These are the instructions for the NAACL 2015 review form as used by the reviewers in NAACL HLT 2015. Authors who have submitted a paper to NAACL 2015 can use this as a guide to understand the numeric scores in the reviews they have received.

Review Form

Please complete your review using the following guidelines for the scored categories.

Appropriateness (1-5)

Does this paper fit in NAACL-HLT?

The NAACL HLT 2015 conference covers a broad spectrum of disciplines aimed at: building intelligent systems to interact with humans using natural language; understanding computational and other linguistic properties of languages; and enhancing human-human communication through speech recognition, automatic translation, information retrieval, text summarization, and information extraction.

Both empirical and theoretical results are welcome; see the Call for Papers.

5 = Appropriate for NAACL-HLT. (Most submissions)
4 = Computational linguistics/NLP, IR or Speech though not typical NAACL-HLT material.
3 = Possibly relevant to the audience, though it’s not quite computational linguistics/NLP, IR or Speech.
2 = Only marginally relevant.
1 = Inappropriate.

Clarity (1-5)

For a reasonably well-prepared reader, is it clear what was done and why? Is the paper well-written and well-structured?

5 = Very clear.
4 = Understandable by most readers.
3 = Mostly understandable with some effort.
2 = Important questions were hard to resolve even with effort.
1 = Much of the paper is confusing.

Originality (1-5)

How original is the approach or problem presented in this paper? Does this paper break new ground in topic, methodology, or insight? A paper can score high for originality even if the results did not show a convincing benefit.

5 = Innovative: Highly original and significant new research topic, technique, methodology, or insight.
4 = Creative: An intriguing problem, technique, or approach that is substantially different from previous research.
3 = Respectable: A nice research contribution that represents a notable extension of prior approaches or methodologies.
2 = Uninspiring: Obvious, or a minor improvement on familiar techniques.
1 = Significant portions have actually been done before or done better.

Soundness/Correctness (1-5)

Is the technical approach sound and well-chosen? Can one trust the claims of the paper – are they supported by proofs or proper experiments where the results of the experiments are correctly interpreted?

5 = The approach is sound, and the claims are convincingly supported.
4 = Generally solid, but there are some aspects of the approach or evaluation I am not sure about.
3 = Fairly reasonable, but the main claims cannot be accepted based on the material provided.
2 = Troublesome. Some interesting ideas, but the work needs better justification or evaluation.
1 = Fatally flawed.

Impact of Ideas/Results (1-5)

How significant is the work described? If the ideas are novel, will they also be useful or inspirational? If the results are sound, are they also important?

5 = Will have a significant impact on the field.
4 = Some of the ideas or results will substantially help ongoing research.
3 = Interesting but not too influential: a minor contribution that will be used mainly for comparison.
2 = Marginally interesting. Probably will not be read or used.
1 = Will have no impact on the field.

Meaningful Comparison (1-5)

Do the authors place their work well with respect to existing literature? Are the references adequate? Are any experimental results meaningfully compared with appropriate prior approaches or other baselines?

If you feel references are inadequate be sure to include the relevant references in your comments.

5 = Comparison to prior work is superbly carried out given the space constraints.
4 = Comparisons are mostly solid, but there are some missing references.
3 = Comparisons are weak, very hard to determine how it compares to previous work.
2 = Only partial awareness or understanding of related work, or a flawed empirical comparison.
1 = Little awareness of related work, or lacks necessary empirical comparison.

Substance (1-5)

Does this paper have enough substance, or would it benefit from more ideas or results? Note that this question mainly concerns the amount of work; quality is evaluated in other categories.

5 = Contains more ideas or results than most publications of this length at NAACL-HLT.
4 = Represents an appropriate amount of content for a NAACL-HLT paper of this length (most submissions).
3 = Leaves open one or two natural questions that should have been pursued within the paper.
2 = Work in progress. There are enough good ideas, but perhaps not enough results yet.
1 = Seems thin. Not enough ideas here.

Replicability (1-5)

Will members of the research community be able to reproduce or verify the results described in this paper? A lower score might be assigned if an insufficient amount of detail has been provided, if there is a highly subjective component to the setting of certain parameters, or if proprietary data have been used in the experiments. A low score here does not necessarily imply a low overall recommendation.

Members of the ACL community…

5 = could easily reproduce the results and verify the correctness of the results described here.
4 = could mostly reproduce the results described here, although there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method.
3 = could possibly reproduce the results described here with some difficulty. The settings of parameters are underspecified or very subjectively determined; the training/evaluation data required are not widely available.
2 = could not reproduce the results described here no matter how hard they tried. The author simply has not provided a sufficient amount of detail nor access to resources for us to do anything more than accept their conclusions without question.
1 = not applicable (please use this very sparingly, such as for short submissions that are opinion pieces).

Recommendation (1-5)

There are many good submissions to NAACL-HLT 2015; how important is it to feature this one? Will people learn a lot by reading this paper or seeing it presented?

In deciding on your ultimate recommendation, please think over all your scores above. But remember that no paper is perfect, and remember that we want a conference full of interesting, diverse, and timely work. If a paper has some weaknesses, but you really got a lot out of it, feel free to fight for it. If a paper is solid but you could live without it, let us know that you would rather not see it in the conference. Remember also that the author has about a month to address reviewer comments before the camera-ready deadline.

Please do take the length of the submission into account. Rank short submissions relative to other short submissions, and full-length submissions relative to other full-length submissions. Acceptable short submissions include small, focused contributions; works in progress; negative results; opinion pieces and interesting application notes.

5 = Exciting: I would fight for this paper to be accepted.
4 = Strong: I learned a lot from this paper.
3 = Borderline: It has some merits but also some serious problems. I’m ambivalent about this one.
2 = Mediocre: I would rather not see it in the conference.
1 = Poor: I would fight to have it rejected.

Reviewer Confidence (1-5)

5 = Positive that my evaluation is correct. I read the paper very carefully and I am very familiar with related work.
4 = Quite sure. I tried to check the important points carefully. It’s unlikely, though conceivable, that I missed something that should affect my ratings.
3 = Pretty sure, but there’s a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper’s details, e.g., the math, experimental design, or novelty.
2 = Willing to defend my evaluation, but it is fairly likely that I missed some details, didn’t understand some central points, or can’t be sure about the novelty of the work.
1 = Not my area, or paper was hard for me to understand. My evaluation is just an educated guess.