Evaluating grammatical error corrections

Organized by cnapoles - Current server time: May 23, 2018, 6:47 a.m. UTC

First phase

Nov. 2, 2026, midnight UTC

End

Competition Ends
Never

This competition has moved. Please update your bookmarks to https://competitions.codalab.org/competitions/15475

This platform evaluates gramamtical error corrections of the CoNLL 2014 Shared Task test set [1], and is released to accompany the following paper:

Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault
There’s No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction
EMNLP 2016

Please include the following citation if you use this toolkit.

@InProceedings{napoles-sakaguchi-tetreault:2016:EMNLP2016,
  author    = {Napoles, Courtney  and  Sakaguchi, Keisuke  and  Tetreault, Joel},
  title     = {There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction},
  booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
  month     = {November},
  year      = {2016},
  address   = {Austin, Texas},
  publisher = {Association for Computational Linguistics},
  pages     = {2109--2115},
  url       = {https://aclweb.org/anthology/D16-1228}
}

The code for executing this evaluation program is also available from our git repository: https://github.com/cnap/grammaticality-metrics


Data

The CoNLL 2014 test set can be obtained from the official shared task website:

http://www.comp.nus.edu.sg/~nlp/conll14st.html

This following metrics and reference sets are supported in this competition:

Metrics

  • Reference-based metrics (RBMs)
    • GLEU [2]
    • I-measure [3] ** Not supported in CodaLab
    • M2 [4]
  • Grammaticality-based metrics (GBM)
    • LT
  • Interpolated metrics
    • LT interpolated with each RBM

Reference sets

  • NUCLE references [1]
  • non-expert fluency edits [5]
  • non-expert minimal edits [5]
  • expert fluency edits [5]
  • expert minimal edits [5]

Credits

The scripts for calculating GLEU, I-measure, and M2 were modified to return sentence-level scores and so that they can be called by an external program. At this date, CodaLab does not support Java 8, so we are using the most recent version of LanguageTool that supports Java 7 (v3.1). I-measure takes several minutes to run and exceeds the time limit imposed by CodaLab on scoring programs. Therefore, it is not enabled in the online CodaLab competition, but you can run it from the original repository (https://github.com/mfelice/imeasure) or our git repository (https://github.com/cnap/grammaticality-metrics).


References

1. Ng et al. The CoNLL-2014 Shared Task on grammatical error correction. In Proceedings of CoNLL, 2014.
2. Napoles et al. Ground truth for grammatical error correction metrics. In Proceedings of ACL, 2015.
3. Felice and Briscoe. Towards a standard evaluation method for grammatical error detection and correction. In Proceedings of NAACL, 2015.
4. Dahlmeier and Ng. Better evaluation for grammatical error correction. In Proceedings of NAACL, 2012.
5. Sakaguchi et al. Reassessing the goals of grammatical error correction: Fluency instead of grammaticality. TACL, 2016.


Misc.

In the future, we hope to expand the evaluation to include new data, references, and metrics. Feel free to contact us with any suggestions, questions, or comments.


Courtney Napoles (napoles@cs.jhu.edu)

2016-11-16

Start: Nov. 2, 2026, midnight

Description: Please visit https://competitions.codalab.org/competitions/15475

Start: Nov. 2, 2026, midnight

Description: Please visit https://competitions.codalab.org/competitions/15475

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In