Organized by hollensteinnora - Current server time: Jan. 25, 2021, 10:57 a.m. UTC


Jan. 15, 2021, 8 a.m. UTC


Jan. 29, 2021, 8 a.m. UTC


Competition Ends

CMCL 2021 Shared Task

Predicting Human Reading Behavior

The ability of accurately modeling eye-tracking features is crucial to advance the understanding of language processing. Eye tracking provides millisecond-accurate records on where humans look when they are reading, shedding lights on where humans pay attention during their reading and comprehension phase. The benefits of utilizing eye movement data have been noticed in various domains, including natural language processing and computer vision. Thanks to the recent introduction of a standardized dataset, it is finally possible to compare the capabilities of machine learning approaches to model and analyze human patterns of reading.


In this shared task, we present the challenge of predicting eye tracking-based metrics recorded during English sentence processing. We are interested in submissions concerning both cognitive modelling approaches and linguistically motivated approaches (e.g., language models). All participants are encouraged to write a short system description paper after the evaluation to present their models and results (see timeline below).

To participate please register in the "Participate" tab. Then you will be able to download the data provided during the respective phases.


We will use the eye-tracking data of the Zurich Cognitive Langiage Processing Corpus (ZuCo 1.0 and ZuCo 2.0) recorded during normal reading. The training data will contains 800 sentences, and the test set 191 sentences. The data provided will contain scaled features in the range between 0 and 100 to facilitate evaluation via the mean absolute average (MAE). The feature values are averaged over all readers.

The shared task if formulated as a regression task to predict 5 eye-tracking features:

  1. number of fixations (nFix), total number of fixations on the current word;
  2. first fixation duration (FFD), the duration of the first fixation on the prevailing word;
  3. total reading time (TRT), the sum of all fixation durations on the current word, including regressions;
  4. go-past time (GPT), the sum of all fixations prior to progressing to the right of the current word, including regressions to previous words that originated from the current word;
  5. fixation proportion (fixProp), the proportion of participants that fixated the current word (proxy for how likely a word is to be fixated).

For a detailed description of the dataset please refer to the original publications (Hollenstein et al., 2018 and Hollenstein et al., 2020).


  • January 15, 2021: Trial data release
  • January 29, 2021: Participant registration deadline & training data release
  • February 23, 2021: Test data release
  • March 2, 2021: Submission deadline
  • March 9, 2021: Results release
  • March 22: Paper submission deadline
  • April 15: Reviews released to participants
  • April 26: Camera-ready papers due


This shared task is organized as part of the the CMCL workshop. If you have any questions, please contact the organizers.


Task objective
The shared task if formulated as a regression task to predict the five eye-tracking features for each token in the testset. Each training sample consists of a token in a sentence and the corresponding features. You must predict the five features: nFix, FFD, TRT, GPT, and fixProp. 

Data Format

You are given a CSV file with sentences to train you model. Tokens in the sentences are split in the same manner as they were presented to the participants during the readin experiments. Hence, this does not necessarily follow linguistically correct tokenization. Sentence endings are marked with an <EOS> symbol. You must train a model which predicts the eye-tracking features for each token. At test time, you will receive a CSV file with the text input only and your model must predict the eye-tracking features. 

To make a submission, you need to upload a ZIP file containing a text file named "answer.txt", which contains your predictions in the same format as the training data. This means a header line containing "sentence_id,word_id,word,nFix,FFD,GPT,TRT,fixProp", followed by one token per line as in the test data file including the predicted features.


The submissions are evaluated using the mean absolute error (MAE) metric. We will evaluate the predictions against the real eye-tracking feature values using the mean absolute error (MAE). The winning system will be the one with the lowest average MAE across all 5 eye-tracking features.

There will be no leaderboard available until the release of the results after the evaluation phase.

Additional Rules

Any additional data source is allowed, as long as it is freely available to the research community and you describe it in the submitted paper. For example, additional eye-tracking corpora, additional features such as brain activity signals, pre-trained language models, lexical and syntactic features, etc.


Submissions must be made before the end of the evaluation phase.

This challenge is organized as part of the the CMCL workshop. If you have any questions, please contact the organizers.


Start: Jan. 15, 2021, 8 a.m.

Description: Practice phase: Trial data is available in the starting kit to the participants. No leaderboard.


Start: Jan. 29, 2021, 8 a.m.

Description: Training phase: Training data is released to the participants. No leaderboard.


Start: Feb. 23, 2021, 8 a.m.

Description: Evaluation phase: Test data is released to the participants. Leaderboard will be published at the end.


Start: March 9, 2021, 8 a.m.

Description: Post-Evaluation phase: Result, scoring script and datasets are available to everyone.

Competition Ends


You must be logged in to participate in competitions.

Sign In