The ability of accurately modeling eye-tracking features is crucial to advance the understanding of language processing. Eye tracking provides millisecond-accurate records on where humans look when they are reading, shedding lights on where humans pay attention during their reading and comprehension phase. The benefits of utilizing eye movement data have been noticed in various domains, including natural language processing and computer vision. Thanks to the recent introduction of a standardized dataset, it is finally possible to compare the capabilities of machine learning approaches to model and analyze human patterns of reading.
In this shared task, we present the challenge of predicting eye tracking-based metrics recorded during English sentence processing. We are interested in submissions concerning both cognitive modelling approaches and linguistically motivated approaches (e.g., language models). All participants are encouraged to write a short system description paper after the evaluation to present their models and results (see timeline below).
To participate please register in the "Participate" tab. Then you will be able to download the data provided during the respective phases.
We will use the eye-tracking data of the Zurich Cognitive Langiage Processing Corpus (ZuCo 1.0 and ZuCo 2.0) recorded during normal reading. The training data will contains 800 sentences, and the test set 191 sentences. The data provided will contain scaled features in the range between 0 and 100 to facilitate evaluation via the mean absolute average (MAE). The feature values are averaged over all readers.
The shared task if formulated as a regression task to predict 5 eye-tracking features:
For a detailed description of the dataset please refer to the original publications (Hollenstein et al., 2018 and Hollenstein et al., 2020).
The shared task if formulated as a regression task to predict the five eye-tracking features for each token in the testset. Each training sample consists of a token in a sentence and the corresponding features. You must predict the five features: nFix, FFD, TRT, GPT, and fixProp.
You are given a CSV file with sentences to train you model. Tokens in the sentences are split in the same manner as they were presented to the participants during the readin experiments. Hence, this does not necessarily follow linguistically correct tokenization. Sentence endings are marked with an <EOS> symbol. You must train a model which predicts the eye-tracking features for each token. At test time, you will receive a CSV file with the text input only and your model must predict the eye-tracking features.
To make a submission, you need to upload a ZIP file containing a text file named "answer.txt", which contains your predictions in the same format as the training data. This means a header line containing "sentence_id,word_id,word,nFix,FFD,GPT,TRT,fixProp", followed by one token per line as in the test data file including the predicted features.
The submissions are evaluated using the mean absolute error (MAE) metric. We will evaluate the predictions against the real eye-tracking feature values using the mean absolute error (MAE). The winning system will be the one with the lowest average MAE across all 5 eye-tracking features.
There will be no leaderboard available until the release of the results after the evaluation phase.
Any additional data source is allowed, as long as it is freely available to the research community and you describe it in the submitted paper. For example, additional eye-tracking corpora, additional features such as brain activity signals, pre-trained language models, lexical and syntactic features, etc.
Start: Jan. 15, 2021, 8 a.m.
Description: Practice phase: Trial data is available in the starting kit to the participants. No leaderboard.
Start: Jan. 29, 2021, 8 a.m.
Description: Training phase: Training data is released to the participants. No leaderboard.
Start: Feb. 23, 2021, 8 a.m.
Description: Evaluation phase: Test data is released to the participants. Leaderboard will be published at the end.
Start: March 9, 2021, 8 a.m.
Description: Post-Evaluation phase: Result, scoring script and datasets are available to everyone.
You must be logged in to participate in competitions.Sign In