SemEval 2021 Task 1 - Lexical Complexity Prediction (LCP)

Organized by ghpaetzold - Current server time: Jan. 25, 2021, 11:41 a.m. UTC


TEST: Sub-Task 1 (Single Words)
Jan. 11, 2021, midnight UTC


TEST: Sub-Task 2 (Multi-Word Expressions)
Jan. 11, 2021, midnight UTC


Competition Ends
SemEval 2021 Task 1: Lexical Complexity Prediction (LCP 2021) is a follow-up of the CWI 2016 and CWI 2018 shared tasks. 
LCP 2021 provides participants with an augmented version of CompLex, a multi-domain English dataset with sentences annotated using a 5-point Likert scale (1-5) (from very easy to very difficult). The task is to predict the complexity value of words in context.

LCP 2021 is divided into two sub-tasks:

  • Sub-task 1: predicting the complexity score of single words;
  • Sub-task 2: predicting the complexity score of multi-word expressions 
Dataset and detailed task description available at:

LCP 2021 systems will be ranked using Pearson correlation (R).

We will also report scores for the following metrics:

  • Spearman correlation (Rho)
  • Mean absolute error (MAE)
  • Mean squared error (MSE)
  • R-squared (R2)

In the shared-task report paper, we will also report the overall performance for sub-task 1 and sub-task 2 for teams that made a submission for both tracks. We will also include scores for different content sub-genres present in the dataset.

To aid those participating in the shared task with a reasonable estimation of performance we have released the following baselines combining log word frequency from SUBTLEX and word length. Features were averaged for each token in a MWE. The training set was used for training and the trial set was used for evaluation. Each feature, or combination of features was passed through linear regression. The complexity-averge gives the same 'average complexity' value, learned from the training set to each value in the trial set. The results are illustrated below:

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2021 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatically and manually calculated quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that if your team has several members, each of them will register to the competition and build a competition team (as described on the 'Overview' page) and that if you are a single participant you will build a team with a single member.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

For both sub-tasks, the submission format is a ZIP file containing a .CSV file inside it.

Each line of the .CSV file must follow the following pattern:


Where <ID> is the instance's ID provided in the dataset, and <SCORE> the complexity score that your system predicted.

The starting kits you find in the "Participate/Files" section are examples of valid submissions!

Your CodaLab username used for your submission will be listed in the shared task report. If your team would like to use a team name different from your CodaLab username, you must include this information in your CodaLab profile before submission. We can't make any name changes after submissions are completed. It is also prohibited for the same team to submit results using multiple accounts.

During the trial phase, you will be allowed 999 submissions so that you can get familiarized with the platform. However, during the test phase, you will be allowed only 3 submissions, so make them count! Only your best submission will be featured on the leaderboard.

TRIAL: Sub-Task 1 (Single Words)

Start: Oct. 29, 2020, midnight

TRIAL: Sub-Task 2 (Multi-Word Expressions)

Start: Oct. 29, 2020, midnight

TEST: Sub-Task 1 (Single Words)

Start: Jan. 11, 2021, midnight

TEST: Sub-Task 2 (Multi-Word Expressions)

Start: Jan. 11, 2021, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 DeepBlueAI 0.8612
2 rg_pa 0.8575
3 xiang_wen_tian 0.8571