We would like to announce that we are extending the test phase by 48 hours.
The test phase now ends on Jan 30 23:59pm UTC.
You can download the test data and submit predictions here:
https://competitions.codalab.org/competitions/36425
View the task FAQ: https://multiconer.github.io/faq
This shared task challenges NLP enthusiasts to develop complex Named Entity Recognition systems for 11 languages. The task focuses on detecting semantically ambiguous and complex entities in short and low-context settings. Participants are welcome to build NER systems for any number of languages. And we encourage to aim for a bigger challenge of building NER systems for multiple languages. The languages are:
For some languages, an additional track with code-mixed data will be offered. The task also aims at testing the domain adaption capability of the systems by testing on additional test sets on questions and short search queries.
For more information, please visit the official website for the task.
More Details about the Test Phase: https://multiconer.github.io/faq
In this shared task, we provide train/dev/test data for 11 languages (i.e. English, Spanish, Dutch, Russian, Turkish, Korean, Farsi, German, Chinese, Hindi, and Bangla), and two language settings: code-mixed and multilingual. As a summary, we provide 13 train/dev/test sets, respectively. This Codalab competition is the practice phase, where you are allowed to submit prediction files for all dev sets. The evaluation framework is divided in three broad types of tracks.
en_train.conll
to train their model and the model will be evaluated by en_dev.conll
in practice phase, and en_test.conll
in evaluation phase. Predictions from any multi-lingual model is not allowed in this track.multi_train.conll
). The training data contains sentences in 11 languages. Note that a sentence in this data only has one language. This trained model should be used to predict multilingual evaluation sets, i.e. multi_dev.conll
or multi_test.conll
. Predictions from any mono-lingual model is not allowed in this track. Therefore, please do not submit predictions from mono-lingual models in this track. The data does not identify the language of each sentence.mix_train.conll
) or use the training data of Track 1-12 to train their model. The trained model will be evaluated on mix_dev.conll
and mix_test.conll
. The data does not identify the languages used in the sentences.Your submissions will be evaluated by macro- Precision, Recall and F1 over 6 entity classes, i.e. LOC
, PER
, PROD
, GRP
, CW
and CORP
. The leaderboard per track will contains these tree metrics and the performance is ranked by macro-F1. To check more detailed evaluation scores after submission, you can either “View scoring output log” or “Download output from scoring step” by checking scores.json.
The prediction file should follow conll format but only contain tags. i.e. each line contains only the predicted tags of the tokens and sentences are separated by a blank line. Make sure your tags in your prediction file are exactly aligned with the provided dev/test sets. For example,
Given en_dev.conll/en_test.conll
. Note that the 4th column in the data contains entity tags, which will be hidden (replaced by _
) in test set.
# id f423a88e-02b7-4d61-a546-4a1bd89cfa15 domain=dev
it _ _ O
originally _ _ O
operated _ _ O
seven _ _ O
bus _ _ O
routes _ _ O
which _ _ O
were _ _ O
mainly _ _ O
supermarket _ _ O
routes _ _ O
for _ _ O
asda _ _ B-CORP
and _ _ O
tesco _ _ B-CORP
. _ _ O
...
You will need generate the prediction file in the follow format
# (You can either delete the sentence id or keep it)
O
O
O
O
O
O
O
O
O
O
O
O
B-CORP
O
B-CORP
O
...
Follow the below instructions to submit your prediction files for a track. Codalab requires all submissions in zip format
<language_code>.pred.conll
.
en_dev.conll
(or en_test.conll
in the testing phase) and name it as en.pred.conll
.language_code
values for Track 12 Multilingual and Track 13 code mixed are mix
and multi
, i.e. you will need to name the prediction file as mix.pred.conll
or multi.pred.conll
.zip my_submission.zip <language_code>.pred.conll
(or your favorite zip utility), and the submit the zip file to the right track on Codalab.
More info and FAQ: https://multiconer.github.io/faq
By submitting results to this competition, you consent to the public release of your scores at the SemEval-2022 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatically and manually calculated quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that if your team has several members, each of them will register to the competition and build a competition team (as described on the 'Overview' page) and that if you are a single participant you will build a team with a single member.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
The data of this competition is released under the CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) license (see lowcontext-ner-gaz (https://registry.opendata.aws/lowcontext-ner-gaz/) and code-mixed-ner (https://registry.opendata.aws/code-mixed-ner/)). Attribution shall be provided by citing:
Shervin Malmasi, Amazon, USA.
Besnik Fetahu, Amazon, USA.
Anjie Fang, Amazon, USA.
Sudipta Kar, Amazon, USA.
Oleg Rokhlenko, Amazon, USA.
Join us in Slack
Subscribe to the task mailing list
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development EN data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development ES data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development NL data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development RU data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development TR data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development KO data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development FA data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development DE data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development ZH data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development HI data.
Start: Oct. 27, 2021, midnight
Description: Develop and train your system, and try evaluating on development BN data.
Start: Nov. 1, 2021, midnight
Description: Develop and train your system, and try evaluating on multilingual development data covering all languages.
Start: Nov. 1, 2021, midnight
Description: Develop and train your system, and try evaluating on Code-mixed development data.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | omnifish_ygb | 0.822 |
2 | jplu | 0.822 |
3 | lld | 0.813 |