Please cite the shared task paper with the following BibTex:
Code-switching (CS) is the phenomenon by which multilingual speakers switch back and forth between their common languages in written or spoken communication. CS is typically present on the intersentential, intrasentential (mixing of words from multiple languages in the same utterance) and even morphological (mixing of morphemes) levels. CS presents serious challenges for language technologies such as Parsing, Machine Translation (MT), Automatic Speech Recognition (ASR), information retrieval (IR) and extraction (IE), and semantic processing. Traditional techniques trained for one language quickly break down when there is input mixed in from another. Even for problems that are considered solved, such as language identification, or part of speech tagging, performance degrades at a rate proportional to the amount and level of the mixed-language present.
The third workshop on Computational Approaches on Linguistic Code-Switching (CALCS-2018) has prepared a shared task on Named Entity Recognition using English-Spanish code-switched data from social media. The goal is to allow participants to explore the use of supervised, semi-supervised and/or unsupervised approaches to predict the entity types of CS data. We believe that this effort will provide more resources to the increasing CS community.
Participants of the shared task will have to register on the official website of the workshop as well as request access to the CodaLab competition (see Participate tab). Additionally, participants will be required to submit the output of their systems within a pre-specified time window in order to qualify for evaluation in the shared task. They will also be required to submit a paper describing their system.
For more information, please visit the official website of the workshop.
We are going to evaluate your output predictions with the harmonic mean F1 metric. This is the standard way to evaluate NER tasks. Additionally, we include the Surface Forms F1 metric introduced in th Workshop on Noisy User-generated Text, W-NUT 2017 (Derczynski et al., 2017).
The leaderboard will show both the standard F1 and the F1 Surface Form. However, the ranking will be ordered by the average of those two metrics.
By participating in the CALCS 2018 shared task, you have to agree with the following terms and conditions:
This is a Named Entity Recognition shared task on social media data that presents the English-Spanish code-switching behavior. You will have to predict the right entity type using the IOB scheme for the following categories:
Note that [BI] are the Beginning and Inside on each category. This describes whether a specific token is the start of an NE or if it's a subsequent token, in the case of a multi-word NE. You can find the annotation guidelines used for this data here.
Participants can use any resources (e.g., pre-trained word embeddings, gazetteers, etc.) that they consider appropriate for the task. In terms of the competition, there is no difference between with or without resources. However, we highly encourage participants to keep track of the perfomance when adding resources to include such insights in the paper.
We provide the test set using the CoNLL format. We expect you to add the labels next to each token using a tab as a delimiter. Additionally, do not change the order of the lines in the test set because this could have a bad impact in your scores.
The evaluation script will expect your submission to be named as "calcs_eng_spa_preds.conll". Additionally, you will need to compress your submission file in order to upload it to CodaLab.
Finally, you will be able to submit your results as a team. As such, please use a team name that you would like to see in the proceedings of the workshop. To join/create a team, please follow the instructions here.
Start: March 23, 2018, midnight
Description: This is the NER English-Spanish competition.
Start: April 29, 2018, 10:25 p.m.
Description: This is the NER English-Spanish perpetual benchmark phase.
April 19, 2018, 11 p.m.
You must be logged in to participate in competitions.Sign In