CodaLab -

> About test datasets

Hello, is there five final multilingual test datasets? How many datasets are there in each language? How many cross language test datasets are there, and what is the data size? How many evaluation lists will be published?

Thanks very much

Posted by: Sattiy @ Jan. 4, 2021, 2:56 a.m.

Hello,

As far as the test data is concerned, we provided/are providing 5 multilingual datasets (test.ar-ar.data, test.en-en.data, test.fr-fr.data, test.ru-ru.data and test.zh-zh.data) and 4 cross-lingual datasets (test.en-ar.data, test.en-fr.data, test.en-ru.data and test.en-zh.data). Gold files will not be released. Each file contains 500 unique lemmas and 2000 sentences. What do you mean exactly with evaluation lists?

Please check out our GitHub page to access the data!

Best regards,
The MCL-WiC

Posted by: Federico_Martelli @ Jan. 4, 2021, 9:56 p.m.

I mean, does the last nine test data mean nine results(ranking lists) in evaluation phase?
Thanks very much

Posted by: Sattiy @ Jan. 5, 2021, 1:15 a.m.

Hi，What is the format of submission？
Thanks

Posted by: Sattiy @ Jan. 10, 2021, 12:40 p.m.

Hello,

Yes, there will be 9 ranking lists (one for each gold dataset provided) and you can also submit only one or two datasets.

Please follow these steps for the submission:

1. download the test data (.data) from our GitHub page https://github.com/SapienzaNLP/mcl-wic,
2. generate your answers,
3. name each file "test.{language}-{language}" (for example "test.ru-ru" if you wish to participate in the Russian multilingual sub-task),
4. create a submission.zip file containing all your datasets which you would like to submit (for example the submission.zip file could contain the files "test.ru-ru" and "test.en-ru", indicating that you will participate in the Russian multilingual sub-task and the English-Russian cross-lingual sub-task), and
5. submit!

Best regards,
The MCL-WiC team

Posted by: Federico_Martelli @ Jan. 10, 2021, 8:29 p.m.

PS. Your "test.{language}-{language}" files (to be zipped together) must be in the same format as our .gold files. Please see our CodaLab page for a detailed description and download the dev .gold files from our GitHub page: https://github.com/SapienzaNLP/mcl-wic.
PSS. Possible language combinations for the multilingual sub-task: ar-ar, en-en, fr-fr, ru-ru, zh-zh. Possible combinations for the cross-lingual sub-task: en-ar, en-fr, en-ru, en-zh.

Posted by: Federico_Martelli @ Jan. 10, 2021, 8:45 p.m.

Thanks for your reply!

Posted by: Sattiy @ Jan. 11, 2021, 1:14 a.m.

Dear participants,

Please pay attention to the format of the submission files before uploading. IMPORTANT: tags must be either T or F (Y/N will not be processed by the script).

Example1:

[
{
"id": "test.en-en.0",
"tag": "T"
}
]

Example2:

[
{
"id": "test.en-en.1",
"tag": "F"
}
]

Best regards,
The MCL-WiC team

Posted by: Federico_Martelli @ Jan. 11, 2021, 8:17 a.m.

Hello,

Please clarify this - 'and you can also submit only one or two datasets.'

Thanks.

Posted by: amansinha_ @ Jan. 12, 2021, 3:59 p.m.

Does the submission status display "Finished" mean that the submission is successful?
But I see the line chart shows "-1".
Thanks!

Posted by: Sattiy @ Jan. 13, 2021, 10:50 a.m.

Yes, now if you received no errors and see "Finished", that means that the submission was successful!

All the best,
The MCL-WiC team

Posted by: Federico_Martelli @ Jan. 13, 2021, 11:13 a.m.

Hi again,

to answer this question: Please clarify this - 'and you can also submit only one or two datasets': Your submission.zip file can also contain only one file (for example test.ru-ru), in this case you will receive only only one score (multilingual sub-task, language combination: Russian-Russian), in all other datasets you will receive -1, meaning datasets not uploaded).

All the best,
The MCL-WiC team

Posted by: Federico_Martelli @ Jan. 13, 2021, 11:20 a.m.

Is the score based on the last submission?
Thanks!

Posted by: Sattiy @ Jan. 14, 2021, 9:43 a.m.

Hello,

You will receive a score for each submission.

Cheers,
The MCL-Wic team

Posted by: Federico_Martelli @ Jan. 14, 2021, 4:24 p.m.

I mean the result of the final Evaluation List depends on which submission.
Thanks!

Posted by: mengyuan_jiayi @ Jan. 15, 2021, 1:17 a.m.

Hi,

This is still to be clarified internally.

Best regards,
The MCL-WiC team

Posted by: najlakalach @ Jan. 15, 2021, 9:38 a.m.

Hi,
Sorry I didn't understand what should we do if we see a -1 score in the chart? I am sure that I am submitting my results in the right way.
Thank you,
Niloofar_R

Posted by: niloofar_r @ Jan. 18, 2021, 6:38 a.m.

Hi,

That's because you did not submit all datasets, -1 indicates that the corresponding dataset was not uploaded.

Cheers,
The MCL-WiC team

Posted by: Federico_Martelli @ Jan. 18, 2021, 7:53 a.m.

Post in this thread

Forums

SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) Forum

> About test datasets