SemEval 2020 Task 3 - Predicting the (Graded) Effect of Context in Word Similarity Forum

Go back to competition Back to thread list Post in this thread

> Question about Training Data

Hi Organizers!

I was going through the submission section on the codalab page and found this paragraph with regard to the subtasks:

"Both are unsupervised tasks, we won't be releasing training data. Both use the same input data (pairs of words and contexts) but each of them has its own phases and leaderboards. This means the submissions are independent and you can use
different models for each of the subtasks."

Please correct me if I am wrong, but does this mean the only data we have access to is the sample dataset, and that we will have to build "generalized" models that operate on a pair of contexts consisting of the words that occur in the SimLex999 dataset? This confusion comes from the absence of a training dataset since we would still need some form of supervision of how humans differentiate the similarity between the word pairs for different contexts correct? or are you expecting participants to collect this data?

Please let me know,

thanks!

Posted by: kanishka @ Sept. 12, 2019, 2:26 p.m.

Hi,

The task is "unsupervised" in that we are not going to release any "training data" to be used to train models specifically for this task.
You can use any amount of text/data that is available to you to train your models (example: English/Croatian/Slovenian wikipedia).
You don't need to collect any human annotation, that is our role and exactly what we are working on at the moment.
We will release a dataset of similarity in context annotated by humans, but in this task that will be the gold standards, the values for the model to predict, and not to be used to train.
Does that answer you question? Please let me know if it is not clear yet.

We will release some additional "sample data" for Croatian, Slovenian and possibly Estonian though.

Thanks for the interest!

Posted by: csantosarmendariz @ Sept. 14, 2019, 4:35 p.m.

Oh, I sort of get it. All you want are some models that operate on your testing/gold standard dataset which is structured in a similar way as the sample data and produce the necessary values for the two subtasks. So essentially you're going to test how generalized models that are trained for various objectives perform in distinguishing the similarities of two words in different contexts.

I was clearly under the assumption that you'll provide us with something like simlex999 where you have 3 sets - training, validation and a test set without any gold labels; and then we would train some model for the task and then submit our systems. But that is definitely not the case.

This makes sense and I can see why you've structured the task in this way.

Thanks!

Posted by: kanishka @ Sept. 16, 2019, 12:40 a.m.
Post in this thread