Multi-Hop Inference Explanation Regeneration (TextGraphs-14) Forum

Go back to competition Back to thread list Post in this thread

> Link for test data?


Can someone please direct me on how to obtain the test data?


Posted by: jeshuren @ May 30, 2020, 11:45 a.m.


we have published the currently masked test dataset at Please find instructions on getting started on GitHub: We will publish the answers after the competition ends.

- Dmitry

Posted by: dustalov @ May 30, 2020, 12:04 p.m.

Hi Dmitry,

Thanks for your response. I downloaded the zip file and on extraction, these are the contents of the zip file.

| -- tables\
| --
| -- questions.train.tsv
| -- tableindex.txt

Am I missing something in here? Kindly help. Thanks.

Posted by: jeshuren @ May 30, 2020, 1:37 p.m.

There should be one more file, questions.test.tsv, with removed answers. We will investigate why it is missing.

Posted by: dustalov @ May 30, 2020, 2:15 p.m.

Okay. Thanks much.

Posted by: jeshuren @ May 30, 2020, 2:24 p.m.

Hello again, I have updated the corresponding parts of Makefile to use the dataset uploaded to GitHub: Hope it helps.

Posted by: dustalov @ June 1, 2020, 9:30 p.m.

Hi Jeshuren,

We're internally having a discussion as to whether it makes sense to move the test set over to the official WorldTree V2.1 set, since it's now officially released from LREC2020, and the deadlines for the shared task have been extended significantly due to the virus. This would also have the benefit of participants results being directly comparable to work others do with the corpus after the shared task. The main difference between the official Worldtree V2.1 release and the TG-2020 shared task release is some clean up, filtering, merging explanations authored by different annotators for the same question (reducing the size by about 14%), and some TableStore improvements/refactoring.

WorldTree V2.1 is available here:
(Direct link: )

Posted by: pajansen @ June 11, 2020, 12:22 a.m.


Does this mean only the test set is affected or the whole data?


Posted by: jeshuren @ June 11, 2020, 5:55 p.m.

Hi Jeshuren,

While we're still discussing this (and hopefully will have an answer in a few days), if we do go with the official V2.1 for the test data for forward-compatibility (which I think is very beneficial), I think participants would be free to choose whether to use the original or newer train and dev sets to work from -- they're nearly identical, except for the merged explanations, and a small set of refactored tablestore rows. If there were to be any performance difference from training on the old data and testing on the new data due to the refactored rows in the tablestore, I imagine that it would be extremely small, so the models trained on the old data should transfer fine.

One thing we've noticed in my lab is that if you end up deciding to retrain on the new data, you might have to tinker with the hyperparameters a bit. The original dataset had 14% of explanations that were essentially duplicates (usually explanations for questions written by different authors), and in the new V2.1 dataset we've merged them into a single explanation.

best wishes,

Posted by: pajansen @ June 11, 2020, 10:20 p.m.

Dear participants, we have updated the shared task dataset. Please check our GitHub repository and download the file again:


The newer version of the dataset contains more useful data. Our CodaLab competition is already using it. Since we have changed the dataset, we increased your submission limits to 30.

Note that we have updated our Terms and Conditions. In particular, we added a reproducibility requirement: to encourage transparency and replicability, all teams must publish their code, tuning procedures, and instructions for running their models with their submission of shared task papers.

Posted by: dustalov @ June 23, 2020, 8:33 p.m.
Post in this thread