Our task description paper:
SemEval-2020 Task 5: Counterfactual Recognition (available on arXiv 2020-08-03)
Our dataset is allowed to be used in any paper, only upon citation (BibTex as below):
(1) LaTex version:
@inproceedings{yang-2020-semeval-task5,
title = "{S}em{E}val-2020 Task 5: Counterfactual Recognition",
author = "Yang, Xiaoyu and
Obadinma, Stephen and
Zhao, Huasha and
Zhang, Qiong and
Matwin, Stan and
Zhu, Xiaodan",
booktitle = "Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020)",
year = "2020",
address = "Barcelona, Spain",
}
(2) MS Word:
Xiaoyu Yang, Stephen Obadinma, Huasha Zhao, Qiong Zhang, Stan Matwin, and Xiaodan Zhu. 2020. SemEval-2020 Task 5: Counterfactual Recognition. In Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020), Barcelona, Spain.
To model counterfactual semantics and reasoning in natural language, our shared task aims to provide a benchmark for two basic problems.
In this task, you are asked to determine whether a given statement is counterfactual or not. Counterfactual statements describe events that did not actually happen or cannot happen, as well as the possible consequence if the events have had happened. More specifically, counterfactuals describe events counter to facts and hence naturally involve common sense, knowledge, and reasoning. Tackling this problem is the basis for all down-stream counterfactual related causal inference analysis in natural language. For example, the following statements are counterfactuals that need to be detected: one from healthcare and one from the finance domain:
The important dates have been updated as below according to the updated SemEval-2020 schedule. For the details, please refer to the official website of SemEval-2020: http://alt.qcri.org/semeval2020/
Email: task5.counterfactual AT gmail.com
We provide datasets for task-1 and task-2 respectively, and both will include train.csv and test.csv.
Please note that you could only use the corresponding dataset for task-1 to build models for task-1 and dataset for task-2 to build models for task-2 to ensure fairness.
In 'Participate -> Submit/View Results -> Practise-Subtask1' or '...->Practise-Subtask2', you could try to submit your own results to verify the format.
A valid submission zip file for CodaLab contains one of the following files:
* The .csv file with the incorrect file name (sensitive to capitalization of letters) will not be accepted.
* A zip file containing both files will not be accepted.
* Neither .csv nor .rar file will be accepted, only .zip file is accepted.
* Please zip your results file (e.g. subtask1.csv) directly without putting it into a folder and zipping the folder.
For the pred_label, '1' refers to counterfactual while '0' refers to non-counterfactual. The 'sentenceID' should be in the same order as in 'test.csv' for subtask-1 (in evaluation phase).
sentenceID | pred_label |
322893 | 1 |
322892 | 0 |
... | ... |
If there is no consequent part (a consequent part not always exists in a counterfactual statement) in this sentence, please put '-1' in the consequent_startid and 'consequent_endid'. The 'sentenceID' should be in the same order as in 'test.csv' for subtask-2 (in evaluation phase).
sentenceID | antecedent_startid | antecedent_endid | consequent_startid | consequent_endid |
104975 | 15 | 72 | 88 | 100 |
104976 | 18 | 38 | -1 | -1 |
... | ... | ... | ... | ... |
sentenceID,gold_label,sentence
"6000627","1","Had Russia possessed such warships in 2008, boasted its naval chief, Admiral Vladimir Vysotsky, it would have won its war against Georgia in 40 minutes instead of 26 hours."
sentenceID,sentence,domain,antecedent_startid,antecedent_endid,consequence_startid, consequence_endid
3S0001,"For someone who's so emotionally complicated, who could have given up many times if he was made of straw - he hasn't.",Health,83,105,48,81
Participants have to participate in both of the 2 tasks. The evaluation metrics that will be applied are:
The evaluation script will verify whether the predicted binary "label" is the same as the desired "label" which is annotated by human workers, and then calculate its precision, recall, and F1 scores.
Exact Match will represent what percentage of both your predicted antecedents and consequences are exactly matched with the desired outcome that is annotated by human workers.
F1 score is a token level metric and will be calculated according to the submitted antecedent_startid, antecedent_endid, consequent_startid, consequent_endid. Please refer to our baseline model for evaluation details.
By submitting results to this competition, you consent to the public release of your scores at the SemEval-2020 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include but are not limited to, automatic and manual quantitative judgments, qualitative judgments, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgment that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
You agree not to redistribute the test data except in the manner prescribed by its license.
(1) Please note that you could only use the provided dataset for task-1 to build models for task-1 and dataset for task-2 to build models for task-2 to ensure fairness.
(2) Please use the corresponding training data for Subtask-1 and 2 to train your own models. The trial data is only used to show the format of our dataset but not for the competition.
Xiaodan Zhu, Queen's University
Xiaoyu Yang, Queen's University
Huasha Zhao, Alibaba Group
Qiong Zhang, Alibaba Group
Stan Matwin, Dalhousie University
We also kindly thank Jiaqi Li, Qianyu Zhang, Stephen Obadinma, Xiao Chu and Rohan for their help and effort in this project.
Start: Sept. 1, 2019, midnight
Description: ****** here you submit the results for subtask-1! [phase: practice] ****** ( Please choose a specific task and a target phase before submitting your answers! )
Start: Sept. 1, 2019, midnight
Description: ****** here you submit the results for subtask-2! [phase: practice] ****** ( Please choose a specific task and a target phase before submitting your answers! ) e you submit the results for subtask-2 !!! ************************ ( Please choose a specific task and a target phase before submitting your answers! )
Start: Feb. 19, 2020, midnight
Description: ********************** Evaluation subtask-1 (only for competition)************************
Start: March 1, 2020, midnight
Description: ********************** Evaluation subtask-2 (only for competition)************************
Start: March 18, 2020, 1 a.m.
Description: >> Please submit your results for Subtask-1 here after Mar 18, 2020 ! Here we only keep the latest results not the best ! [Post Evaluation]
Start: March 18, 2020, 1 a.m.
Description: >> Please submit your results for Subtask-2 here after Mar 18, 2020 ! Here we only keep the latest results not the best ! [Post Evaluation]
Sept. 14, 2050, midnight
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | Martin | 0.8790 |
2 | pouria_babvey | 0.8710 |
3 | Roger | 0.8660 |