CodaLab - Competition

SemEval-2020 Task5: Modelling Causal Reasoning in Language: Detecting Counterfactuals

Organized by Ariel_yang - Current server time: April 23, 2025, 4:23 p.m. UTC

Post-Evaluation-Subtask1

March 18, 2020, 1 a.m. UTC

Current

Post-Evaluation-Subtask2

March 18, 2020, 1 a.m. UTC

End

Competition Ends

Sept. 14, 2050, midnight UTC

Overview
Submission and Evaluation
Terms and Conditions
Resources
Organizer List

Our Task Description Paper

Our task description paper:

SemEval-2020 Task 5: Counterfactual Recognition (available on arXiv 2020-08-03)

Our dataset is allowed to be used in any paper, only upon citation (BibTex as below):

(1) LaTex version:

@inproceedings{yang-2020-semeval-task5,

title = "{S}em{E}val-2020 Task 5: Counterfactual Recognition",

author = "Yang, Xiaoyu and

Obadinma, Stephen and

Zhao, Huasha and

Zhang, Qiong and

Matwin, Stan and

Zhu, Xiaodan",

booktitle = "Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020)",

year = "2020",

address = "Barcelona, Spain",

}

(2) MS Word:

Xiaoyu Yang, Stephen Obadinma, Huasha Zhao, Qiong Zhang, Stan Matwin, and Xiaodan Zhu. 2020. SemEval-2020 Task 5: Counterfactual Recognition. In Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020), Barcelona, Spain.

Task Description

To model counterfactual semantics and reasoning in natural language, our shared task aims to provide a benchmark for two basic problems.

Subtask1: Detecting counterfactual statements

In this task, you are asked to determine whether a given statement is counterfactual or not. Counterfactual statements describe events that did not actually happen or cannot happen, as well as the possible consequence if the events have had happened. More specifically, counterfactuals describe events counter to facts and hence naturally involve common sense, knowledge, and reasoning. Tackling this problem is the basis for all down-stream counterfactual related causal inference analysis in natural language. For example, the following statements are counterfactuals that need to be detected: one from healthcare and one from the finance domain:

Her post-traumatic stress could have been avoided if a combination of paroxetine and exposure therapy had been prescribed two months earlier.
Finance Minister Jose Antonio Meade noted that if a jump in tomato prices had been factored out, inflation would have begun to drop.

While the above examples are chosen for clarity for demonstration, real statements are harder for computers to judge.

Subtask2: Detecting antecedent and consequence

Indicating causal insight is an inherent characteristic of counterfactual. To further detect the causal knowledge conveyed in counterfactual statements, subtask 2 aims to locate antecedent and consequent in counterfactuals.

According to (Nelson Goodman, 1947. The problem of counterfactual conditionals), a counterfactual statement can be converted to a contrapositive with a true antecedent and consequent. Consider the “post-traumatic stress” example discussed above; it can be transposed into “because her post-traumatic stress was not avoided, (we know) a combination of paroxetine and exposure therapy was not prescribed”. Such knowledge can be not only used for analyzing the specific statement but also be accumulated across corpora to develop domain causal knowledge (e.g., a combination of paroxetine and exposure may help cure post-traumatic stress).

Please note that in some cases there is only an antecedent part while without a consequent part in a counterfactual statement. For example, "Frankly, I wish he had issued this order two years ago instead of this year", in this sentence we could only get the antecedent part. In our subtask2, when locating the antecedent and consequent part, please set '-1' as consequent starting index (character index) and ending index (character index) to refer that there is no consequent part in this sentence. For details, please refer to the 'Evaluation' on this website.

Important Dates

The important dates have been updated as below according to the updated SemEval-2020 schedule. For the details, please refer to the official website of SemEval-2020: http://alt.qcri.org/semeval2020/

19 February 2020: Evaluation start*
11 March 2020: Evaluation end*
18 March 2020: Results posted
15 May 2020: System description paper submissions due (11:59pm, UTC-12)
22 May 2020: Task description paper submissions due (11:59pm, UTC-12)
24 June 2020: Author notifications
24 July 2020: Camera-ready submissions due for system description papers (11:59pm, UTC-12)
31 July 2020: Camera-ready submissions due for task description papers (11:59pm, UTC-12)
12-13 December 2020: SemEval 2020

Contact Us

Email: task5.counterfactual AT gmail.com

Submission Details & Evaluation Criteria

We provide datasets for task-1 and task-2 respectively, and both will include train.csv and test.csv.

Please note that you could only use the corresponding dataset for task-1 to build models for task-1 and dataset for task-2 to build models for task-2 to ensure fairness.

In 'Participate -> Submit/View Results -> Practise-Subtask1' or '...->Practise-Subtask2', you could try to submit your own results to verify the format.

A valid submission zip file for CodaLab contains one of the following files:

subtask1.csv (directly zip it first and only submitted to "xxx-Subtask1" section)
subtask2.csv (directly zip it first and only submitted to "xxx-Subtask2" section)

* The .csv file with the incorrect file name (sensitive to capitalization of letters) will not be accepted.

* A zip file containing both files will not be accepted.

* Neither .csv nor .rar file will be accepted, only .zip file is accepted.

* Please zip your results file (e.g. subtask1.csv) directly without putting it into a folder and zipping the folder.

Submission format for task1

For the pred_label, '1' refers to counterfactual while '0' refers to non-counterfactual. The 'sentenceID' should be in the same order as in 'test.csv' for subtask-1 (in evaluation phase).

sentenceID	pred_label
322893	1
322892	0
...	...

Submission format for task2

If there is no consequent part (a consequent part not always exists in a counterfactual statement) in this sentence, please put '-1' in the consequent_startid and 'consequent_endid'. The 'sentenceID' should be in the same order as in 'test.csv' for subtask-2 (in evaluation phase).

sentenceID	antecedent_startid	antecedent_endid	consequent_startid	consequent_endid
104975	15	72	88	100
104976	18	38	-1	-1
...	...	...	...	...

Example of train.csv for subtask1

sentenceID,gold_label,sentence

"6000627","1","Had Russia possessed such warships in 2008, boasted its naval chief, Admiral Vladimir Vysotsky, it would have won its war against Georgia in 40 minutes instead of 26 hours."

sentenceID: indicating which sentence you are labeling
gold_label: if you estimate the sentence is counterfactual, put 1, otherwise please put 0
sentence: the original sentence as the one in the provided dataset

Example of train.csv for subtask2

sentenceID,sentence,domain,antecedent_startid,antecedent_endid,consequence_startid, consequence_endid

3S0001,"For someone who's so emotionally complicated, who could have given up many times if he was made of straw - he hasn't.",Health,83,105,48,81

sentenceID: indicating which sentence you are labeling
sentence: the original sentence as the provided dataset
domain: the sentence related to a specific domain
antecedent_startid: the index of the original sentence where your predicted antecedent starts (index of the character in the corresponding sentence)
antecedent_endid: the index of the original sentence where your predicted antecedent ends (index of the character in the corresponding sentence)
consequent_startid: the index of the original sentence where your predicted consequence starts (if the consequent part is not available, put -1 here)
consequent_endid: the index of the original sentence where your predicted consequence ends (if the consequent part is not available, put -1 here)

Evaluation Method

Participants have to participate in both of the 2 tasks. The evaluation metrics that will be applied are:

Subtask1: Precision, Recall, and F1

The evaluation script will verify whether the predicted binary "label" is the same as the desired "label" which is annotated by human workers, and then calculate its precision, recall, and F1 scores.

Subtask2: Exact Match, Precision, Recall, and F1

Exact Match will represent what percentage of both your predicted antecedents and consequences are exactly matched with the desired outcome that is annotated by human workers.

F1 score is a token level metric and will be calculated according to the submitted antecedent_startid, antecedent_endid, consequent_startid, consequent_endid. Please refer to our baseline model for evaluation details.

Terms & Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2020 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include but are not limited to, automatic and manual quantitative judgments, qualitative judgments, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgment that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its license.

Note

(1) Please note that you could only use the provided dataset for task-1 to build models for task-1 and dataset for task-2 to build models for task-2 to ensure fairness.

(2) Please use the corresponding training data for Subtask-1 and 2 to train your own models. The trial data is only used to show the format of our dataset but not for the competition.

Download Dataset and Baseline Code

Click to download from Github

Click to download from Zenodo

Subtask-1

train.csv (released on Nov 8)

test.csv (released on Feb 19, 2020 )
Baseline (with eval method): SVM

Subtask-2

train.csv (released on Nov 16)
test.csv (released on Mar 1, 2020)
Baseline (with eval method): Sequence Labeling Model

Xiaodan Zhu, Queen's University

Xiaoyu Yang, Queen's University

Huasha Zhao, Alibaba Group

Qiong Zhang, Alibaba Group

Stan Matwin, Dalhousie University

We also kindly thank Jiaqi Li, Qianyu Zhang, Stephen Obadinma, Xiao Chu and Rohan for their help and effort in this project.

Practice-Subtask1

Start: Sept. 1, 2019, midnight

Description: ****** here you submit the results for subtask-1! [phase: practice] ****** ( Please choose a specific task and a target phase before submitting your answers! )

Practice-Subtask2

Start: Sept. 1, 2019, midnight

Description: ****** here you submit the results for subtask-2! [phase: practice] ****** ( Please choose a specific task and a target phase before submitting your answers! ) e you submit the results for subtask-2 !!! ************************ ( Please choose a specific task and a target phase before submitting your answers! )

Evaluation-Subtask1

Start: Feb. 19, 2020, midnight

Description: ********************** Evaluation subtask-1 (only for competition)************************

Evaluation-Subtask2

Start: March 1, 2020, midnight

Description: ********************** Evaluation subtask-2 (only for competition)************************

Post-Evaluation-Subtask1

Start: March 18, 2020, 1 a.m.

Description: >> Please submit your results for Subtask-1 here after Mar 18, 2020 ! Here we only keep the latest results not the best ! [Post Evaluation]

Post-Evaluation-Subtask2

Start: March 18, 2020, 1 a.m.

Description: >> Please submit your results for Subtask-2 here after Mar 18, 2020 ! Here we only keep the latest results not the best ! [Post Evaluation]

Competition Ends

Sept. 14, 2050, midnight

You must be logged in to participate in competitions.

#	Username	Score
1	Martin	0.8790
2	pouria_babvey	0.8710
3	Roger	0.8660

Competition

SemEval-2020 Task5: Modelling Causal Reasoning in Language: Detecting Counterfactuals

Previous

Current

End

Our Task Description Paper

Task Description

Subtask1: Detecting counterfactual statements

Subtask2: Detecting antecedent and consequence

Important Dates

Contact Us

Submission Details & Evaluation Criteria

Submission format for task1

Submission format for task2

Example of train.csv for subtask1

Example of train.csv for subtask2

Evaluation Method

Terms & Conditions

Note

Download Dataset and Baseline Code

Subtask-1

train.csv (released on Nov 8)

test.csv (released on Feb 19, 2020 )

Subtask-2

train.csv (released on Nov 16)

test.csv (released on Mar 1, 2020)

Practice-Subtask1

Practice-Subtask2

Evaluation-Subtask1

Evaluation-Subtask2

Post-Evaluation-Subtask1

Post-Evaluation-Subtask2

Competition Ends