SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning

Organized by BoyuanZheng - Current server time: March 26, 2025, 8:29 a.m. UTC

First phase

Practice

Oct. 1, 2020, midnight UTC

End

Competition Ends

Jan. 31, 2021, 11 p.m. UTC

Overview
Evaluation
Terms and Conditions
baseline
Organizers
Submission and Evaluation

ReCAM: Reading Comprehension of Abstract Meaning

Boyuan Zheng, Xiaoyu Yang, Yu-Ping Ruan, Quan Liu, Zhen-Hua Ling, Si Wei, Xiaodan Zhu

We consider that computers' ability in understanding, representing, and expressing abstract meaning is a basic component of true natural language understanding. In the past decade significant advancement has been achieved or claimed in representation learning on many NLP problems. How such a success helps develop models for abstract meaning understanding and modelling?

The aim of this shared task is to provide a benchmark for studying machines' ability in representing and understanding abstract concepts. Specifically, computers are given passages to read and understand. If computers can digest the passages as human do, we expect they can predict abstract words that people use to write summaries after understanding the given passages. Note that unlike some previous datasets such as CNN/Daily Mail (Hermann et al., 2015) that request computers to predict concrete concepts, e.g., named entities, our task here asks algorithms to fill out abstract words removed from human written summaries.

Tasks

Our shared task has three subtasks. Subtask 1 and 2 focus on evaluating machine learning models' performance with regard to two definitions of abstractness (Spreen and Schulz, 1966; Changizi, 2008), which we call imperceptibility and nonspecificity, respectively. Subtask 3 aims to provide some insights to their relationships.

• Subtask 1: ReCAM-Imperceptibility

Concrete words refer to things, events, and properties that we can perceive directly with our senses (Spreen and Schulz, 1966; Coltheart 1981; Turney et al., 2011), e.g., donut, trees, and red. In contrast, abstract words refer to ideas and concepts that are distant from immediate perception. Examples include objective, culture, and economy. In subtask 1, the participanting systems are required to perform reading comprehension of abstract meaning for imperceptible concepts.

Below is an example. Given a passage and a question, your model needs to choose from the five candidates the best one for replacing @placeholder.

• Subtask 2: ReCAM-Nonspecificity

Subtask 2 focuses on a different type of definition. Compared to concrete concepts like groundhog and whale, hypernyms such as vertebrate are regarded as more abstract (Changizi, 2008).

• Subtask 3: ReCAM-Intersection

Subtask 3 aims to provide more insights to the relationship of the two views on abstractness, In this subtask, we test the performance of a system that is trained on one definition and evaluted on the other.

Important Dates

Trail data ready: July 31, 2020

Training data ready: October 1, 2020

Test data ready: December 3, 2020

Evaluation start: January 20, 2021

Evaluation end: January 31, 2021

Paper submission due: February 23, 2021

Notification to authors: March 29, 2021

Camera ready due: April 5, 2021

SemEval workshop: Summer 2021

Contact Us

Email: mrc-abstract-participants@googlegroups.com

Reference:

[1] Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil. "Teaching Machines to Read and Comprehend." Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Quebec, Canada, 2015, December, 1693-1701 [PDF]

[2] Otfried Spreen and Rudolph W. Schulz. "Parameters of abstraction, meaningfulness, and pronunciability for 329 nouns." Journal of Verbal Learning and VeFrbal Behavior, 5(5), 1966, 459-468 [PDF]

[3] Mark A. Changizi. "Economically organized hierarchies in WordNet and the Oxford English Dictionary." Cogn. Syst. Res., 9(3), 2008, 214-228 [PDF]

[4] Coltheart, Max. "The MRC psycholinguistic database." The Quarterly Journal of Experimental Psychology Section A, 33(4), 1981, 497-505 [PDF]

[5] Turney, Peter D. and Neuman, Yair and Assaf, Dan and Cohen, Yohai. "Literal and Metaphorical Sense Identification through Concrete and Abstract Context." Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, United Kingdom, 2011, July, 680-690 [PDF]

Evaluation Method

Accuracy is used as evaluation metric for all three subtasks.

Terms & Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2021 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include but are not limited to, automatic and manual quantitative judgments, qualitative judgments, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgment that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its license.

Baseline

Since the three subtasks share the same task format, baselines are applicable to all subtasks.

GA Reader

Instruction for GA Reader

You are free to build a system from scratch using any available software packages and resources, as long as they are not against the spirit of fair competition. In order to assist testing of ideas, we also provide GA Reader that you can build on. The use of this system is completely optional. The system is available.

Organizers

Boyuan Zheng
Northeastern University
steven.zheng010@gmail.com

Xiaoyu Yang
Queen's University
xiaoyu.yang@queensu.ca

Yu-Ping Ruan
University of Science and Technology of China
ypruan@mail.ustc.edu.cn

Quan Liu
iFlytek Research
quanliu@ustc.edu.cn

Zhen-Hua Ling
University of Science and Technology of China
zhling@ustc.edu.cn

Si Wei
iFlytek Research
siwei@iflytek.com

Xiaodan Zhu
Queen's University
xiaodan.zhu@queensu.ca

Submission Details & Evaluation Criteria

We provide datasets for task-1 and task-2 respectively, and both will include train.jsonl, dev.jsonl and test.jsonl.

Please note that you could only use the corresponding dataset for task-1 to build models for task-1 and dataset for task-2 to build models for task-2 to ensure fairness.

In 'Participate -> Submit/View Results -> Practice', you could try to submit your own results to verify the format. The golden labels for each subtask is the same as dev.jsonl.

A valid submission zip file for CodaLab contains one of the following files:

Practice phase: subtask1.csv and subtask2.csv (directly zip them first and submit to the Practice section.) We use the labels of
Subtask 1 evaluation: subtask1.csv (directly zip it first and submit to Subtask-1:ReCAM-Imperceptibility section.)
Subtask 2 evaluation: subtask2.csv (directly zip it first and submit to Subtask-2: ReCAM-Nonspecificity section.)

* The .csv file with the incorrect file name (sensitive to capitalization of letters) will not be accepted.

* Neither .csv nor .rar file will be accepted, only .zip file is accepted.

* Please zip your results files (e.g. subtask1.csv) directly without putting it into a folder and zipping the folder.

Please find example of submission file in this link:

Submission examples

Practice

Start: Oct. 1, 2020, midnight

Subtask-1: ReCAM-Imperceptibility

Start: Jan. 19, 2021, 6 p.m.

Subtask 2: ReCAM-Nonspecificity

Start: Jan. 25, 2021, noon

Competition Ends

Jan. 31, 2021, 11 p.m.

You must be logged in to participate in competitions.