SemEval-2017 Task 3 Subtask D

Organized by DorisHoogeveen - Current server time: Sept. 24, 2018, 11:51 a.m. UTC

Previous

Testing
Jan. 9, 2017, midnight UTC

Current

Development
Aug. 1, 2016, midnight UTC

End

Competition Ends
Jan. 31, 2017, noon UTC

Welcome!

Community Question Answering (CQA) forums are gaining popularity online. They are seldom moderated, rather open, and thus they have few restrictions, if any, on who can post and who can answer a question. On the positive side, this means that one can freely ask any question and expect some good, honest answers. On the negative side, it takes effort to go through all possible answers and to make sense of them. For example, it is not unusual for a question to have hundreds of answers, which makes it very time consuming to the user to inspect and to winnow. The challenge we propose may help automate the process of finding good answers to new questions in a community-created discussion forum (e.g., by retrieving similar questions in the forum and identifying the posts in the answer threads of those questions that answer the question well).

We build on the success of the previous editions of our SemEval tasks on CQA, SemEval-2015 Task 3 and SemEval-2016 Task 3, and present an extended edition for SemEval-2017, which incorporates several novel facets.

This CodaLab competition is for Subtask D of SemEval Task 3: the Arabic Subtask

Task D: Rerank the correct answers for a new question.

Given the extra-challenges that the Arabic language entails (e.g., it is not spoken by most NLP researchers and there are less resources and toolkits available), we target only one task, which is a simplified version of Subtask C for English.

Given:

  • a new question (aka the original question),
  • the set of the first 30 related questions (retrieved by a search engine), each associated with one correct answer (which typically have a size of one or two paragraphs),

rerank the 30 question-answer pairs according to their relevance with respect to the original question. We want the "Direct" (D) and the "Relevant" (R) answers to be ranked above "Irrelevant" answers (I); the former two will be considered "Relevant" in terms of evaluation (gold labels are contained in the QArel field of the XML file). We will evaluate the position of "Relevant" answers in the rank, therefore, this is again a ranking task.

Unlike the English subtasks, here we use 30 answers since the retrieval task is much more difficult, leading to low recall, and the frequency of correct answers is much lower.

More information on the task and all the subtasks can be found on the SemEval Task website.

Evaluation Criteria

On the Leaderboard three scores will be provided: MAP, Average Recall, and MRR, but the official evaluation measure towards which all systems will be evaluated and ranked will be mean average precision (MAP) using the top-10 ranked question-answer pairs.

Note: The datasets are already provided in a form appropriate for this subtask. For each original question there is a list of 30 related question-answer pairs (see the README file that comes with the data distribution). The test set will follow the same format. The format required for the output of your systems will be detailed in the scorer and in the format-checker README files. These can be found here. We used a different terminology to better characterize the different Arabic subtask, which is much more similar to a traditional QA task. Said that, "Direct", "Relevant" and "Irrelevant" may be roughly mapped to "PerfectMatch", "Relevant" and "Bad", respectively, in the English Subtask B. Note, however, that for the Arabic subtask D, we only evaluate positively the "Direct" and teh "Relevant" answers, while the "Irrelevant" ones have to be pushed below them.

The name of the development file you submit needs to be SemEval2017-Task3-CQA-MD-dev-subtaskD.xml.pred, and it needs to be zipped.

The name of the test file you submit needs to be SemEval2017-Task3-CQA-MD-test.xml.subtaskD.pred, and it needs to be zipped.

Terms and Conditions

By participating in this competition and submitting results in CodaLab you agree to the public release of your results in the proceedings of SemEval 2017. Furthermore, you accept that the choice of evaluation metric is made by the task organizers, and that they have the right to decide the winner of the competition, and to disqualify teams if they do not follow the rules of the competition.

Development

Start: Aug. 1, 2016, midnight

Description: The development phase

Testing

Start: Jan. 9, 2017, midnight

Description: The testing phase

Competition Ends

Jan. 31, 2017, noon

You must be logged in to participate in competitions.

Sign In
# Username Score
1 UPC-USMBA 0.416
2 preslav 0.311
3 DorisHoogeveen 0.286