SemEval-2017 Task 3 Subtask C

Organized by DorisHoogeveen - Current server time: June 25, 2018, 3:58 p.m. UTC

Previous

Testing
Jan. 9, 2017, midnight UTC

Current

Development
Aug. 1, 2016, midnight UTC

End

Competition Ends
Jan. 31, 2017, noon UTC

Welcome!

Community Question Answering (CQA) forums are gaining popularity online. They are seldom moderated, rather open, and thus they have few restrictions, if any, on who can post and who can answer a question. On the positive side, this means that one can freely ask any question and expect some good, honest answers. On the negative side, it takes effort to go through all possible answers and to make sense of them. For example, it is not unusual for a question to have hundreds of answers, which makes it very time consuming to the user to inspect and to winnow. The challenge we propose may help automate the process of finding good answers to new questions in a community-created discussion forum (e.g., by retrieving similar questions in the forum and identifying the posts in the answer threads of those questions that answer the question well).

We build on the success of the previous editions of our SemEval tasks on CQA, SemEval-2015 Task 3 and SemEval-2016 Task 3, and present an extended edition for SemEval-2017, which incorporates several novel facets.

This CodaLab competition is for Subtask C of SemEval Task 3: the Question-External Comment Similarity Subtask (English)

This is the main English subtask.

Given:

  • a new question (aka the original question),
  • the set of the first 10 related questions (retrieved by a search engine), each associated with its first 10 comments appearing in its thread,

rerank the 100 comments (10 questions x 10 comments) according to their relevance with respect to the original question. We want the "Good" comments to be ranked above the "PotentiallyUseful" or "Bad" comments, which will be considered just bad in terms of evaluation (the gold labels are contained in the RELC_RELEVANCE2ORGQ field of the related XML file). We will evaluate the position of good comments in the rank; thus, this is again a ranking task.

Although, the systems are supposed to work on 100 comments, we take an application-oriented view in the evaluation: we assume that potential users are presented with a relatively short list of candidate answers (e.g., 10 as in common search engines today). Thus, the users would like to have good comments to be concentrated in the first 10 positions, (i.e., all good comments ranked before any non-good comment). We believe the user cares much less about what happens in lower positions (e.g., after the 10th) in the rank, as they typically do not ask for the next page of the next 10 comments. This will be reflected in our primary evaluation score, which only considers the top-10 results.

More information on the task and all the subtasks can be found on the SemEval Task website.

Evaluation Criteria

On the Leaderboard three scores will be provided: MAP, Average Recall, and MRR, but the official evaluation measure towards which all systems will be evaluated and ranked will be mean average precision (MAP) using the first 10 ranked comments only.

Note: The datasets are already provided in a form appropriate for this subtask. For each original question there is a list of 10 related questions with 10 comments each (see the README file that comes with the data distribution). The test set will follow the same format. The format required for the output of your systems will be detailed in the scorer and in the format-checker README files. These can be found here.

The name of the development file you submit needs to be SemEval2017-Task3-CQA-QL-dev.xml.subtaskC.pred, and it needs to be zipped.

The name of the test file you submit needs to be SemEval2017-task3-English-test.xml.subtaskC.pred, and it needs to be zipped.

Terms and Conditions

By participating in this competition and submitting results in CodaLab you agree to the public release of your results in the proceedings of SemEval 2017. Furthermore, you accept that the choice of evaluation metric is made by the task organizers, and that they have the right to decide the winner of the competition, and to disqualify teams if they do not follow the rules of the competition.

Development

Start: Aug. 1, 2016, midnight

Description: The development phase

Testing

Start: Jan. 9, 2017, midnight

Description: The testing phase

Competition Ends

Jan. 31, 2017, noon

You must be logged in to participate in competitions.

Sign In
# Username Score
1 TitasNandi 0.380
2 DorisHoogeveen 0.306
3 preslav 0.138