SemEval 2018 Task 4: Character Identification on Multiparty Dialogues

Organized by jdchoi - Current server time: April 27, 2025, 4:32 a.m. UTC

Previous

Development
Aug. 21, 2017, midnight UTC

Current

Evaluation
Jan. 8, 2018, 1 a.m. UTC

End

Competition Ends
Never

Important Notice

  • The evaluation period of this task has ended. Please visit our Github page for the detailed results as well as the datasets used for this task.
  • The official Character Identification site provides the latest updates of this task, including larger and cleaner datasets.  Please visit and give your feedback about this task.

Welcome

This is the CodaLab Competition for SemEval 2018 Task 4: Character Identification on Multiparty Dialogues.

Important Dates

  • 08/21/2017: Trial and training data release.

  • 01/08/2018: Test data release.

  • 01/29/2018: Evaluation end.

Description

Character Identification is an entity linking task that identifies each mention as a certain character in multiparty dialogue. Let a mention be a nominal referring to a person (e.g., she, mom, Judy), and an entity be a character in a dialogue. The goal is to assign each mention to its entity, who may or may not participate in the dialogue. For the following example, the mention "mom" is not one of the speakers; nonetheless, it clearly refers to the specific person, Judy, that could appear in some other dialogue. Identifying such mentions as real characters requires cross-document entity resolution, which makes this task challenging.

This year's competition is focused on singular mentions using gold boundaries.  We plan to open another competition by challenging plural mentions as well as ambiguous mention types using predicted mention boundaries in the following year.

Organizers

References

 

Datasets

The first two seasons of the TV show Friends are annotated for this task.  Each season consists of episodes, each episode comprises scenes, and each scene is segmented into sentences.  The followings describe the distributed datasets:

  • friends.train.episode_delim.conll: the training data where each episode is considered a document.
  • friends.train.scene_delim.conll: the training data where each scene is considered a document.
  • friends.trial.episode_delim.conll: the trial data where each episode is considered a document (the first two episodes of the training data).
  • friends.trial.scene_delim.conll: the trial data where each scene is considered a document (the first two episodes of the training data).
  • friends.test.episode_delim.conll.nokey: the test data where each episode is considered a document (gold keys are replaced by -1).
  • friends.test.scene_delim.conll.nokey: the test data where each scene is considered a document (gold keys are replaced by -1).

Data Format

All datasets follow the CoNLL 2012 Shared Task data format.  Documents are delimited by the comments in the following format:

#begin document (<Document ID>)[; part ###]
...
#end document

Each sentence is delimited by a new line ("\n") and each column indicates the following:

  1. Document ID: /<name of the show>-<season ID><episode ID> (e.g., /friends-s01e01).
  2. Scene ID: the ID of the scene within the episode.
  3. Token ID: the ID of the token within the sentence.
  4. Word form: the tokenized word.
  5. Part-of-speech tag: the part-of-speech tag of the word (auto generated).
  6. Constituency tag: the Penn Treebank style constituency tag (auto generated).
  7. Lemma: the lemma of the word (auto generated).
  8. Frameset ID: not provided (always “_”).
  9. Word sense: not provided (always “_”).
  10. Speaker: the speaker of this sentence.
  11. Named entity tag: the named entity tag of the word (auto generated).
  12. Entity ID: the entity ID of the mention, that is consistent across all documents.

Here is a sample from the training dataset:

/friends-s01e01 0 0 He PRP (TOP(S(NP*) he - - Monica_Geller * (284)
/friends-s01e01 0 1 's VBZ (VP* be - - Monica_Geller * -
/friends-s01e01 0 2 just RB (ADVP*) just - - Monica_Geller * -
/friends-s01e01 0 3 some DT (NP(NP* some - - Monica_Geller * -
/friends-s01e01 0 4 guy NN *) guy - - Monica_Geller * (284)
/friends-s01e01 0 5 I PRP (SBAR(S(NP*) I - - Monica_Geller * (248)
/friends-s01e01 0 6 work VBP (VP* work - - Monica_Geller * -
/friends-s01e01 0 7 with IN (PP*)))))) with - - Monica_Geller * -
/friends-s01e01 0 8 ! . *)) ! - - Monica_Geller * -
/friends-s01e01 0 0 C'mon VB (TOP(S(S(VP*)) c'mon - - Joey_Tribbiani * -
/friends-s01e01 0 1 , , * , - - Joey_Tribbiani * -
/friends-s01e01 0 2 you PRP (NP*) you - - Joey_Tribbiani * (248)
/friends-s01e01 0 3 're VBP (VP* be - - Joey_Tribbiani * -
/friends-s01e01 0 4 going VBG (VP* go - - Joey_Tribbiani * -
/friends-s01e01 0 5 out RP (PRT*) out - - Joey_Tribbiani * -
/friends-s01e01 0 6 with IN (PP* with - - Joey_Tribbiani * -
/friends-s01e01 0 7 the DT (NP* the - - Joey_Tribbiani * -
/friends-s01e01 0 8 guy NN *)))) guy - - Joey_Tribbiani * (284)
/friends-s01e01 0 9 ! . *)) ! - - Joey_Tribbiani * -

A mention may include more than one word:

/friends-s01e02 0 0 Ugly JJ (TOP(S(NP(ADJP* ugly - - Chandler_Bing * (380
/friends-s01e02 0 1 Naked JJ *) naked - - Chandler_Bing * -
/friends-s01e02 0 2 Guy NNP *) Guy - - Chandler_Bing * 380)
/friends-s01e02 0 3 got VBD (VP* get - - Chandler_Bing * -
/friends-s01e02 0 4 a DT (NP* a - - Chandler_Bing * -
/friends-s01e02 0 5 Thighmaster NN *)) thighmaster - - Chandler_Bing * -
/friends-s01e02 0 6 ! . *)) ! - - Chandler_Bing * -

The mapping between the entity ID and the actual character can be found in friends_entity_map.txt.

Evaluation

Your output must consist of the entity ID of each mention, one per line, in the sequential order.  There are 6 mentions in the above example, which will generate the following output:

284
284
248
248
284
380

Given this output, the evaluation script will measure,

  1. The label accuracy considering only 7 entities, that are the 6 main characters (Chandler, Joey, Monica, Phoebe, Rachel, and Ross) and all the others as one entity.
  2. The macro average between the F1 scores of the 7 entities.
  3. The label accuracy considering all entities, where characters not appearing in the tranining data are grouped as one entity, others.
  4. The macro average between the F1 scores of all entities.
  5. The F1 scores for 7 entities.
  6. The F1 scores for all entities.

The following shows the command to run the evaluation script:

python evaluate.py input_dir output_dir
  • input_dir: the input directory that consists of ref/answer.txt and res/answer.txt; these files contain gold and predicted keys, respectivley. The gold keys for the trial data can be found here.
  • output_dir: the output directory where scores.txt is saved.

The macro average between the F1 scores of all entities will be applied to the leaderboard.

Terms and conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its licence.

All Entities + Others

This evaluation considers all characters appearing in training, development, and evaluation sets as individual classes.  Characters that appear only one or two of these sets are grouped as one class called OTHERS.

User ID Label Accuracy Average F1
AMORE UPF 74.72 41.05
Cheoneum 68.55 13.53
Kampfpudding 59.45 37.37
zuma 25.81 14.42

 

Main Entities + Others

This evaluation considers 6 main characters as individual classes and all the other characters as one class called OTHERS.

User ID Label Accuracy Average F1
Cheoneum 82.13 83.37
AMORE UPF 77.23 79.36
Kampfpudding 73.36 73.51
zuma 46.07 43.15

 

 

Development

Start: Aug. 21, 2017, midnight

Evaluation

Start: Jan. 8, 2018, 1 a.m.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In