SemEval 2019 Task 5 - Shared Task on Multilingual Detection of Hate

Organized by msang - Current server time: March 28, 2025, 4:27 p.m. UTC

Previous

Evaluation-English-B
Jan. 10, 2019, midnight UTC

Current

Evaluation-Spanish-B
Jan. 10, 2019, midnight UTC

End

Competition Ends
Never

Multilingual detection of hate speech against immigrants and women in Twitter (hatEval)

Hate Speech is commonly defined as any communication that disparages a person or a group on the basis of some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other characteristics. Given the huge amount of user-generated contents on the Web, and in particular on social media, the problem of detecting, and therefore possibly limit the Hate Speech diffusion, is becoming fundamental, for instance for fighting against misogyny and xenophobia.

The proposed task consists in Hate Speech detection in Twitter but featured by two specific different targets, immigrants and women, in a multilingual perspective, for Spanish and English.
The task will be articulated around two related subtasks for each of the involved languages: a basic task about Hate Speech, and another one where fine-grained features of hateful contents will be investigated in order to understand how existing approaches may deal with the identification of especially dangerous forms of hate, i.e. those where the incitement is against an individual rather than against a group of people, and where an aggressive behavior of the author can be identified as a prominent feature of the expression of hate. Participants will be asked to identify, on the one hand, if the target of hate is a single human or a group of persons, on the other hand, if the message author intends to be aggressive, harmful, or even to incite, in various forms, to violent acts against the target.

  • TASK A - Hate Speech Detection against Immigrants and Women: a two-class (or binary) classification where systems have to predict whether a tweet in English or in Spanish with a given target (women or immigrants) is hateful or not hateful.
  • TASK B - Aggressive behavior and Target Classification: where systems are asked first to classify hateful tweets for English and Spanish (e.g., tweets where Hate Speech against women or immigrants has been identified) as aggressive or not aggressive, and second to identify the target harassed as individual or generic (i.e. single human or group).

Important dates

  • January 10 2019: Evaluation begins
  • January 20 2019: Evaluation ends
  • February 05 2019: Results are notified to participants
  • February 28 2019: System and Task description paper submission due
  • March 14 2019: Paper reviews due
  • April 06 2019: Author notifications
  • April 20 2019: Camera ready submissions due
  • Summer 2019: SemEval 2019

Join the hatEval mailing group: semeval2019-task5-hateval[at]googlegroups.com

Organizers:

  • Paolo Rosso
    prosso@dsic.upv.es
    Universidad Politecnica de Valencia, Valencia (Spain)
  • Francisco Rangel
    francisco.rangel@autoritas.es
    Autoritas Consulting S.A., Madrid (Spain)
  • Elisabetta Fersini
    fersiniel@disco.unimib.it
    Università degli Studi di Milano Bicocca, Milan (Italy)
  • Debora Nozza
    debora.nozza@disco.unimib.it 
    Università degli Studi di Milano Bicocca, Milan (Italy)
  • Viviana Patti
    viviana.patti@unito.it
    Università degli Studi di Torino, Turin (Italy)
  • Valerio Basile
    valerio.basile@unito.it
    Università degli Studi di Torino, Turin (Italy)
  • Cristina Bosco
    cristina.bosco@unito.it
    Università degli Studi di Torino, Turin (Italy)
  • Manuela Sanguinetti
    manuela.sanguinetti@unito.it
    Università degli Studi di Torino, Turin (Italy)

 

 

Evaluation

For the evaluation of the results of task A and B different strategies and metrics are applied in order to allow for more fine-grained scores.

TASK A.

Systems will be evaluated using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.
The metrics will be computed as follows:

  • Accuracy = (number of correctly predicted instances) / (total number of instances)
  • Precision = (number of correctly predicted instances) / (number of predicted labels)
  • Recall = (number of correctly predicted labels) / (number of labels in the gold standard)
  • F1-score = (2 * Precision * Recall) / (Precision + Recall)

TASK B.

Systems will be evaluated on the basis of two criteria: partial match and exact match.

  • Partial match: each dimension to be predicted (Hate Speech HS, Target TR and Aggressiveness AG) will be evaluated independently of the others using standard evaluation metrics, including accuracyprecisionrecall and F1-score as defined above. The report for each participant will include all the measures and a summary of the performance in terms of macro-average F1-score, computed as follows:


  • Exact match: all the dimensions to be predicted will be jointly considered computing the Exact Match Ratio (Kazawa, 2005). Given the multi-label dataset consisting of n multi-label samples (xi,Yi), where xi denotes the i-th instance and Yi represents the corresponding set of labels to be predicted (HS ∈ {0,1}, TR ∈ {0,1} and AG ∈ {0,1}), the Exact Match Ratio (EMR) will be computed as follows:

                 

            where Zi denotes the set of labels predicted for the i-th instance and I is the indicator function.

           The submissions will be ranked by EMR. This choice is motivated by the willingness of capturing the most difficult task of capturing the entire phenomena, and therefore to identify the most dangerous behaviours against the targets.

 

Scoring program

The evaluation script is available in this GitHub repository:

https://github.com/msang/hateval/tree/master/SemEval2019-Task5/evaluation

NOTE

During the Practice phase, the prediction files submitted by participants to the task page will be evaluated for the task A, and for demonstration purposes only; if participants wish to test the script on prediction files for task B as well, they could use the version available in the GitHub repository.

For the Development and Evaluation phases, the script will provide a complete evaluation for each language and task for any submitted file, provided that the latter meet the submission requirements (see Submission Instructions).

Submission Instructions

The official hatEval evaluation script takes one single prediction file as input, for each task and for each language, that MUST be a TSV file structured as follows:

Task A

id[tab]{0|1}

e.g.

101[tab]1

102[tab]0

103[tab]1

Task B

id[tab]{0|1}[tab]{0|1}[tab]{0|1}

e.g.

101[tab]1[tab]1[tab]1

102[tab]0[tab]0[tab]0

103[tab]1[tab]1[tab]0

104[tab]1[tab]0[tab]0

105[tab]1[tab]0[tab]1

 

Contrary to the trial and training set, the submission files do NOT have the header in the first line.

 

File names

When submitting predictions to the task page in Codalab, one single file should be uploaded for each task and language, as a zip-compressed file, and it should be named according to the language and task predictions are submitted for, therefore:

  • en_a.tsv for predictions for taskA-English
  • es_a.tsv for predictions for taskA-Spanish
  • en_b.tsv for predictions for taskB-English
  • es_b.tsv for predictions for taskB-Spanish

 

NOTE

For the Practice phase, more than one submission is allowed, BUT for the task A only. While during the Development and Evaluation phases, participants are free to submit their system's predictions for each language and task separately.

For the Development phase participants will be able to make more than one submission for each language and task, while for the Evaluation phase, a maximum of 2 submissions has been set for each language and for both task A and B, but please note that only the final valid one is taken as the official submission for the competition.

Terms and conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2019 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its licence.

Practice

Start: Aug. 20, 2018, midnight

Description: Trial data available.

Development-English-A

Start: Sept. 17, 2018, midnight

Description: English dataset for task A available for training. More than one submission allowed in this phase.

Development-Spanish-A

Start: Sept. 17, 2018, midnight

Description: Spanish dataset for task A available for training. More than one submission allowed in this phase.

Development-Spanish-B

Start: Sept. 17, 2018, midnight

Description: Spanish dataset for task B available for training. More than one submission allowed in this phase.

Development-English-B

Start: Sept. 17, 2018, midnight

Description: English dataset for task B available for training. More than one submission allowed in this phase.

Evaluation-English-A

Start: Jan. 10, 2019, midnight

Description: English test set available for task A. Up to 2 submissions are allowed, but only the final valid one is taken as the official submission for the competition.

Evaluation-Spanish-A

Start: Jan. 10, 2019, midnight

Description: Spanish test set available for task A. Up to 2 submissions are allowed, but only the final valid one is taken as the official submission for the competition.

Evaluation-English-B

Start: Jan. 10, 2019, midnight

Description: English test set available for task B. Up to 2 submissions are allowed, but only the final valid one is taken as the official submission for the competition.

Evaluation-Spanish-B

Start: Jan. 10, 2019, midnight

Description: Spanish test set available for task B. Up to 2 submissions are allowed, but only the final valid one is taken as the official submission for the competition.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In