SIMAH(SocIaL Media And Harassment) Categorizing Different Types of Online Harassment Language on Social Media

Organized by sima - Current server time: June 19, 2019, 8:53 p.m. UTC


Development phase
April 1, 2018, midnight UTC


Evaluation phase
June 23, 2019, 11 p.m. UTC


Competition Ends
June 23, 2019, 11 p.m. UTC

SIMAH(SocIaL Media And Harassment) Categorizing Different Types of Online Harassment Language on Social Media

Online harassment is becoming prevalant as a specific communication type in Twitter. Considering the huge amount of user-genrated tweets each day, the problem of detecting and possibly limiting these contents automaticaaly in real time is becoming a fundamental problem specifically for female figures who have been harassed for a long time and Twitter was incapable of haleping them.

The proposed task consists of two subtasks and participants are requred to participate in both tasks:

  • TASK A - Detection of tweet as being "Harassment" or "Not-harassment": It is a two-class(or binary) classification where systems have to predict whether a tweet is harassment or not.
  • TASK B - Classifying a harassing tweet into three categories of "Indirect harassment", "Physical harassment" or "Sexual harssment". It is a multiclass classification task.

Important dates

  • April 1: Data and a form(for getting the dataset) will be provided.
  • June 23 2019: Evaluation begins
  • June 25 2019: Evaluation ends
  • June 25 - June 28 2019: Results are notified to participants
  • June 28 - July 20 2019: System and Task description paper submission due
  • July 30 2019: Paper reviews due
  • August 1 2019: Author notifications
  • August 20 2019: Camera ready submissions due
  • September 16-20 2019: ECML PKDD 2019

Join the SIMAH mailing group: simah_competition_ecmlpkdd2019
Please note that the Google group will act as the main communication channel between the organizers and the participants.



  • Sima Sharifirad, Stan Matwin
    Dalhousie University, Institute of Big Data Analytics.


For the evaluation of the results of both tasks different strategies and metrics are applied in order to allow for more fine-grained scores.

TASK A and B.

Systems will be evaluated using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.
The metrics will be computed as follows:

  • Accuracy = (number of correctly predicted instances) / (total number of instances)
  • Precision = (number of correctly predicted instances) / (number of predicted labels)
  • Recall = (number of correctly predicted labels) / (number of labels in the gold standard)
  • F1-score = (2 * Precision * Recall) / (Precision * Recall)


Scoring program

The evaluation script will be available in this GitHub repository:


During the Practice phase, the prediction files submitted by participants to the task page will be evaluated for the task A, and for demonstration purposes only; if participants wish to test the script on prediction files for task B as well, they could use the version available in the GitHub repository.

Terms and conditions

By submitting results to this competition, you consent to the public release of your scores at the SIMAH and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its licence.
A participant can be involved in exactly one team (no more). If there are reasons why it makes sense for you to be on more than one team, then email us before the evaluation period begins. In special circumstances this may be allowed.
Each team must create and use exactly one CodaLab account.
The datasets must not be redistributed or shared in part or full with any third party. Redirect interested parties to this the organizers email.
If you use any of the datasets provided here, cite the following papaers:

  • When a Tweet is Actually Sexist. A more Comprehensive Classification of Different Online Harassment Categories and The Challenges in NLP:
  • How is Your Mood When Writing Sexist tweets? Detecting the Emotion Type and Intensity of Emotion Using Natural Language Processing Techniques:
  • Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs:
  • Submission Instructions

    The official SIMAH evaluation script takes one single prediction file as input, for each task, that MUST be a TSV file structured as follows:

    Task A






    Task B









    Contrary to the trial and training set, the submission files do NOT have the header in the first line.


    File names

    When submitting predictions to the task page in Codalab, one single file should be uploaded for each task, as a zip-compressed file, and it should be named according to the task predictions are submitted for, therefore:

    • en_a.tsv for predictions for taskA-English
    • es_b.tsv for predictions for taskB-Emglish



    For the Practice phase, more than one submission is allowed, BUT for the task A only. While during the Development and Evaluation phases, participants are free to submit their system's predictions for each language and task separately.

    For the Development phase participants will be able to make more than one submission for each language and task, while for the Evaluation phase, a maximum of 2 submissions has been set for both task A and B, but please note that only the final valid one is taken as the official submission for the competition.

    Download Size (mb) Phase
    Public Data 0.283 #2 Development phase

    Development phase

    Start: April 1, 2018, midnight

    Description: Train and validation datasets for task A and B are available for training and validation. More than one submission allowed in this phase.

    Evaluation phase

    Start: June 23, 2019, 11 p.m.

    Description: Up to 10 submissions are allowed, but only the final valid one is taken as the official submission for the competition.

    Competition Ends

    June 23, 2019, 11 p.m.

    You must be logged in to participate in competitions.

    Sign In