SIMAH(SocIaL Media And Harassment) Categorizing Different Types of Online Harassment Language on Social Media
Online harassment is becoming prevalant as a specific communication type in Twitter. Considering the huge amount of user-genrated tweets each day, the problem of detecting and possibly limiting these contents automaticaaly in real time is becoming a fundamental problem specifically for female figures who have been harassed for a long time and Twitter was incapable of haleping them.
The proposed task consists of two subtasks and participants are requred to participate in both tasks:
Join the SIMAH mailing group: simah_competition_ecmlpkdd2019
Please note that the Google group will act as the main communication channel between the organizers and the participants.
For the evaluation of the results of both tasks different strategies and metrics are applied in order to allow for more fine-grained scores.
TASK A and B.
Systems will be evaluated using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.
The metrics will be computed as follows:
The evaluation script will be available in this GitHub repository:
During the Practice phase, the prediction files submitted by participants to the task page will be evaluated for the task A, and for demonstration purposes only; if participants wish to test the script on prediction files for task B as well, they could use the version available in the GitHub repository.
Terms and conditions
By submitting results to this competition, you consent to the public release of your scores at the SIMAH and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
You agree not to redistribute the test data except in the manner prescribed by its licence.
A participant can be involved in exactly one team (no more). If there are reasons why it makes sense for you to be on more than one team, then email us before the evaluation period begins. In special circumstances this may be allowed.
Each team must create and use exactly one CodaLab account.
The datasets must not be redistributed or shared in part or full with any third party. Redirect interested parties to this the organizers email.
If you use any of the datasets provided here, cite the following papaers:
The official SIMAH evaluation script takes one single prediction file as input, for each task, that MUST be a TSV file structured as follows:
Contrary to the trial and training set, the submission files do NOT have the header in the first line.
When submitting predictions to the task page in Codalab, one single file should be uploaded for each task, as a zip-compressed file, and it should be named according to the task predictions are submitted for, therefore:
For the Practice phase, more than one submission is allowed, BUT for the task A only. While during the Development and Evaluation phases, participants are free to submit their system's predictions for each language and task separately.
For the Development phase participants will be able to make more than one submission for each language and task, while for the Evaluation phase, a maximum of 2 submissions has been set for both task A and B, but please note that only the final valid one is taken as the official submission for the competition.
|Public Data||0.283||#2 Development phase|
Start: April 1, 2018, midnight
Description: Train and validation datasets for task A and B are available for training and validation. More than one submission allowed in this phase.
Start: June 23, 2019, 11 p.m.
Description: Up to 10 submissions are allowed, but only the final valid one is taken as the official submission for the competition.
June 23, 2019, 11 p.m.
You must be logged in to participate in competitions.Sign In