The large fraction of hate speech and other offensive and objectionable content online poses a huge challenge to societies. Offensive language such as insulting, hurtful, derogatory or obscene content directed from one person to another person and open for others undermines objective discussions. Such type of language can be more increasingly found on the web and can lead to the radicalization of debates. Public opinion forming requires rational critical discourse (Habermas 1984). Objectionable content can pose a threat to democracy. At the same time, open societies need to find an adequate way to react to such content without imposing rigid censorship regimes. As a consequence, many platforms of social media websites monitor user posts. This leads to a pressing demand for methods to automatically identify suspicious posts. Online communities, social media enterprises and technology companies have been investing heavily in technology and processes to identify offensive language in order to prevent abusive behavior in social media.
HASOC provides a forum and a data challenge for multilingual research on the identification of problematic content. This year, we offer again 2 sub-tasks for each language such as English, German and Hindi, alltogether over 10.000 annotated tweets from Twitter. Participants in this year’s shared task can choose to participate in one or two of the subtasks. Participants can look at the openly available data of HASOC 2019: https://hasocfire.github.io/hasoc/2019/dataset.html
There are two sub-tasks in each of the languages. Below is a brief description of each task.
This task focus on Hate speech and Offensive language identification offered for English, German, and Hindi. Sub-task A is coarse-grained binary classification in which participating system are required to classify tweets into two classes, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).
This sub-task is a fine-grained classification offered for English, German, and Hindi. Hate-speech and offensive posts from the sub-task A are further classified into three categories:
HATE SPEECH: Describing negative attributes or deficiencies to groups of individuals because they are members of a group (e.g. all poor people are stupid). Hateful comment toward groups because of race, political opinion, sexual orientation, gender, social status, health condition or similar.
OFFENSIVE: Posts which are degrading, dehumanizing,insulting an individual,threatening with violent acts are categorized into OFFENSIVE category.
PROFANITY: Unacceptable language in the absence of insults and abuse. This typically concerns the usage of swearwords (Scheiße, Fuck etc.) and cursing (Zur Hölle! Verdammt! etc.) are categorized into this category.
For more detailed information, please refer to this link.
Please contact at hasocfire@gmail.com or post on the competition forum if you have any further queries.
Language tasks consist of 2 sub-tasks. Teams can participate in any/all of the subtasks.
Sub-task A and B are evaluated by F1 macro-average which follows scikit-learn.
Note: The final leaderboard is calculated with approximately 15% of the private test data.
To submit your results to the leaderboard you must construct a submission zip file containing the prediction file submission_<LANGUAGE>_<SUBTASK_NAME>.csv (for example: submission_EN_A.csv
for English subtask A) containing the model’s results on the test set and the required code files which are used to generate the prediction file. This file should follow the format detailed in the subsequent section.
The CSV submission format is composed of three columns and each row contains the tweet_id of the tweet, a predication label (task1 or task2), and the ID of the annotator. Specifically, the CSV file should contain:
The naming convention of submission file for corresponding language sub-tasks are given below:
The submission zip file (i.e., submission.zip)
must contain the following files:
<submission.zip>
- submission_<LANGUAGE>_<SUBTASK_NAME>.csv
- code.zip (containing the scripts to generate the predictions for the test set and a instruction or README file that include a link(s) to the trained model weight)
- (Optional) The link to the model weight is required in case if it's trained for a long time on GPU(s).
Note: For each sub-tasks, the submission zip file must contain a prediction file and code files to generate the prediction.
Competitions should comply with any general rules of HASOC.
The organizers are free to penalized or disqualify for any violation of the above rules or for misuse, unethical behaviour or other behaviours they agree are not accepted in a scientific competition in general and in the specific one at hand.
Please contact the task organisers or post on the competition forum if you have any further queries.
Full dataset for ENGLISH, GERMAN, and HINDI languages can be found in "Participate" Tab (only registered participants can access it)
Each training data (of any of the languages) contain these columns (as below) where sub-task A (or B) prediction labels (columns "task1" or "task2") are given in it.
Thomas Mandl, University of Hildesheim, Germany
Sandip Modha, DA-IICT & LDRP-ITR, Gandhinagar, India
Gautam Kishore Shahi, University of Duisburg-Essen, Germany
Amit Kumar Jaiswal, University of Bedfordshire, UK
Durgesh Nandini, University of Bamberg, Germany
Prasenjit majumder, DA-IICT, Gandhinagar, India
Daksh Patel, Dalhousie University, Halifax, Canada
Johannes Schäfer, University of Hildesheim, Germany
Note: This final leaderboard is calculated with approximately 15% of the private test data.
| # | Team Name | Entries | Subtask A F1 Macro average |
|----|---------------------------|---------|----------------------------|
| 1 | IIIT_DWD | 1 | 0.5152 |
| 2 | CONCORDIA_CIT_TEAM | 1 | 0.5078 |
| 3 | AI_ML_NIT_Patna | 1 | 0.5078 |
| 4 | Oreo | 6 | 0.5067 |
| 5 | MUM | 3 | 0.5046 |
| 6 | Huiping Shi | 6 | 0.5042 |
| 7 | TU Berlin | 1 | 0.5041 |
| 8 | NITP-AI-NLP | 1 | 0.5031 |
| 9 | JU | 2 | 0.5028 |
| 10 | HASOCOne | 6 | 0.5018 |
| 11 | Astralis | 2 | 0.5017 |
| 12 | YNU_WU | 3 | 0.5017 |
| 13 | YNU_OXZ | 2 | 0.5006 |
| 14 | HRS-TECHIE | 6 | 0.5002 |
| 15 | ZYJ | 2 | 0.4994 |
| 16 | Buddi_SAP | 2 | 0.4991 |
| 17 | HateDetectors | 2 | 0.4981 |
| 18 | QutBird | 8 | 0.4981 |
| 19 | NLP-CIC | 2 | 0.4980 |
| 20 | SSN_NLP_MLRG | 1 | 0.4979 |
| 21 | Fazlourrahman Balouchzahi | 4 | 0.4979 |
| 22 | Lee | 1 | 0.4976 |
| 23 | IRIT-PREVISION | 2 | 0.4969 |
| 24 | chrestotes | 1 | 0.4969 |
| 25 | zeus | 1 | 0.4954 |
| 26 | DLRG | 4 | 0.4951 |
| 27 | ComMA | 4 | 0.4945 |
| 28 | Siva | 1 | 0.4927 |
| 29 | hub | 2 | 0.4917 |
| 30 | CFILT IIT Bombay | 2 | 0.4889 |
| 31 | Salil Mishra | 1 | 0.4881 |
| 32 | NSIT_ML_Geeks | 1 | 0.4879 |
| 33 | Buddi_avengers | 1 | 0.4871 |
| 34 | yasuo | 2 | 0.4856 |
| 35 | UDE-LTL | 2 | 0.4571 |
| 36 | Sushma Kumari | 2 | 0.1612 |
| # | Team Name | Entries | Subtask B F1 Macro average |
|----|---------------------------|---------|----------------------------|
| 1 | chrestotes | 2 | 0.2652 |
| 2 | hub | 1 | 0.2649 |
| 3 | zeus | 1 | 0.2619 |
| 4 | Oreo | 2 | 0.2529 |
| 5 | Fazlourrahman Balouchzahi | 4 | 0.2517 |
| 6 | Astralis | 1 | 0.2484 |
| 7 | QutBird | 1 | 0.2450 |
| 8 | Siva | 1 | 0.2432 |
| 9 | Buddi_SAP | 2 | 0.2427 |
| 10 | HRS-TECHIE | 4 | 0.2426 |
| 11 | ZYJ | 1 | 0.2412 |
| 12 | ComMA | 4 | 0.2398 |
| 13 | Huiping Shi | 5 | 0.2396 |
| 14 | Buddi_avengers | 1 | 0.2391 |
| 15 | MUM | 2 | 0.2388 |
| 16 | NSIT_ML_Geeks | 1 | 0.2361 |
| 17 | HASOCOne | 7 | 0.2357 |
| 18 | IIIT_DWD | 1 | 0.2341 |
| 19 | SSN_NLP_MLRG | 1 | 0.2305 |
| 20 | HateDetectors | 2 | 0.2299 |
| 21 | AI_ML_NIT_Patna | 1 | 0.2298 |
| 22 | CFILT IIT Bombay | 1 | 0.2229 |
| 23 | CONCORDIA_CIT_TEAM | 1 | 0.2115 |
| 24 | JU | 2 | 0.1623 |
| 25 | NITP-AI-NLP | 1 | 0.1623 |
| 26 | Sushma Kumari | 1 | 0.1423 |
| # | Team Name | Entries | Subtask A F1 Macro average |
|----|---------------------------|---------|----------------------------|
| 1 | ComMA | 4 | 0.5235 |
| 2 | simon | 1 | 0.5225 |
| 3 | CONCORDIA_CIT_TEAM | 1 | 0.5200 |
| 4 | YNU_OXZ | 3 | 0.5177 |
| 5 | Siva | 1 | 0.5158 |
| 6 | Buddi_avengers | 2 | 0.5121 |
| 7 | Huiping Shi | 2 | 0.5121 |
| 8 | NITP-AI-NLP | 1 | 0.5109 |
| 9 | MUM | 1 | 0.5106 |
| 10 | HASOCOne | 4 | 0.5054 |
| 11 | Fazlourrahman Balouchzahi | 2 | 0.5044 |
| 12 | Oreo | 1 | 0.5036 |
| 13 | CFILT IIT Bombay | 1 | 0.5028 |
| 14 | SSN_NLP_MLRG | 2 | 0.5025 |
| 15 | IIIT_DWD | 1 | 0.5019 |
| 16 | yasuo | 1 | 0.4968 |
| 17 | hub | 2 | 0.4953 |
| 18 | NSIT_ML_Geeks | 2 | 0.4919 |
| 19 | DLRG | 2 | 0.4843 |
| 20 | Astralis | 1 | 0.4789 |
| 21 | AI_ML_NIT_Patna | 1 | 0.4768 |
| 22 | Sushma Kumari | 1 | 0.4368 |
| 23 | TU Berlin | 1 | 0.4276 |
| 24 | IRLab@IITVaranasi | 1 | 0.3840 |
| 25 | JU | 1 | 0.3231 |
| # | Team Name | Entries | Subtask B F1 Macro average |
|----|--------------------|---------|----------------------------|
| 1 | Siva | 1 | 0.2943 |
| 2 | SSN_NLP_MLRG | 1 | 0.2920 |
| 3 | ComMA | 4 | 0.2831 |
| 4 | Huiping Shi | 1 | 0.2736 |
| 5 | CONCORDIA_CIT_TEAM | 1 | 0.2727 |
| 6 | Astralis | 1 | 0.2627 |
| 7 | Buddi_avengers | 2 | 0.2609 |
| 8 | MUM | 2 | 0.2595 |
| 9 | CFILT IIT Bombay | 1 | 0.2594 |
| 10 | simon | 1 | 0.2579 |
| 11 | hub | 1 | 0.2567 |
| 12 | Oreo | 1 | 0.2542 |
| 13 | IIIT_DWD | 1 | 0.2513 |
| 14 | NSIT_ML_Geeks | 2 | 0.2468 |
| 15 | HASOCOne | 4 | 0.2397 |
| 16 | Sushma Kumari | 1 | 0.2346 |
| 17 | AI_ML_NIT_Patna | 1 | 0.2210 |
| 18 | NITP-AI-NLP | 1 | 0.1214 |
| 19 | JU | 1 | 0.0984 |
| # | Team Name | Entries | Subtask A F1 Macro average |
|----|---------------------------|---------|----------------------------|
| 1 | NSIT_ML_Geeks | 1 | 0.5337 |
| 2 | Siva | 1 | 0.5335 |
| 3 | DLRG | 2 | 0.5325 |
| 4 | NITP-AI-NLP | 1 | 0.5300 |
| 5 | YUN111 | 1 | 0.5216 |
| 6 | YNU_OXZ | 2 | 0.5200 |
| 7 | ComMA | 4 | 0.5197 |
| 8 | Fazlourrahman Balouchzahi | 3 | 0.5182 |
| 9 | HASOCOne | 1 | 0.5150 |
| 10 | HateDetectors | 2 | 0.5129 |
| 11 | IIIT_DWD | 1 | 0.5121 |
| 12 | LoneWolf | 2 | 0.5095 |
| 13 | MUM | 2 | 0.5033 |
| 14 | IRLab@IITVaranasi | 2 | 0.5028 |
| 15 | CONCORDIA_CIT_TEAM | 1 | 0.5027 |
| 16 | QutBird | 2 | 0.4992 |
| 17 | Oreo | 2 | 0.4943 |
| 18 | CFILT IIT Bombay | 1 | 0.4834 |
| 19 | TU Berlin | 1 | 0.4678 |
| 20 | JU | 1 | 0.4599 |
| 21 | AI_ML_NIT_Patna | 1 | 0.4561 |
| 22 | Sushma Kumari | 1 | 0.4346 |
| 23 | Astralis | 1 | 0.4293 |
| 24 | SSN_NLP_MLRG | 2 | 0.3971 |
| # | Team Name | Entries | Subtask B F1 Macro average |
|----|--------------------|---------|----------------------------|
| 1 | Sushma Kumari | 1 | 0.3345 |
| 2 | NSIT_ML_Geeks | 1 | 0.2667 |
| 3 | Astralis | 1 | 0.2644 |
| 4 | Oreo | 1 | 0.2612 |
| 5 | Siva | 1 | 0.2602 |
| 6 | HASOCOne | 2 | 0.2574 |
| 7 | MUM | 3 | 0.2488 |
| 8 | ComMA | 5 | 0.2464 |
| 9 | AI_ML_NIT_Patna | 1 | 0.2399 |
| 10 | IIIT_DWD | 1 | 0.2374 |
| 11 | CFILT IIT Bombay | 1 | 0.2355 |
| 12 | CONCORDIA_CIT_TEAM | 1 | 0.2323 |
| 13 | HateDetectors | 1 | 0.2272 |
| 14 | YUN111 | 1 | 0.2100 |
| 15 | SSN_NLP_MLRG | 2 | 0.2063 |
| 16 | JU | 1 | 0.1600 |
| 17 | NITP-AI-NLP | 1 | 0.0940 |
Start: Aug. 20, 2020, midnight
Description: English sub-task A: submit result on public data and get result for a taste of the data and task
Start: Aug. 20, 2020, midnight
Start: Aug. 20, 2020, midnight
Start: Aug. 20, 2020, midnight
Start: Aug. 20, 2020, midnight
Start: Aug. 20, 2020, midnight
Sept. 28, 2020, 11:59 a.m.
You must be logged in to participate in competitions.
Sign In