HASOC-Dravidian-CodeMix - FIRE 2020

Organized by dravidiancodemixed - Current server time: Sept. 29, 2020, 7:38 a.m. UTC

Current

First phase
June 19, 2020, 6:53 p.m. UTC

End

Competition Ends
Never

HASOC-Offensive Language Identification- DravidianCodeMix FIRE 2020

https://sites.google.com/view/dravidian-codemix-fire2020/overview

There is an increasing demand for offensive language detection on social media texts which are largely code-mixed. Code-mixing is a prevalent phenomenon in a multilingual community and the code-mixed texts are sometimes written in non-native scripts. Systems trained on monolingual data fail on code-mixed data due to the complexity of code-switching at different linguistic levels in the text. This shared task presents a new gold standard corpus for offensive language detection of code-mixed text in Dravidian languages (Malayalam-English and Tamil-English). 

The goal of this task is to identify offenslve language of the code-mixed dataset of comments/posts in Dravidian Languages (Malayalam-English and Tamil-English) collected from social media. The comment/post may contain more than one sentence but the average sentence length of the corpora is 1. Each comment/post is annotated with offensive language label at the comment/post level. This dataset also has class imbalance problems depicting real-world scenarios. 

The participants will be provided development, training and test dataset.

Task1:

This is a message-level label classification task. Given a YouTube comment in Code-mixed Malayalam, systems have to classify it into offensive or not-offensive. 

 

Task2:

This is a message-level label classification task. Given a tweet or Youtube comments in Tanglish and Manglish (Tamil and Malayalam using written using Roman Characters), systems have to classify it into offensive or not-offensive.

As far as we know, this is the first shared task on Offensive language in Dravidian Code-Mixed text.

To download the data and participate, go to the "Participate" tab.

Results:

Task1-Malayalam-mix-rank

Task2:Tamil-rank

Task2:Malayalam-rank

 

Paper submission link (https://easychair.org/conferences/?conf=hasocdravidiancodemi). We will soon update the details about submission.

We accept test result only through google form.
Format of the submission file should be like below:
 
 
id textlabel
ml_1 Yarayellam FDFS ppga ippove ready agitinga Off
ml_2 Ennada viswasam mersal sarkar madhri time la likes and views create pannalayae Not
-  -  -
  • label column should only have labels in the form mentioned in the training data i.e. 'off', 'not.
  • id column is the index of the row. (keep the sequence of the youtuebe comments same as provided in the test data)
 
Submission should be a zip/rar/tar compressed file with your team name containing a tsv file with name 'Teamname_task1ortask2_language_submissionnumber.tsv'.. The submission will be evaluated with weighted average F1-score. Submit results in google form https://docs.google.com/forms/d/e/1FAIpQLScYtYnunMcqDlGR1MVre9BRccOnqIRqyrpouCJqNjr_5rVJbg/viewform?usp=sf_link .
  • We expect each team to submit a system description paper after the evaluation. The deadline, length of submission and other instructions for the system description papers will be same as that of FIRE 2020 conference papers. All the system papers will be published in the proceedings and the best systems will be given slots for demos and presentations at the workshop.
Task announcement: 15 June

Release of Trail data: 20 June

Release of Training data: 1 July

Release of Test data: 1 August

Run submission deadline: 10 August

Results declared: 20 August

Paper submission: 20 September

Revised paper: 30 September (will be updated)

Dr. Bharathi Raja Chakravarthi, Researcher, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway

Dr.M Anand Kumar, Assistant Professor, Department of Information Technology, National Institute of Technology Karnataka Surathkal, India

Dr John P. McCrae, Lecturer-above-the-bar, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway

Prof. K P Soman, Head, CEN, Amrita Vishwa Vidyapeetham

Mr. Premjith, Faculty Associate, CEN, Amrita Vishwa Vidyapeetham

 

HASOC Organizers

 

Thomas Mandl :- University of Hildesheim, Germany

Sandip Modha :- DA-IICT, Gandhinagar, India

prasenjit majumder :- DA-IICT, Gandhinagar, India

Daksh Patel :- Dalhousie University, Halifax, Canada

Gautam Kishore Shahi - University of Duisburg-Essen

Johannes Schäfer - University of Hildesheim

Amit Kumar Jaiswal - University of Bedfordshire

Terms and Conditions

By downloading the data or by accessing it any manner, you agree not to redistribute the data except for non-commercial and academic-research purposes. The data must not be used for providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose.

 

First phase

Start: June 19, 2020, 6:53 p.m.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In