https://sites.google.com/view/dravidian-codemix-fire2020/overview
There is an increasing demand for offensive language detection on social media texts which are largely code-mixed. Code-mixing is a prevalent phenomenon in a multilingual community and the code-mixed texts are sometimes written in non-native scripts. Systems trained on monolingual data fail on code-mixed data due to the complexity of code-switching at different linguistic levels in the text. This shared task presents a new gold standard corpus for offensive language detection of code-mixed text in Dravidian languages (Malayalam-English and Tamil-English).
The goal of this task is to identify offenslve language of the code-mixed dataset of comments/posts in Dravidian Languages (Malayalam-English and Tamil-English) collected from social media. The comment/post may contain more than one sentence but the average sentence length of the corpora is 1. Each comment/post is annotated with offensive language label at the comment/post level. This dataset also has class imbalance problems depicting real-world scenarios.
The participants will be provided development, training and test dataset.
Task1:
This is a message-level label classification task. Given a YouTube comment in Code-mixed Malayalam, systems have to classify it into offensive or not-offensive.
Task2:
This is a message-level label classification task. Given a tweet or Youtube comments in Tanglish and Manglish (Tamil and Malayalam using written using Roman Characters), systems have to classify it into offensive or not-offensive.
As far as we know, this is the first shared task on Offensive language in Dravidian Code-Mixed text.
To download the data and participate, go to the "Participate" tab.
Paper submission link (https://easychair.org/conferences/?conf=hasocdravidiancodemi). We will soon update the details about submission.
id | text | label |
---|---|---|
ml_1 | Yarayellam FDFS ppga ippove ready agitinga | Off |
ml_2 | Ennada viswasam mersal sarkar madhri time la likes and views create pannalayae | Not |
- | - | - |
Dr. Bharathi Raja Chakravarthi, Researcher, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway
Dr.M Anand Kumar, Assistant Professor, Department of Information Technology, National Institute of Technology Karnataka Surathkal, India
Dr John P. McCrae, Lecturer-above-the-bar, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway
Prof. K P Soman, Head, CEN, Amrita Vishwa Vidyapeetham
Mr. Premjith, Faculty Associate, CEN, Amrita Vishwa Vidyapeetham
HASOC Organizers
Thomas Mandl :- University of Hildesheim, Germany
Sandip Modha :- DA-IICT, Gandhinagar, India
prasenjit majumder :- DA-IICT, Gandhinagar, India
Daksh Patel :- Dalhousie University, Halifax, Canada
Gautam Kishore Shahi - University of Duisburg-Essen
Johannes Schäfer - University of Hildesheim
Amit Kumar Jaiswal - University of Bedfordshire
By downloading the data or by accessing it any manner, you agree not to redistribute the data except for non-commercial and academic-research purposes. The data must not be used for providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose.
Release of Trail data: 20 June
Release of Training data: 1 July
Release of Test data: 1 August
Run submission deadline: 10 August
Results declared: 20 August
Paper submission: 20 September
Revised paper: 30 September (will be updated)
Start: June 19, 2020, 6:53 p.m.
Never
You must be logged in to participate in competitions.
Sign In