SentiMix Hindi-English

Organized by suraj1ly - Current server time: April 9, 2020, 10:16 a.m. UTC

First phase

First phase
Sept. 4, 2019, 8 a.m. UTC

End

Competition Ends
March 12, 2020, noon UTC

Overview

Abstract 
Mixing languages, also known as code- mixing, is a norm in multilingual societies. Multilingual people, who are non-native English speakers, tend to code-mix using English-based phonetic typing and the insertion of anglicisms in their main language. In addition to mixing languages at the sentence level, it is fairly common to find the code-mixing behavior at the word level. This linguistic phenomenon poses a great challenge to conventional NLP systems, which currently rely on monolingual resources to handle the combination of multiple languages. The objective of this proposal is to bring the attention of the research community towards the task of sentiment analysis in code-mixed social media text. Specifically, we focus on the combination of English with Spanish (Spanglish) and Hindi (Hinglish), which are the 3rd and 4th most spoken languages in the world respectively. 


Hinglish and Spanglish - the Modern Urban Languages 
The evolution of social media texts such as blogs, micro-blogs (e.g., Twitter), and chats (e.g., WhatsApp and Facebook messages) has created many new opportunities for information access and language technology, but it has also posed many new challenges making it one of the current prime research areas. Although current language technologies are primarily built for English, non-native English speakers combine English and other languages when they use social media. In fact, statistics show that half of the messages on Twitter are in a language other than English. This evidence suggests that other languages, including multilinguiality and code-mixing, need to be considered by the NLP community. Code-mixing poses several unseen difficulties to NLP tasks such as word-level language identification, part-of-speech tagging, dependency parsing, machine translation and semantic processing. Conventional NLP systems heavily rely on monolingual resources to address code-mixed text, which limit them to properly handle issues like English-based phonetic typing, word-level code-mixing, and others. The next two phrases are examples of code-mixing in Spanglish and Hinglish. For the Spanglish example, in addition to the code-mixing at the sentence level, the word pushes conjugates the English word push according to the grammar rules in Spanish, which shows that code-mixing can also happen at the word level. Better to add more details on the Hinglish example In the Hinglish example only one English word enjoy has been used, but more noticeably for the Hindi words - instead of using Devanagari script, English phonetic typing is a popular practice in India. 


The SentiMix task - A summary 
The task is to predict the sentiment of a given code-mixed tweet. The sentiment labels are positive, negative, or neutral, and the code-mixed languages will be English-Hindi and English-Spanish. Besides the sentiment labels, we will also provide the language labels at the word level. The word-level language tags are en (English), spa (Spanish), hi (Hindi), mixed, and univ (e.g., symbols, @ mentions, hashtags).   Efficiency will be measured in terms of Precision, Recall, and F-measure. 

 

Bibtex

If you are a participant or a researcher using our dataset, cite the follong paper:

@inproceedings{patwa2020sentimix,
  title={{SemEval-2020 Sentimix Task 9: Overview of SENTIment Analysis of Code-MIXed Tweets}},
  author="Patwa, Parth and
          Aguilar, Gustavo and
          Kar, Sudipta and
          Pandey, Suraj and
          PYKL, Srinivas and
          Garrette, Dan and
          Gamb{\"a}ck, Bj{\"o}rn and
          Chakraborty, Tanmoy and
          Solorio, Thamar and  
          Das, Amitava",
  booktitle = "Proceedings of the 14th International Workshop on Semantic Evaluation ({S}em{E}val-2020)",
  year = {2020},
  month = {Sep},
  address = "Barcelona, Spain",
  publisher = "Association for Computational Linguistics"
}

Evaluation

For the Tasks 
Official Competition Metric: The metric for evaluating the participating systems will be as follows. We will use F1 averaged across the positives, negatives, and the neutral. The final ranking would be based on the average F1 score. However, for further theoritical discussion and we will release macro-averaged recall (recall averaged across the three classes), since the latter has better theoretical properties than the former2015), and since this provides better consistency. 
Each participating team will initially have access to the training data only. Later, the unlabelled test data will be released. After SemEval-2020, the labels for the test data will be released as well. We will ask the participants to submit their predictions in a specified format (within 24 hours), and the organizers will calculate the results for each participant. We will make no distinction between constrained and unconstrained systems, but the participants will be asked to report what additional resources they have used for each submitted run.

Organizer List

Dr. Amitava Das Wipro AI Labs, Bangalore, India Mahindra École Centrale, Hyderabad, India. Dr. Tanmoy Chakraborty Indraprastha Institute of Information Technology Delhi, India. Dr. Thamar Solorio University of Houston, USA. Dr. Björn Gambäck Norwegian University of Science and Technology,Norway. Gustavo Aguilar University of Houston, USA. Sudipta Kar University of Houston, USA. Dr. Dan Garrette Google Research in New York. Srinivas P Y K L Indian Institute of Information Technology Sri City, India. Student Volunteers Parth Patwa Indian Institute of Information Technology Sri City, India. Suraj Pandey Indraprastha Institute of Information Technology Delhi, India.

Schedule Date 

Trial data ready: July 31, 2019
Training data ready: September 4, 2019
Test data ready: 19 February 2020
Evaluation start: 19 February 2020
Evaluation end: 1 March 2020
Results Posted: 18 March 2020
System description paper submissions due: 1 May 2020 
Task description paper submissions due: 8 May 2020 
Author notifications: 24 June 2020 
Camera ready submissions due: 8 July 2020

SemEval 2020: 12-13 December 2020

Terms & Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval workshop and in the associated proceedings, at the task organizers' discretion. Scores may include but are not limited to, automatic and manual quantitative judgments, qualitative judgments, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgment that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

By downloading the data or by accessing it any manner, You agree not to redistribute the data except for the purpose of non-commercial and academic-research. The data must not be used for providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose.

For any queries contact us on Email: semevalsentiment@gmail.com

Rank Users Score 1 (Best Score) Score 2 Score 3  
1 kk2018 0.75 0.735 0.732  
2 Genius1237 0.726 0.684 0.68  
3 olenet 0.715 0.713 0.659  
4 gopalanvinay 0.713 0.699 0  
5 ayushk 0.707 0.636 0.623  
6 Taha 0.706 0.705 0.618  
7 Miriam 0.702 0.689 0.685  
8 HugoLerogeron 0.695 0.69 0.666  
9 somban 0.691 0.656 0  
10 aditya_malte 0.69 0.681 0.652  
11 MeisterMorxrc 0.69 0.687 0.525  
12 nirantk 0.689 0.661 0.543  
13 apurva19 0.688 0.662 0.66  
14 c1pher 0.687 0.615 0.615  
15 will_go 0.686 0.685 0.685  
16 eduardgzaharia 0.685 0.66 0.593  
17 guzimanis 0.685 0.682 0.674  
18 epochs 0.683 0.682 0.679  
19 kongjun 0.682 0.681 0.667  
20 koustava 0.68 0 0  
21 abaruah 0.678 0.674 0.342  
22 ayan7246 0.677 0.666 0  
23 caozhou 0.676 0.674 0.627  
24 gundapusunil 0.675 0.673 0.626  
25 HaoYu 0.675 0.673 0.668  
26 meiyim 0.673 0 0  
27 rachel 0.671 0.67 0  
28 the0ne 0.671 0.652 0.641  
29 anuragvij264 0.668 0.668 0.601  
30 talent404 0.668 0.653 0.646  
31 asking28 0.666 0.647 0.629  
32 mamta 0.666 0 0  
33 pribanp 0.665 0.532 0  
34 verissimo.manoel 0.665 0.665 0.662  
35 sainik.mahata 0.662 0.657 0.535  
36 pratikbhavsar 0.661 0 0  
37 souryadipta 0.661 0.653 0.615  
38 harsh_6 0.659 0.648 0.642  
39 Turaga_Tulasi_Sasidhar 0.659 0.648 0.644  
40 mohitasudani 0.657 0.654 0.643  
41 RAMANDEEP 0.657 0.349 0.313  
42 lakshadvani 0.655 0.639 0  
43 sjmaharjan 0.655 0.655 0  
44 keshav22b 0.654 0.649 0.632  
45 suraj1ly (organizer baseline) 0.654 0 0  
46 dhruvrnaik 0.65 0.64 0.626  
47 buraka 0.648 0.648 0.645  
48 Sugeeth14 0.648 0.617 0.503  
49 zyy1510 0.647 0.635 0.628  
50 riveill 0.641 0.62 0.62  
51 ugrganesh 0.641 0.353 0.23  
52 KalyanGvs 0.637 0.551 0.535  
53 abestard 0.626 0.625 0.623  
54 Manmeet 0.626 0 0  
55 vivek_IITGN 0.6 0.591 0.558  
56 gagan42 0.589 0 0  
57 himanshu366 0.58 0 0  
58 rns2020 0.56 0.559 0.557  
59 Gaurav337 0.532 0.497 0  
60 Abhilash 0.463 0 0  
61 nomanashraf712 0.412 0.257 0.207  
62 Lavinia_Ap 0.324 0.195 0  

 

First phase

Start: Sept. 4, 2019, 8 a.m.

Description: Test Phase

Competition Ends

March 12, 2020, noon

You must be logged in to participate in competitions.

Sign In