MeOffendEs@IberLEF 2021

Organized by amontejo - Current server time: April 16, 2025, 8:13 p.m. UTC

Previous

Contextual binary classification on OffendMex
May 10, 2021, midnight UTC

Current

Contextual binary classification on OffendMex
May 10, 2021, midnight UTC

End

Competition Ends
May 26, 2021, 11 p.m. UTC

About the task

Social networks represent a major threat to users who are exposed to a number of risks and potential attacks. One of such threats are offensive and aggressive comments, which can produce long-term harm to victims, in the most acute cases they can lead to suicide. Therefore, tools that can detect and analyze such comments could have a great impact into everyone’s lives. For this reason, the detection and analysis of offensive language in social media is a hot topic of research in NLP. Nevertheless, few resources are available for Spanish, despite being the fourth most spoken language in the world.

We are organizing a shared task focused on offensive language analysis in social networks for Spanish. Participants will be provided with training corpora for developing solutions aiming to detect and recognize offensive language and its categories. The shared task comprises four subtasks that target different aspects of the approached problem and rely on different information sources. Participants are free to participate in any subset of these subtasks.

The aim of this competition is to boost research on a sensitive topic that has not been deeply addressed for the Spanish language. Of particular novelty: we release corpora that consider multiple social networks and a diversity of variants of Spanish. Likewise, we are exploring the benefits of incorporating metadata as an additional information source when approaching the task, and the feasibility of learning to indirectly predict the agreement of annotators. We foresee this task will lead to interesting findings and progress in the field.

Subtasks

  • Subtask 1: Non-contextual multiclass classification for generic Spanish. Participants have to classify comments into the four different categories associated with the OffendEs corpus. No information about the comment (source or influencer ID) is provided. Participants can optionally submit confidence values to predictions (as a probability for each class, so they all sum 1.0) for the four considered categories, in order to evaluate the agreement of predictions with confidence of human annotators.
  • Subtask 2: Contextual multiclass classification for generic Spanish. Same problem as subtask 1, but metadata (information about targeted users and the related social media) will be provided to participants.
  • Subtask 3: Non-contextual binary classification for Mexican Spanish. Participants must classify tweets as offensive or non-offensive in the OffendMEX corpus.
  • Subtask 4: Contextual binary classification for Mexican Spanish. Same problem as subtask 3, but metadata about each tweet will be provided to participants.

Submissions will be evaluated on the test partitions for the corresponding corpora. These are the evaluation measures computed:

  • Tasks 1 and 2. Micro-averaged precision, recall and f-score. Macro-averaged precision, recall and f-score. Weighted macro-averaged precision, recall and f-score. In cases where participants submit confidence values (between 0 and 1) to their outputs, Mean Squared Error (MSE) will be applied (with error value equal to one for wrongly predicted classes)
  • Tasks 3 and 4: Precision, recall and f-score with respect to the offensive class will be used, being f1 score the leading evaluation measure.

By submitting results to this competition, you consent to the public release of your scores at the IberLEF workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatically and manually calculated quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that if your team has several members, each of them will register to the competition and build a competition team (as described on the 'Overview' page) and that if you are a single participant you will build a team with a single member.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

Schedule

  • Feb 15th, 2021: Release of trial corpora
  • Mar 15th, 2021: Release of training corpora.
  • May 10th, 2021: Release of test corpora and start of evaluation campaign
  • Extended to May 26th, 2021 May 21th, 2021: End of evaluation campaign (deadline for submission of runs)
  • Extended to May 29h, 2021 May 24th, 2021: Publication of official results
  • Extended to June 10th, 2021, June 7th, 2021: Deadline for paper submission
  • Jun 21th, 2021: Acceptance notification
  • Jun 28th, 2021: Camera ready submission deadline
  • September 2021: Publication of proceedings
  • September 2021: Workshop with SEPLN 2021

Paper submission

Format details will be communicated shortly, according to the specifications of IberLEF organizers.

 

Organization team

  • Hugo Jair Escalante (INAOE, Mexico)
  • Flor Miriam Plaza-de-Arco (Universidad de Jaén, Spain)
  • Luis Villaseñor (INAOE, Mexico)
  • Manuel Montes (INAOE, Mexico)
  • Arturo Montejo-Ráez (Universidad de Jaén, Spain)

Short bios

Hugo Jair Escalante (hugojair@inaoep.mx) is researcher scientist INAOE, Mexico and secretary and member of the board of directors of ChaLearn USA, Vice chair officer of the IAPR Technical Committee 12He has been involved in the organization of several challenges in machine learning and computer vision collocated with top venues, see http://chalearnlap.cvc.uab.es/. He has served as competition chair of NeurIPS2020, FG2020 and ICPR2020, NeurIPS2019, PAKDD2019-2018, IJCNN2019. His research interests are on machine learning, challenge organization, and its applications on language and vision.

Flor Miriam Plaza-de-Arco, Universidad de Jaén, Spain (fmplaza@ujaen.es). Her research interests are in Computational Linguistics and Natural Language Processing, especially Machine Learning, Deep Learning, Text Categorization, Sentiment Analysis, Emotion Analysis, Social Media Analysis, Offensive Language detection, Computational Psycholinguistics, Low-Resource Generation. She has been an organizer of the last edition of TASS workshop and also of the 36th Annual SEPLN conference.

Luis Villaseñor Pineda (INAOE, Mexico, villasen@inaoep.mx) is researcher scientist at the Instituto Nacional de Astrofísica, Óptica y Electrónica, INAOE, Mexico. For more than 15 years he has been the organizer of different events and initiatives to promote Language Technologies in Mexico, such as the Annual Mexican Workshop of Language Technologies, the Mexican Language Technologies Autumn School, and the Mexican Workshop on Plagiarism Detection and Authorship Analysis. He was former head researcher of the Mexican Network for Language Technologies (RedTTL) 2014-2016. He was president of the Mexican Association of Natural Language Processing (AMNLP) (2016-2020).

Manuel Montes y Gómez, Instituto Nacional de Astrofísica, Óptica y Electrónica, INAOE, Mexico (mmontesg@inaoep.mx). His research is on automatic text processing. He is author of more than two hundred papers in the fields of information retrieval, text mining and authorship analysis. Together with Luis Villaseñor, he has been organizer of the Annual Mexican Workshop of Language Technologies from 2004 to 2016, the Mexican Language Technologies Autumn School in 2015 and 2016, the Mexican Workshop on Plagiarism Detection and Authorship Analysis in 2016 and 2017. Also, he has been organizer of the track on Personality Recognition in Source Code at the FIRE 2016 conference, and he has served as program chair for IBERAMIA 2016 and as NLP area chair for IBERAMIA 2012 and 2014.

Arturo Montejo-Ráez, Universidad de Jaén, Spain (amontejo@ujaen.es). His research is focused on Natural Language Processing, Human Language Technologies, Machine Learning, Text Categorization, Opinion Mining, Sentiment Analysis, Semantic Web, Linked Open Data, Language Complexity, Text Simplification and Deep Learning for NLP. He has been an organizer of past three editions of TASS workshop at IberLEF 2018, 2019 and 2020 and the ALexS workshop at IbertLEF 2020.

Each team can participate with up to three submissions at each phase/subtask (except for development, where 100 submissions are allowed). Files to be uploaded must be compressed in a .zip file. These are the expected formats for submissions:

Substasks 1 and 2 (OffendEs)

Format for predictions is a .tsv (tab separated file) in a .zip file (no folders within). The .tsv file must contain these columns (no header):

comment_id label confidence_list

confidence_list is a list with four probability values (the sum up 1.0), one per each possible label in this order: [NO_prob, NOM_prob, OFP_prob, OFG_prob]

Example:

38564    NOM   [0.0, 0.7777777777777778, 0.2222222222222222, 0.0]
4522     OFP    [0.1111111111111111, 0.0, 0.8888888888888888, 0.0]
529     NO     [0.6666666666666666, 0.2222222222222222, 0.1111111111111111, 0.0]
13756    NO     [1.0, 0.0, 0.0, 0.0]
...

You can check development data to see the format for both data and submissions by looking into reference (ground-truth) files.

Substasks 3 and 4 (OffendMex)

Format for predictions is a plain text file with one prediction per line (0 for non-offensive or 1 for offensive). The file must be in a .zip file for submission.

Example:

0
1
0
1
1
1
...

Discussion forum

There is a Google Group list where you can post your questions: MeOffendEs@IberLEF group

Development (OffendEs)

Start: Feb. 15, 2021, midnight

Development (OffendMex)

Start: Feb. 15, 2021, midnight

Non-contextual muticlass on OffendEs

Start: May 10, 2021, midnight

Contextual muticlass on OffendEs

Start: May 10, 2021, midnight

Non-contextual binary classification on OffendMex

Start: May 10, 2021, midnight

Contextual binary classification on OffendMex

Start: May 10, 2021, midnight

Competition Ends

May 26, 2021, 11 p.m.

You must be logged in to participate in competitions.

Sign In
# Username Score
1 hugo.jair -1.000000
2 Timen -1.000000
3 JAGD -1.000000