HAHA@IberLEF2019

Organized by luischir - Current server time: June 19, 2019, 8:59 p.m. UTC

First phase

Task 1: Humor Detection | Task 2: Humor Scoring
March 25, 2019, midnight UTC

End

Competition Ends
June 10, 2019, 11:59 p.m. UTC

Welcome to HAHA: Humor Analysis based on Human Annotation

Welcome to the 2019 edition of the shared task HAHA - Humor Analysis based on Human Annotation, a task to classify tweets in Spanish as humorous or not, and to determine how funny they are. This task is part of IberLEF 2019.

Introduction

While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Machine Learning and Computational Linguistics. There exist some previous works (Mihalcea & Strapparava, 2005; Sjöbergh & Araki, 2007; Castro et al., 2016), but a characterization of humor that allows its automatic recognition and generation is far from being specified. The aim of this task is to gain better insight in what is humorous and what causes laughter.

There is past work regarding this topic. Semeval-2015 Task 11 proposed to work on figurative language, such as metaphors and irony, but focused on Sentiment Analysis. Semeval-2017 Task 6 presented a similar task to this one as well. This is the second edition of the HAHA task, you can also see the results of last year's edition of the task (Castro et al., 2018b).

Task description

Based on tweets written in Spanish, the following subtasks are proposed:

  • Humor Detection: telling if a tweet is a joke or not (intended humor by the author or not). The results of this task will be measured using F-measure for the humorous category and accuracy. F-measure is the main measure for this task.
  • Funniness Score Prediction: predicting a funniness score value (average stars) for a tweet in a 5-star ranking, supposing it is a joke. The results of this task will be measured using root-mean-squared error.

Corpus

We provide a corpus of crowd-annotated tweets based on (Castro et al., 2018a), divided in 80% for training and 20% tweets for testing. The annotation was made with a voting scheme in which users could select one of six options: the tweet is not humorous, or the tweet is humorous and a score is given between one (not funny) to five (excellent).

All tweets are classified as humorous or not humorous. The humorous tweets received at least three votes indicating a number of stars, and at least five votes in total. The not humorous votes received at least three votes for not humor (they might have less than five votes in total).

The corpus contains annotated tweets such as the following:

Text – La semana pasada mi hijo hizo un triple salto mortal desde 20 metros de altura. – ¿Es trapecista? – Era :(
Is humorous
True
Votes: Not humor
1
Votes: 1 star 0
Votes: 2 stars 1
Votes: 3 stars 2
Votes: 4 stars 0
Votes: 5 stars 1
Funniness score 3.25

Important Dates

  • March 18th, 2019: team registration page.
  • March 25th, 2019: release of training data.
  • May 20th, 2019: release of test data.
  • June 3rd, 2019: results submission page.
  • June 10th, 2019: publication of results.
  • June 17th, 2019: working notes paper submission.
  • June 24th, 2019: notification of acceptance.
  • July 1st, 2019: camera ready paper submission.
  • September 24th, 2019: IberLEF 2019 Workshop.

Contact

Please join the Google Group hahaiberlef2019. We will be sharing news and important information about the task in that group. If you have any question that you prefer to write privately, contact us via hahapln@fing.edu.uy.

The organizers of the task are:

  • Luis Chiruzzo. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Syntactic Analysis, Subjectivity Analysis.
  • Santiago Castro. University of Michigan, Ann Arbor, USA. Areas of research: Multimodality, Question Answering, Subjectivity, Sarcasm and Humor Analysis.
  • Mathias Etcheverry. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Lexical Semantics, Subjectivity Analysis.
  • Diego Garat. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Subjectivity Analysis, NLP for Legal Texts.
  • Juan José Prada. Facultad de Ingeniería, Universidad de la República, Uruguay. Information Extraction, Event Analysis in Social Networks, Syntactic Analysis.
  • Aiala Rosá. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Subjectivity Analysis, Event and Temporal Analysis, Syntactic Analysis.

We are part of the NLP research group at Instituto de Computación, Facultad de Ingeniería, Universidad de la República, Uruguay.

Bibliography

(Castro et al., 2018a) Castro, S., Chiruzzo, L., Rosá, A., Garat, D., & Moncecchi, G. (2018). A crowd-annotated spanish corpus for humor analysis. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (pp. 7-11).

(Castro et al., 2018b) Castro, S., Chiruzzo, L., & Rosá, A. Overview of the HAHA Task: Humor Analysis based on Human Annotation at IberEval 2018.

(Castro et al., 2016) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In Ibero-American Conference on Artificial Intelligence (pp. 139-150). Springer International Publishing.

(Castro et al., 2017) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2017). HUMOR: A Crowd-Annotated Spanish Corpus for Humor Analysis. arXiv preprint arXiv:1710.00477.

(Fleiss, 1971) Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5), 378.

(Mihalcea & Strapparava, 2005) Mihalcea, R., & Strapparava, C. (2005). Making Computers Laugh: Investigations in Automatic Humor Recognition. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. HLT ’05, (pp. 531–538). Association for Computational Linguistics, Vancouver, British Columbia, Canada

(Sjöbergh & Araki, 2007) Sjöbergh, J., and Araki, K. (2007). Recognizing Humor Without Recognizing Meaning (2007). In WILF, (pp. 469–476). Springer

Tasks

Based on tweets written in Spanish, the following subtasks are proposed:

  • Task 1 - Humor Detection: telling if a tweet is a joke or not (intended humor by the author or not). The results of this task will be measured using F-measure for the humorous category and accuracy. F-measure is the main measure for this task.
  • Task 2 - Funniness Score Prediction: predicting a funniness score value (average stars) for a tweet in a 5-star ranking, supposing it is a joke. The results of this task will be measured using root-mean-squared error.

Evaluation Criteria

The submissions in this competition will be evaluated and scored using:
- Task 1: accuracy and F-measure.
- Task 2: root mean squared error

Submission Format

The upload format is a .zip file containing a csv. The CSV file has the columns id, is_humor and funniness_average. The funniness_average column is optional. A sample of the file format can be downloaded here.

IMPORTANT: You should include a row for each one of the 6000 tweets in the test corpus. Especially for task 2, it is important that all the rows have a predicted score. The scoring algorithm will check the ones that are appropriate for evaluation.

Terms and Conditions

The data used in this competition was created by Grupo PLN-InCo (Uruguay).

The entire corpus will be published at the end of the competition for research and teaching purposes.

If you use the corpus please cite the overview of the HAHA shared task that will be available in September 2019 and the following paper:

Castro, S., Chiruzzo, L., Rosá, A., Garat, D., & Moncecchi, G. (2018). A crowd-annotated spanish corpus for humor analysis. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (pp. 7-11).

Task 1: Humor Detection | Task 2: Humor Scoring

Start: March 25, 2019, midnight

Competition Ends

June 10, 2019, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In