Task 7: Hahackathon: Linking Humor and Offense Across Different Age Groups
Join our mailing list: hahackathon@googlegroups.com
Background and Motivation
Humor, like most figurative language, poses interesting linguistic challenges to NLP, due to its emphasis on multiple word senses, cultural knowledge, and pragmatic competence. Humor appreciation is also a highly subjective phenomenon, with age, gender and socio-economic status known to have an impact on the perception of a joke. In this task, we collected labels and ratings from a balanced set of age groups from 18-70. Our annotators also represented a variety of genders, political stances and income levels.
We asked annotators:
With the above questions, we classify the genre of the text, and the humor score related to it. We take the majority label assigned by annotators, and the average of the ratings. Notably, we also allowed annotators to label a text as intended to be humorous (e.g. due to its content or structure) but also to give ‘I don’t get it’ as a rating. In this case, the humor rating for this annotator is 0.
We represent the subjectivity of humor appreciation with a controversy score. This examines the variance in humor ratings for each text. If the variance of a text was higher than the median variance of all texts, we labelled the humor of the text as controversial. Prediction of this value is a binary classification task.
This is also the first task to combine humor and offensive detection. This is down to the observation that what is humorous to one user, may be offensive to another. To explore this, we add a further layer of annotation by asking raters:
Tasks
Task 1 emulates previous humor detection tasks in which all ratings were averaged to provide mean classification and rating scores.
Task 2 aims to predict how offensive a text would be (for an average user) with values between 0 and 5.
The main metric for the classification tasks will be f1-measure, and the metric for the regression tasks will be root mean squared error.
For all tasks, please submit a zipped csv file with a row for each text and a column for each task you are participating in. The csv file format should be like the following:
id | is_humor | humor_rating | humor_controversy | offense_rating |
---|---|---|---|---|
1 | 1 | 1.126 | 0 | 3.098 |
2 | 0 | 4.527 | 1 | 1.282 |
3 | 1 | 3.983 | 1 | 1.644 |
Your csv file should always include the 'id' column, and can include one or more of the other columns corresponding to the different subtasks. The columns for the different tasks are the following:
IMPORANT: Notice that, if you include the humor_rating or humor_controversy columns, you must provide a value for all rows (whether your system considers them humorous or not), and the system will only take in consideration the values for the rows that are humorous according to the gold standard.
Start: Oct. 1, 2020, midnight
Description: Development phase for all tasks.
Start: Jan. 10, 2021, midnight
Description: Evaluate your trained system on our test data.
Start: Feb. 1, 2021, midnight
Description: Open Post-Evaluation phase that lasts forever.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | DeepBlueAI | 0.9676 |
2 | dalya | 0.9675 |
3 | ThisIstheEnd | 0.9655 |