HaHackathon: Detecting and Rating Humor and Offense

Organized by jam - Current server time: Sept. 23, 2020, 9:33 p.m. UTC

Current

Practice
July 31, 2020, noon UTC

Next

Development - Task 1a
Oct. 1, 2020, noon UTC

End

Competition Ends
Never

Task 7: Hahackathon: Incorporating Demographic Factors into Shared Humor Tasks 


Join our mailing list: hahackathon@googlegroups.com 

 


Background and Motivation


Humor, like most figurative language, poses interesting linguistic challenges to NLP, due to its emphasis on multiple word senses, cultural knowledge, and pragmatic competence.  Humor appreciation is also a highly subjective phenomenon, with age, gender and socio-economic status known to have an impact on the perception of a joke. Previous humor detection/rating challenges have taken the mean of all annotations given to a joke, to give an average classification or rating. This treats humor as an objective concept, rather than the subjective phenomenon that it is in reality. 


With that in mind, we propose the first humor detection challenge which incorporates the subjectivity associated with humor across different demographic groups. We have collected annotations from users from different age groups and genders who answered the question Is the intention of this text to be humorous? and if yes, How humorous do you find this text. This enables us to bin users into groups based on their demographic information, and pose the question Is this text humorous for a person of this age group/gender? along with the more established question of is this text humorous in general?, which takes the average of the annotations. 


We also note that what is humorous to one group, may be offensive to other groups, and that this distinction may be particularly relevant for people of different age groups. We add a further layer of annotation by asking raters Is this text generally offensive? and if so, How generally offensive is the text? We are the first task to combine a humor and offensive detection task. 


The ability to detect if a text is humorous or offensive depending on a persons demographic characteristics may be able to aid downstream tasks, such as personalised content moderation, or recommendation systems. 


Tasks


Task 1 emulates previous humor detection tasks in which all ratings were averaged to provide mean classification and rating scores. 


Task 1a: given a text, predict if the class is humor and/or offensive, or other. 


Task 1b: if the text is classed as humorous and/or offensive, predict how humorous and/or offensive it is. 


Task 2 uses the demographic information collected at annotation time to provide classifications and ratings based on age and gender. 


Task 2a: as the classification of humor genre is relatively objective, and shows little variation across age groups, this task addresses offensiveness classification, and asks: given a text, predict if the class is offensive, or not for each age group and gender. Predictions about the humor class are given from the output of task 1a. 


Task 2b: if the text is classed as humorous and/or offensive, predict how humorous and/or offensive it is for each age group and gender. 

Evaluation criteria


The classification tasks will be evaluated using f-measure, and the metric for the regression tasks will be root mean squared error.


Task 1a: predict if the text is classed as humorous and/or offensive, or other, by an average of annotators. Submit a zipped folder containing a file named task-1a-output.csv with three columns:


  • id: the id of the text given in the dataset
  • class_humor: a prediction about whether the class of the text is humor, a binary choice 1 or 0
  • class_offense: a prediction about whether the class of the text is offense, a binary choice 1 or 0


Task 1b: predict the average funniness or offensiveness score given by annotators. Submit a zipped folder containing a file named task-1b-output.csv with three columns:

  • id: the id of the text given in the dataset
  • rating_humor: a prediction of the humor rating given
  • rating_offense: a prediction of the offensiveness rating



Task 2a: predict if the text is classed as humorous and/or offensive, or other, by different age and gender groups of annotators. Submit a zipped folder containing a file named task-2a-output.csv with three columns:

  • id: the id of the text given in the dataset
  • class_humor: a prediction about whether the class of the text is humor, a binary choice 1 or 0
  • class_offense: a prediction about whether the class of the text is offense, a binary choice 1 or 0


Task 2b: predict the average funniness or offensiveness score given by different age and gender groups of annotators. Submit a zipped folder containing a file named task-2b-output.csv with three columns:

  • id: the id of the text given in the dataset
  • rating_humor: a prediction of the humor rating given
  • rating_offense: a prediction of the offensiveness rating


Terms

  • By submitting results to this competition, you consent to the public release of your scores at this website and at the SemEval 2021 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgments, qualitative judgments, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
  • You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgment that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
  • This task has a single evaluation phase. To be considered a valid participation/submission in the task's evaluation, you agree to submit a single (possibly empty) list of character offsets (as in the task overview) per test text (post), for every test text. 
  • Each team must create and use exactly one CodaLab account.
  • Team constitution (members of a team) cannot be changed after the evaluation phase has begun.
  • During the evaluation phase, each team can submit as many as ten submissions; the top-scoring submission will be considered as the official submission to the competition.
  • The organizers and the organizations they are affiliated with make no warranties regarding the datasets provided, including but not limited to being correct or complete. They cannot be held liable for providing access to the datasets or the usage of the datasets.
  • Each task participant will be assigned at least one other teams' system description paper for review, using the START system. The papers will thus be peer reviewed.

Schedule

  • Trial data ready: July 31, 2020
  • Task website ready: August 14, 2020
  • Training data ready: October 1, 2020
  • Test data ready: December 3, 2020
  • Evaluation start: January 10, 2021
  • Evaluation end: January 31, 2021
  • Paper submission due: February 23, 2021
  • Notification to authors: March 29, 2021
  • Camera ready due: April 5, 2021
  • SemEval workshop: Summer 2021


Organizers





Practice

Start: July 31, 2020, noon

Description: Explore the dataset and task.

Development - Task 1a

Start: Oct. 1, 2020, noon

Description: Predict if a text is classed as funny and/or offensive or other in general

Development - Task 1b

Start: Oct. 1, 2020, noon

Description: Predict how funny or offensive a text is in general

Development - Task 2a

Start: Oct. 1, 2020, noon

Description: Predict if a text is classed as funny and/or offensive or other for different age groups

Development - Task 2b

Start: Oct. 1, 2020, noon

Description: Predict how funny or offensive a text is for different age groups

Evaluation - Task 1a

Start: Jan. 10, 2021, noon

Description: Evaluate your trained system on our test data.

Evaluation - Task 1b

Start: Jan. 10, 2021, noon

Description: Evaluate your trained system on our test data.

Evaluation - Task 2a

Start: Jan. 10, 2021, noon

Description: Evaluate your trained system on our test data.

Evaluation - Task 2b

Start: Jan. 10, 2021, noon

Description: Evaluate your trained system on our test data.

Post-Evaluation - Task 1a

Start: Feb. 1, 2021, noon

Description: Explore the dataset without participating in SemEval.

Post-Evaluation - Task 1b

Start: Feb. 1, 2021, noon

Description: Explore the dataset without participating in SemEval.

Post-Evaluation - Task 2a

Start: Feb. 1, 2021, noon

Description: Explore the dataset without participating in SemEval.

Post-Evaluation - Task 2b

Start: Feb. 1, 2021, noon

Description: Explore the dataset without participating in SemEval.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In