SemEval-2017 Task 6: "#HashtagWars Learning a Sense of Humor" - Subtask A

Organized by jgc128 - Current server time: May 23, 2019, 6:46 p.m. UTC

First phase

Development (Trial)
Aug. 1, 2016, midnight UTC


Competition Ends
Jan. 31, 2017, 7:59 a.m. UTC


This competition is dedicated to the Subtask A: Pairwise Comparison of SemEval-2017 Task 6: "#HashtagWars Learning a Sense of Humor".  If you are participating in both subtasks, make sure to do a submission in the competition for the Subtask B as well.

Humor is an essential trait of human intelligence that has not yet been addressed extensively in the current AI research. Most work on humor detection to date has approached the problem as binary classification: humor or not humor. This representation ignores both the continuous nature of humor and the fact that humour is subjective. To address these concerns, we introduce the task #HashtagWars: Learning a Sense of Humor. The goal of this task is to encourage the development of methods that take into account the continuous nature of humor, on the one hand, and aim to characterize the sense of humour of a particular source, on the other hand.

Task Definition

The dataset for this task is based on humorous responses submitted to a Comedy Central TV show. Debuting in Fall 2013, the Comedy Central show @midnight ( is a late-night "game-show" that presents a modern outlook on current events by focusing on content from social media. The show's contestants (generally professional comedians or actors) are awarded points based on how funny their answers are. The segment of the show that best illustrates this attitude is the Hashtag Wars (HW).


Every episode the show's host proposes a topic in the form of a hashtag, and the show's contestants must provide tweets that would have this hashtag. Viewers are encouraged to tweet their own responses. In the next episode, the show finds the ten funniest tweets from the viwers' responses. From this top-10, the show selects a single winning tweet. From the point of view of the show, all tweets in the top-10 are funnier than the non-top-10 tweets, and the winning tweet is funnier than the rest of the tweets in the top ten. Therefore, we are able to apply labels that determine how relatively humorous the show finds a given tweet.

We advise potential participants to watch clips from the HW segment available from the show's webpage for a better understanding of the task.

Because of the contest's format, it provides an adequate method for addressing the selection bias often present in machine learning techniques. Consequently, tweets are seen not as humor/non-humor, but rather varying degrees of wit and cleverness. Moreover, given the subjective nature of humor, labels in the dataset are only "gold" with respect to the show's sense of humor. This concept becomes more grounded when considering the use of supervised systems for the dataset.

Goal of the task

The goal of the task is to learn to characterize the sense of humor represented in this show. Given a set of hashtags, the goal is to predict which tweets the show will find funnier within each hashtag. The degree of humor in a given tweet is determined by the labels provided by the show.

More info

More information available on the SemEval-2017 website.

Join our mailing list for questions or important updates.

Evaluation Criteria

This competition is dedicated to the Subtask A of SemEval-2017 Task 6: "#HashtagWars Learning a Sense of Humor". If you are participating in both subtasks, make sure to do a submission for the competition for Subtask B as well.

Subtask A: Pairwise Comparison

Given two tweets, a successful system will be able to predict which tweet is funnier, according to the gold labels of the tweets.

For evaluation, we will release data formatted exactly like the Trial/Training data, but without labels. To evaluate this subtask, teams will produce predictions for every possible combination of tweet pairs from a given Evaluation file. The evaluation script will then select the appropriate pairs for evaluation. The evaluation metric is accuracy micro-averaged across all evaluation files.

Input format

For each hashtag, a team should produce a file, using the following filename template: Hashtag_File_PREDICT.tsv. For example, for the hashtag Fast_Food_Books, the file with predictions should be called Fast_Food_Books_PREDICT.tsv

This file should contain prediction for each possible pair of tweets, formatted as follows: <tweet1_id>\t<tweet2_id>\t<prediction>\n where <prediction> is 1 if the first tweets is funnier and 0 otherwise. An excerpt from a correctly formatted file is below: 651601127432192000 651608477043789824 1
651601127432192000 651604707035713536 1
651601127432192000 651626009498898433 0
651601127432192000 651600581060984832 1
651601127432192000 651602058492030976 1
651601127432192000 651602018176364544 1
651601127432192000 651615279131168769 1
651601127432192000 651626121251954689 1

All files (without any directories) should be placed into a single zip-archive and submitted usign the "Participate" tab ("Participate" ->"Submit / View Results").

Evaluation metric

The evaluation metric is accuracy, micro-averaged across all evaluation files.

Terms and Conditions

This page enumerated the terms and conditions of the competition.

Development (Trial)

Start: Aug. 1, 2016, midnight

Description: Predictions on the trail data


Start: Jan. 9, 2017, midnight

Description: Predictions on the evaluation data

Competition Ends

Jan. 31, 2017, 7:59 a.m.

You must be logged in to participate in competitions.

Sign In