This competition is dedicated to the Subtask B: Semi-Ranking of SemEval-2017 Task 6: "#HashtagWars Learning a Sense of Humor". If you are participating in both subtasks, make sure to do a submission in the competition for the Subtask A as well.

Humor is an essential trait of human intelligence that has not yet been addressed extensively in the current AI research. Most work on humor detection to date has approached the problem as binary classification: humor or not humor. This representation ignores both the continuous nature of humor and the fact that humour is subjective. To address these concerns, we introduce the task #HashtagWars: Learning a Sense of Humor. The goal of this task is to encourage the development of methods that take into account the continuous nature of humor, on the one hand, and aim to characterize the sense of humour of a particular source, on the other hand.

Task Definition

The dataset for this task is based on humorous responses submitted to a Comedy Central TV show. Debuting in Fall 2013, the Comedy Central show @midnight ( is a late-night "game-show" that presents a modern outlook on current events by focusing on content from social media. The show's contestants (generally professional comedians or actors) are awarded points based on how funny their answers are. The segment of the show that best illustrates this attitude is the Hashtag Wars (HW).


Every episode the show's host proposes a topic in the form of a hashtag, and the show's contestants must provide tweets that would have this hashtag. Viewers are encouraged to tweet their own responses. In the next episode, the show finds the ten funniest tweets from the viwers' responses. From this top-10, the show selects a single winning tweet. From the point of view of the show, all tweets in the top-10 are funnier than the non-top-10 tweets, and the winning tweet is funnier than the rest of the tweets in the top ten. Therefore, we are able to apply labels that determine how relatively humorous the show finds a given tweet.

We advise potential participants to watch clips from the HW segment available from the show's webpage for a better understanding of the task.

Because of the contest's format, it provides an adequate method for addressing the selection bias often present in machine learning techniques. Consequently, tweets are seen not as humor/non-humor, but rather varying degrees of wit and cleverness. Moreover, given the subjective nature of humor, labels in the dataset are only "gold" with respect to the show's sense of humor. This concept becomes more grounded when considering the use of supervised systems for the dataset.

Goal of the task

The goal of the task is to learn to characterize the sense of humor represented in this show. Given a set of hashtags, the goal is to predict which tweets the show will find funnier within each hashtag. The degree of humor in a given tweet is determined by the labels provided by the show.

More info

More information available on the SemEval-2017 website.

Evaluation Criteria

This competition is dedicated to the Subtask B of SemEval-2017 Task 6: "#HashtagWars Learning a Sense of Humor". If you are participating in both subtasks, make sure to do a submission for the competition for Subtask A as well.

Subtask B: Semi-Ranking

Given an input file of tweets for a given hashtag, systems will produce a ranking of tweets from funniest to least funny. Since the tweet files do not relate an explicit rankings, we will be evaluating whether tweets having been placed in the appropriate bucket: winning tweet, top 10 but not winning, and not 10. In a certain sense this can be thought of as labeling, however there is a known cardianlity for tweets in each bucket: 1 tweet, 9 tweets, the rest of the tweets.

Input format

For each hashtag, a team should produce a file, using the following filename template: Hashtag_File_PREDICT.tsv. For example, for the hashtag Fast_Food_Books, the file with predicrtions should be called Fast_Food_Books_PREDICT.tsv

These files should contain tweet ids ranked in decreasing order according to how funny they are, as follows: <winning tweet_id>
<top10 but not winning tweet_id>
<top10 but not winning tweet_id>
<not in top10 tweet_id>
<not in top10 tweet_id>
where <prediction> is 1 if the first tweet is funnier and 0 otherwise. An excerpt from a correctly formatted file is below: 651608477043789824

Evaluation metric

System evaluation will use a measure inspired by edit distance. For each tweet, we will compute how many moves are required for the tweet to be placed in the right bucket. For example, if the winning tweet has been placed in the top-10 but not the winning bucket, and a tweet from the top-10 (but not winning bucket) has been placed in the winning tweet bucket, the total edit error will be 2, 1 for each tweet. The final evaluation measure will be the edit error normalized by 22, the maximum edit error. This evaluation metric is averaged across all evaluation files to produce the final metric.

