This is the CodaLab Competition for Subtask 1 of SemEval-2017 Task 7: Detection and Interpretation of English Puns. The competition took place in January 2017 and the official results were presented at the SemEval-2017 workshop in August 2017. The CodaLab Competition has now been re-opened on an unofficial basis for the benefit of others who wish to use the automated scoring system.
A pun is a form of wordplay in which one signifier (e.g., a word or phrase) suggests two or more meanings by exploiting polysemy, or phonological similarity to another signifier, for an intended humorous or rhetorical effect. For example, the first of the following two punning jokes exploits contrasting meanings of the word "interest", while the second exploits the sound similarity between the surface form "propane" and the latent target "profane":
I used to be a banker but I lost interest.
When the church bought gas for their annual barbecue, proceeds went from the sacred to the propane.
Puns where the two meanings share the same pronunciation are known as homophonic or perfect, while those relying on similar- but not identical-sounding signs are known as heterophonic or imperfect. Where the signs are considered as written rather than spoken sequences, a similar distinction can be made between homographic and heterographic puns.
Conscious or tacit linguistic knowledge – particularly of lexical semantics and phonology – is an essential prerequisite for the production and interpretation of puns. This has long made them an attractive subject of study in theoretical linguistics, and has led to a small but growing body of research into puns in computational linguistics. Most computational treatments of puns to date, however, have focused on generational algorithms or modelling their phonological properties.
Participants will be provided with two data sets:
This subtask is a binary classification task. Participating systems must classify each context according to whether or not it contains a pun.
The evaluation for this subtask will be carried out in two simultaneous phases, one for the homographic data set and one for the heterographic data set. Systems may participate in either or both phases.
Systems participating in a given phase must classify all contexts in the data set. Contexts must be classified as either containing or not containing a pun.
The classification results for each phase must be submitted in a delimited text file named answer.txt. Each line consists of two fields separated by horizontal whitespace (a single tab or space character). The first field is the ID of a context from the data set. The second field is either 1 if the text contains a pun, or 0 if the text does not contain a pun. Sample data and results files are available in the trial data.
To submit the results, place answer.txt in a ZIP file (in the top-level directory), and then upload it to CodaLab according to the instructions at Participating in a Competition.
Systems will be scored using the standard precision, recall, accuracy, and F1 measures as used in classification:
By submitting results to this competition, you consent to the public release of your scores at the SemEval-2017 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
You agree not to redistribute the test data except in the manner prescribed by its licence.
Start: Dec. 5, 2016, midnight
Start: Jan. 9, 2017, midnight
Start: Jan. 9, 2017, midnight
Jan. 15, 2050, 11:59 p.m.
You must be logged in to participate in competitions.
Sign In