Sarcasm is defined as a form of figurative language, which often presents a sharp, bitter, or cutting expression or remark. Consider the following examples:
Sarcasm detection has received considerable attention in the NLP community in recent years (Joshi, 2016). Computational approaches for sarcasm detection have modeled either the utterance in isolation or together with contextual information such as conversation context, author context, visual context, or cognitive features. This current shared task aims to study the role of conversation context for sarcasm detection (Ghosh et al., 2018).
We will be using two different datasets: Twitter conversations and conversation threads from Reddit. For both datasets, we will provide the immediate context (i.e., only the previous dialogue turn) as well as the full dialogue thread, when available. The goal is to understand how much conversation context is needed or helpful for sarcasm detection.
The computational models proposed by Ghosh et al. (2018), already released as open-source, will be used as a baseline. Sarcasm data is collected from Twitter using the standard hashtags of #sarcasm and #sarcastic. The Reddit data will be a subset from the datasets introduced by Khodak et al. (2017).
As mentioned, there will be two datasets available for this shared task. You can elect to participate in the shared task using one or/and two of the datasets.
For both the Twitter dataset and Reddit dataset we will provide training and test utterances, thus, you do not have to use APIs to download data. We also provide a baseline classification model for prediction. The metric for comparison will the F-measure.
Training phase: Data is released for developing and training your sarcasm detection software. You could do cross-validation on the training data, or partition the training data further to have a held-out set for preliminary evaluations, and/or set apart a subset data for development/tuning of hyper-parameters. However the released data is used, the goal is to have N final systems (or versions of a system) ready for test when the test data is released.
Testing phase: Test instances are released. Each team generates predictions for the test instances, for up to N models. The submitted predictions are evaluated (by us) against the true labels. Submissions will be anonymized -- only the highest score of all submitted systems per day will be displayed. The metric will be F-measure with Precision and Recall also available via the detailed results link.
NOTE: If you submit the predictions for the Twitter test set please note that the submission must be named "twitter_answer.txt" and compressed into "twitter_answer.zip" for the evaluation to work. Likewise, if you submit the predictions for Reddit, the submission must be named "reddit_answer.txt" and compressed into "reddit_answer.zip" file. The answer files will have two columns (tab-separated), where the first column will depict the identifier of the test data and the second column will depict the prediction (i.e., "SARCASM" or "NOT_SARCASM").
We plan to have N=12 (up to 12 submissions per team) — this way, a number of versions of your system can be evaluated, without overwhelming the competition. You can try as many submissions as are allowed by CodaLab (1000), but only the last 12 submissions will be valid for comparative evaluation purposes with other systems participating in the shared task. For generating a task summary paper, we require that each submission provide the mandatory team name/individual, method name and method description. Specifically, please include any 3rd party datasets you train your system on (if any) in the method description textbox.
By submitting results to this competition, you consent to the public release of your scores at the Figurative Language Processing-2020 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
Training and testing data will be made available and persist on github after the competition to encourage research in metaphor detection. You agree not to redistribute the these data except in the manner prescribed by its licence.
Start: Jan. 19, 2020, midnight
Start: Jan. 19, 2020, midnight
March 22, 2020, 6:31 p.m.
You must be logged in to participate in competitions.Sign In