SemEval-2018 task 3 - Irony detection in English tweets Forum

Go back to competition Back to thread list Post in this thread

> Using other data

Hi, i want to know that using other data which is not offered from semeval is okay.
Can i use external data?

Posted by: minok00 @ Nov. 2, 2017, 7:46 a.m.

Hi,

Yes you are allowed to use external data, as long as you describe them well in the system description paper.

Cheers
Cynthia

Posted by: cyvhee @ Nov. 22, 2017, 9:58 a.m.

@Cynthia
You said that participants can use external data. But I think that you should set some limitation. Specifically, the test data does not contain the irony hashtags but I can easily to restore the tags by querying to Twitter (e.g., I can restore 75% tweets from the test data). With the help of the hashtags, I can achieve much higher performance.

I don't know whether or not that activity is allowed? If yes, it is not fair for other participants who do not apply that trick.
Thanks & regards,
T

Posted by: thanhvu @ Jan. 24, 2018, 11:02 a.m.

Hi

It was mentioned in an earlier post at the form that external data like sentiment lexicons, dictionaries and embeddings can be used for feature engineering purposes. We trust on the participants' fair play that systems will not be tuned as the result of manual labelling of the test set (with or without hashtag information), as this completely overshoots the purpose of the task (i.e. to achieve better insights into irony detection). Evidently, submissions with suspiciously high scores given the described approach (and compared to other teams with similar approaches) will be noticed and possibly not considered for the official ranking.

Kind regards
Cynthia

Posted by: cyvhee @ Jan. 24, 2018, 11:37 a.m.

Thanks for your answer. However, it is still not clear to me. For example, if someone just restores the original tweets (i.e., not removing the irony hashtags), that means they do nothing with looking at the label of the test data or manually annotate the data), is that participant considered as playing fair?

The second problem is that when annotators had annotated the data, did they look at the original tweet or the tweet after removing the irony hashtags? If they look at the original tweet, I think that it is not really right to just only provide the data after removing the irony hashtags for testing.

Posted by: thanhvu @ Jan. 24, 2018, 11:52 a.m.

Here are some examples of the train data which I think without the irony hashtags, it is really confusing to give it the right label
46 1 Luv this #not
116 1 We want turkey!! #not
147 1 @JordanNoftall that's funny #not
204 1 Love this weather #Not

Moreover, the way to remove all the irony tags (probably programmatically) is not right, for example,
140 0 @DFDSUKUpdates why all the delays? #not happy
--> 140 0 @DFDSUKUpdates why all the delays? happy
The first one is clearly not irony while the second one should be irony?

Posted by: thanhvu @ Jan. 24, 2018, 12:07 p.m.

"Thanks for your answer. However, it is still not clear to me. For example, if someone just restores the original tweets (i.e., not removing the irony hashtags), that means they do nothing with looking at the label of the test data or manually annotate the data), is that participant considered as playing fair?"
> No, participants are expected to submit predictions for the test data as provided for this task.

"The second problem is that when annotators had annotated the data, did they look at the original tweet or the tweet after removing the irony hashtags? If they look at the original tweet, I think that it is not really right to just only provide the data after removing the irony hashtags for testing."
> Annotators had the original tweets at their disposition (as illustrated on the CodaLab "Overview" page). However, the test data does not contain ironic utterances that cannot be identified as such without an irony hashtag.

Posted by: cyvhee @ Jan. 24, 2018, 12:08 p.m.
Post in this thread