SemEval 2019 Task 9 - SubTask B - Suggestion Mining from Online Reviews and Forums

Organized by TDaudert - Current server time: May 19, 2019, 6:23 a.m. UTC

First phase

Scoring the Trial Data
Aug. 12, 2018, midnight UTC

End

Competition Ends
Feb. 1, 2019, 11:59 a.m. UTC

 

Welcome to the Subtask B for SemEval 2019 task 9 on suggestion mining. A detailed introduction is provided on the codalab page for Subtask A.

   

Task Organisers

     sapna.negi@genesys.com
     Genesys Telecommunications Laboratory Inc, Galway, Ireland
     tobias.daudert@insight-centre.org
     Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland
     paul.buitelaar@insight-centre.org
     Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland
 
For any discussions related to the task, please post and refer to the google group.

Evaluation

Suggestion mining task will comprise of two subtasks. Participating teams should participate in at-least one of the subtasks.
All scripts and data can be downloaded here: https://github.com/Semeval2019Task9?tab=repositories

Sub-task A 

Under this subtask, participants will perform domain specific suggestion mining, where the test dataset will belong to the same domain as the training and development datasets, i.e. suggestion forum for windows platform developers. A separate codalab page is set for subtask A

Sub-task B

Under this subtask, participants will perform cross domain suggestion mining, where train/development and test datasets will belong to separate domains. Train and development datasets will remain the same as subtask A, while the test dataset will belong to the domain of hotel reviews.
This means that a model trained on the suggestion forum dataset will be evaluated on the hotel review dataset.

 

Evaluation Metrics 

Classification performance of the submissions will be evaluated on the basis of F-1 score for the positive class, i.e. the suggestion class. F1 score will range from 1 to 0.

Predicted Label Actual Label
  Suggestion Non-Suggestion
Suggestion True Positive False Positive
Non-Suggestion False Negative True Negative

 

Given that Psugg, Rsugg, and F1sugg are the precision, recall and F1 score for the suggestion class:

Psugg = True Positives / (True Positives + False Positives)

Rsugg = True Positives / (True Positives + False Negatives)

F1sugg  = 2 * (Psugg * Rsugg) / (Psugg + Rsugg)

 

Rule based systems

The submissions will not be limited to the statistical classifiers. In the case of rule based systems, participants can choose to participate in the two subtasks with the same system. Participants can also submit different rule based systems for the two subtasks.

 


Additional resources

Both rule based systems and statistical systems are allowed to use additional language resources, with one exception. Participants are prohibited from using additional hand labeled training datasets for any of the domain, i.e. data where sentences are manually labeled as suggestion and non-suggestions.
Any other resources which are readily available on the web or are generated using automated systems. Eg. scraping text from a website which can be automatically identified as suggestion, automatically tagging additional data using a system trained on the provided training data.

 

The datasets and evaluation scripts are available at our Github page . The datasets will be incrementally available as per the SemEval-2019 timelines. Please refer to the Terms and Condition for the timelines.

Submitted systems

  • Teams should not use manually labeled training dataset (suggestion labeled sentences) from the same domain.
  • Teams are allowed to use silver standard datasets, for example, sentences scraped from the web which are likely to be suggestions or non-suggestions but are not further manually labeled. The silver standard dataset can belong to the same domain.
  • Teams cannot use SubtaskB trial test set as a training set.
  • Teams can look at the trial test dataset labels for validating their system, error analysis etc.
  • In the case of systems which typically use a validation dataset for automatic hyperparameter tuning, teams can use the trial test set as a validation set.  Please mention this in the description section and submit your source code (commenting where exactly is the trial test set used) in this case, using the Project URL section on CodaLab.  
  • Only one final submission will be recorded per team. The leaderboard will only show an updated submission if results are higher.

 Permissions

  • All data released for this task is done so under the Creative Commons License (licenses could also be found with the data).

  • Organizers of the competition might choose to publicize, analyze and change in any way any content submitted as a part of this task. Wherever appropriate, academic citation for the sending group would be added (e.g. in a paper summarizing the task).

 

The teams wishing to participate in SemEval 2019 should strictly adhere to the following deadlines.

Task Schedule for SemEval2019

  • 20 Aug 2018: Trial data and evaluation script available
  • 18 September 2018: Training, development data ready. Benchmark system results available.
  • 5 Jan 2019: Trial test labels released. Trial phase submissions not accepted anymore on CodaLab, trial phase leaderboard frozen.
  • 12 Jan 2019: Evaluation starts
  • 26 Jan 2019: Evaluation period ends
  • 05 Feb, 2019: Results posted
  • 23 Feb 2019: System description paper submissions
  • 16 Mar 2019: Paper reviews due for system description papers
  • 29 Mar 2019: Author notifications
  • 5 Apr 2019: Camera ready submissions
  • 6-7 June, 2019: SemEval 2019 event takes place in Minneapolis, USA

All deadlines correspond to 23:59 GMT -12:00 of the given dates.

Competitions should comply with any general rules of SEMEVAL.

 The organizers are free to penalized or disqualify for any violation of the above rules or for misuse, unethical behaviour or other behaviours they agree are not accepted in a scientific competition in general and in the specific one at hand.

 

Please contact the task organisers or post on the task mailing list if you have any further queries.

 

Annotation Overview

Oxford dictionary defines suggestion as, An idea or plan put forward for consideration. Some of the listed synonyms of suggestions are proposal, proposition, recommendation, advice, hint, tip, clueIn our annotation study, we observe that human perception of the term suggestion is subjective, and this effects the preparation of hand labeled datasets for suggestion mining.

The datasets provided under this task are backed by a study of suggestions appearing in different domains and formalisation of the definition of suggestions in the context of suggestion mining [1]. The datasets have been annotated in two phases, where phase-1 employs crowdsourced annotators, and phase-2 employs in-house expert annotators.

The final datasets comprise of only those sentences tagged as suggestions which explicitly express suggestions (explicit suggestions), and not just provide information which could be used to infer suggestions (implicit suggestions). For example, 'I loved the cup cakes from the bakery next door' is an implicit form of a suggestion which can be explicitly expressed as,  'Do try the cupcakes from the bakery next door'.

In this year's SemEval, we are evaluating suggestion mining systems for two domains:

 

Suggestion Forums - Subtask A

Suggestion forums are dedicated forums which are used to provide suggestions for improvement in an entity. The data is collected from feedback posts on Universal Windows Platform, available on uservoice.com
Often people tend to provide the context in suggestion posts, which gets repetitive in the case of large number of posts under the same topic (see the snapshot below). Suggestion mining can act as automatic summarisation in this use case, by identifying the sentences where a concrete suggestion is expressed. We obserce that the datasets derived from this domain contain a relatively larger number of positive class instances, as compared to the other domains. The sentences are automatically split using stanford's parser.
 
Under the subtask A, training and validation sets will be provided for this domain, and the submissions will be evaluated on a test dataset from the same domain.
 
Number of sentences in each dataset
  Trial data: Train set Trial data: Test set  Train Development  Test 
Suggestions 585  296 TBD TBD    TBD
Non-suggestions 1915 296  TBD TBD   TBD


 

 

                                          

Hotel Reviews - Subtask B

Wachsmuth et al. (2014) [2] provide a large sentiment analysis dataset of hotel reviews from the TripAdvisor website. We take a subset of these reviews, the sentences were already split in the dataset. The hotel review dataset will be used as the test dataset for subtask B. As mentioned in Terms and Conditions, participants are free to use additional non-labeled datasets. The raw hotel review dataset provided by Wachsmuth et al. (2014) could be one such dataset, and is openly available.

Number of sentences in each dataset
  Trial data: Test set Test set 
Suggestions 404     TBD
Non-suggestions 404   
 TBD

 

 

 
 
 
 


  

 

References

[1] Sapna Negi, Maarten de Rijke, and Paul Buitelaar. Open Domain Suggestion Mining: Problem Definition and Datasets. arXiv preprint arXiv:1806.02179 (2018).

[2] Henning Wachsmuth, Martin Trenkmann, Benno Stein, Gregor Engels, and Tsve- tomira Palakarska. "A review corpus for argumentation analysis. In Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing, volume 8404 of LNCS, pages 115–127, Kathmandu, Nepal, 2014. Springer.

Scoring the Trial Data

Start: Aug. 12, 2018, midnight

Scoring the Test Data

Start: Jan. 12, 2019, 11 a.m.

Competition Ends

Feb. 1, 2019, 11:59 a.m.

You must be logged in to participate in competitions.

Sign In