Welcome to the pilot challenge on suggestion mining!
Suggestion mining can be defined as the extraction of suggestions from unstructured text, where the term 'suggestions' refers to the expressions of tips, advice, recommendations etc. Consumer opinions towards commercial entities like brands, services, and products are generally expressed through online reviews, blogs, discussion forums, or social media platforms. These opinions largely express positive and negative sentiments towards a given entity, but also tend to contain suggestions for improvising the entity or tips to the fellow consumers. Traditional opinion mining systems mainly focus on automatically calculating the sentiment distribution towards an entity of interest by means of Sentiment Analysis methods. A suggestion mining component can extend the capabilities of traditional opinion mining systems, which can then cater to additonal applications. Such systems can empower both public and private sectors by extracting the suggestions which are spontaneously expressed on various online platforms, enabeling the organisations to collect suggestions from much larger and varied sources of opinions than the tradional suggestion box or online feedback forms.
Suggestion mining remains a relatively young area as compared to Sentiment Analysis, especially in the context of recent advancements in neural network based approaches for learning feature representations. Suggestion mining research could drive the engagement of both commercial entities, as well as the research communities working on problems like opinion mining, supervised learning, representation learning, etc. From a linguistic viewpoint, topics of interest to be explored within this task include extra propositional aspects like mood and modality, as well as determing the importance of different kinds of syntactic and semantic features. It is observed that in some cases the grammatical properties of a sentence can alone decide its label, while at times semantics can play a significant role. In this pilot SemEval task, we introduce suggestion mining as a simple task of classifying given sentences into suggestion and non-suggestion classes. With this task, we will evaluate the submitted systems for two domains, software developers suggestion forum, and hotel reviews. We will evaluate the cross domain performance of statistical models, since suggestions tend to possess similar linguistic properties across domains. It can also prove to be an evaluation for transfer learning methods.
Suggestions mining from online reviews and forums
Considering the domain of discussion forums, all the posts within a single thread are centered around answering a question or replying to a topic defined in the first post of the thread. A number of times these questions/topics are advice seeking, for example, "Advice for our week in Vienna" thread on a travel discussion forum. The answers on discussion forums are conversational and may contain additional contextual and information than what the first post sought. A suggestion mining system can extract the exact sentences where the advice is expressed, which renders a suggestion mining system as a suggestion summarisation system for discussion forums. Another type of discussions forums are dedicated suggestion forums pertaining to a commercial entity. For example, a suggestion forum to share the platform capability requests and general ideas for improving the Windows developer platform. Such forums operate by developers or customers posting messages explaining the improvisations they want to see in the product, which is different from online reviews where the main objective is to provide positive or negative ratings and rest of the information is additional. In the case of developer suggestion forums, the contextual text describing the functionality of the product gets repetitive over a large number of posts and a suggestion mining system would really be helpful to extract sentences containing concrete suggestions. A suggestion post is shown in the image below, where only first and last sentences express suggestions while the rest can be considered as the context.
Some examples of suggestions found among the text from different opinion platforms are listed below.
Source |
Example Suggestion |
---|---|
Electronics Reviews |
I would recommend doing the upgrade to be sure you have the best chance at trouble free operation. |
Electronics Reviews |
My one recommendation to creative is to get some marketing people to work on the names of these things. |
Hotel Reviews |
An electric kettle would have been a good addition to the room. |
Hotel Reviews |
Be sure to specify a room at the back of the hotel. |
Travel Discussion Forum |
If you do book your own airfare, be sure you don’t have problems if Insight has to cancel the tour or reschedule it |
Some of the observed challenges in suggestion mining are:
Suggestion mining task will comprise of two subtasks. Participating teams should participate in at-least one of the subtasks. Relevant scripts and datasets are available at: https://github.com/Semeval2019Task9?tab=repositories
Sub-task A
Under this subtask, participants will perform domain specific suggestion mining, where the test dataset will belong to the same domain as the training and development datasets, i.e. suggestion forum for windows platform developers.
Sub-task B
Under this subtask, participants will perform cross domain suggestion mining, where train/development and test datasets will belong to separate domains. Train and development datasets will remain the same as subtask A, while the test dataset will belong to the domain of hotel reviews.
This means that a model trained on the suggestion forum dataset will be evaluated on the hotel review dataset.
A separate codalab page is set for sub-task B.
Evaluation Metrics
Classification performance of the submissions will be evaluated on the basis of F-1 score for the positive class, i.e. the suggestion class. F1 score will range from 1 to 0. The class distribution in the provided test datasets will be balanced out prior to the release of the test set.
Predicted Label | Actual Label | ||
Suggestion | Non-Suggestion | ||
Suggestion | True Positive | False Positive | |
Non-Suggestion | False Negative | True Negative |
Given that Psugg, Rsugg, and F1sugg are the precision, recall and F1 score for the suggestion class:
Psugg = True Positives / (True Positives + False Positives)
Rsugg = True Positives / (True Positives + False Negatives)
F1sugg = 2 * (Psugg * Rsugg) / (Psugg + Rsugg)
Rule based systems
The submissions will not be limited to the statistical classifiers. In the case of rule based systems, participants can choose to participate in the two subtasks with the same system. Participants can also submit different rule based systems for the two subtasks.
Additional resources
Both rule based systems and statistical systems are allowed to use additional language resources, with one exception. Participants are prohibited from using additional hand labeled training datasets for any of the domain, i.e. data where sentences are manually labeled as suggestion and non-suggestions.
Any other resources which are readily available on the web or are generated using automated systems. Eg. scraping text from a website which can be automatically identified as suggestion, automatically tagging additional data using a system trained on the provided training data.
The datasets and evaluation scripts are available at our Github page . The datasets will be incrementally available as per the SemEval-2019 timelines. Please refer to the Terms and Condition for the timelines.
All data released for this task is done so under the Creative Commons License (licenses could also be found with the data).
Organizers of the competition might choose to publicize, analyze and change in any way any content submitted as a part of this task. Wherever appropriate, academic citation for the sending group would be added (e.g. in a paper summarizing the task).
The teams wishing to participate in SemEval 2019 should strictly adhere to the following deadlines.
Task Schedule for SemEval2019
Competitions should comply with any general rules of SEMEVAL.
The organizers are free to penalized or disqualify for any violation of the above rules or for misuse, unethical behaviour or other behaviours they agree are not accepted in a scientific competition in general and in the specific one at hand.
Please contact the task organisers or post on the task mailing list if you have any further queries.
Oxford dictionary defines suggestion as, An idea or plan put forward for consideration. Some of the listed synonyms of suggestions are proposal, proposition, recommendation, advice, hint, tip, clue. In our annotation study, we observe that human perception of the term suggestion is subjective, and this effects the preparation of hand labeled datasets for suggestion mining.
The datasets provided under this task are backed by a study of suggestions appearing in different domains and formalisation of the definition of suggestions in the context of suggestion mining [1]. The datasets have been annotated in two phases, where phase-1 employs crowdsourced annotators, and phase-2 employs in-house expert annotators.
The final datasets comprise of only those sentences tagged as suggestions which explicitly express suggestions (explicit suggestions), and not just provide information which could be used to infer suggestions (implicit suggestions). For example, 'I loved the cup cakes from the bakery next door' is an implicit form of a suggestion which can be explicitly expressed as, 'Do try the cupcakes from the bakery next door'.
In this year's SemEval, we are evaluating suggestion mining systems for two domains, suggestion forums and hotel reviews. Datasets are available at: https://github.com/Semeval2019Task9?tab=repositories
Trial data: Train set | Trial data: Test set | Train | Development | Test | |
Suggestions | 1428 | 296 | TBD | TBD | TBD |
Non-suggestions | 4356 | 296 | TBD | TBD | TBD |
Wachsmuth et al. [2] provide a large sentiment analysis dataset of hotel reviews from the TripAdvisor website. We take a subset of these reviews, the sentences were already split in the dataset. The hotel review dataset will be used as the test dataset for subtask B. As mentioned in Terms and Conditions, participants are free to use additional non-labeled datasets. The raw hotel review dataset provided by Wachsmuth et al. (2014) could be one such dataset, and is openly available.
Trial data: Test set | Test set | |
Suggestions | 404 | TBD |
Non-suggestions | 404 | TBD |
[1] Sapna Negi, Maarten de Rijke, and Paul Buitelaar. Open Domain Suggestion Mining: Problem Definition and Datasets. arXiv preprint arXiv:1806.02179 (2018).
[2] Henning Wachsmuth, Martin Trenkmann, Benno Stein, Gregor Engels, and Tsve- tomira Palakarska. "A review corpus for argumentation analysis. In Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing, volume 8404 of LNCS, pages 115–127, Kathmandu, Nepal, 2014. Springer.
Start: Aug. 18, 2018, midnight
Start: Jan. 10, 2019, midnight
Jan. 31, 2019, midnight
You must be logged in to participate in competitions.
Sign In