Held as part of the evaluation forum IberLEF in the XXXV edition of the International Conference of the Spanish Society for Natural Language Processing (SEPLN 2019)
The workshop and shared task "Sentiment Analysis at SEPLN (TASS)" has been held since 2012, under the umbrella of the International Conference of the Spanish Society for Natural Language Processing (SEPLN). TASS was the first shared task on sentiment analysis in Twitter in Spanish. Spanish is the second language used in Facebook and Twitter , which calls for the development and availability of language-specific methods and resources for sentiment analysis. The initial aim of TASS was the furtherance of research on sentiment analysis in Spanish with a special interest on the language used in Twitter.
Although sentiment analysis is still an open problem, the Organization Committee would like to foster research on other tasks related to the processing of the semantics of texts written in Spanish. Consequently, the name of the workshop/shared task has been changed to "Workshop on Semantic Analysis at SEPLN (TASS)".
The Organization Committee appeals to the research community to propose and organize evaluation tasks related to other semantic tasks in the Spanish language. New tasks provide an opportunity to create linguistic resources, evaluate their usefulness, and promotes the consolidation of a community of researchers interested in the addressed topics. Thus, we encourage the semantic processing community to propose and submit an evaluation tasks (see Proposal of Tasks).
The tasks we propose are the natural evolution from TASS 2018 Task 1. The first aim of this task is the furtherance of research on sentiment analysis in Spanish with a special interest on the language used in Twitter. The target community for this task is any research group working in this area. Traditionally, we have had about ten systems each year. The task tries to attract Hispanic American groups and offering a common meeting point in the research of this type of task.
This task focuses on the evaluation of polarity classification systems of tweets written in Spanish. The submitted systems will have to face up with the following challenges:
The participants will be provided with a training, a development and several test corpora (see important dates). All the corpora are annotated with 4 different levels of opinion intensity (P, N, NEU, NONE).
In case the participants submit a supervised or semi-supervised system, it must be only trained with provided training data and it is totally forbidden the use of other training set. However, linguistic resources like lexicons, vectors of word embeddings or knowledge bases can be used. We want a fair competition and furtherance the creativity, so we want to assess the originality of the systems given the same set of training data.
On each submitted data set, macro precission, macro recall, macro F1-score and accuracy are computed. Submissions are ranked according to macro F1-score.
Registering to the task and submitting your results only implies registration in TASS task. To attend it, the registration must be made through the space provided for that purpose on the page of the XXXV International Congress of the Spanish Society for Natural Language Processing (http://www.sepln2019.com/)
This license is signed between the undersigned user/user group and MeaningCloud Europe (Spain), acting on behalf of the organizing committee of the TASS workshop for sentiment analysis and online reputation analysis focused on Spanish, which also includes SINAI group at University of Jaén (Spain).
The TASS Dataset is a corpus of texts (mainly tweets) in Spanish tagged for Sentiment Analysis related tasks. It is divided into several subsets created for the various tasks proposed in the different editions through the years. All the information on these datasets can be found in the TASS website at http://www.sepln.org/workshops/tass
The Dataset is disclosed on a voluntary basis by the concerned persons. The Dataset contains the actual data as well as any derivative work, products or services based on all or part of the data. All data contained within the Dataset have been collected and processed in accordance with the laws applicable in Spain.
The Dataset is the sole property of the Licensor and is protected by copyright. The Licensor reserves all rights to use in any way and distribute the Dataset. The Dataset shall remain the exclusive property of the Licensor. The End User acquires no ownership, rights or title of any kind with regard to the Dataset.
Subject to prior identification of the End User and signature of this License, the Dataset is freely available to the benefit of the End User. The Licensor grants to the End User the right to use the Dataset, for its own internal and non-commercial use and for the purpose of scientific research only.
The End User shall respond for any infringement of the present License by one of their subsidiaries and/or student’s. The End User may only disclose, give access, and/or transfer the rights related to the Dataset to subsidiaries and/or students, under the following conditions:
The Licensor grants to the End User the right to reproduce temporarily, to adapt, arrange and modify by any means the Dataset. The Licensor grants to the End User the right to rework and build upon the Dataset, or any component thereof, as necessary or desirable for research or technology development activity and create derivative products or services for the End User's own internal research and development. The End User is permitted to make a copy of the Dataset for archiving only. This License gives no right of any kind to the End User over the Dataset. The License is deemed non-exclusive and nontransferable to third parties.
The End User may only use the Dataset after this License has been signed by filling in the form below this license. The Dataset will be accessible using the provided URL.
The End User shall not, without prior authorization of Licensor, transfer in any way, permanently or temporarily, distribute or broadcast all or part of the Dataset to third parties.
Research includes all type of scientific research, irrespective of the object under scrutiny, aimed at achieving a progress in science. The Dataset can be used in any kind of research.
Any commercial use of the Dataset is strictly prohibited.
Commercial use of the Dataset includes, but is not limited to:
Any violation of this clause will give rise to immediate legal prosecution by the Licensor.
Any damages and/or unfair enrichment or the like of the End User due to the breach of the License shall be immediately restituted to the Licensor together with the derivative works, products and services based on the Dataset.
The End User shall reference the Dataset, or results obtained with it, in publications. Publications include, but are not limited to Research papers, Articles, Presentations for conferences or educational purposes. Small portions of the Dataset cannot be used in any publication.
All publications that report on research that use the Dataset will acknowledge this by referring one of the following publications:
Any illegal or criminal use of the Dataset by the End User is strictly prohibited.
The Dataset is granted without any warranty. Licensor shall not be held responsible for any damage caused by the use of the Dataset. Licensor shall not be held responsible of any illegal or criminal use of the Dataset by the End User.
This License is subject to and interpreted in accordance with Spanish Law. Any claim arising on the basis of this License shall exclusively be submitted to the Courts of Madrid, Spain.
The Licensor is allowed to amend this License at any time without prior announcement or consent to/of the End User. The End User can opt out of this License by contacting the Licensor at any time.
The End User warrants that they are authorized signatory, adult and not legally forbidden to enter into this License. The End User warrants that they have read and understood all elements contained herein and that the signature apposed hereunder is the result of a fully aware decision.
By signing this License, the End User engages to strictly respect the conditions set forth herein and to respect all the laws applicable in Spain in relation to data and personality protection with regard to the data contained within the Dataset collected and processed by the Licensor:
|March 7, 2019:||Registration open|
|March 19, 2019:||Training and development sets released|
|May 1, 2019:||Test set released|
|May 12, 2019:||Submission end|
|May 14, 2019:||Results posted|
|May 25, 2019:||Deadline for paper submission|
|June 15, 2019:||Notification of acceptance|
|July 3, 2019:||Deadline for camera ready paper submission|
|September 24, 2019:||IberLEF, SEPLN 2019, Bilbao (Spain)|
Format details will be communicated shortly, according to the specifications of IberLEF organizers.
More info at IberLEF webpage.
Manuel Carlos Díaz Galiano (Universidad de Jaén, Spain)
Miguel Ángel García Cumbreras (Universidad de Jaén, Spain)
Manuel García Vega (Universidad de Jaén, Spain)
Arturo Montejo Ráez (Universidad de Jaén, Spain)
Edgar Casasola Murillo (University of Costa Rica, Costa Rica)
Marco Antonio Sobrevilla Cabezudo (University of São Paulo, Brazil)
Luis Chiruzzo (University of la República, Uruguay)
Eugenio Martínez Cámara (University of Granada, Spain)
Daniela Moctezuma (Centro de Investigación en Ciencias de Información Geoespacial, Mexico)
The starting kit at Participate > Files contains the script evaluate.py which you can use to test your submission files. It expects two input files: a submission candidate file and a gold standard. It dumps scoring metrics to standard output.
For each subtask, a unique submission file has to be uploaded (up to 3 submissions). This file is a ZIP file containing the predictions for each language with the following naming convention (ISO country codes):
/- |- es.tsv |- pe.tsv |- cr.tsv
|- mx.tsv \- uy.tsv
Your submission may contain predictions for all targeted languages or only some of them (even just one). Therefore, your submission will be scored only against provided languages. For both subtasks, submitted ZIP files follow the same structure.
Each TSV is a plain text file with the following format (Tab Separated Values):
tweet_id \t polarity
There is a Google Group list where you can post your questions: Tass2019 Google Group
Start: March 19, 2019, midnight
Start: May 1, 2019, midnight
Start: May 1, 2019, midnight
May 12, 2019, midnight
You must be logged in to participate in competitions.Sign In