TASS 2019

Organized by amontejo - Current server time: March 22, 2019, 6:08 p.m. UTC

Current

Task 2: InterTASS crosslingual
May 1, 2019, midnight UTC

Next

Task 1: InterTASS monolingual
May 1, 2019, midnight UTC

End

Competition Ends
May 12, 2019, midnight UTC

Held as part of the evaluation forum IberLEF in the XXXV edition of the International Conference of the Spanish Society for Natural Language Processing (SEPLN 2019)

 

About the task

The workshop and shared task "Sentiment Analysis at SEPLN (TASS)" has been held since 2012, under the umbrella of the International Conference of the Spanish Society for Natural Language Processing (SEPLN). TASS was the first shared task on sentiment analysis in Twitter in Spanish. Spanish is the second language used in Facebook and Twitter [1], which calls for the development and availability of language-specific methods and resources for sentiment analysis. The initial aim of TASS was the furtherance of research on sentiment analysis in Spanish with a special interest on the language used in Twitter.

Although sentiment analysis is still an open problem, the Organization Committee would like to foster research on other tasks related to the processing of the semantics of texts written in Spanish. Consequently, the name of the workshop/shared task has been changed to "Workshop on Semantic Analysis at SEPLN (TASS)".

The Organization Committee appeals to the research community to propose and organize evaluation tasks related to other semantic tasks in the Spanish language. New tasks provide an opportunity to create linguistic resources, evaluate their usefulness, and promotes the consolidation of a community of researchers interested in the addressed topics. Thus, we encourage the semantic processing community to propose and submit an evaluation tasks (see Proposal of Tasks).

Task: Sentiment Analysis at Tweet level

The tasks we propose are the natural evolution from TASS 2018 Task 1. The first aim of this task is the furtherance of research on sentiment analysis in Spanish with a special interest on the language used in Twitter. The target community for this task is any research group working in this area. Traditionally, we have had about ten systems each year. The task tries to attract Hispanic American groups and offering a common meeting point in the research of this type of task.

This task focuses on the evaluation of polarity classification systems of tweets written in Spanish. The submitted systems will have to face up with the following challenges:

  • Lack of context: Remember, tweets are short (up to 240 characters).
  • Informal language: Misspellings, emojis, onomatopeias are common.
  • (Local) multilinguality: The training, tests and development corpus contains tweets written in the Spanish language spoken in Spain, Peru and Costa Rica.
  • Generalization: The systems will be assessed with several corpora, one is the test set of the training data, so it follows a similar distribution; the second corpus is the test set of the General Corpus of TASS (see previous editions), which was compiled some years ago, so it may be lexical and semantic different from the training data. Furthermore, the system will be evaluated with test sets of tweets written in the Spanish language spoken in different American countries.

The participants will be provided with a training, a development and several test corpora (see important dates). All the corpora are annotated with 4 different levels of opinion intensity (P, N, NEU, NONE).

In case the participants submit a supervised or semi-supervised system, it must be only trained with provided training data and it is totally forbidden the use of other training set. However, linguistic resources like lexicons, vectors of word embeddings or knowledge bases can be used. We want a fair competition and furtherance the creativity, so we want to assess the originality of the systems given the same set of training data.

Subtasks

  • Subtask-1: Monolingual Sentiment Analysis: Training and test using each InterTASS dataset (ES-Spain, PE-Peru, CR-Costa Rica and UR-Uruguay).
  • Subtask-2: Cross-lingual. Training a selection of any dataset and use a different one to test, in order to test the dependency of systems on a language.

On each submitted data set, macro precission, macro recall, macro F1-score and accuracy are computed. Submissions are ranked according to macro F1-score.

Terms and conditions

Registering to the task and submitting your results only implies registration in TASS task. To attend it, the registration must be made through the space provided for that purpose on the page of the XXXV International Congress of the Spanish Society for Natural Language Processing (http://www.sepln2019.com/)

TASS Dataset Research/Non Commercial License Agreement

This license is signed between the undersigned user/user group and MeaningCloud Europe (Spain), acting on behalf of the organizing committee of the TASS workshop for sentiment analysis and online reputation analysis focused on Spanish, which also includes SINAI group at University of Jaén (Spain).

1. Description of Data

The TASS Dataset is a corpus of texts (mainly tweets) in Spanish tagged for Sentiment Analysis related tasks. It is divided into several subsets created for the various tasks proposed in the different editions through the years. All the information on these datasets can be found in the TASS website at http://www.sepln.org/workshops/tass

The Dataset is disclosed on a voluntary basis by the concerned persons. The Dataset contains the actual data as well as any derivative work, products or services based on all or part of the data. All data contained within the Dataset have been collected and processed in accordance with the laws applicable in Spain.

2. Copyright

The Dataset is the sole property of the Licensor and is protected by copyright. The Licensor reserves all rights to use in any way and distribute the Dataset. The Dataset shall remain the exclusive property of the Licensor. The End User acquires no ownership, rights or title of any kind with regard to the Dataset.

3. License

Subject to prior identification of the End User and signature of this License, the Dataset is freely available to the benefit of the End User. The Licensor grants to the End User the right to use the Dataset, for its own internal and non-commercial use and for the purpose of scientific research only.

The End User shall respond for any infringement of the present License by one of their subsidiaries and/or student’s. The End User may only disclose, give access, and/or transfer the rights related to the Dataset to subsidiaries and/or students, under the following conditions:

  1. a copy of the present License has already been transferred to them;
  2. the subsidiaries/students have fully read and understood all terms and the present License;
  3. the access to the Dataset is granted under the close supervision of the End User;
  4. the access to the Dataset is granted under the sole responsibility of the End User.

The Licensor grants to the End User the right to reproduce temporarily, to adapt, arrange and modify by any means the Dataset. The Licensor grants to the End User the right to rework and build upon the Dataset, or any component thereof, as necessary or desirable for research or technology development activity and create derivative products or services for the End User's own internal research and development. The End User is permitted to make a copy of the Dataset for archiving only. This License gives no right of any kind to the End User over the Dataset. The License is deemed non-exclusive and nontransferable to third parties.

4. Access and Distribution

The End User may only use the Dataset after this License has been signed by filling in the form below this license. The Dataset will be accessible using the provided URL.

The End User shall not, without prior authorization of Licensor, transfer in any way, permanently or temporarily, distribute or broadcast all or part of the Dataset to third parties.

5. Research

Research includes all type of scientific research, irrespective of the object under scrutiny, aimed at achieving a progress in science. The Dataset can be used in any kind of research.

6. Commercial use

Any commercial use of the Dataset is strictly prohibited.

Commercial use of the Dataset includes, but is not limited to:

  1. Proving the efficiency of commercial systems;
  2. Testing commercial systems;
  3. Using screenshots of subjects from the Dataset in advertisements,
  4. Selling data or making any commercial use of the Dataset,
  5. Broadcasting data from the Dataset.

Any violation of this clause will give rise to immediate legal prosecution by the Licensor.

Any damages and/or unfair enrichment or the like of the End User due to the breach of the License shall be immediately restituted to the Licensor together with the derivative works, products and services based on the Dataset.

7. Publications

The End User shall reference the Dataset, or results obtained with it, in publications. Publications include, but are not limited to Research papers, Articles, Presentations for conferences or educational purposes. Small portions of the Dataset cannot be used in any publication.

All publications that report on research that use the Dataset will acknowledge this by referring one of the following publications:

  • Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., González-Cristobal, J.C. (2013). TASS - Workshop on Sentiment Analysis at SEPLN. Procesamiento del Lenguaje Natural, 50. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/4657
  • Villena-Román, J., García-Morera, J., Lana-Serrano, S., González-Cristóbal, J.C. (2014). TASS 2013 - A Second Step in Reputation Analysis in Spanish. Procesamiento del Lenguaje Natural,52. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/4901
  • Villena-Román, J., Martínez-Cámara, E., García-Morera, J. Jiménez-Zafra, S. (2015). TASS 2014 - The Challenge of Aspect-based Sentiment Analysis. Procesamiento del Lenguaje Natural, 54. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5095
  • Martínez-Cámara, E., García-Cumbreras, M.A., Villena-Román, J., García-Morera, J. (2016). TASS 2015 - The Evolution of the Spanish Opinion Mining Systems. Procesamiento del Lenguaje Natural, 56. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5284

8. Illegal or criminal use of the Dataset

Any illegal or criminal use of the Dataset by the End User is strictly prohibited.

9. Legal Disclaimer

The Dataset is granted without any warranty. Licensor shall not be held responsible for any damage caused by the use of the Dataset. Licensor shall not be held responsible of any illegal or criminal use of the Dataset by the End User.

10. Jurisdiction

This License is subject to and interpreted in accordance with Spanish Law. Any claim arising on the basis of this License shall exclusively be submitted to the Courts of Madrid, Spain.

11. Amendments

The Licensor is allowed to amend this License at any time without prior announcement or consent to/of the End User. The End User can opt out of this License by contacting the Licensor at any time.

12. Warranties

The End User warrants that they are authorized signatory, adult and not legally forbidden to enter into this License. The End User warrants that they have read and understood all elements contained herein and that the signature apposed hereunder is the result of a fully aware decision.

By signing this License, the End User engages to strictly respect the conditions set forth herein and to respect all the laws applicable in Spain in relation to data and personality protection with regard to the data contained within the Dataset collected and processed by the Licensor:

DateEvent
March 7, 2019: Registration open
March 19, 2019: Training and development sets released
May 1, 2019: Test set released
May 12, 2019: Submission end
May 14, 2019: Results posted
May 25, 2019: Deadline for paper submission
June 15, 2019: Notification of acceptance
July 3, 2019: Deadline for camera ready paper submission
September 24, 2019: IberLEF, SEPLN 2019, Bilbao (Spain)

Paper submission

Format details will be communicated shortly, according to the specifications of IberLEF organizers.

More info at IberLEF webpage.

Organizing committee

Manuel Carlos Díaz Galiano (Universidad de Jaén, Spain)

Miguel Ángel García Cumbreras (Universidad de Jaén, Spain)

Manuel García Vega (Universidad de Jaén, Spain)

Arturo Montejo Ráez (Universidad de Jaén, Spain)

Edgar Casasola Murillo (University of Costa Rica, Costa Rica)

Marco Antonio Sobrevilla Cabezudo (University of São Paulo, Brazil)

Luis Chiruzzo (University of la República, Uruguay)

Eugenio Martínez Cámara (University of Granada, Spain)

Program committee

  • Julio Villena-Román (MeaningCloud, Spain)
  • Yoan Gutiérrez Vázquez (University of Alicante, Spain)
  • Lluís F. Hurtado (Polytechnic University of Valencia, Spain)
  • Ferrán Pla (Universidad Politécnica de Valencia, Spain)
  • Salud María Jiménez Zafra (Universidad de Jaén, Spain)
  • Mª. Teresa Martín Valdivia (Universidad de Jaén, Spain)
  • L. Alfonso Ureña López (Universidad de Jaén, Spain)
  • Manuel Montes Gómez (National Institute of Astrophysics, Optics and Electronics, Mexico)
  • Antonio Moreno Ortíz (University of Málaga, Spain)
  • José Manuel Perea Ortega (University of Extremadura, Spain)
  • Sara Rosenthal (IBM Research, U.S.A.)
  • Maite Taboada (Simon Fraser University, Canada)
  • Fermín Cruz Mata (University of Sevilla, Spain)

Starting kit

The starting kit at Participate > Files contains the script evaluate.py which you can use to test your submission files. It expects two input files: a submission candidate file and a gold standard. It dumps scoring metrics to standard output.

Submission files

For each subtask, a unique submission file has to be uploaded (up to 3 submissions). This file is a ZIP file containing the predictions for each language with the following naming convention (ISO country codes):

  /-
   |- es.tsv
   |- pe.tsv
   |- cr.tsv
   \- uy.tsv

Your submission may contain predictions for all targeted languages or only some of them (even just one). Therefore, your submission will be scored only against provided languages. For both subtasks, submitted ZIP files follow the same structure.

Each TSV is a plain text file with the following format (Tab Separated Values):

tweet_id \t polarity

Discussion forum

There is a Google Group list where you can post your questions: Tass2019 Google Group

Development

Start: March 19, 2019, midnight

Task 1: InterTASS monolingual

Start: May 1, 2019, midnight

Task 2: InterTASS crosslingual

Start: May 1, 2019, midnight

Competition Ends

May 12, 2019, midnight

You must be logged in to participate in competitions.

Sign In