FACT@IberLEF2020

Organized by pln_udelar - Current server time: July 9, 2020, 8:54 a.m. UTC

First phase

Event Identification and Factuality Detection
March 18, 2020, midnight UTC

End

Competition Ends
June 17, 2020, 11:59 p.m. UTC

Welcome to FACT: Factuality Analysis and Classification Task

Welcome to FACT: Factuality Analysis and Classification Task, a task to classify events in Spanish texts, according to their factuality status. The main page of the task is here. This task is part of IberLEF 2020.

The FACT shared task is organized by Grupo PLN-InCo (UdelaR - Uruguay) and GRIAL (UB-UAB-UDL, España).

News

June, 17th: 

Final results published.

June, 3rd:

A description of the baseline for each task was published at "Learn the Details > Evaluation".

The guidelines for the working notes are described at the end of this page.

May, 25th:

Results submission deadline has been extended to June 17.

May, 20th

Test data for tasks 1 and 2 is ready to download. You can get it by downloading the Public Data in the "Participate > Files" section.
You can now upload your results for the tasks in the "Participate > Submit / View Results" section.
Please use the data format specified in "Learn the Details > Evaluation".
 
Due to issues with Codalab we are not able to show the results for task 2 in the leaderboard, so during this phase the leaderboard will only show the results for task 1. We will publish the leaderboard table for task 2 when the competition ends. In the meantime, you will be able to see your own results for task 2 using the "View scoring output log" for your submission.

March, 19th:

The training dataset is available, see the Participate section.

Task Description

Factuality is understood, following Sauri (2008), as the category that determines the factual status of events, that is, whether events are presented or not as certain. In 2019, the first edition of the FACT task focused on determining the factuality of verbal events. The goal of the second edition is to identify noun events and determine the factuality of all events (verbs and nouns).

In this task facts are not verified in regard to the real world, just assessed with respect to how they are presented by the source (in this case the writer), that is, the commitment of the source to the truth-value of the event. In this sense, the task could be conceived as a core procedure for other tasks such as fact-checking and fake-news, making it possible, in future tasks, to compare what is narrated in the text (fact tagging) to what is happening in the world (fact-checking and fake-news).

Sub task 1: Factuality Determination

We establish three possible categories for factuality:

  • Facts (F): Current and past situations in the world that are presented as real.
  • Counterfacts (CF): Current and past situations that the writer presents as not having happened.
  • Undefined (U): Possibilities, future situations, predictions, hypothesis and other options. Situations presented as uncertain since the writer does not commit openly to the truth-value either because they have not happened yet or because the author does not know.

The systems will have to automatically propose a factual tag for each event (vervs and nouns). The events are already annotated in the texts. The structure of the tags used in the annotation is the following:

<event factuality=”F”>verb</event>

For example, for the following paragraph, where events are already anotated:

De acuerdo con el Instituto Nacional de Sismología, Vulcanología, Meteorología e Hidrología (Insivumeh), el volcán de Fuego <event>ha</event> <event> vuelto</event> a la normalidad, aunque <event>mantiene</event> <event>explosiones</event> moderadas, por lo que no <event>descarta</event> una nueva <event>erupción</event> .

The systems outcome should be:

De acuerdo con el Instituto Nacional de Sismología, Vulcanología, Meteorología e Hidrología (Insivumeh), el volcán de Fuego <event factuality = “F”>ha</event> <event factuality = “F”>vuelto</event> a la normalidad, aunque <event factuality = “F”>mantiene</event> <event factuality = “F”>explosiones</event> moderadas, por lo que no <event factuality = “CF”>descarta</event”> una nueva <event factuality = “U”>erupción</event> .

The expected target audience is NLP researchers interested in providing understanding and advances in event detection and modeling, temporal text analysis, and Information Extraction in general.

The performance of this task will be measured against the evaluation corpus using these metrics:

  • Precision, Recall and F1 score for each category.
  • Macro-F1.
  • Global accuracy.

The main score for evaluating the submissions will be Macro-F1.

Sub task 2: Event Identification

The recognition of noun events presents different challenges (Saurí et al., 2005; Wonserver et al., 2012), on the one hand, identifying the nouns that transmit eventive information, such as war or construction, and, on the other hand, disambiguating those nouns that are eventive in certain contexts (conversaremos durante la cena) and not eventive in others (la cena está servida).

The participants will receive text with no annotations:

De acuerdo con el Instituto Nacional de Sismología, Vulcanología, Meteorología e Hidrología (Insivumeh), el volcán de Fuego ha vuelto a la normalidad, aunque mantiene explosiones moderadas, por lo que no descarta una nueva erupción.

and have to identify verbal and noun events:

De acuerdo con el Instituto Nacional de Sismología, Vulcanología, Meteorología e Hidrología (Insivumeh), el volcán de Fuego <event>ha</event> <event>vuelto</event> a la normalidad, aunque <event>mantiene</event> <event>explosiones</event> moderadas, por lo que no <event>descarta</event> una nueva <event>erupción</event> .

The performance of this task will be measured against the evaluation corpus using these metrics:

  • Precision, Recall and F1 score.

Corpus

The corpus contains Spanish texts with approximately 6,300 events classified as F (Fact), CF (Counterfact), U (Undefined). The texts belong to the journalistic register and most of them are from the political sections from Spanish and Uruguayan newspaper.

Working Notes

All the papers will be part of the official IberLEF Proceedings that will be published at CEUR-WS.org. The proceedings of the workshop will be named: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020)

Please send your working notes to factiberlef@fing.edu.uy by July 15, 2020.

Instructions for working notes:

  • IberLEF will use the Springer style (https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines).
  • The paper should be between 5 and 8 pages long.
  • Papers should have enough information for reproducing the mentioned results.
  • Each paper must include a copyright footnote on the first page of each paper:{\let\thefootnote\relax\footnotetext{Copyright \textcopyright\ 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IberLEF 2020, September 2020, Málaga, Spain. }} 
  • Eliminate the numbering in the pages of the paper, if there is one, and that there are no headers or footnotes, except the copyright as a footnote on the first page that is necessary.
  • The names of the authors must be with their full affiliation (university and country) and must be complete, it is not correct for example "S. Perez" must be "Soto Perez".
  • Titles of papers should be in emphatic capital English notation, i.e. "Filling an Author Agreement by Autocompletion" rather than "Filling an author agreement by autocompletion".
  • We will send you the copyright agreement that at least one author of each Working Notes must sign. Please, note that CEUR only can accept author agreement that were originally physically signed, so a digital signature is not accepted.
  • Please cite the FACT overview paper using this citation:
@article{rosa2020overview,
  title={Overview of FACT at IberLEF 2020: Events Detection and Classification},
  author={Ros{\'a}, Aiala and Alonso, Laura and Castell{\'o}n, Irene and Chiruzzo, Luis and Curell, Hortensia and Fern{\'a}ndez, Ana and G{\'o}ngora, Santiago and Malcuori, Marisa and V{\'a}zquez, Gloria and Wonsever, Dina},
  booktitle={Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020)},
  year={2020}
}

Important Dates

  • March 11th, 2020: team registration.
  • March 18th, 2020: release of training data.
  • May 20th, 2020: release of test data.
  • May 27th June 17th, 2020: publication of results.
  • June 5th July 15th, 2020 working notes paper submission.
  • June 12th August 5th, 2020: notification of acceptance.
  • June 19th August 26th, 2020: camera ready paper submission.
  • September, 2020: IberLEF 2020 Workshop.

Results

The following are the results for Subtask 1:

ParticipantMacro-F1Macro-PrecisionMacro-RecallAccuracy
t.romani 60.7 61.2 60.4 84.8
guster 59.3 62.1 57.4 83.1
accg14 55.0 55.6 54.5 79.8
trinidadg 53.6 55.8 52.0 80.6
premjithb 39.3 45.5 37.6 71.6
garain 36.6 35.7 39.4 59.9
FACT_baseline 24.6 25.4 25.1 52.4

The following are the results for Subtask 2:

ParticipantF1PrecisionRecall
trinidadg 86.5 95.1 79.3
FACT_baseline 59.7 60.3 59.1

Contact

Please join the Google Group factiberlef2020. We will be sharing news and important information about the task in that group.

Task 1

For the factuality classification task, the test corpus will include unique identifiers for each event, like this:

De acuerdo con el Instituto Nacional de Sismología, Vulcanología, Meteorología e Hidrología (Insivumeh), el volcán de Fuego <event id="1">ha</event> <event id="2">vuelto</event> a la normalidad, aunque <event id="3">mantiene</event> <event id="4">explosiones</event> moderadas, por lo que no <event id="5">descarta</event> una nueva <event id="6">erupción</event>.

The results must be uploaded in a CSV file named task1.csv with two columns: id and factuality. For example:

id,factuality
1,F
2,F
3,F
4,F
5,CF
6,U

Task 2

For the event detection task, the test corpus will include unique identifiers for each token, like this:

De/1 acuerdo/2 con/3 el/4 Instituto/5 Nacional/6 de/7 Sismología/8 ,/9 Vulcanología/10 ,/11 Meteorología/12 e/13 Hidrología/14 (/15 Insivumeh/16 )/17 ,/18 el/19 volcán/20 de/21 Fuego/22 ha/23 vuelto/24 a/25 la/26 normalidad/27 ,/28 aunque/29 mantiene/30 explosiones/31 moderadas/32 ,/33 por/34 lo/35 que/36 no/37 descarta/38 una/39 nueva/40 erupción/41 ./42

The results must be uploaded in a CSV file named task2.csv with one column (id). For example:

id
23
24
30
31
38
41

Submissions

Participants must upload their submissions as a zip file containing one CSV file for each of the tasks. There must be at least one CSV file in the submission, with the format described above.

Baselines

The baselines for each task are as follows:

Task 1:
The baseline assigns random factuality values with the following probabilities: F-70%, U-20%, CF-10%.
Macro-F1: 0.246
Accuracy: 0.524
 
Task 2:
The baseline assigns the class 'event' to the words tagged as 'event' at least once in the training corpus.
F1: 0.597


 

Terms and Conditions

The data used in this competition was created by Grupo PLN-InCo (Uruguay) and GRIAL (España).

The entire corpus will be published at the end of the competition for research and teaching purposes.

If you use the corpus please cite the overview of the FACT shared task 2019 and 2020.

Event Identification and Factuality Detection

Start: March 18, 2020, midnight

Competition Ends

June 17, 2020, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In