(Additional Scoring) CLEF 2019 Lab ProtestNews

Organized by OsmanMutlu - Current server time: April 26, 2025, 6:12 p.m. UTC

Previous

Task 2
Aug. 9, 2021, 1 p.m. UTC

Current

Task 3
Aug. 9, 2021, 1 p.m. UTC

End

Competition Ends
Never

CLEF ProtestNews-2019 Additional Scoring (Refer to https://competitions.codalab.org/competitions/20318 for the original results.)

NOTE: PLEASE DO NOT BE MISLED BY THE ABOVE LABELS FOR THE  PHASES. ALL PHASES ARE OPEN SIMULTANEOUSLY.

This is a CLEF 2019 Lab. Please register to CLEF 2019 for the ProtestNews task and contact Ali Hürriyetoğlu (ahurriyetoglu@ku.edu.tr) to participate in the task.

 

The task setting is presented in [1] and the overview of the results in [2]. Please access the complete proceedings on http://ceur-ws.org/Vol-2380/
 

[1] Hürriyetoğlu, A., Yörük, E., Yüret, D., Yoltar, Ç., Gürel, B., Duruşan, F., & Mutlu, O. (2019, April). A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries. In European Conference on Information Retrieval (pp. 316-323). Springer, Cham. URL: https://link.springer.com/chapter/10.1007/978-3-030-15719-7_42

[2] Hürriyetoğlu, A., Y ör ük, E., Y üret, D., Yoltar, C ., G urel, B., Durusan, F., Mutlu, O., Akdemir A. (2019, July). Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context Setting. In: Cappellato, L., Ferro N., Losada, D. E., and Müller, H. (eds.) CLEF 2019 Working Notes URL: http://ceur-ws.org/Vol-2380/paper_249.pdf

 

 

 

Extracting Protests from News Using Automated Methods

The task ProtestNews aims at extracting event information from news articles across multiple countries. We particularly focus on events that are in the scope of contentious politics and characterized by riots and social movements, i.e. the “repertoire of contention” (Giugni 1998, Tarrow 1994, Tilly 1984). Our aim is to develop text classification and information extraction tools on one country and test them on data from different countries. The text data is in English and collected from India, China, and South Africa.

We believe our task will set a baseline in evaluating generalizability of the NLP tools. Another challenge of the task is the handling of the nuanced protest definition used in social science studies, difference in protest types and their expression across countries, and the target information to be extracted. The clues that are needed to discriminate between the relevant and irrelevant information in this context may be either implied without any explicit expression or hinted with a single word in the whole article. For instance, a news article about a protest threat or an open letter written by a single person does not qualify as relevant. A protest should have happened and an open letter should be supported by more than one person to be in-scope.

 

Organization

Please regularly check the website of the lab for the updates. The Forums tab can be used to discuss any issues.

If you have not done so, please complete the individual application form (forma link) for each member of your team. The forms should be sent to Ali Hürriyetoglu (ahurriyetoglu@ku.edu.tr) to access the data and be accepted into submission system.

The participants are strongly recommended to read each page and refer to the starting kit in the Participate Tab. README.ipynb provided inside the starting kit includes example code for reading data and making a valid submission file in .zip format.

Organizing Committee

Ali Hürriyetoglu: ahurriyetoglu@ku.edu.tr
Deniz Yüret: dyuret@ku.edu.tr
Erdem Yörük: eryoruk@ku.edu.tr
Çağrı Yoltar: cyoltar@ku.edu.tr
Burak Gürel: bgurel@ku.edu.tr
Fırat Duruşan: fdurusan@ku.edu.tr
Osman Mutlu: omutlu@ku.edu.tr
Arda Akdemir: aakdemir@ku.edu.tr
Theresa Gessler: Theresa.Gessler@EUI.eu
Peter Makarov: makarov@cl.uzh.ch

 


CLEF ProtestNews 2019: Evaluation

 LEADERBOARD LINKS :

Screenshot1 

 

Screenshot2 

The lab aims at evaluating generalizability of text classification and information extraction tools. Therefore, we designed the evaluation as follows. The training data is obtained from a single country, which is the source country. The evaluation is consists of two steps. The first step of evaluation, which we call Test 1 or intermediate evaluation, is performed on data from the source country. The second step of evaluation, which we call Test 2 or Final evaluation, is performed on data from a target country, which is China in our setting. The performance metrics for both Test 1 and Test 2 are described below.

 

FINAL EVALUATION PHASE

In addition to the News articles obtained from India, we will make use of News Articles from China to make the final evaluation of the participants for all three tasks.

Our main aim in choosing this approach is to favor models which can generalize better and adapt to new domains better.

CLEF ProtestNews 2019: Terms and Conditions

The participants of the competition are assumed to have read and agreed the terms and conditions listed below.

  • The data shared for the competition must only be used for research related purposes.
  • Parts of the data provided can be shared by participants for illustrative purposes only. Sharing the datasets in a way that makes it possible to reconstruct the whole dataset is not allowed.
  • Only individuals/teams that have approved registration to the competition has the rights to make use of the data. It is the responsibility of each participant to prevent the third parties to access the data.
  • Copyright holders of the datasets shared for the competition retain all rights regarding the use and distribution of all the material.

ProtestNews 2019 Organizing Committee

CLEF ProtestNews 2019: Submission

We provide tbe data for each task in subfolder of their own. You should have gotten the data from one of the organizers manually.

 Task 1 and 2

For task 1 and 2 we give "test.json" and "china_test.json" as test files. Each of them have a "prediction" field in them with null values. For your submissions you must fill the "prediction" field with your model's predictions (please provide integers) and submit these two files in a zip file.

Rules for submission:

  • Submitted files in zip must have the same names as the provided test files. ("test.json", "china_test.json")
  • The zip file must not contain a folder, only these two files.
  • If you do not provide any of the test files in the zip file, its corresponding score will be 0 and will also affect the average score.

 

Task 3

We will provide a token-per-line format data. The participants must make their predictions with a tab between the token and the prediction at each line.

For task 1 and 2 we give "test.txt" and "china_test.txt" as test files. Each of them only have tokens in them unlike train and dev files. For your submissions you must add your model's predictions for each token (seperating the predictions and tokens with tab like in train and dev data) and submit these two files in a zip file.

Rules for submission:

  • Submitted files in zip must have the same names as the provided test files. ("test.txt", "china_test.txt")
  • The zip file must not contain a folder, only these two files.
  • If you do not provide any of the test files in the zip file, its corresponding score will be 0 and will also affect the average score.
  • All lines must exactly match with the provided test file. Otherwise the scoring program will not be able to calculate the score.
  • Be careful with extra empty lines and assure that each token at each line and empty lines completely overlap.
  • You can check out here for a sample submission.

Tasks

Task 1 : The task is to classify news documents as protest (1) or non-protest (0), given the raw document. 

Task 2 : The task is to classify sentences as a sentence containing an event-trigger (1) or not (0), given the sentence and the news article containing that sentnece.

Task 3 : The task is to extract various information from a given event sentence such as location, time and participant of an event. 

Task 1 and Task 2

Both Task 1 and Task 2 are binary classification tasks.

The submission will be evaluated using the Fscore for Task 1 and Task 2. 

We will give intermediate results at the end of the first phase several times in order to ensure that competitors can get feedback about their models.

For intermediate evaluation of Task 1 and Task 2 we will use news articles from India. For final evaluation we use a mixture of news articles from china and india. 

Final evaluation will be made on the test set that will be released for the final phase.

Task 3

For Task 3, F1 metric will be used. BIO tagging scheme is used to annotate the corpus for various information types.

We will provide intermediate results under the Task 3 phase for Task 3 submissions.

The final results will be given on the test set which will be provided later.

 

 

No files have been added for this competition yet.

Task 1

Start: Aug. 9, 2021, 1 p.m.

Description: Task 1: Document classification

Task 2

Start: Aug. 9, 2021, 1 p.m.

Description: Task 2: Sentence classification

Task 3

Start: Aug. 9, 2021, 1 p.m.

Description: Task 3: Token classification

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 OsmanMutlu 46.9361