CLEF 2019 Lab ProtestNews

Organized by ardaakdemir - Current server time: Jan. 21, 2021, 3:24 p.m. UTC


Final Results Task 3
April 1, 2019, 11:18 a.m. UTC


End of Competition
May 20, 2019, 11:18 a.m. UTC


Competition Ends

CLEF ProtestNews-2019 (Completed! Refer to for the next iteration.)


This is a CLEF 2019 Lab. Please register to CLEF 2019 for the ProtestNews task and contact Ali Hürriyetoğlu ( to participate in the task.


Results of the competition were lost due to a major crash of CodaLab platform. The screenshots of the results can be found here for Task 1 &2 and here for Task 3.

The task setting is presented in [1] and the overview of the results in [2]. Please access the complete proceedings on

[1] Hürriyetoğlu, A., Yörük, E., Yüret, D., Yoltar, Ç., Gürel, B., Duruşan, F., & Mutlu, O. (2019, April). A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries. In European Conference on Information Retrieval (pp. 316-323). Springer, Cham. URL:

[2] Hürriyetoğlu, A., Y ör ük, E., Y üret, D., Yoltar, C ., G urel, B., Durusan, F., Mutlu, O., Akdemir A. (2019, July). Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context Setting. In: Cappellato, L., Ferro N., Losada, D. E., and Müller, H. (eds.) CLEF 2019 Working Notes URL:




Extracting Protests from News Using Automated Methods

The task ProtestNews aims at extracting event information from news articles across multiple countries. We particularly focus on events that are in the scope of contentious politics and characterized by riots and social movements, i.e. the “repertoire of contention” (Giugni 1998, Tarrow 1994, Tilly 1984). Our aim is to develop text classification and information extraction tools on one country and test them on data from different countries. The text data is in English and collected from India, China, and South Africa.

We believe our task will set a baseline in evaluating generalizability of the NLP tools. Another challenge of the task is the handling of the nuanced protest definition used in social science studies, difference in protest types and their expression across countries, and the target information to be extracted. The clues that are needed to discriminate between the relevant and irrelevant information in this context may be either implied without any explicit expression or hinted with a single word in the whole article. For instance, a news article about a protest threat or an open letter written by a single person does not qualify as relevant. A protest should have happened and an open letter should be supported by more than one person to be in-scope.



Please regularly check the website of the lab for the updates. The Forums tab can be used to discuss any issues.

If you have not done so, please complete the individual application form (forma link) for each member of your team. The forms should be sent to Ali Hürriyetoglu ( to access the data and be accepted into submission system.

The participants are strongly recommended to read each page and refer to the starting kit in the Participate Tab. README.ipynb provided inside the starting kit includes example code for reading data and making a valid submission file in .zip format.

Organizing Committee

Ali Hürriyetoglu:
Deniz Yüret:
Erdem Yörük:
Çağrı Yoltar:
Burak Gürel:
Fırat Duruşan:
Osman Mutlu:
Arda Akdemir:
Theresa Gessler:
Peter Makarov:


CLEF ProtestNews 2019: Evaluation





The lab aims at evaluating generalizability of text classification and information extraction tools. Therefore, we designed the evaluation as follows. The training data is obtained from a single country, which is the source country. The evaluation is consists of two steps. The first step of evaluation, which we call Test 1 or intermediate evaluation, is performed on data from the source country. The second step of evaluation, which we call Test 2 or Final evaluation, is performed on data from a target country, which is China in our setting. The performance metrics for both Test 1 and Test 2 are described below.


All tasks will be evaluated using News articles from India in this phase. The aim of this phase is to give some feedback to the participants and get some rivalry going on our leaderboard!

We will be releasing the results obtained on this phase on some specific dates. We will be using this test set as part of the test set as well. So we will give only limited amount of intermediate evaluation results to make sure the participants do not overfit to the dataset.



In addition to the News articles obtained from India, we will make use of News Articles from China to make the final evaluation of the participants for all three tasks.

Our main aim in choosing this approach is to favor models which can generalize better and adapt to new domains better. 

Below  we give the dates related to the data release and deadlines for each phase in the competition.

The competition will end on May 11.


Intermediate Evaluation Task1 | 2 , Task3:

Cycle 1:

Data Release, India: April 12

Submission deadline: April 26

Scores: April 26


Cycle 2:

Data Release, China: April 29

Submission deadline: May 3

Scores: May 4


Final Evaluation Task1 | 2 , Task 3:


Cycle 3 (final evaluation):

No new data.

Submission deadline: May 10

Scores: May 11


We will evaluate the performance of the participants by the average of all 3 F1 scores obtained.

CLEF ProtestNews 2019: Terms and Conditions

The participants of the competition are assumed to have read and agreed the terms and conditions listed below.

  • The data shared for the competition must only be used for research related purposes.
  • Parts of the data provided can be shared by participants for illustrative purposes only. Sharing the datasets in a way that makes it possible to reconstruct the whole dataset is not allowed.
  • Only individuals/teams that have approved registration to the competition has the rights to make use of the data. It is the responsibility of each participant to prevent the third parties to access the data.
  • Copyright holders of the datasets shared for the competition retain all rights regarding the use and distribution of all the material.

ProtestNews 2019 Organizing Committee

CLEF ProtestNews 2019: Submission


Download the datasets using Docker image and obtain the public data (in submission format) from the Files under Participate tab for each phase and task.

For each phase '.solution' files having the same name with the provided '.solution' files in the Public data must be zipped together and submitted as a single file (name of the zip file can be anything). Important thing to note is that the name of the predict files must match with the names of the data files provided for evaluation.


Task 1 | 2:

Submit results on test sets which will be provided.

Submission format : The submission files must have the same basename with the data provided. The files must have the ending .predict. For example, for file the predictions must be given in the file named x_dev.predict. The prediction files must be zipped. The format of the submitted files must be in the same format with the .data file provided in the Public Data. The data we provide is in a very straightforward format where each line contains the id of an instance followed by the prediction for that single instance. For Task 1, each line correspond to the binary prediction for the label of the news document.

Example : If we have three news articles in the file an example .predict file submission would look as follows:

id1 0

id2 1

id3 1

Each line corresponds to the prediction made for the news article in the corresponding line. .data files contain the ids to news articles. These news articles (in raw text format) are to be obtained using the docker image.

Important Note : The scoring algorithm will go over all the instances given in the .data file. Be sure to include all the predictions in your submission.

Important Note:Both predictions for Task 1 and Task 2 must be zipped together into a single zip file during submission. The files must be zipped with no intermediate folder otherwise the scoring program will not detect them. Do not put the files in a folder before zipping.

The system accepts separate submission as well if you are planning to participate to a single task.

If you plan to participate to both task submit both .predict files in a single zip file otherwise the previously submitted task will have the score 0.

Task 3

Submit results on all test sets for Task 3, which will be provided.

We will provide a token-per-line format data. The participants must make their predictions with a tab between the token and the prediction at each line.

Again the ending must be .predict and the file name must exactly match the .data file that will be shared.

Important Note:All lines must exactly match with the .data file provided. Otherwise the scoring program will not be able to calculate the score.

Be careful with extra empty lines and assure that each token at each line and empty lines completely overlap.



Task 1 : The task is to classify news documents as protest (1) or non-protest (0), given the raw document. 

Task 2 : The task is to classify sentences as a sentence containing an event-trigger (1) or not (0), given the sentence and the news article containing that sentnece.

Task 3 : The task is to extract various information from a given event sentence such as location, time and participant of an event. 

Task 1 and Task 2

Both Task 1 and Task 2 are binary classification tasks.

The submission will be evaluated using the Fscore for Task 1 and Task 2. 

We will give intermediate results at the end of the first phase several times in order to ensure that competitors can get feedback about their models.

For intermediate evaluation of Task 1 and Task 2 we will use news articles from India. For final evaluation we use a mixture of news articles from china and india. 

Final evaluation will be made on the test set that will be released for the final phase.

Task 3

For Task 3, F1 metric will be used. BIO tagging scheme is used to annotate the corpus for various information types.

We will provide intermediate results under the Task 3 phase for Task 3 submissions.

The final results will be given on the test set which will be provided later.



Download Size (mb) Phase
Starting Kit 1.258 #1 Intermediate Evaluation Task 1 | 2
Public Data 0.048 #1 Intermediate Evaluation Task 1 | 2
Public Data 0.003 #3 Final Results Task 3

Intermediate Evaluation Task 1 | 2

Start: April 1, 2019, 9:15 a.m.

Description: We will provide intermediate evaluation results for Task 1 and Task 2 to give everyone a feedback.

Final Results Task 1 | 2

Start: April 1, 2019, 11:18 a.m.

Description: We will provide intermediate evaluation results to give everyone a feedback.

Final Results Task 3

Start: April 1, 2019, 11:18 a.m.

Description: Submissions for both Intermediate and Final Evaluation of Task 3 will be done here.

End of Competition

Start: May 20, 2019, 11:18 a.m.

Description: End of Competition

Competition Ends


You must be logged in to participate in competitions.

Sign In