This is a CLEF 2019 Lab. Please register to CLEF 2019 for the ProtestNews task and contact Ali Hürriyetoğlu (ahurriyetoglu@ku.edu.tr) to participate in the task.
[1] Hürriyetoğlu, A., Yörük, E., Yüret, D., Yoltar, Ç., Gürel, B., Duruşan, F., & Mutlu, O. (2019, April). A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries. In European Conference on Information Retrieval (pp. 316-323). Springer, Cham. URL: https://link.springer.com/chapter/10.1007/978-3-030-15719-7_42
[2] Hürriyetoğlu, A., Y ör ük, E., Y üret, D., Yoltar, C ., G urel, B., Durusan, F., Mutlu, O., Akdemir A. (2019, July). Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context Setting. In: Cappellato, L., Ferro N., Losada, D. E., and Müller, H. (eds.) CLEF 2019 Working Notes URL: http://ceur-ws.org/Vol-2380/paper_249.pdf
The task ProtestNews aims at extracting event information from news articles across multiple countries. We particularly focus on events that are in the scope of contentious politics and characterized by riots and social movements, i.e. the “repertoire of contention” (Giugni 1998, Tarrow 1994, Tilly 1984). Our aim is to develop text classification and information extraction tools on one country and test them on data from different countries. The text data is in English and collected from India, China, and South Africa.
We believe our task will set a baseline in evaluating generalizability of the NLP tools. Another challenge of the task is the handling of the nuanced protest definition used in social science studies, difference in protest types and their expression across countries, and the target information to be extracted. The clues that are needed to discriminate between the relevant and irrelevant information in this context may be either implied without any explicit expression or hinted with a single word in the whole article. For instance, a news article about a protest threat or an open letter written by a single person does not qualify as relevant. A protest should have happened and an open letter should be supported by more than one person to be in-scope.
Please regularly check the website of the lab for the updates. The Forums tab can be used to discuss any issues.
If you have not done so, please complete the individual application form (forma link) for each member of your team. The forms should be sent to Ali Hürriyetoglu (ahurriyetoglu@ku.edu.tr) to access the data and be accepted into submission system.
The participants are strongly recommended to read each page and refer to the starting kit in the Participate Tab. README.ipynb provided inside the starting kit includes example code for reading data and making a valid submission file in .zip format.
Ali Hürriyetoglu: ahurriyetoglu@ku.edu.tr
Deniz Yüret: dyuret@ku.edu.tr
Erdem Yörük: eryoruk@ku.edu.tr
Çağrı Yoltar: cyoltar@ku.edu.tr
Burak Gürel: bgurel@ku.edu.tr
Fırat Duruşan: fdurusan@ku.edu.tr
Osman Mutlu: omutlu@ku.edu.tr
Arda Akdemir: aakdemir@ku.edu.tr
Theresa Gessler: Theresa.Gessler@EUI.eu
Peter Makarov: makarov@cl.uzh.ch
LEADERBOARD LINKS :
The lab aims at evaluating generalizability of text classification and information extraction tools. Therefore, we designed the evaluation as follows. The training data is obtained from a single country, which is the source country. The evaluation is consists of two steps. The first step of evaluation, which we call Test 1 or intermediate evaluation, is performed on data from the source country. The second step of evaluation, which we call Test 2 or Final evaluation, is performed on data from a target country, which is China in our setting. The performance metrics for both Test 1 and Test 2 are described below.
In addition to the News articles obtained from India, we will make use of News Articles from China to make the final evaluation of the participants for all three tasks.
Our main aim in choosing this approach is to favor models which can generalize better and adapt to new domains better.
The participants of the competition are assumed to have read and agreed the terms and conditions listed below.
ProtestNews 2019 Organizing Committee
We provide tbe data for each task in subfolder of their own. You should have gotten the data from one of the organizers manually.
For task 1 and 2 we give "test.json" and "china_test.json" as test files. Each of them have a "prediction" field in them with null values. For your submissions you must fill the "prediction" field with your model's predictions (please provide integers) and submit these two files in a zip file.
Rules for submission:
We will provide a token-per-line format data. The participants must make their predictions with a tab between the token and the prediction at each line.
For task 1 and 2 we give "test.txt" and "china_test.txt" as test files. Each of them only have tokens in them unlike train and dev files. For your submissions you must add your model's predictions for each token (seperating the predictions and tokens with tab like in train and dev data) and submit these two files in a zip file.
Rules for submission:
Task 1 : The task is to classify news documents as protest (1) or non-protest (0), given the raw document.
Task 2 : The task is to classify sentences as a sentence containing an event-trigger (1) or not (0), given the sentence and the news article containing that sentnece.
Task 3 : The task is to extract various information from a given event sentence such as location, time and participant of an event.
Both Task 1 and Task 2 are binary classification tasks.
The submission will be evaluated using the F1 score for Task 1 and Task 2.
We will give intermediate results at the end of the first phase several times in order to ensure that competitors can get feedback about their models.
For intermediate evaluation of Task 1 and Task 2 we will use news articles from India. For final evaluation we use a mixture of news articles from china and india.
Final evaluation will be made on the test set that will be released for the final phase.
For Task 3, F1 metric will be used. BIO tagging scheme is used to annotate the corpus for various information types.
We will provide intermediate results under the Task 3 phase for Task 3 submissions.
The final results will be given on the test set which will be provided later.
Start: Aug. 9, 2021, 1 p.m.
Description: Task 1: Document classification
Start: Aug. 9, 2021, 1 p.m.
Description: Task 2: Sentence classification
Start: Aug. 9, 2021, 1 p.m.
Description: Task 3: Token classification
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | OsmanMutlu | 46.9361 |