There will be three evaluation scenarios:
Only plain text is given (Subtasks A, B, C)
Plain text with manually annotated keyphrase boundaries are given (Subtasks B, C)
Plain text with manually annotated keyphrases and their types are given (Subtask C)
The output of systems is matched exactly against the gold standard. The traditionally used metrics of precision, recall and F1-score are computed and the micro-average of those metrics across publications of the three genres are calculated. These metrics are calculated for Subtasks A, B and C.
Participants may use additional external resources, as long as they declare this at submission time. However, participants may not manually annotate the test data.