SemEval 2018 Task 8: SecureNLP (Semantic Extraction from CybersecUrity REports using NLP)

Organized by PeterP - Current server time: May 23, 2019, 7:27 p.m. UTC


Jan. 8, 2018, midnight UTC


Jan. 30, 2018, midnight UTC


Competition Ends


This is the CodaLab Competition for the four Subtasks of SemEval-2018 Task 8: SecureNLP (Semantic Extraction from CybersecUrity REports using Natural Language Processing)

This Competition follows the tasks given in our ACL 2017 paper: MalwareTextDB: A Database for Annotated Malware Articles 

Task Summary

As the world is getting more connected and digitized, cyber-threats are also getting more dangerous. There exists a large repository of malware-related texts online, such as detailed malware reports by various cybersecurity agencies such as Symantec, Cylance, and in various blog posts. Cybersecurity researchers often use such texts in the process of data collection. However, the sheer volume and diversity of these texts make it difficult for researchers to quickly obtain useful information.

A potential application of NLP can be to quickly highlight critical information from these texts, such as the specific actions taken by a certain malware. This can help researchers quickly understand the capabilities of a specific malware and search in other texts for malware with similar capabilities.

We defined 4 subtasks for this competition:

  1. SubTask1: Classify relevant sentences to malware
  2. SubTask2: Predict token labels
  3. SubTask3: Predict relation labels
  4. SubTask4: Predict attribute labels

More detailed explanation regarding each subtask can be seen in our ACL paper. We exclude subtask 5 because the gold answer can be obtained using sandboxes such as Cuckoo Sandbox

The participants are free to use any external data for this competition.

Important Dates:
14 Aug 2017: Trial data release
18 Sep 2017: Training data release
8 Jan 2018:   Test data release
29 Jan 2018: Evaluation end


You can join our mailing list at We will post updates and announcements there from time to time.

Organizers: Peter Phandi, Wei Lu (Singapore University of Technology and Design)


Evaluation Criteria

We will use F1 score as our evaluation metric, following our paper in ACL 2017.

We will provide the input data for each subtask, and the participants will need to make prediction based on the given input data. Name your output files for SubTask 1, 2, 3, and 4 as Task1.out, Task2.out, Task3,out, and Task4.out respectively. The Codalab submission format requires the submission to be in a single zip file containing the output files. We provided a sample '' file containing the benchmark outputs as an example of what it looks like.

SubTask 1

We will provide a list of sentences, the participants need to predict whether the sentences are relevant for inferring the malware's actions and capabilities.

For each sentences, the participants will need to output 1 if the sentence is relevant or 0 if it is irrelevant.

SubTask 2

Using the same sentences as provided in Task 1, the participants need to predict the token labels in the sentences. The output needs to be in BIO format. There are 3 types of token labels: "Action", "Entity", and "Modifier". 

SubTask 3

We will provide a different set of sentences and their token labels. They will come in separate files according to their source document. The participants need to predict the relations between the token labels. This task will be treated as a binary classification task. For each entity pair, the participant need to output O if there is no relation between the entities, or <relation_type> if there is a relation between them. The relation types are: "SubjAction", "ActionObj", "ActionMod", and "ModObj". Token 0 will be reserved for the root. Any other tokens without a parent will be connected to this root entity with the 'ROOT' relation. 

Here's an example output for SubTask 3:

Example SubTask 3 output

SubTask 4

We will provide a different set of sentences, with annotations of their token labels and their relation labels. The sentences will be separated into several files based on the document it came from. The participants need to predict the attributes for each 'Action' token in 4 categories: 'ActionName', 'Capability', 'StrategicObjectives', and 'TacticalObjectives'. Output 'O' if the 'Action' token doesn't correspond to any attribute in that category. The detailed list of attributes can be found in the MalwareTextDB annotation guideline. 

The annotations given will follow the brat annotation format. Each line of annotation will be given an ID depending on which type is it. The annotation for token labels starts with 'T', the annotation for relation labels starts with 'R', and the annotation for attribute labels starts with 'A'. Please refer to the linked webpage for a more detailed explanation. We did not include the attribute labels for the input files, since it is what the participants need to predict.

Output format (separated by tab):

<doc_id> <action_label> <action_name> <capability> <strategic_objectives> <tactical_objectives>

Here's an example of what the output looks like:

Sampe output for SubTask 4

For Subtask 3 and 4, we will provide a file called 'doclist.txt' which contains the file names. The doc_id will be the row number of that document in the 'doclist.txt' starting from index 0.

For example, given a doclist.txt with this content:





Operation_Snowman will have the doc_id 0 and The_Monju_Incident will have the doc_id 1.

For all subtasks, the output must follow the order of appearance of the sentence in the document and the order of the document in the doclist.txt (if given). Additionally for subtask 4, the action label within each document must also be output in ascending order according to their number (T2, T8, T13, .....etc.).

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its licence.


Start: June 1, 2017, midnight


Start: Jan. 8, 2018, midnight


Start: Jan. 30, 2018, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 lukedorney 0.55
2 yingBear 0.53
3 williamvt 0.51