SemEval 2021 Task 11: NLPContributionGraph

Organized by jdsouza - Current server time: Aug. 11, 2020, 9:09 p.m. UTC

Current

Post-competition
Jan. 31, 2021, midnight UTC

End

Competition Ends
Never

SemEval 2021 Task 11: NLPContributionGraph

Structuring Scholarly NLP Contributions in the Open Research Knowledge Graph

NLPContributionGraph is introduced as Task 11 at SemEval 2021 for the first time. The task is defined on a dataset of NLP scholarly articles with their contributions structured to be integrable within Knowledge Graph infrastructures such as the Open Research Knowledge Graph. The structured contribution annotations are provided as: (1) Contribution sentences: a set of sentences about the contribution in the article; (2) Scientific terms and relations: a set of scientific terms and relational cue phrases extracted from the contribution sentences; and (3) Triples: semantic statements that pair scientific terms with a relation, modeled toward subject-predicate-object RDF statements for KG building. The Triples are organized under three (mandatory) or more of twelve total information units (viz., ResearchProblem, Approach, Model, Code, Dataset, ExperimentalSetup, Hyperparameters, Baselines, Results, Tasks, Experiments, and AblationAnalysis).

The Shared Task

As a complete submission for the Shared Task, systems will have to extract the following information:

  1. contribution sentences;
  2. scientific term and predicate phrases from the sentences; and
  3. (subject,predicate,object) triple statements toward KG building organized under three or more of twelve total information units.

For example, given the article:

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Systems should identify:

  • Sentence:
    • We used the BERTBASE model pre-trained on English Wikipedia and BooksCorpus for 1M steps.
  • Scientific Term and Predicate Phrases:
    • used
    • BERTBASE model
    • pre-trained on
    • English Wikipedia
    • BooksCorpus
    • for
    • 1M steps
  • Triples:
    • (Contribution, has, ExperimentalSetup)
    • (ExperimentalSetup, used, BERTBASE model)
    • (BERTBASE model, pre-trained on, English Wikipedia)
    • (BERTBASE model, pre-trained on, BooksCorpus)
    • (BERTBASE model, for, 1M steps)

Note that the above example is only for one contribution-related sentence from the article. The participating systems should identify all the contribution-related sentences and accordingly perform the subsequent Phrases and Triples extraction tasks given the sentences, where the Triples extraction task entails categorizing a triples sequence under one of the twelve information units. In the example above, the set of triples pertain to the ExperimentalSetup information unit and, for the evaluation submission, will need to be saved in a file named after the information unit. More the details on the task submission format can be in the Evaluation page.

An NLPContributionGraph submission will be considered complete with predictions made for all 3 tasks (Sentence, Phrases, Triples). The evaluation metrics that will be applied are:

  • Sentences: precision, recall and F1
  • Scientific Term and Predicate Phrases: precision, recall and F1
  • Triples: precision, recall and F1 overall and for each information unit

The focus of NLPContributionGraph is on the structuring of contributions in NLP scholarly articles to form a knowledge graph. To allow a thorough evaluation of systems, NLPContributionGraph will have multiple evaluation phases:

  1. Practice phase:
    • Training and dev data:
      • contribution sentences from NLP scholarly articles across different IE tasks (e.g., machine translation, named entity recognition, etc.)
      • phrases from the annotated contribution sentences
      • triples from the entities under any of the twelve information units listed below
        • ResearchProblem, Approach, Model, Code, Dataset, ExperimentalSetup, Hyperparameters, Baselines, Results, Tasks, Experiments, and AblationAnalysis
    • Test data:
      • In the pratice phase, the test data is the dev set itself with a different set of annotated articles from those chosen as training data
  2. Evaluation Phase 1: End-to-end pipeline testing phase
    • Training and dev data:
      • Same as in the practice phase; no new data will be released
    • Test data:
      • Participants will be provided with unannotated articles that are not part of the training or dev set
      • The server will host the reference annotations for these articles for participant automated system output evaluations
  3. Evaluation Phase 2: Phrases and Triples extraction testing phase
    This phase will have two parts: In Part 1: Phrase Extraction Testing, the participant systems will be given the gold-annotated contribution sentences and will be expected to provide purely their scientific term and predicate phrase extraction output; In Part 2: Triples Extraction Testing, the participant systems will be given the gold phrases and will be expected to provide their system output just for triples.
    • Training data:
      • Same as in the practice phase; no new data will be released
    • Test data:
      • Part 1: Gold annotated contribution sentences will be released
      • Part 2: Gold annotated phrases from the contribution sentences
While participation is encouraged in all Evaluation Phases and Parts, it is not required. Please see our Terms and Conditions for more information.

Evaluation Metrics

The evaluation metrics in the Evaluation Phases 1 and 2 will be the standard Precision, Recall, and F-score measures. Details of the evaluation units can be found in our evaluation script or in our Codalab competition configuration yaml file.

Evaluation Submission Format

For Evaluation Phase 1: End-to-end pipeline testing phase, the submission will have be organized per the following directory structure:

    [task-name-folder]/
        ├── [article-counter-folder]/
        │   ├── sentences.txt
        │   └── entities.txt
        │   └── triples/
        │   │   └── research-problem.txt
        │   │   └── model.txt
        │   │   └── ...                         # each article may be annotated by 3 or 6 information units
        │   └── ...                             # repeats for all articles annotated in a release
        └── ...                                 # repeats depending on the number of tasks in the release

Please see our Github repository https://github.com/ncg-task/sample-submission for detailed information and for sample system input and output data for each of the Evaluation Phases.

 

By participating in this task you agree to these terms and conditions. If, however, one or more of this conditions is a concern for you, send us an email and we will consider if an exception can be made.

  • By submitting results to this competition, you consent to the public release of your scores at this website and at SemEval-2021 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
  • You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
  • This task will have three evaluation phases in all: Evaluation Phase 1: End-to-end pipeline testing phase, Evaluation Phase 2, Part 1: Phrase extraction testing, and Evaluation Phase 2, Part 2: Triples extraction testing. You can choose to participate in at least one.

    To be considered a valid participation/submission in Evaluation Phase 1: End-to-end pipeline testing phase, you agree to:

    • Submit predictions for all three defined NLPContributionGraph elements (i.e. Sentences, Phrases, and Triples)

    If there are reasons why submission of all elements is not possible, then email us before the evaluation period begins. In special circumstances this may be allowed.
  • Each team must create and use exactly one CodaLab account.
  • Team constitution (members of a team) cannot be changed after the evaluation period has begun.
  • During each evaluation week phase:
    • Each team can submit as many as ten submissions. And, the top-scoring submission will be considered as the official submission to the competition.
  • The organizers and their affiliated institution makes no warranties regarding the datasets provided, including but not limited to being correct or complete. They cannot be held liable for providing access to the datasets or the usage of the datasets.
  • Each task participant will be assigned at least one other teams' system description paper for review, using the START system. The papers will thus be peer reviewed.

Practice

Start: Aug. 16, 2020, midnight

Evaluation Phase 1: End-to-end Pipeline Testing

Start: Jan. 10, 2021, midnight

Evaluation Phase 2, Part 1: Phrase Extraction Testing

Start: Jan. 17, 2021, midnight

Evaluation Phase 2, Part 2: Triples Extraction Testing

Start: Jan. 24, 2021, midnight

Post-competition

Start: Jan. 31, 2021, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In