LaySumm - Overview
Lay Summary Generation is a shared task that focuses on generating lay summaries for scientific documents. LaySumm is one of three shared tasks conducted as part of: 1st Workshop on Scholarly Document Processing
To ensure and increase the relevance of science for all of society and not just a small group of niche practitioners, researchers have been increasingly tasked by funders and publishers to outline the scope of their research for a general public by writing a summary for a lay audience, or lay summary. The LaySumm summarization task considers automating this responsibility, by enabling systems to automatically generate lay summaries.
The CL-LaySumm Shared Task is to automatically produce Lay Summaries of technical (scientific research article) texts. A Lay Summary is defined as a textual summary intended for a non-technical audience. It is typically produced either by the authors or by a journalist or commentator. The corpus covers three distinct domains: epilepsy, archeology, and materials engineering. In more detail, a lay summary explains, succinctly and without using technical jargon, what the overall scope, goal and potential impact of a scientific paper is. The task is to generate summaries that are representative of the content, comprehensible, and interesting to a lay audience. The corpus for this task comprised of full-text papers with lay summaries, in a variety of domains, and from a number of journals. Elsevier has made available a collection of lay summaries from a multidisciplinary collection of journals, as well as the abstracts and full text of these journals.
Please see the workshop page: https://ornlcda.github.io/SDProc/sharedtasks.html, for more details.
The intrinsic evaluation will be done by ROUGE, using ROUGE-1, -2, -L. In addition, a randomly selected subset of the summaries will undergo human evaluation.
Terms and Conditions
By participating in this task you agree to these terms and conditions:
- By submitting results to this competition, you consent to the public release of your scores at LaySumm2020 and in the associated proceedings, at the task organizers' discretion.
- You accept that the ultimate decision of metric choice and score value is that of the task organizers.
- Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit.
- You agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules.
- Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
- A participant can be involved in exactly one team (no more).
- A participant may not use any external resource to produce machine-generated summaries, besides that provided by the LaySumm shared task.
- The submitted code must generate predictions using a machine learning model. Submissions making hard-coded predictions are forbidden.
- Users may make use of open source libraries given proper attribution. At the end of the competition, we encourage all code to be open-sourced so the results can be reproduced.
- Each team must create and use exactly one CodaLab account.
- Team constitution (members of a team) cannot be changed after the evaluation period has begun.
- During the evaluation period, each team can submit as many as 100 submissions.
- Once the competition is over, we will release the gold test set and you will be able to determine results on various system variants you may have developed. We encourage you to report results on all of your systems (or system variants) in the system description paper. However, we will ask you to clearly indicate the result of your official submission.
- The organizers and their affiliated institutions cannot be held liable for providing access to the datasets or the usage of the datasets.
- The training dataset should only be used for scientific or research purposes. Any other use is explicitly prohibited.
- The datasets must not be redistributed or shared in part or full with any third party.
- If you use any of the datasets provided here, or discuss your work developed for this workshop, please cite the Workshop page: ‘First Workshop on Scholarly Document Processing, 2020, Lay Summary Task https://ornlcda.github.io/SDProc/index.html (LaySumm2020). Once the results of the task have been published we will share a final citation to the paper describing the outputs of this task.
If one or more of these conditions is a concern for you, please send an email to Anita de Waard firstname.lastname@example.org and the organisers will consider if an exception can be made.