GermEval 2019 Task 1 -- Shared task on hierarchical classification of blurbs

Organized by Raly - Current server time: June 19, 2019, 8:46 p.m. UTC

Previous

Evaluation Validation Set
Feb. 1, 2019, midnight UTC

Current

Evaluation Test Set
June 1, 2019, 11:59 p.m. UTC

Next

Post Evaluation
July 15, 2019, midnight UTC

Shared Task on Hierarchical Classification of Blurbs

Hierarchical multi-label classification (HMC) of Blurbs is the task of classifying multiple labels for a short descriptive text, where each label is part of an underlying hierarchy of categories. The increasing amount of available digital documents and the need for more and finer grained categories calls for a new, more robust and sophisticated text classification methods. Large datasets often incorporate a hierarchy for which can be used to categorize information of documents on different levels of specificity. The traditional multi-class text classifcation approach is thoroughly researched, however, with the increase of available data and the necessity of more specific hierarchies and since traditional approaches fail to generalize adequately, the need for more robust and sophisticated classification methods increases.

With this task we aim to foster research within this context. This task is focusing on classifying German books into their respective hierarchically structured writing genres using short advertisement texts (Blurbs) and further meta information such as author, page number, release date, etc.

GermEval is a series of shared task evaluation campaigns that focus on Natural Language Processing for the German language. The workshop of this shared task will be held in conjunction with the Conference on Natural Language Processing KONVENS 2019 in Erlangen/Nürnberg.

Submission rules

 System submissions are done in teams. There is no restriction on the number of people in a team. However, keep into consideration that a participant is allowed to be in multiple teams, so splitting up into teams with overlapping members is a possibility. Every participating team is allowed to submit 3 different systems to the competition. For submission in the final evaluation phase, it is necessary for every team to name their submission (.zip and the actual submission .txt file) in the form "[Teamname]__[Systemname]" (note the two underscores!). E.g. your submission could look like

Funtastic4__SVM_NAIVEBAYES_ensemble1.zip
   |
   +-- Funtastic4__SVM_NAIVEBAYES_ensemble1.txt

We also ask you to put exactly this name into the description before submitting your system. This identification method is needed to correctly associate each submitted system with its description paper. Thus, please make sure to write the name exactly as it will appear in your description paper (i.e. case sensitive). If your submission does not follow these rules it might not be evaluated. The evaluation script has been adopted for a formality check.

Only the person who makes the submission is required to register for the competition. All team members need to be stated in the description paper of the submitted system.  The last submission of a system will be used for the final evaluation. Participants will see whether the submission succeeds, however, there will be no feedback regarding the score. The leaderboard will thus be disabled during the test phase.

The evaluation script is provided with the data so that participants can still evaluate their own data splits. The evaluation.py takes two input parameters - the path for the input and for the output folder. The input folder must consist of two files: The output of the system, answer.txt, and the gold data, gold.txt. The output files will then be written into the output folder.

Evaluation Criteria

Classification systems will be listed on the leaderboard based on the micro-averaged F1-score, which is one of the most common metric to evaluate multi-label as well as hierarchical classification tasks (Silla and Freitas, 2011). The detailed test report will additionally include micro recall, mico precision as well as the subset accuracy. The latter metric is being captured, as it measures how well labels are selected in relation to each other. All submissions should be formated as shown below (submissions to both tasks) and written into a file called answer.txt and uploaded as a zipped file:

subtask_a
ISBN<Tab>Label1<Tab>Label2<Tab>....<Tab>Label_n
ISBN<Tab>Label1<Tab>Label2<Tab>....<Tab>Label_n...
subtask_b
ISBN<Tab>Label1<Tab>Label2<Tab>....<Tab>Label_n
ISBN<Tab>Label1<Tab>Label2<Tab>....<Tab>Label_n...

The label order is irrelevant. The scoring program handles the assinged labels as a set, duplicates are thus ignored. This is not an issue for the hierarchical task because every child has one parent which results into unambiguity.

If only one subtask is being submitted, only the respective header(subtask_a or subtask_b) and the label assignments have to be written to the submission file.

 

References
  • Silla, C. N. and Freitas, A. A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1-2):31–72.

Terms and Conditions

The copyright to all blurbs belongs to Random House, its licensors, vendors and/or its content providers since the blurbs were obtained through the randomhouse.de website. The blurbs serve promotional/public purposes and permission has been granted by Random House to share this dataset. This dataset is redistributed under the creative commons license CC BY-NC.

By participating at this competition, you consent the public release of your scores at the GermEval-2019 workshop and in respective proceedings, at the task organizers' discretion. The scoring of a system may include, but is not limited to, the metrics mentioned on this page. The final decision of the metric choice and score value is made by the task organizers. In case the organizers' judge a submission as incomplete, deceptive or as a violation to the competition's rules, scores may be withheld. The system of a participating team will be named according to the team name provided at the time of submission, or to an abbreviation selected by the task organizers.

Important Dates

  • Jan-2019: Release of trial data
  • 01-Feb-2019: Release of training data (train + validation)
  • 01-Jun-2019: release test data
  • 15-Jul-2019: Final submission of test results
  • 31-Jul-2019: Submission of description paper
  • 8-Oct-2019: Workshop in Erlangen, Germany at the Conference on Natural Language Processing KONVENS 2019.

Subtasks

This shared task consists of two subtask, described below. You can participate in one of them or both.

Subtask A

The task is to classify german books into one or multiple most general writing genres (d=0). Therfore, it can be considered a multi-label classification task. In total, there are 8 classes that can be assigned to a book: Literatur & Unterhaltung, Ratgeber, Kinderbuch & Jugendbuch, Sachbuch, Ganzheitliches Bewusstsein, Glaube & Ethik, Künste, Architektur & Garten.

Subtask B

The second task is a hierarchical multi-label classification into multiple writing genres. In addition to the very general writing genres additional genres of different specificity can be assigned to a book. In total, there are 343 different classes that are hierarchically structured.

Data

A sample set is now available to get familiar with the general structure of the data and the evaluation system.

The dataset consists of information on German books (blurb, genres, title, author, URL, ISBN, date of publication), crawled from the Random House page. The rights to all blurbs and further meta information belong to Random House. The date of publication normally represents the publication date of the particular version of the book. Genres that capture properties which do not rely on content but on the shape or form of a book were removed. Since every ISBN and URL appears only once, all blurbs should be in theory unique, however, in exceptional cases, book blurbs can be very similar, for example, if a book is part of a series. There are some other anomalies in the documents, e.g. no available author and inaccurate publication dates. Furthermore, it appears that multiple publications of books cause blurbs to be different, although the author and title is identical and vice versa. However, only around 1% of all blurbs are affected by these anomalies.

An example entry is shown below:

<book date="2019-01-04" xml:lang="de">
<title>Blueberry Summer</title>
<body>In neuer Sommer beginnt für Rory und Isabel – mit einer kleinen Neuerung: Rory ist fest mit Connor Rule zusammen und deshalb als Hausgast in den Hamptons. Und genau das bringt Komplikationen mit sich, denn irgendwie scheint Connor ein Problem damit zu haben, dass Rory nicht mehr für seine Familie arbeitet. Isabel dagegen arbeitet zur Überraschung aller als Kellnerin, um einen süßen Typen zu beeindrucken – irgendwie muss sie ja über ihre Affäre mit Mike hinwegkommen. Das klappt ganz gut, bis Rory auf Isabels Neuen trifft ... Und Isabel wieder auf Mike.</body>
<copyright>(c) Verlagsgruppe Random House GmbH</copyright>
<categories>
<category>
<topic d="0">Kinderbuch & Jugendbuch </topic>
<topic d="1" label="True">Liebe, Beziehung und Freundschaft</topic>
</category>
<category>
<topic d="0">Kinderbuch & Jugendbuch </topic>
<topic d="1" label="True">Echtes Leben, Realistischer Roman</topic>
</category>
</categories>
<authors>Joanna Philbin</authors>
<published>2015-02-09</published>
<isbn>9780451457998</isbn>
<url>https://www.randomhouse.de/Taschenbuch/Blueberry-Summer/Joanna-Philbin/cbj-Jugendbuecher/e455949.rhd%0A/</url>
</book>

Key characteristics

The following table shows important quantitative characteristics of the total dataset:

Number of books 20784
Average Number of Words 94.67
Number of classes 343 (8, 93, 242 on level 1,2, 3 resp.)

Information for Subtask B: Exactly one parent is assigned to a child genre. The underlying hierarchy is a forest. The most specific writing genre of a book is not necessarily a leaf node.

Organizers

The shared task is organized by Rami Aly, Steffen Remus and Chris Biemann from the Language Technology group of the University of Hamburg, Germany.

Preparation

Start: Jan. 11, 2019, midnight

Description: Submit practice predictions on the sample dataset. Use this to check your file format. A sample submission is available for download under the tab Participate/Files.

Evaluation Validation Set

Start: Feb. 1, 2019, midnight

Description: Submit predictions for the validation set. The scoreboard will be enabled during this phase.

Evaluation Test Set

Start: June 1, 2019, 11:59 p.m.

Description: Submit predictions for the test set. Results during this phase will be used to assess the performance of a submission for this shared task. The scoreboard is disabled. Please adhere to the submission formalities as explained in https://competitions.codalab.org/forums/17948/3701/ ; Otherwise, the submission might not be evaluated!

Post Evaluation

Start: July 15, 2019, midnight

Description: For evaluation after the competition ends. Submit additional test set predictions.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In