GermEval 2019 Task 1 -- Shared task on hierarchical classification of blurbs

Organized by Raly - Current server time: Jan. 18, 2021, 8:23 a.m. UTC


Evaluation Test Set
June 1, 2019, midnight UTC


Post Evaluation
July 28, 2019, noon UTC


Competition Ends

Shared Task on Hierarchical Classification of Blurbs

The test labels can now be found under the Files tab in the Participate section.

Please note the extended system submission deadline (July 27nd, 2019) and the extended paper deadline (August 7th, 2019)

Results for the test-phase are available here.

Please note that we still accept system description papers which have systems submitted in the post-evaluation phase. These system submissions will be clearly distinguished from systems submitted in the test-phase and the paper should clearly indicate that the results were produced after the final submission deadline.

Hierarchical multi-label classification (HMC) of Blurbs is the task of classifying multiple labels for a short descriptive text, where each label is part of an underlying hierarchy of categories. The increasing amount of available digital documents and the need for more and finer grained categories calls for a new, more robust and sophisticated text classification methods. Large datasets often incorporate a hierarchy for which can be used to categorize information of documents on different levels of specificity. The traditional multi-class text classifcation approach is thoroughly researched, however, with the increase of available data and the necessity of more specific hierarchies and since traditional approaches fail to generalize adequately, the need for more robust and sophisticated classification methods increases.

With this task we aim to foster research within this context. This task is focusing on classifying German books into their respective hierarchically structured writing genres using short advertisement texts (Blurbs) and further meta information such as author, page number, release date, etc.

GermEval is a series of shared task evaluation campaigns that focus on Natural Language Processing for the German language. The workshop of this shared task will be held in conjunction with the Conference on Natural Language Processing KONVENS 2019 in Erlangen/Nürnberg

Submission Rules

System submissions are done in teams. There is no restriction on the number of people in a team. However, keep into consideration that a participant is allowed to be in multiple teams, so splitting up into teams with overlapping members is a possibility. Every participating team is allowed to submit 3 different systems to the competition. For submission in the final evaluation phase, it is necessary for every team to name their submission (.zip and the actual submission .txt file) in the form "[Teamname]__[Systemname]" (note the two underscores!). E.g. your submission could look like
	+-- Funtastic4__SVM_NAIVEBAYES_ensemble1.txt

We also ask you to put exactly this name into the description before submitting your system. This identification method is needed to correctly associate each submitted system with its description paper. Thus, please make sure to write the name exactly as it will appear in your description paper (i.e. case sensitive). If your submission does not follow these rules it might not be evaluated. The evaluation script has been adopted for a formality check.

Only the person who makes the submission is required to register for the competition. All team members need to be stated in the description paper of the submitted system. The last submission of a system will be used for the final evaluation. Participants will see whether the submission succeeds, however, there will be no feedback regarding the score. The leaderboard will thus be disabled during the test phase.

The evaluation script is provided with the data so that participants can still evaluate their own data splits. The takes two input parameters - the path for the input and for the output folder. The input folder must consist of two files: the output of the system, named with the scheme described above and the gold data, gold.txt . The output files will then be written into the output folder.

Evaluation Criteria

Classification systems will be listed on the leaderboard based on the micro-averaged F1-score, which is one of the most common metrics to evaluate multi-label as well as hierarchical classification tasks (Silla and Freitas, 2011). The detailed test report will additionally include micro recall, mico precision as well as the subset accuracy. The latter metric is being captured, as it measures how well labels are selected in relation to each other. All submissions should be formatted as shown below (submissions to both tasks) and written into a file, with the naming scheme as described above, and finally uploaded as a zipped file:


The label order is irrelevant. The scoring program handles the assigned labels as a set, duplicates are thus ignored. This is not an issue for the hierarchical task because every child has one parent which results in unambiguity.

If only one subtask is being submitted, only the respective header(subtask_a or subtask_b) and the label assignments have to be written to the submission file.


  • Silla, C. N. and Freitas, A. A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1-2):31–72.

Terms and Conditions

The copyright to all blurbs belongs to Random House, its licensors, vendors and/or its content providers since the blurbs were obtained through the website. The blurbs serve promotional/public purposes and permission has been granted by Random House to share this dataset. This dataset is redistributed under the creative commons license CC BY-NC.

By participating at this competition, you consent the public release of your scores at the GermEval-2019 workshop and in respective proceedings, at the task organizers' discretion. The scoring of a system may include, but is not limited to, the metrics mentioned on this page. The final decision of the metric choice and score value is made by the task organizers. In case the organizers' judge a submission as incomplete, deceptive or as a violation to the competition's rules, scores may be withheld. The system of a participating team will be named according to the team name provided at the time of submission, or to an abbreviation selected by the task organizers.

Important Dates

  • Jan-2019: Release of trial data
  • 01-Feb-2019: Release of training data (train + validation)
  • 01-Jun-2019: Release test data
  • 15-Jul-2019 extended to 27-July 2019: Final submission of test results
  • 31-Jul-2019 extended to 07-August 2019: Submission of description paper
  • 20-August-2019 extended to 27-August 2019: Notification of acceptance
  • 15-September-2019 extended to 22-September 2019: Camera-ready deadline for system description papers
  • 8-Oct-2019: Workshop in Nürnberg/Erlangen, Germany at the Conference on Natural Language Processing KONVENS 2019.

All due times are at 23:59 (AoE)


  • The workshop in co-located with KONVENS 2019
  • Register here for the workshop
  • Please note that the early bird registration ends on August 31
  • Find most recent information here.


This shared task consists of two subtask, described below. You can participate in one of them or both.

Subtask A

The task is to classify german books into one or multiple most general writing genres (d=0). Therfore, it can be considered a multi-label classification task. In total, there are 8 classes that can be assigned to a book: Literatur & Unterhaltung, Ratgeber, Kinderbuch & Jugendbuch, Sachbuch, Ganzheitliches Bewusstsein, Glaube & Ethik, Künste, Architektur & Garten.

Subtask B

The second task is a hierarchical multi-label classification into multiple writing genres. In addition to the very general writing genres additional genres of different specificity can be assigned to a book. In total, there are 343 different classes that are hierarchically structured.


The dataset is available under the tab 'Participate/Files'.

The dataset consists of information on German books (blurb, genres, title, author, URL, ISBN, date of publication), crawled from the Random House page. The rights to all blurbs and further meta information belong to Random House. The date of publication normally represents the publication date of the particular version of the book. Genres that capture properties which do not rely on content but on the shape or form of a book were removed. Since every ISBN and URL appears only once, all blurbs should be in theory unique, however, in exceptional cases, book blurbs can be very similar, for example, if a book is part of a series. There are some other anomalies in the documents, e.g. no available author and inaccurate publication dates. Furthermore, it appears that multiple publications of books cause blurbs to be different, although the author and title is identical and vice versa. However, only around 1% of all blurbs are affected by these anomalies.

An example entry is shown below:

<book date="2019-01-04" xml:lang="de">
<title>Blueberry Summer</title>
<body>In neuer Sommer beginnt für Rory und Isabel – mit einer kleinen Neuerung: Rory ist fest mit Connor Rule zusammen und deshalb als Hausgast in den Hamptons. Und genau das bringt Komplikationen mit sich, denn irgendwie scheint Connor ein Problem damit zu haben, dass Rory nicht mehr für seine Familie arbeitet. Isabel dagegen arbeitet zur Überraschung aller als Kellnerin, um einen süßen Typen zu beeindrucken – irgendwie muss sie ja über ihre Affäre mit Mike hinwegkommen. Das klappt ganz gut, bis Rory auf Isabels Neuen trifft ... Und Isabel wieder auf Mike.</body>
<copyright>(c) Penguin Random House</copyright>
<topic d="0">Kinderbuch & Jugendbuch </topic>
<topic d="1" label="True">Liebe, Beziehung und Freundschaft</topic>
<topic d="0">Kinderbuch & Jugendbuch </topic>
<topic d="1" label="True">Echtes Leben, Realistischer Roman</topic>
<authors>Joanna Philbin</authors>

Key characteristics

The following table shows important quantitative characteristics of the total dataset:

Number of books 20784
Average Number of Words 94.67
Number of classes 343 (8, 93, 242 on level 1,2, 3 resp.)

Information for Subtask B: Exactly one parent is assigned to a child genre. The underlying hierarchy is a forest. The most specific writing genre of a book is not necessarily a leaf node.


The shared task is organized by Rami Aly, Steffen Remus and Chris Biemann from the Language Technology Group of the University of Hamburg, Germany.

System Description Paper Author Guidelines

  • Submission is electronic, using the easychair management system. The submission site is now available at
  • Paper submissions must use the official ACL 2019 style templates and must be within 6 to 8 pages of content. References do not count against these limits.
  • Upon acceptance, authors will be given one extra page of content for including the reviewers' feedback.
  • All submissions must be in PDF format and must conform to the official style guidelines, which are contained in the template files that are available at
  • The review process will be single-blind, i.e. authors are allowed to enter information that might reveal their identity.
  • The decision on paper acceptance will be based on the feedback from the reviewers.
  • Authors of accepted papers are invited to presented their system at the official GermEval workshop as a poster or in an oral presentation (depending on the number of submissions).
  • Accepted system description papers will appear in an online workshop proceeding.


Start: Jan. 11, 2019, midnight

Description: Submit practice predictions on the sample dataset. Use this to check your file format. A sample submission is available for download under the tab Participate/Files.

Evaluation Validation Set

Start: Feb. 1, 2019, midnight

Description: Submit predictions for the validation set. The Scoreboard will be enabled.

Evaluation Test Set

Start: June 1, 2019, midnight

Description: Submit predictions for the test set. Results during this phase will be used to assess the performance of a submission for this shared task. The scoreboard is disabled.

Post Evaluation

Start: July 28, 2019, noon

Description: For evaluation after the competition ends. Submit additional test set predictions.

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 Christian.Gawron 4.5000
2 Raghavan 2.5000
3 benf 3.0000