The validity of high school grades as a predictor of academic success is controversial. Researchers have found indications that linguistic features such as function words used in a prospective student's writing perform better in predicting academic success (Pennebaker et al., 2014).
During an aptitude test, participants are asked to write freely associated texts to provided questions and images. Trained psychologists can predict behavior, long-term development, and subsequent success from those expressions. Paired with an IQ test and provided high school grades, prediction of intellectual ability from a text can be investigated. Such an approach would extend the sole text classification and could reveal insightful psychological traits.
Operant motives are unconscious intrinsic desires that can be measured by implicit or operant methods, such as the Operant Motive Test (OMT) or the Motive Index (MIX) employs. During the OMT and MIX, participants are asked to write freely associated texts to provided questions and images. Trained psychologists label these textual answers with one of five motives and corresponding levels. The identified motives allow psychologists to predict behavior, longterm development, and subsequent success. For our task, we provide extensive amounts of textual data from both, the OMT and MIX, paired with IQ and high school grades (MIX) and labels (OMT).
With this task, we aim to foster research within this context. This task is focusing on classifying German psychological text data for predicting the IQ and high school grades of college applicants as well as performing speaker identification by the same image descriptions.
The shared task is organized by Dirk Johannßen, Chris Biemann, Steffen Remus and Timo Baumann from the Language Technology group of the University of Hamburg, as well as David Scheffer from the NORDAKADEMIE Elmshorn, Nicola Baumann from the Universität Trier and the Gudula Ritz from the Impart GmbH (Germany).
System submissions are done in teams. There is no restriction on the number of people in a team. However, keep into consideration that a participant is allowed to be in multiple teams, so splitting up into teams with overlapping members is a possibility. Every participating team is allowed to submit 3 different systems to the competition. For submission in the final evaluation phase, it is necessary for every team to name their submission (.zip and the actual submission .txt file) in the form "[Teamname]__[Systemname]" (note the two underscores!). E.g. your submission could look like
We also ask you to put exactly this name into the description before submitting your system. This identification method is needed to correctly associate each submitted system with its description paper. Thus, please make sure to write the name exactly as it will appear in your description paper (i.e. case sensitive). If your submission does not follow these rules it might not be evaluated. The evaluation script has been adopted for a formality check.
Only the person who makes the submission is required to register for the competition. All team members need to be stated in the description paper of the submitted system. The last submission of a system will be used for the final evaluation. Participants will see whether the submission succeeds, however, there will be no feedback regarding the score. The leaderboard will thus be disabled during the test phase.
The evaluation script is provided with the data so that participants can still evaluate their own data splits. The following zip-files contains this years' evaluation tool:
The evaluation tool comes as a self-contained python script and is able to accept both tasks.: For the tasks to be distinguishable, you need to include a text file in your submission, being either
for Task 1 and
The evaluation tool requires three files: the task1/2.txt, as described above, the file with a system prediction and some gold standard file. Both latter files have to comply with the tab-separated format as follows for Subtask 1, reproducing the target rank (as averaged z-standardized scores of a participant) relative to all participants in a collection (i.e. test / dev / train) :
and for Task 2:
UUID motive level
To get more information about its usage, simply type:
python evaluationScriptGermeval2020_psychpred.py --help
On the task to be evaluated, the script computes for each class precision, recall and F1 score. As a summarizing score, the tool computes accuracy and macro-average precision, recall and F1 score.
Although the evaluation tool outputs several evaluation measures, the official ranking of the systems will be based on the macro-average F1 score only. Please remember this when tuning your classifiers. A classifier that is optimized for accuracy may not necessarily produce optimal results in terms of the macro-average F1 score.
System submissions are done in teams. There is no restriction on the number of people in a team. However, keep into consideration that a participant is allowed to be in multiple teams, so splitting up into teams with overlapping members is a possibility. Every participating team is allowed to submit 3 different systems to the competition.
The copyright to the provided data belongs to the NORDAKADEMIE and for the OMT related tasks to the University of Trier and Impart GmbH, its licensors, vendors and/or its content providers. The scores and instances serve promotional/public purposes and permission has been granted by the NORDAKADEMIE and the University of Trier, which both share this dataset. This dataset is redistributed under the creative commons license CC BY-NC-SA 4.0.
By participating at this competition, you consent the public release of your anonymized scores at the GermEval-2020 workshop and in respective proceedings, at the task organizers' discretion.
All due times are at 23:59 (AoE)
The shared task on a prediction of intellectual ability of text consists of two subtasks, described below. You can participate in any of them, may learn from external data and/or utilize the other data respectively for training, as well as perform e.g. multi-task or transfer learning.
The task is to predict measures of intellectual ability solemnly based on text. For this, z-standardized high school grades and IQ scores of college applicants are summed and globally ranked. The goal of this subtask is to reproduce their ranking, systems are evaluated by the Pearson correlation coefficient between system and gold ranking. An exemplary illustration can be found in the Data area.
One z-standardized example instance looks as follows (including spelling errors made by the participant) with the unique ID (consisting of studentID_imageNo_questionNo), a student ID, an image number, an answer number, the German grade points, the English grade points, the math grade points, the language IQ score, the math IQ score and the average IQ score (all z-standardized).
The data is delivered in two files, one containing participant data, the other containing sample data, each being connected by a student ID. The rank in the sample data reflects the averaged performance relative to all instances within the collection (i.e. within train / test / dev), which is to be reproduced for the task.
student_ID image_no answer_no UUID MIX_text
1034-875791 2 2 1034-875791_2_2 Die Person fühl sich eingebunden in die Unterhatung.
student_ID german_grade english_grade math_grade lang_iq logic_iq
1034-875791 -0.08651999119820285 0.3747985587188588 0.5115559707967757 -0.010173719700624676 -0.13686707618782515
The training data set contains 80% of all available data, which is 62,280 expressions from 2,076 participants and the development and test sets contain roughly 10% each, which are 7,800 expressions from 260 participants for the dev set and 7,770 (259 participants) expressions for the test set (this split has been chosen in order to preserve the order and completeness of the 30 answers per participant).
For the final results, participants of this shared task will be provided with an MIX_text only and are asked to reproduce the ranking of each student relative to all students in a collection (i.e. the within the test set).
Operant motives are unconscious intrinsic desires that can be measured by implicit or operant methods, such as the Operant Motive Test (OMT)(Kuhl and Scheffer, 1999). During the OMT, participants are asked to write freely associated texts to provided questions and images. An exemplary illustration can be found in the Data area. Trained psychologists label these textual answers with one of four motives. The identified motives allow psychologists to predict behavior, long-term development, and subsequent success.
For this task, we provide the participants with a large dataset of labeled textual data, which emerged from an operant motive test. The training data set contains 80% of all available data (167,200 instances) and the development and test sets contain 10% each (20,900 instances)
6221323283933528M10 Sie wird ausgeschimpft, will jedoch das Gesicht bewahren.Beleidigt.Weil sie sich schämt, ausgeschimpft zu werden. Die blaue Person ist verletzt und hört nicht auf die Worte der weißen Person.
UUID motive level
6221323283933528M10 F 5
For this shared task, participants will be provided with an OMT_text and are asked to predict the motive and level of each instance. The success will be measured with the macro-averaged F1-score.
Start: Dec. 1, 2019, 10 a.m.
Description: Preparation: Submit practice predictions on the sample dataset. Use this to check your file format. A sample submission is available for download under the tab Participate/Files.
Start: Jan. 1, 2020, 10 a.m.
Description: Evaluation Validation Set: Submit predictions for the validation set. The Scoreboard will be enabled.
Start: May 8, 2020, 10 a.m.
Description: Evaluation Test Set: Submit predictions for the test set. Results during this phase will be used to assess the performance of a submission for this shared task. The scoreboard is disabled.
You must be logged in to participate in competitions.Sign In