This shared task will help advance the state-of-the-art in automatic speech recognition (ASR) by considering a challenging domain for ASR: non-native children's speech. A new data set containing English spoken responses produced by Italian students will be released for training and evaluation. The spoken responses in the data set were produced in the context of an English speaking proficiency examination. The following data will be released for this shared task: training set of 49 hours of transcribed speech, development set of 2 hours of transcribed speech, test set of 2 hours of speech, and a baseline Kaldi ASR system with evaluation scripts. The shared task will consist of two tracks: a closed track and an open track. In the closed track, only the training data distributed as part of the shared task can be used to train the models; in the open track, any additional data can be used to train the models.
For questions about the shared task, please email is2020challenge@gmail.com.
Important Dates
Organizers
Daniele Falavigna, Fondazione Bruno Kessler
Roberto Gretter, Fondazione Bruno Kessler
Marco Matassoni, Fondazione Bruno Kessler
Keelan Evanini, Educational Testing Service
Ben Leong, Educational Testing Service
Further information about the shared task
The availability of large amounts of training data and large computational resources have made Automatic Speech Recognition (ASR) technology usable in many application domains, and recent research has demonstrated that ASR systems can achieve performance levels that match human transcribers for some tasks. However, ASR systems still present deficiencies when applied to speech produced by specific types of speakers, in particular, non-native speakers and children.
Several phenomena that regularly occur in non-native speech can greatly reduce ASR performance, including mispronounced words, ungrammatical utterances, disfluencies (including false starts, partial words, and filled pauses), and code-switched words. ASR for children’s speech can be challenging due to linguistic differences from adult speech at many levels (acoustic, prosodic, lexical, morphosyntactic, and pragmatic) caused by physiological differences (e.g., shorter vocal tract lengths), cognitive differences (e.g., different stages of language acquisition), and behavioral differences (e.g., whispered speech). Developing ASR systems for both of these domains is made more challenging due to the lack of publicly available databases for both non-native speech and children’s speech.
Despite these difficulties, a significant portion of the speech transcribed by ASR systems in practical applications may come from both non-native speakers, (e.g., newscasts, movies, internet videos, human-machine interactions, human-human conversations in telephone call centers, etc.) and children (e.g., educational applications, smart speakers, speech-enabled gaming devices, etc.) Therefore, it is necessary to continue to improve ASR systems to be able to accurately process speech from these populations. An additional important application area is the automatic assessment of second language speaking proficiency, where the ASR difficulties can be increased by the low proficiency levels of the speakers, especially if they are children. The lack of training data is especially pronounced for this population (non-native children’s speech).
With this special session we aim to help address these gaps and stimulate research that can advance the present state-of-the-art in ASR for non-native children’s speech. To achieve this aim we will distribute a new data set containing non-native children’s speech and organize a challenge that will be presented in the special session. The data set consists of spoken responses collected in Italian schools from students between the ages of 9 and 16 in the context of English speaking proficiency assessments. The data that will be released includes both a test set (ca. 4 hours) and adaptation (ca. 9 hours) set, both of which were carefully transcribed by human listeners. In addition, a set of around 90 hours of untranscribed spoken responses will be distributed. A Kaldi baseline system will also be released together with the data, and a challenge web site will be developed for collecting and scoring submissions.
The following points makes this session special:
Submissions to the Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech will be evaluated according to the Word Error Rate (WER) between the ASR hypotheses in the submission and the reference human transcriptions for the evaluation set as calculated by the evaluation script that was distributed with the training data.
Each participating team may submit one submission per day for each track during the evaluation period with a maximum of 7 submissions per team per track. The submissions will be ranked against other submissions based on WER, regardless of the order of the submissions (e.g., if a team's submission from the first day achieves the lowest WER out of a total of 7 submissions from that team, the submission from the first day will be the top-ranking submission for that team).
The performance of the baseline system developed by the organizers at FBK is a WER of 35.09% on the evaluation set. This result is displayed as the "Baseline System from Organizing Team" team on the CodaLab leaderboard for the shared task.
A participating team can view detailed results for their submissions including the number of substitutions, deletions, and insertions, by going to the "Participate" tab in CodaLa, selecting the "View / Submit Results" sub-page, clicking on the "+" at the right side of the entry for a submission in the table to expand the box, and then accessing the "View scoring output log" link. The detailed results for that submission will then be displayed in a separate webpage in the following format:
WER= 35.09% (S= 971 I= 437 D= 711) / REFERENCE_WORDS= 6038 - UTTERANCES= 578
For questions about the shared task, please email is2020challenge@gmail.com.
1010106_en_22_20_100 this is a fake asr output
1010106_en_22_20_101 this is a fake asr output
1010106_en_22_20_102 this is a fake asr output
For questions about the shared task, please email is2020challenge@gmail.com.
Start: April 14, 2020, midnight
Start: April 14, 2020, midnight
April 25, 2020, noon
You must be logged in to participate in competitions.
Sign In