Short-duration Speaker Verification (SdSV) Challenge 2020 - Task 1 : Text-Dependent

Organized by sdsvc - Current server time: Nov. 30, 2020, 4:58 p.m. UTC
Reward $1,000


Challenge Period
Jan. 15, 2020, midnight UTC


Post Evaluation
April 18, 2020, noon UTC


Competition Ends
Dec. 31, 2020, midnight UTC

Short-duration Speaker Verification (SdSV) Challenge 2020

Task 1 : Text-Dependent Speaker Verification

Evaluate New Technologies in Short Duration Scenarios

The main goal of the SdSV Challenge 2020 is to evaluate new technologies for text-dependent (TD) and text-independent (TI) speaker verification (SV) in short duration scenario.

The challenge evaluates SdSV with varying degree of phonetic overlap between the enrollment and test utterances. It is the first challenge with a broad focus on systematic benchmark and analysis on varying degree of phonetic variability on short-duration speaker recognition.

The full challenge evaluation plane can be found in this link. If you have any more questions regarding the challenge you can contact organizers via sdsvc2020[at]

Each team needs at least one CodaLab account to be able to submit their results. When creating an account, please select a team name that can be the name of your organization or any anonymous identity. There are two separate tasks in the challenge. Participants can register to any of the two tasks or both. The same user account (i.e, team name) should be used if teams decided to participate in both tasks. This page corresspont to the first task of the challenge.

Short-duration Speaker Verification (SdSV) Challenge 2020

Task 1 : Text-Dependent Speaker Verification

Evaluation Plan

Task 1 of the SdSV Challenge 2020 is defined as speaker verification in text-dependent mode: given a test segment of speech and the target speaker's enrollment data, automatically determine whether a specific phrase and the test segment was spoken by the target speaker. In contrast to text-independent speaker verification, the lexical content of the utterance is also taken into consideration. As such, Task 1 is a twofold verification task in which both the speaker and phrase are verified.

In Task 1, each trial consists of a test segment along with a model identifier which indicates three enrollment utterances and a phrase ID that uttered in the utterances. The system is required to process each trial independently and produce a log-likelihood ratio (LLR) score which combines both speaker and phrase verification scores.

The enrollment and test phrases are drawn from a fixed set of ten phrases consisting of five Persian and five English phrases, respectively. The in-domain training data contains utterances from <strong>936speakers</strong>, some of which have only Persian phrases. Model enrollment is done in a phrase and language-dependent way using three utterances for each model. Given the same set of target speakers, Task 1 provides a good basis for analyzing the language factor in text-dependent speaker recognition.

Trial types:

Given the ground truth, there are four types of trial in a TD-SV task ~\cite{larcher2014text}. The first is Target-Correct (TC) where the target speaker utters the correct pass-phrase. The second is Target-Wrong (TW) where the target speaker utters a wrong pass-phrase. In the same manner, the Imposter-Correct (IC) and Imposter-Wrong (IW) refer to the case where the imposter utters the correct or a wrong pass-phrase. The system should accept the TC trials as the target trial and reject the other three types as non-target (imposture) trials. Notice that the main difference between text-dependent and text-independent speaker verification is considering TW trials as imposter trial while both TC and TW are considered as target trials in the text-independent mode. There are no cross-language and cross-gender trials in Task 1 of the challenge.

Training condition:

The training condition is defined as the amount of data/resources used to build a Speaker Recognition (SR) system. Unlike SRE19, we adopted a fixed training condition where the system should only be trained using a designated set. The fixed training set consists of the following:

  • VoxCeleb1
  • VoxCeleb2
  • LibriSpeech
  • DeepMine (Task 1 Train Partition)

The use of other public or private speech data for training is forbidden, while the use of non-speech data for data augmentation purposes is allowed. The in-domain DeepMine training data can be used for any purpose, such as neural network training, LDA or PLDA model training, and score normalization. Part of the data could also be used as a development set since there is no separate development data provided for the challenge. Note that, however, usage of Task 2 in-domain data for this task is not allowed.

Enrollment Condition:

The enrollment is accomplished using three utterances of a specific phrase for each model. We decided to use three utterances for the model enrollment since it is commonly adopted in practice. Note that using enrollment utterances of other models is forbidden, for example, for calculating score normalization parameters (i.e., trials are to be processed independently).

Test Condition:

Each trial in the evaluation contains a test utterance and a target model. As described above, there are four types of trials in the evaluation and only TC is considered as target and the rest will be considered as imposture. Similar to the SRE 2019 CTS challenge, the whole set of trials will be divided into two subsets: a progress subset (30%), and an evaluation subset (70%). The progress subset is used to monitor progress on the leaderboard, while the evaluation subset is used to generate the official results at the end of the challenge.

Performance Measurement:

The main metric for the challenge is normalized minimum Detection Cost Function (DCF) as defined is SRE08. This detection cost function is defined as a weighted sum of miss and false alarm error probabilities:



By using this competition page, teams will be able to submit the system scores and see the results on the progress set. To participate, open the competition page and click on the "Participate" tab. Then accept the Terms and Conditions and click on the "Register" button. After this, you will be able to upload files via the "Participate" tab. Note that a team must only submit to CodaLab the output scores of its system, not the system itself. Make sure you have read the "Submission Instructions" under the "Participate" tab before uploading any files.

To submit a file you need to click on the "Submit/View Results" link under the "Participate" tab. After this, you will be able to see two buttons, corresponding to the Competition Phases. Click on one of the buttons to choose the phase you want to submit to. The available phases are:

  • Challenge Period: This is the main phase of the challenge and participants can use it to evaluate their systems on progress set and select the best system and deliver it to the challenge organizers.
  • Post Evaluation: During this phase, new submissions will be accepted and participants can use it to do any post-evaluation to write papers.

Each team can submit one submission per day during the Challenge Period phase but after that, this limit will be increased to 5 submissions per day. See more details about this under "Submit/View Results"

Short-duration Speaker Verification (SdSV) Challenge 2020

Task 1 : Text-Dependent Speaker Verification

Evaluation Dataset

The evaluation dataset used for the challenge is drawn from the recently released multi-purpose DeepMine corpus. The dataset has three parts and among them, Part 1 is used for TD-SV while Part 3 is for TI-SV. Since the evaluation dataset is a subset of the DeepMine corpus, in addition to the CodaLab account, teams need to complete the dataset’s License Agreement. After signing the agreement, the scanned version of the signed agreement should send back to the challenge organizations by the challenge email: sdsvc2020[at] The dataset download links will be sent to the team’s corresponding user. More information can be found in the SdSV Challenge 2020 page at

Short-duration Speaker Verification (SdSV) Challenge 2020

Task 1 : Text-Dependent Speaker Verification

Terms and Conditions

Participation in this challenge is open to all who are interested. There is no cost to participate except writing a system description with at least two pages. We highly recommend submitting the corresponding paper to challenge's special session at Interspeech 2020.

We kindly ask participants to use their organization name as the team name. Also, each organization is allowed to participate using only one account.


There will be three cash prizes. The winners will be selected based on the results of the primary systems on the evaluation subset. In addition to the cash prize, each winner will receive a certificate for their achievement. The cash prizes are as follow:

  • Rank 1: 500 EUR
  • Rank 1: 300 EUR
  • Rank 1: 100 EUR

Challenge Period

Start: Jan. 15, 2020, midnight

Description: Submissions for evaluating systems on the Progress set. Note that in this phase you can only see the results on the progress set which is 30% of the whole trials.

Post Evaluation

Start: April 18, 2020, noon

Description: Submissions for doing post evaluation. Note that for this phase the reported results are for the Evaluation set while the results in the Challenge Period phase are for the Progress set.

Competition Ends

Dec. 31, 2020, midnight

You must be logged in to participate in competitions.

Sign In
# Username Score
1 matejkap 0.0409
2 zxchen123 0.0456
3 david02 0.0470