The VoxCeleb Speaker Recognition Challenge 2019 - Audio speaker verification - OPEN training data Forum

Go back to competition Back to thread list Post in this thread

> Rules regarding the processing of trials

I was not able to find any description about how the trials need to be processed. That is for each same-vs-different question, are we only allowed to use the two audio files in that trial or is it allowed to use other audio files from the test set.

Posted by: dgromero @ July 23, 2019, 1:49 p.m.

Hi,

Please read carefully the "Evaluation", "Test Data" and "Submission Instructions" pages. You are only supposed to upload results of evaluating the pairs specified in the "list_pairs_test_data.txt" file, which you can download with the test data.

I hope this clarifies your question.

Posted by: vgg @ July 23, 2019, 2 p.m.

Let me be more specific, in typical NIST evaluation setups, for a trial (which is the equivalent of the pair of segments in this eval) you are produce a score only based on the audio pair involved. That is, you are not allowed to used the audio from any other test segment to compute the score. The task description does not state any constraints on how to process the trials: "Teams are invited to create a system that takes the test data and produces a list of floating-point scores, with a single score for each pair of segments". For example, if I am allowed to look at all the test segments to produce the list of scores, I could cluster all the audio files and then produce the score for a pair based on the fact that the two audio files are in the same cluster or not. This is a very different problem statement than when you are only allowed to use the audio segments in a pair to produce the score. I hope this makes my question more clear.

Thanks!

Posted by: dgromero @ July 23, 2019, 2:17 p.m.

Ideally, you are not encouraged to do so, but if you plan to do that, a description should be included, to explicitly specify the method you have used.

I would recommend to include an extra text file within the ZIP file that you will use to upload your solution. As stated in the "Submission Instructions", the file with the solution should be called "answer.txt". So, please include inside the ZIP another text file, with a different name, where you describe the method you have used.

Now, let say you call that file "description.txt". If you do want to try more than one methodology, you have up to 3 attempts. For each different methodology you can include a different "description.txt" within the ZIP file. Naturally the methodology that achieves the best score will replace the others in the leaderboard, but we will have them all on record in case we need to reconsider one of your solutions.

I hope this helps.

Posted by: vgg @ July 24, 2019, 8:25 a.m.

I'm a little concerned that your last statement seems to allow teams to use the complete pool of test data to potentially enhance the reliability of a trial score. The reason for concern is that it appears to conflict with the statement on the challenge website: "In both training conditions, i.e. fixed and open, the test data can be used strictly for reporting of results alone - it cannot be used in any way to train or tune systems". Is not the use of all test data to produce a score for a single trial a form of modifying/tuning the system based on the test data distribution (i.e., changing system behaviour based on knowledge of all the test data)? For instance, simply taking the mean of the trial data and using those statistics in the system could be beneficial. In such a system, the score for a given trial will be different after removing say 10% of other trials from the trial list since each trial score is dependent on the statistics from all trials. Could you please re-confirm this is allowable as it may or may not be a game changer, and is a relatively unseen evaluation paradigm for the general speaker recognition community.

Posted by: mmclaren @ Aug. 20, 2019, 6:55 a.m.

Apologies if the earlier response was unclear. Any form of unsupervised learning on the test set, such as getting the mean of test samples, is not allowed. If you have already used such a method for your submission, please email us immediately at voxsrc@googlegroups.com and we will sort it out.

Posted by: vgg @ Aug. 27, 2019, 2:58 p.m.
Post in this thread