This shared task addresses a challenging area in Automatic Speech Recognition: vulnerable old-aged and transgender people in Tamil. People in their old-age visit primary locations such as banks, hospitals and administrative offices to address their needs in their quotidian lives. Many aged people are unaware of using the equipment facilitated to aid people. Similarly, transgender people are deprived of primary education because of prejudice in society, so speech is the only medium that could assist them in satisfying their needs. The spontaneous speech data is gathered from old-aged and transgender people, who are bereft of using these facilities to their advantage. The speech corpus containing 5.5 hours of transcribed speech will be released for the training set, and 2 hours of speech data will be released for testing.
The participants will be provided training and test dataset. Download the data and participate, go to CodaLab and click “Participate" tab.
Paper name format should be: TEAM_NAME@LT-EDI-ACL2022: Title of the paper.
Example: NUIG_ULD@LT-EDI-ACL2022: Speech Recognition in Tamil
For electronic submission of papers to DravidianLangTech workshop please use this link:
Submission should be a zip file with your team name containing a folder for each submission ( max. no. of submission :3) which contains .txt file(recognised text) for each of the test audio.
Eg. teamname.zip contains sub_1 folder contains Audio-37.txt or Audio-38.txt (test-filename.txt)
Submissions to the Shared task on speech recognition for vulnerable individuals in Tamil will be evaluated according to the Word Error Rate (WER) between the ASR hypotheses in the submission and the reference human transcriptions for the evaluation set.
WER = (S + D + I) / N
Where,
Note: Participants can use the full utterance i.e one utterance per speaker (Shared via github) or the smaller utterances shared via Google drive
By downloading the data or by accessing it any manner, you agree not to redistribute the data except for non-commercial and academic-research purposes. The data must not be used for providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose.
You should cite this papers if you are using our data.
@inproceedings{speech-acl,
title={Findings of the Shared Task on {S}peech {R}ecognition for {V}ulnerable {I}ndividuals in {T}amil},
author = "Bharathi, B and
Chakravarthi, Bharathi Raja and
Chinnaudayar Navaneethakrishnan, Subalalitha and
Sripriya, N and
Pandian, Arunaggiri and
Valli, Swetha”,
booktitle = "Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion",
month = may,
year = "2022",
publisher = "Association for Computational Linguistics",
}
Important Dates for shared task:
Task announcement: Nov 20, 2021
Release of Training data: Nov 20, 2021
Release of Test data: Jan 14, 2022
Run submission deadline: Jan 30, 2022
Results declared: Feb 10, 2022
Paper submission: March 10, 2022
Peer review notification: March 26, 2022
Camera-ready paper due: April 5, 2022
Workshop Dates: May 26-28, 2022
Bharathi B, SSN College of Engineering, Tamil Nadu
Bharathi Raja Chakravarthi, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway
Subalalitha Chinnaudayar Navaneethakrishnan, Department Of Computer Science & Engineering, SRM Institute Of Science And Technology, Tamil Nadu
Sripriya N, SSN College of Engineering, Tamil Nadu
Student Volunteer:
Arunaggiri Pandian, Thiagarajar College of Engineering, India.
Swetha Valli, Thiagarajar College of Engineering, India.
Email: bharathib@ssn.edu.in, arunabimanyu123@gmail.com, bharathiraja.akr@gmail.com
You can find the rank list in the link below:
Start: Nov. 21, 2021, midnight
June 15, 2022, 8:07 a.m.
You must be logged in to participate in competitions.
Sign In