Speech Recognition for Vulnerable Individuals in Tamil-ACL 2022

Organized by dravidianlangtech - Current server time: Jan. 7, 2025, 2:35 a.m. UTC

First phase

First phase
Nov. 21, 2021, midnight UTC

End

Competition Ends
June 15, 2022, 8:07 a.m. UTC

Shared Task on Speech Recognition for Vulnerable Individuals in Tamil- LT-EDI@ACL-2022

This shared task addresses a challenging area in Automatic Speech Recognition: vulnerable old-aged and transgender people in Tamil. People in their old-age visit primary locations such as banks, hospitals and administrative offices to address their needs in their quotidian lives. Many aged people are unaware of using the equipment facilitated to aid people. Similarly, transgender people are deprived of primary education because of prejudice in society, so speech is the only medium that could assist them in satisfying their needs. The spontaneous speech data is gathered from old-aged and transgender people, who are bereft of using these facilities to their advantage. The speech corpus containing 5.5 hours of transcribed speech will be released for the training set, and 2 hours of speech data will be released for testing.

The participants will be provided training and test dataset. Download the data and participate, go to CodaLab and click “Participate" tab.

 

Paper  name format should be: TEAM_NAME@LT-EDI-ACL2022: Title of the paper. 

Example: NUIG_ULD@LT-EDI-ACL2022: Speech Recognition in Tamil

For electronic submission of papers to DravidianLangTech workshop please use this link:

Following are some general guidelines to keep in mind while submitting the working notes.
- Basic sanity check for grammatical errors and reported results
- Papers should have sufficient information for reproducing the mentioned results- Papers should follow the appropriate style (We will use ACL 2022 style: details below)
- Check the papers for text reuse / Plagiarism. This includes self-plagiarism as well. We would like to stress this point as ACL is quite strict about it. Any paper found to have plagiarized content should be rejected without further consideration.
- Please ensure the author names do not have any salutations like Dr., Prof., etc in the final version
 
All submissions should be in Double column ACL 2022 format. Authors should use one of the ACL 2022 Templates below:
 
Email: anbu.1318@gmail.com, theni_d@ssn.edu.in, bharathiraja.akr@gmail.com
 

Submission Format

Submission should be a zip file with your team name containing a folder for each submission ( max. no. of submission :3) which contains .txt file(recognised text) for each of the test audio.

Eg. teamname.zip contains sub_1 folder contains Audio-37.txt or Audio-38.txt (test-filename.txt)

Submission Intsructions

We accept the test results only through the google form.
 
Click here to upload your submission :  https://forms.gle/Q2M6W5veHEE7S1ny7
  • One team can fill the form only once.
  • Each team must submit a maximum of three runs.
  • Zip all the run files as a single zip folder and upload it. 

Evaluation Criteria

We accept the test results only through the google form. The google form can be accessed from (link will be given later)
 
 

Submissions to the Shared task on speech recognition for vulnerable individuals in Tamil will be evaluated according to the Word Error Rate (WER) between the ASR hypotheses in the submission and the reference human transcriptions for the evaluation set.

WER = (S + D + I) / N

Where,

  • S is the number of substitutions
  • D is the number of deletions
  • I is the number of insertions
  • N is the number of words in the reference transcriptions

Note: Participants can use the full utterance i.e one utterance per speaker (Shared via github) or the smaller utterances shared via Google drive

Terms and Conditions

By downloading the data or by accessing it any manner, you agree not to redistribute the data except for non-commercial and academic-research purposes. The data must not be used for providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose.

You should cite this papers if you are using our data.

 

@inproceedings{speech-acl,

   title={Findings of the Shared Task on {S}peech {R}ecognition for {V}ulnerable {I}ndividuals in {T}amil},

author = "Bharathi, B and

   Chakravarthi, Bharathi Raja and

   Chinnaudayar Navaneethakrishnan, Subalalitha and

    Sripriya, N  and

    Pandian, Arunaggiri and

    Valli, Swetha”,   

booktitle = "Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion",

month = may,

year = "2022",

publisher = "Association for Computational Linguistics", 

}

 

 

Important Dates for shared task:

Task announcement: Nov 20, 2021

Release of Training data: Nov 20, 2021

Release of Test data: Jan 14, 2022

Run submission deadline: Jan 30, 2022

Results declared: Feb 10, 2022

Paper submission: March 10, 2022

Peer review notification: March 26, 2022

Camera-ready paper due: April 5, 2022

Workshop Dates: May 26-28, 2022

Bharathi B, SSN College of Engineering, Tamil Nadu

Bharathi Raja Chakravarthi, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway

Subalalitha Chinnaudayar Navaneethakrishnan, Department Of Computer Science & Engineering, SRM Institute Of Science And Technology, Tamil Nadu

Sripriya N, SSN College of Engineering, Tamil Nadu

Student Volunteer:

Arunaggiri Pandian, Thiagarajar College of Engineering, India.

Swetha Valli, Thiagarajar College of Engineering, India.

Email: bharathib@ssn.edu.inarunabimanyu123@gmail.com, bharathiraja.akr@gmail.com

You can find the rank list in the link below:

 Rank list

First phase

Start: Nov. 21, 2021, midnight

Competition Ends

June 15, 2022, 8:07 a.m.

You must be logged in to participate in competitions.

Sign In