CodaLab -

> Problem with Dictionary Preparation in ETLT2021_CAMBRIDGE_EN_baseline

Hello everybody,

I faced a problem while generating the dictionary using the provided baseline for the English data. The output of ETLT2021enChallenge/baseline/local/prepare_dict.sh just included the special tokens provided in the script (hesitation, noise, laugh, etc.).
After examining the script, I found that line 54 had a trailing dash after the target encoding:

| iconv -f latin1 -t utf-8 - | tr '[A-Z]' '[a-z]' | \

Removing this dash fixed the problem:

| iconv -f latin1 -t utf-8 - | tr '[A-Z]' '[a-z]' | \

I don't know if anyone else has faced this issue, so I preferred to post it here in case someone did.

Thanks.

Posted by: ai_zahran @ Feb. 20, 2021, 8:26 p.m.

I am guessing the authors intended this for iconv to read from standard input. However, for some reason this caused iconv to read an empty input (and thus, produce empty output) on my system.

Posted by: ai_zahran @ Feb. 21, 2021, 7:59 a.m.

Hello

To check for you did this work:
| iconv -f latin1 -t utf-8 | tr '[A-Z]' '[a-z]' | \

i.e. removing the "-" after "utf-8".

Thanks
Kate

Posted by: kateknill @ March 3, 2021, 11 a.m.

Dear Kate,

Yes, that is correct. I am sorry I also forgot to remove it in the second command in my original post.

Thank you
Ahmed

Posted by: ai_zahran @ March 3, 2021, 8 p.m.

Post in this thread

Forums

Interspeech Shared Task: Automatic Speech Recognition for Non-Native Children’s Speech Forum

> Problem with Dictionary Preparation in ETLT2021_CAMBRIDGE_EN_baseline