Hello everybody,
I faced a problem while generating the dictionary using the provided baseline for the English data. The output of ETLT2021enChallenge/baseline/local/prepare_dict.sh just included the special tokens provided in the script (hesitation, noise, laugh, etc.).
After examining the script, I found that line 54 had a trailing dash after the target encoding:
| iconv -f latin1 -t utf-8 - | tr '[A-Z]' '[a-z]' | \
Removing this dash fixed the problem:
| iconv -f latin1 -t utf-8 - | tr '[A-Z]' '[a-z]' | \
I don't know if anyone else has faced this issue, so I preferred to post it here in case someone did.
Thanks.
Posted by: ai_zahran @ Feb. 20, 2021, 8:26 p.m.I am guessing the authors intended this for iconv to read from standard input. However, for some reason this caused iconv to read an empty input (and thus, produce empty output) on my system.
Posted by: ai_zahran @ Feb. 21, 2021, 7:59 a.m.Hello
To check for you did this work:
| iconv -f latin1 -t utf-8 | tr '[A-Z]' '[a-z]' | \
i.e. removing the "-" after "utf-8".
Thanks
Kate
Dear Kate,
Yes, that is correct. I am sorry I also forgot to remove it in the second command in my original post.
Thank you
Ahmed