CodaLab - Competition

IndoNLU Benchmark

Organized by gentaiscool - Current server time: April 22, 2025, 3:32 a.m. UTC

NERP

Sept. 20, 2020, midnight UTC

Current

FacQA

Sept. 20, 2020, midnight UTC

End

Competition Ends

Never

Overview
Evaluation
Terms and Conditions
How to submit?

IndoNLU Benchmark

Indonesian is known to be the fourth largest language used over the internet with around 171 million internet users across the globe. Despite a large amount of Indonesian data available over the internet, the advancement of NLP research in Indonesian is slow-moving. This problem occurs because available datasets are scattered with a lack of documentation and minimal community engagement.

Concerning the aforementioned problem, we propose the first-ever Indonesian natural language understanding benchmark, IndoNLU, a collection of 12 diverse tasks. The tasks are mainly categorized based on the input, such as single-sentences and sentence-pairs, and objectives, such as sentence classification tasks and sequence labeling tasks, with different levels of difficulty, domains, and styles. The benchmark is designed to cater to a range of styles in both formal and colloquial Indonesian, which are highly diverse.

To establish a strong baseline, we collect large clean Indonesian datasets, called Indo4B, and use them for training monolingual contextual pre-trained language models, called IndoBERT and IndoBERT-lite. We demonstrate the effectiveness of our dataset and our pre-trained models in capturing sentence-level semantics, and apply them to the classification and sequence labeling tasks.

To help with the reproducibility of the benchmark, we release the pre-trained model, including the collected data and code. In order to accelerate the community engagement and benchmark transparency, we set up a leaderboard website for the NLP community. We publish our leaderboard website at https://www.indobenchmark.com/ and we are also providing the models and the data here: https://github.com/indobenchmark/indonlu.

To participate in the challenge, you can try to submit to this Codalab competitions!

Best of luck!

Evaluation Criteria

Evaluation will be based on accuracy, macro-precision, macro-recall, and macro-F1 metrics for classification subtasks. We opt F1 metric as our main evaluation.

Terms and Conditions

We limit 3 submissions per day.

Please kindly check submission example directory. There is different format for each task. Every submission file always start with the `index` column (the id of the test sample following the order of the masked test set).

For you submission, first you need to rename your prediction file into 'pred.txt', then zip the file.