IndoNLU Benchmark

Organized by gentaiscool - Current server time: April 22, 2025, 3:32 a.m. UTC

Previous

NERP
Sept. 20, 2020, midnight UTC

Current

FacQA
Sept. 20, 2020, midnight UTC

End

Competition Ends
Never

IndoNLU Benchmark

Indonesian is known to be the fourth largest language used over the internet with around 171 million internet users across the globe. Despite a large amount of Indonesian data available over the internet, the advancement of NLP research in Indonesian is slow-moving. This problem occurs because available datasets are scattered with a lack of documentation and minimal community engagement.

Concerning the aforementioned problem, we propose the first-ever Indonesian natural language understanding benchmark, IndoNLU, a collection of 12 diverse tasks. The tasks are mainly categorized based on the input, such as single-sentences and sentence-pairs, and objectives, such as sentence classification tasks and sequence labeling tasks, with different levels of difficulty, domains, and styles. The benchmark is designed to cater to a range of styles in both formal and colloquial Indonesian, which are highly diverse.

To establish a strong baseline, we collect large clean Indonesian datasets, called Indo4B, and use them for training monolingual contextual pre-trained language models, called IndoBERT and IndoBERT-lite. We demonstrate the effectiveness of our dataset and our pre-trained models in capturing sentence-level semantics, and apply them to the classification and sequence labeling tasks.

To help with the reproducibility of the benchmark, we release the pre-trained model, including the collected data and code. In order to accelerate the community engagement and benchmark transparency, we set up a leaderboard website for the NLP community. We publish our leaderboard website at https://www.indobenchmark.com/ and we are also providing the models and the data here: https://github.com/indobenchmark/indonlu.

To participate in the challenge, you can try to submit to this Codalab competitions!

Best of luck!

Evaluation Criteria

Evaluation will be based on accuracy, macro-precision, macro-recall, and macro-F1 metrics for classification subtasks. We opt F1 metric as our main evaluation.

Terms and Conditions

We limit 3 submissions per day.

Please kindly check submission example directory. There is different format for each task. Every submission file always start with the `index` column (the id of the test sample following the order of the masked test set).

For you submission, first you need to rename your prediction file into 'pred.txt', then zip the file.

EmoT

Start: Sept. 20, 2020, midnight

Description: Emotion Twitter Classification Task

SmSA

Start: Sept. 20, 2020, midnight

Description: Sentence-level Sentiment Analysis Task

CASA

Start: Sept. 20, 2020, midnight

Description: Car Reviews Aspect-based Sentiment Analysis Task

HoASA

Start: Sept. 20, 2020, midnight

Description: Hotel Aspect-based Sentiment Analysis Task

WReTE

Start: Sept. 20, 2020, midnight

Description: The Wiki Revision Edits Textual Entailment Task

POSP

Start: Sept. 20, 2020, midnight

Description: The Prosa Part-of-Speech Task

BaPOS

Start: Sept. 20, 2020, midnight

Description: The PAN Localization Project Part-of-Speech Task

TermA

Start: Sept. 20, 2020, midnight

Description: The Airy Span Extraction Task

KEPS

Start: Sept. 20, 2020, midnight

Description: The Keyphrase Extraction Task

NERGrit

Start: Sept. 20, 2020, midnight

Description: The Grit-ID Named Entity Recognition Task

NERP

Start: Sept. 20, 2020, midnight

Description: The Prosa Named Entity Recognition Task

FacQA

Start: Sept. 20, 2020, midnight

Description: The Factoid Question Answering Task

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In