2019 PRMU challenge on old Japanese character recognition

Organized by tttamaki - Current server time: Nov. 17, 2019, 9:48 a.m. UTC

Previous

test-challenge
Aug. 31, 2019, 2:59 p.m. UTC

Current

test-challenge
Aug. 31, 2019, 2:59 p.m. UTC

End

Competition Ends
Sept. 1, 2019, 2:59 p.m. UTC

Overview

Recognizing successive three characters in old Japanese documents, and output Unicode of a set of three characters

2019/Sep/6: The challenge is over, and thank you for your participation !

2019/July/23: Sorry for the inconvenience due to the server down of the codalab system. Now we are back.

  • Samples: A rectangle including successive three characters. Three characters appear vertically in the rectangle.
  • Categories: about 50 of KANA (Japanese alphabets), called "Kuzushiji". Not including KANJI (Chinese characters).
  • References: Refer to CODH seminars for classical Japanese literatures, KMNIST for Kuzushiji characters in classical Japanese literatures, Kuzushiji Dataset for the details of the dataset.

Organizers

  • Tomo Miyazaki (Tohoku University)
  • Toru Tamaki (Hiroshima University)
  • Kazuaki Nakamura (Osaka Univerity)
  • Masashi Nishiyama (Tottori University)
  • Yusuke Uchida (DeNA)
  • Takanori Ogata (ABEJA)
  • Keiichiro Shirai (Sinshu University)
  • Asanobu Kitamoto (ROIS-DS Center for Open Data in the Humanities, NII)
  • Tarin Clanuwat (ROIS-DS Center for Open Data in the Humanities, NII)

Supported by

Terms and Conditions

This challenge is governed by the general ChaLearn contest rules.

Evaluation and submission

Metric

A sample (single training or test image) contains three characters. A sample is correct if all three characters are correct, otherwise wrong. Submissions are evaluated by recognition rate (number of correct samples / number of all samples).

Phases

  1. test-dev phase: first 3000 samples in the test set are evaluated to show the recognition rate of the first 3000 samples on the leaderboard, which is NOT the final score.
  2. test-challenge phase: all other samples (from 3000 to the last) in the test set are evaluated to compute the recognition rate (again, excluding the first 3000 samples). This score is the final one.

Protocol

  • During the test-dev phase, You can submit a csv file up to 5 15 times (2019/7/23 changed: max 5 to max 15), with 5 maximum submissions per day. (Caution! if you submit 5 times a day for three days, then you run out the allowed number of 15 submissions.
  • The submitted csv file is the predictions for "all 16387 samples" in the test set (NOT for only the first 3000 samples).
  • If the file is successfully submitted, the result score is shown in your private view (the score is the recognition rate for the first 3000 samples in the test set). It may take few to 20 minutes. Be patient ! (If something wrong, please check the log by clicking "View scoring output log" link.)
  • Once your submissions has scores, choose one of the results to show on the public leaderboard, by clicking "Submit to Leaderboard" button. The selected one is used for the evaluation in the test-challenge phase. If you want to show another submission on the leaderboard (and hence for the evaluation in the test-challenge phase), simply click "Submit to Leaderboard" button of the submission you want to show.
  • In the test-challenge phase, you have to nothing. The score of the submission selected in the test-dev phase is shown on the leaderboard. The score is the recognition rate for all samples (excluding the first 3000) in the test set.

Note: Currently the leaderboard is hidden, but the score you choose is valid for the test-challenge phase. Please do not forget to click "Submit to Leaderboard" button !

Submission file format

example

The file should have 16388 lines:

ID,Unicode1,Unicode2,Unicode3
0,U+3057,U+3044,U+308A
1,U+304A,U+3068,U+304F
2,U+304B,U+306A,U+3057
...
16386,U+304B,U+306A,U+3057

header

  1. ID: ID of the sample in the test set
  2. Unicode1: Unicode of the first character (top in a image)
  3. Unicode2: Unicode of the second character (middle in a image)
  4. Unicode3: Unicode of the third character (bottom in a image)

each line

ID, and Unicode for three characters with the form of "U+xxxx", separated by comma. A to F should be in capitals (evaluations are done by string matching).

file name

The file name should be "test_prediction.csv", which should be zipped as a single zip file.

The zip file can have any name, but should contain a single file named "test_prediction.csv" without any folders.

You can create and check your submission zip file by using zip and unzip as follows:

$ zip yoursubmission.zip test_prediction.csv
updating: test_prediction.csv (deflated 79%)
$ unzip -l yoursubmission.zip
Archive:  yoursubmission.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
   357417  03-22-2019 10:17   test_prediction.csv
---------                     -------
   357417                     1 file

The challenge is over at 1/Sep/2019. Here is the results.

test-challenge result

User          score (rank)
catla         0.906300 (1)
komachi       0.893500 (2)
blue0620      0.889500 (3)
tangyiping    0.874100 (4)
suzuki.t      0.844000 (5)
takedarts     0.832700 (6)
hrokamo       0.776400 (7)
ITakahashi    0.768300 (8)
xiechun       0.764400 (9)
moriatyyuan   0.756000 (10)
HU_XIAORAN    0.738800 (11)
takedak       0.720000 (12)
morimori      0.691500 (13)
ItoChihiro    0.686500 (14)
ntuanhung     0.683700 (15)
ueki          0.657700 (16)
YukiNiino     0.627100 (17)
h_o_t         0.623000 (18)
sandy2008     0.583000 (19)
mys3_74       0.576000 (20)
YusukeKyokawa 0.512500 (21)
b1016042      0.506800 (22)
verlier       0.355000 (23)
gotoy         0.233300 (24)

test-dev result

User          score (rank)
catla         0.964000 (1)
komachi       0.958700 (2)
blue0620      0.948300 (3)
tangyiping    0.936700 (4)
suzuki.t      0.927700 (5)
takedarts     0.905000 (6)
ITakahashi    0.890300 (7)
hrokamo       0.888700 (8)
xiechun       0.884000 (9)
moriatyyuan   0.870700 (10)
HU_XIAORAN    0.857300 (11)
morimori      0.837000 (12)
takedak       0.832700 (13)
ntuanhung     0.814700 (14)
ItoChihiro    0.813000 (15)
ueki          0.788000 (16)
YukiNiino     0.761000 (17)
h_o_t         0.757700 (18)
sandy2008     0.746700 (19)
mys3_74       0.722300 (20)
YusukeKyokawa 0.667300 (21)
b1016042      0.654000 (22)
verlier       0.497000 (23)
gotoy         0.347700 (24)

test-dev

Start: May 1, 2019, 3 p.m.

Description: Submit your results on the test set. Show one of results to the leaderboard, which is used for the final phase.

test-challenge

Start: Aug. 31, 2019, 2:59 p.m.

Description: Final phase. you don't submit results. Instead, the result shown in the leaderboard of the test-dev phase is used for the final evaluataion.

Competition Ends

Sept. 1, 2019, 2:59 p.m.

You must be logged in to participate in competitions.

Sign In