OCR Technology Challenge for Automatic Machine Scoring - Test Paper Character Recognition

Organized by handan - Current server time: Sept. 19, 2019, 7:09 p.m. UTC

First phase

First phase
Aug. 19, 2019, midnight UTC

End

Competition Ends
Sept. 6, 2019, noon UTC

OCR Challenge for Automatic Machine Scoring

Challenge Overview

In recent years, with the rapid development of artificial intelligence technology, many computer vision tasks such as object detection, image classification, and object segmentation have made great breakthroughs and gradually gained practical application. Among them, OCR, namely optical character recognition, has been very mature in terms of documents, book electronicization and license plate recognition under ideal conditions. However, in the relatively open scene of machine scoring, although there are many researches, it is still difficult to solve a series of challenging problems like complex layout, Chinese and English mixture, numbers, symbols, image distortion and so on.

Tomorrow Advancing Life (TAL) and the Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) propose to co-organize this "OCR Challenge for Automatic Machine Scoring " (referred to as the challenge below) at PRCV2019. The challenge consists of two tasks: (1) test paper layout analysis and text line detection, (2) character recognition, expecting to promote the understanding of fundamental problems and advance of key technologies in OCR under machine scoring scenarios, so that the OCR technologies can be applied to the pratical machine scoring scenarios.

Important Dates

  • 2019.07.09 Release of train data (images and labels).
  • 2019.07.15 Team registration deadline.
  • 2019.08.19 Release of validation data (images only). Validation submission site open.
  • 2019.09.06 Validation submission close. Release of validation data (labels).
  • 2019.09.20 Final test submission deadline.
  • 2019.10.15 Release of challenge results.
  • 2019.11.08 Workshop in PRCV.

Evaluation Criteria

Task Introduction

Output the position of the text line in the paper and identify the text in each box (including but not limited to Chinese, English, numbers, formulas, symbols, etc., the text contains printed and handwritten).

Evaluation

The average similarity of text is used as the final score.

A text box is recalled if its IoU with groundtruth is above 0.7. x is the recalled box content and y is its corresponding groundtruth. L is the Levenshtein Distance of x and y.

The similarity is calculated as  sim=1-(L/max(x,y)).

And  precision=sum(sim)/(TP+FP)

     recall=sum(sim)/(TP+FN)

     hmean = 2*precision*recall/(precision+recall)

Results Submission

For each test image, submit a dict dumps as a json file with detection and recognition result. The keys of the dict include "image_name" and "label". The item of "image_name" should be the name of the image with no path in it. The item of the "label" should be a list of all the detection and recognition results. Each of the results should be a dict with keys of "type", "content" and "location". The item in "type" should be "TextLineBox". The item of "location" should be presented in eight int number indicating the x and y value of corner points with the order of left-top, right-top, right-down, left-down. "Content" should be the string of characters in the text line. Formulas should be presented in latex way.

The submission file format is exactly the same as train labels.

All the json files should be put into a folder named "answers". The folder "answers" should be ziped into a zip file with the name of "answers.zip" and subumitted.

Terms and Conditions

Database Release Agreement

The database for the challenge consists of examination papers and annotations provided by the Tomorrow Advancing Life (TAL).

 

  • The database can only be used for research purpose, i.e., in papers or technical reports. No images can be used in commercial materials, newspapers, or other public medias.
  • The database should not be re-distributed, published, copied or further disseminated in any way or form whatsoever, whether for profit or not. This includes further distributing, copying, or disseminating to a different facility or organizational unit in the requesting university, organization, or company.
  • All the images will be used for the purpose of scientific researches only. The database, in whole or in part, cannot be used for any commercial purpose in any form.
  • The copyright of the data belongs to Tomorrow Advancing Life (TAL).
  • In case of any violation of the above commitment, all losses caused to the Tomorrow Advancing Life (TAL) shall be borne by the participant.
  • TAL reserves the rights to the final explanation of this agreement.

 

Prize Settings

The prizes of the challenge are set as following.

 

  • One first place award: ¥ 30,000.00.
  • One second place award: ¥ 10,000.00.

 

All the prizes are provided by TAL.

First phase

Start: Aug. 19, 2019, midnight

Competition Ends

Sept. 6, 2019, noon

You must be logged in to participate in competitions.

Sign In