OCR Technology Challenge for Automatic Machine Scoring - Layout and Text Line Detection

Organized by handan - Current server time: Sept. 19, 2019, 6:49 p.m. UTC

First phase

First phase
Aug. 19, 2019, midnight UTC

End

Competition Ends
Sept. 6, 2019, noon UTC

OCR Challenge for Automatic Machine Scoring

Challenge Overview

In recent years, with the rapid development of artificial intelligence technology, many computer vision tasks such as object detection, image classification, and object segmentation have made great breakthroughs and gradually gained practical application. Among them, OCR, namely optical character recognition, has been very mature in terms of documents, book electronicization and license plate recognition under ideal conditions. However, in the relatively open scene of machine scoring, although there are many researches, it is still difficult to solve a series of challenging problems like complex layout, Chinese and English mixture, numbers, symbols, image distortion and so on.

Tomorrow Advancing Life (TAL) and the Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) propose to co-organize this "OCR Challenge for Automatic Machine Scoring " (referred to as the challenge below) at PRCV2019. The challenge consists of two tasks: (1) test paper layout analysis and text line detection, (2) character recognition, expecting to promote the understanding of fundamental problems and advance of key technologies in OCR under machine scoring scenarios, so that the OCR technologies can be applied to the pratical machine scoring scenarios.

Important Dates

  • 2019.07.09 Release of train data (images and labels).
  • 2019.07.15 Team registration deadline.
  • 2019.08.19 Release of validation data (images only). Validation submission site open.
  • 2019.09.06 Validation submission close. Release of validation data (labels).
  • 2019.09.20 Final test submission deadline.
  • 2019.10.15 Release of challenge results.
  • 2019.11.08 Workshop in PRCV.

Evaluation Criteria

Task Introduction

 

  • Analyze the layout of the paper and locate a position box for each question. The box contains the questions and their corresponding answers.

 

 

  • Locate the position box of each text line in the paper.

 

Evaluation

The hmean(f-score) with IoU above 0.7 is used as the evaluation criteria. The average hmean of layout analysis and text line detection is used as the final score.

Results Submission

For each test image, submit a dict dumps as a json file with detection result. The keys of the dict include "image_name" and "label". The item of "image_name" should be the name of the file with no path in it. The item of "label" should be a list of all the detection results. Each of the detection results should be a dict with keys of "type", "content" and "location". If detection result is text line, the item in the key of "type" is "TextLineBox". If detection result is layout, the item in the key of "type" is "SubjectBox". The item of "location" should be presented in eight int number indicating the x and y value of corner points with the order of left-top, right-top, right-down, left-down. "Content" chould be empty for this task.

The submission file format is exactly the same as train labels.

All the json files should be put into a folder named "answers". The folder "answers" should be ziped into a zip file with the name of "answers.zip" and subumitted.

Terms and Conditions

Database Release Agreement

The database for the challenge consists of examination papers and annotations provided by the Tomorrow Advancing Life (TAL).

 

  • The database can only be used for research purpose, i.e., in papers or technical reports. No images can be used in commercial materials, newspapers, or other public medias.
  • The database should not be re-distributed, published, copied or further disseminated in any way or form whatsoever, whether for profit or not. This includes further distributing, copying, or disseminating to a different facility or organizational unit in the requesting university, organization, or company.
  • All the images will be used for the purpose of scientific researches only. The database, in whole or in part, cannot be used for any commercial purpose in any form.
  • The copyright of the data belongs to Tomorrow Advancing Life (TAL).
  • In case of any violation of the above commitment, all losses caused to the Tomorrow Advancing Life (TAL) shall be borne by the participant.
  • TAL reserves the rights to the final explanation of this agreement.

 

Prize Settings

The prizes of the challenge are set as following.

 

  • One first place award: ¥ 30,000.00.
  • One second place award: ¥ 10,000.00.

 

All the prizes are provided by TAL.

First phase

Start: Aug. 19, 2019, midnight

Competition Ends

Sept. 6, 2019, noon

You must be logged in to participate in competitions.

Sign In