In recent years, with the rapid development of artificial intelligence technology, many computer vision tasks such as object detection, image classification, and object segmentation have made great breakthroughs and gradually gained practical application. Among them, OCR, namely optical character recognition, has been very mature in terms of documents, book electronicization and license plate recognition under ideal conditions. However, in the relatively open scene of machine scoring, although there are many researches, it is still difficult to solve a series of challenging problems like complex layout, Chinese and English mixture, numbers, symbols, image distortion and so on.
Tomorrow Advancing Life (TAL) and the Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) propose to co-organize this "OCR Challenge for Automatic Machine Scoring " (referred to as the challenge below) at PRCV2019. The challenge consists of two tasks: (1) test paper layout analysis and text line detection, (2) character recognition, expecting to promote the understanding of fundamental problems and advance of key technologies in OCR under machine scoring scenarios, so that the OCR technologies can be applied to the pratical machine scoring scenarios.
The hmean(f-score) with IoU above 0.7 is used as the evaluation criteria. The average hmean of layout analysis and text line detection is used as the final score.
For each test image, submit a dict dumps as a json file with detection result. The keys of the dict include "image_name" and "label". The item of "image_name" should be the name of the file with no path in it. The item of "label" should be a list of all the detection results. Each of the detection results should be a dict with keys of "type", "content" and "location". If detection result is text line, the item in the key of "type" is "TextLineBox". If detection result is layout, the item in the key of "type" is "SubjectBox". The item of "location" should be presented in eight int number indicating the x and y value of corner points with the order of left-top, right-top, right-down, left-down. "Content" chould be empty for this task.
The submission file format is exactly the same as train labels.
All the json files should be put into a folder named "answers". The folder "answers" should be ziped into a zip file with the name of "answers.zip" and subumitted.
The database for the challenge consists of examination papers and annotations provided by the Tomorrow Advancing Life (TAL).
The prizes of the challenge are set as following.
All the prizes are provided by TAL.
Start: Aug. 19, 2019, midnight
Sept. 6, 2019, noon
You must be logged in to participate in competitions.Sign In