## ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX

Organized by icdariitgn

#### Previous

(S) Table Structure Reconstruction: Validation
Oct. 14, 2020, midnight UTC

#### Current

(C) Table Content Reconstruction:Post-Evaluation
April 1, 2021, midnight UTC

#### Next

(S) Table Structure Reconstruction: Testing
March 1, 2021, midnight UTC

### Welcome to ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX!

Table recognition is a well-studied problem in document analysis, and many academic and commercial approaches have been developed to recognize tables in several document formats, including plain text, scanned page images, and born-digital, object-based formats such as PDF. There are several works that can convert tables in text-based PDF format into structured representations. However, there is limited work on image-based table content recognition.

The challenge aims at assessing the ability of state-of-the-art methods to recognize scientific tables in
LaTeX format. In particular, the problem would be split up into two subtasks:

• Subtask I: Table structure reconstruction (S): Reconstructing the structure of a table in the form of LaTeX symbols and code

• Subtask II: Table content reconstruction (C): Reconstructing and recognizing the content of a table in the form of LaTeX symbols and code

In this subtask, you are given an image of a table and its corresponding LaTeX code. You need to construct the LaTeX structural tokens that define the table in LaTeX.

In this subtask, you are given an image of a table and its corresponding LaTeX code.  You need to construct the LaTeX content tokens that belong to the table in LaTeX.

#### Timeline:

• Registration Period: 15th Oct 2020 to 28th Feb 2021
• Release of training and validation set: 20th Oct 2020
• Release of test set: 01st Mar 2021
• Submission Deadline: 31st Mar 2021
• Post-Evaluation Phase Starts: 01st Apr 2021

### Evaluation

For both the subtasks, the participants would be required to submit the prediction files as per the submission format.

The tasks would be scored by Exact Match Accuracy and Exact Match Accuracy @ 95% similarity as common evaluation metrics.

• Row Prediction Accuracy and Column Prediction Accuracy for Table structure reconstruction task
• Alpha-Numeric characters Prediction Accuracy, LaTeX Token Accuracy, LaTex Symbol Accuracy, and Non-LaTeX Symbols Prediction Accuracy for Table content reconstruction task

The description of each metric is as follows:

1. Exact Match Accuracy: Fraction of predictions which match exactly with the ground truth
2. Exact Match Accuracy @ 95% similarity: Fraction of predictions with at least 95% similarity between ground truth
3. Row Prediction Accuracy: Fraction of predictions with a count of rows equal to the count of rows in the ground truth
4. Column Prediction Accuracy: Fraction of predictions with a count of cell alignment ('c', 'r', 'l') tokens equal to the count of cell alignment tokens in the ground truth
5. Alpha-Numeric Characters Prediction Accuracy: Fraction of predictions which has the same alphanumeric characters as in the ground truth
6. LaTeX Token Accuracy: Fraction of predictions which has the same LaTeX tokens as in the ground truth
7. LaTeX Symbol Accuracy: Fraction of predictions which has the same LaTeX Symbols as in the ground truth
8. Non-LaTeX Symbol Prediction Accuracy: Fraction of predictions which has the same Non-LaTeX Symbols as in the ground truth

Example:

For the given image, to calculate Exact Match Accuracy @ 95% similarity between the ground truth target sequence and predicted target sequence, we use the Longest Common Subsequence algorithm to find the similarity percentage and set the similarity percentage minimum threshold to 95%.

The ground truth target sequence (G) for Table structure recognition task is { c | c c c } & \milticolumn { 3 } { c } \\ & & & \\ \hline \hline & & \\ & & & \\ \hline \multicolumn { 3 } { c } (No. of tokens = 37)

and the predicted target sequence (P) is { c | c c } & \milticolumn { 2 } { c } \\ & & & \\ \hline \hline & & \\ & & & \\ \hline \multicolumn { 3 } { c } (No. of tokens = 36)

The longest common subsequence between G and P is } { c } \\ & & & \\ \hline \hline & & \\ & & & \\ \hline \multicolumn { 3 } { c }.

Thus, the percentage similarity calculated is 70.27% (26/.37).

Note: Final Evaluation will be based on Exact Match Accuracy for both the Subtasks on Test Dataset

For more details visit the FAQ page in the "Learn the Details" tab.

### Submission Format

To submit your results to the leaderboard you must construct a submission.zip file containing a single prediction file in .txt format according to the naming convention given below.

This prediction file should follow the format detailed in the subsequent section.

File Submission format

The prediction file should contain LaTeX codes for the corresponding Image Ids.

Note: Each LaTeX code must be in the new line for the corresponding Image Ids.

File Naming Convention

The naming convention of the prediction file for corresponding sub-tasks are given below:

• Table Structure Reconstruction(Validation Phase): submission_tsr_val.txt
• Table Content Reconstruction(Validation Phase): submission_tcr_val.txt
• Table Structure Reconstruction(Testing & Post-Evaluation Phase): submission_tsr_test.txt
• Table Content Reconstruction(Testing & Post-Evaluation Phase): submission_tcr_test.txt

### Submission Archive

To upload your results to CodaLab you have to zip the prediction file into a flat zip archive (it can’t be inside a folder within the archive).

You can create a flat archive using the command providing the txt file is in your current directory.

### The organizers of the competition are:

Pratik Kayal, IIT Gandhinagar, India

Emailpratik.kayal@iitgn.ac.in

Mrinal Anand, IIT Gandhinagar, India

Email: mrinal.anand@iitgn.ac.in

Harsh Desai, IIT Gandhinagar, India

Email: hsd31196@gmail.com

Prof. Mayank Singh, IIT Gandhinagar, India

Emailsingh.mayank@iitgn.ac.in

### (S) Table Structure Reconstruction: Validation

Start: Oct. 14, 2020, midnight

Description: In the validation phase, you can make up to 50 successful submissions per day, while the online evaluation chances will be updated at the stipulated time per week. Failed submissions on CodaLab (this website) caused by any unexpected issue can be re-submitted. Please do not intentionally do this. Please enter your method description.

### (C) Table Content Reconstruction: Validation

Start: Oct. 14, 2020, midnight

Description: In the validation phase, you can make up to 50 successful submissions per day, while the online evaluation chances will be updated at the stipulated time per week. Failed submissions on CodaLab (this website) caused by any unexpected issue can be re-submitted. Please do not intentionally do this. Please enter your method description.

### (S) Table Structure Reconstruction: Testing

Start: March 1, 2021, midnight

Description: In the testing phase, you can make up to 50 successful submissions per day, while the online evaluation chances will be updated at the stipulated time per week. Failed submissions on CodaLab (this website) caused by any unexpected issue can be re-submitted. Please do not intentionally do this. Please enter your method description. The results will be revealed after the final check.

### (C) Table Content Reconstruction: Testing

Start: March 1, 2021, midnight

Description: In the testing phase, you can make up to 50 successful submissions per day, while the online evaluation chances will be updated at the stipulated time per week. Failed submissions on CodaLab (this website) caused by any unexpected issue can be re-submitted. Please do not intentionally do this. Please enter your method description. The results will be revealed after the final check.

### (S) Table Structure Reconstruction:Post Evaluation

Start: April 1, 2021, midnight

Description: In the Post-Evaluation phase, you can make any number of successful submissions per day. Failed submissions on CodaLab (this website) caused by any unexpected issue can be re-submitted. Please do not intentionally do this. Please enter your method description.

### (C) Table Content Reconstruction:Post-Evaluation

Start: April 1, 2021, midnight

Description: In the Post-Evaluation phase, you can make any number of successful submissions per day. Failed submissions on CodaLab (this website) caused by any unexpected issue can be re-submitted. Please do not intentionally do this. Please enter your method description.

### Competition Ends

Never

