CodaLab -

> Format của file submit

Team mình làm task 2 thôi mà file submit của BTC ghi là img_id,anno_image_quality,anno_texts .Vậy anno_image_quality mình để mặc định là 0.5 phải ko ạ. Team mình nộp bị error .
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Traceback (most recent call last):
File "/tmp/codalab/tmpE7aH_1/run/program/evaluate.py", line 97, in
rmse = sqrt(mean_squared_error(gt_quality_score, sub_quality_score))
File "/opt/conda/lib/python2.7/site-packages/sklearn/metrics/regression.py", line 231, in mean_squared_error
y_true, y_pred, multioutput)
File "/opt/conda/lib/python2.7/site-packages/sklearn/metrics/regression.py", line 74, in _check_reg_targets
check_consistent_length(y_true, y_pred)
File "/opt/conda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 176, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [390 391]

Posted by: phamnhattruong.skyo @ Jan. 13, 2021, 8:37 a.m.

Hi phamnhattruong.skyo,

> ValueError: Found arrays with inconsistent numbers of samples: [390 391]
This line says that your submitted file contains one missing value in a column.
The testing file should have 391 values for both columns.

Please check it again.
Best regards,

Posted by: mc-ocr-organizers @ Jan. 14, 2021, 2:37 p.m.

Dear organizers,
I saw the description of Task 2 in the website https://rivf2021-mc-ocr.vietnlp.com/challenge : "At maximum, a receipt image is associated with 4 text lines annotated by human annotators..." Is it still true?

Posted by: SDSV_AICR @ Jan. 15, 2021, 6:54 a.m.

Hi SDSV_AICR,

> I saw the description of Task 2 in the website https://rivf2021-mc-ocr.vietnlp.com/challenge : "At maximum, a receipt image is associated with 4 text lines annotated by human annotators..." Is it still true?

Thank you for noticing this information. It's a bit outdated. It should have been "At maximum, a receipt image is associated with 4 *fields* annotated by human annotators".
We will update the webpage to address this issue.

Regards,

Posted by: mc-ocr-organizers @ Jan. 15, 2021, 7:20 a.m.

That mean, if a field like "SELLER" has 2 text lines such as ["SCTC CÔ THỎ 104 TRẦN PHÚ -CẨM","PHẢ"], We should merge those 2 lines to a single field "SCTC CÔ THỎ 104 TRẦN PHÚ -CẨM PHẢ" in final result, right?

Posted by: SDSV_AICR @ Jan. 15, 2021, 7:41 a.m.

Hi,

Please note that, it's similar to the public training data. At maximum, there are 4 fields, however, one field might have multiple instances.

> That mean, if a field like "SELLER" has 2 text lines such as ["SCTC CÔ THỎ 104 TRẦN PHÚ -CẨM","PHẢ"], We should merge those 2 lines to a single field "SCTC CÔ THỎ 104 TRẦN PHÚ -CẨM PHẢ" in final result, right?
No, you don't need to merge. For the above example the correct way is "SCTC CÔ THỎ 104 TRẦN PHÚ -CẨM|||PHẢ".
However, either you merge it to be one sequence with or without "|||", the CER score is the same. We removed "|||" before doing evaluation.

Hope this answers your questions.

Regards,

Posted by: mc-ocr-organizers @ Jan. 15, 2021, 7:58 a.m.

Post in this thread

Forums

Mobile-Captured Image Document Recognition for Vietnamese Receipts (MC-OCR) Forum

> Format của file submit