TBX11K Tuberculosis Classification and Detection Challenge

Organized by yunliu94 - Current server time: Sept. 23, 2020, 10:30 p.m. UTC


Aug. 3, 2020, midnight UTC


Competition Ends

TBX11K Tuberculosis Classification and Detection Challenge

As a serious infectious disease, tuberculosis (TB) is one of the major threats to human health worldwide, leading to millions of death every year. Although early diagnosis and treatment can greatly improve the chances of survival, it remains a major challenge, especially in developing countries. Computer-aided tuberculosis diagnosis (CTD) is a promising choice for TB diagnosis due to the great successes of deep learning. However, when it comes to TB diagnosis, the lack of training data has hampered the progress of CTD. To solve this problem, we establish a large-scale TB dataset, namely Tuberculosis X-ray (TBX11K) dataset. This dataset contains 11200 X-ray images with corresponding bounding box annotations for TB areas, while the existing largest public TB dataset only has 662 X-ray images with corresponding image-level annotations. The proposed dataset enables the training of sophisticated detectors for high-quality CTD.

Evaluation Criteria

To adapt to the practical demand, this challenge aims at simultaneous tuberculosis (TB) X-ray classification and TB area detection in a single system (e.g., a convolutional neural network). The X-ray classification focuses on classifying each test X-ray into one of three categories, including Healthy, Sick but Non-TB, and TB (the super-category of active TB and latent TB). We adopt six metrics to evaluate the X-ray classification results:

  • Accuracy that measures the percentage of X-rays that are correctly classified as one of the three classes;
  • Area Under Curve (AUC) that computes the area under the Receiver Operating Characteristic (ROC) curve that plots the true positive rate against the false positive rate for TB class;
  • Sensitivity that measures the percentage of TB cases that are correctly identified as TB, i.e., the recall for TB class;
  • Specificity that measures the percentage of non-TB cases that are correctly identified as non-TB, i.e., the recall for non-TB class, where non-TB includes healthy and sick but non-TB classes;
  • Average Precision (AP) that computes the precision of each class and takes the average across all classes;
  • Average Recall (AR) that computes the recall of each class and averages over all classes.

For the evaluation of TB detection, we adopt the average precision of bounding box (AP) proposed by the COCO dataset challenge. The default AP refers to the AP averaged over IoU (intersection-over-union) thresholds of [0.5 : 0.05 : 0.95]. AP50 refers to AP at the threshold of 0.5. Similarly, AP75 refers to AP at the threshold of 0.75. In order to observe the detection of each TB type, we report the evaluation results for active TB and latent TB separately. Here, the uncertain TB X-rays are ignored (uncertain TB X-rays only exist in the test set). We also report category-agnostic TB detection results, where the TB categories are ignored, to describe the detection for all TB areas. Here, the uncertain TB X-rays are included. Note that there exist both TB and non-TB X-rays in the test set. In the ideal case, one model should not predict TB areas for a non-TB X-ray. In other words, the evaluation program will penalize the false positives in a non-TB X-ray.

Terms and Conditions

This dataset belongs to the Media Computing Lab at Nankai University and is licensed under a Creative Commons Attribution 4.0 License.


Start: Aug. 3, 2020, midnight

Description: Please provide your team name, method name, method description, affiliation, and etc. If external data are used, please specify.

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 Faster_R-CNN-ResNet50 89.73
2 FCOS-ResNet50 88.92
3 RetinaNet-ResNet50 87.37