PAPER 13 Apr 2026 Global

Benchmarking AI for predicting second-line TB drug resistance

Manikandan Narayanan led a systematic benchmark showing traditional machine-learning often outperforms deep learning for predicting second-line TB drug resistance from genomes.

Drug-resistant tuberculosis (TB) is a major barrier to global elimination efforts because it requires longer, more complex treatment and often has poorer outcomes. Advances in sequencing technologies and whole-genome sequencing (WGS) have opened the door to predicting drug resistance directly from bacterial genomes using machine-learning (ML) and deep-learning (DL) methods. Despite rapid technical progress, translating these models into reliable clinical tools remains difficult, especially for second-line drugs where accuracy has lagged behind traditional drug-susceptibility testing. To address this gap, a team led by Manikandan Narayanan carried out a thorough and standardized benchmarking study called TB-Bench. The study systematically reviewed existing methods, selected 20 traditional ML and DL models from 8 studies, and evaluated them within a single, consistent framework. The researchers used the WHO dataset of 50,801 samples for model training and internal testing and an external validation set of 1,199 samples to assess generalizability. By placing many published approaches on equal footing, the study aimed to clarify which techniques are most reliable and to highlight the practical barriers to deploying genomic predictors of second-line drug resistance in clinical settings.

TB-Bench compared drug-specific versions of the selected models across 14 second-line drugs using three distinct feature sets to reflect real-world variability in how input data are represented, including binary features. The team evaluated model performance on a held-out portion of the WHO dataset and on an external validation dataset. In internal testing on the WHO test data, traditional ML models using binary features tended to outperform DL models. Notably, XGBoost achieved the highest area under the precision-recall curve (PRAUC) scores, ranging from 46%–93% for 10 of the 14 drugs, although performance varied considerably by drug. The finding that traditional ML methods often did best—even with limited feature sets—suggests these approaches may be practical in low-resource settings. However, when models were evaluated on the external validation dataset, traditional ML and DL models performed similarly, and neither class produced clear gains over catalogue-based approaches, highlighting persistent challenges in cross-dataset generalization and real-world transferability.

The TB-Bench study provides a comprehensive, standardized evaluation framework that researchers and clinicians can use to compare future models for TB drug resistance prediction. By identifying which approaches work best under consistent conditions and where performance breaks down, the work points to key methodological considerations—such as feature representation, the choice between traditional ML and DL, and the need for better external validation—that must be addressed before genomic predictors can be reliably used in clinical practice. The public release of the TB-Bench source code at https://github.com/BIRDSgroup/TB-Bench supports reproducibility and makes it easier for others to test new models or apply the benchmark to additional datasets. While genomic prediction shows promise, particularly for first-line drugs in other studies, the variable results for second-line drugs documented here mean that care is still needed before replacing standard drug-susceptibility testing in patient care. Continued work to improve generalization, expand diverse training datasets, and refine model inputs will be essential to realize clinical benefits.

Public Health Impact

Better benchmarking helps researchers know which prediction methods are most reliable and where to focus improvements, speeding progress toward clinically useful genomic tests. Wider adoption of validated, easy-to-run traditional ML models could improve TB care in low-resource settings more quickly than complex deep-learning approaches.

tuberculosis
machine-learning
deep-learning
whole-genome sequencing
drug-resistance
{% if expert_links_html %}
Featured Experts

Author: Brintha VP

Read Original Source →