AI predicts TB treatment failure — promising but not ready
Rogers Kamulegeya led a review finding AI/ML models show promising TB treatment failure prediction but are not yet ready for clinical use.
Tuberculosis remains one of the world’s deadliest infectious diseases, and when treatment fails it fuels ongoing transmission, drug resistance, and poor outcomes for patients. To understand whether artificial intelligence and machine learning can help identify patients at high risk of treatment failure, Rogers Kamulegeya and colleagues carried out a systematic review and meta-analysis of studies that developed or validated predictive models. They searched PubMed/MEDLINE and Embase for work published between January 2000 and October 2025 and included 34 studies in the review. Those studies collectively covered nearly 1.1 million participants from 22 countries in the authors’ summary, and 19 of the studies (used for the quantitative meta-analysis) contributed data on 100,790 participants. The papers in the review were recent: all were published between 2014 and 2025, and 91% appeared from 2019 onward. The review set out to collect what models are being built, how well they perform, and whether they have been tested in new settings, aiming to assess whether these tools are ready to help clinicians manage patients on anti-TB treatment.
The team screened for studies that developed, validated, or implemented artificial intelligence or machine learning models to predict TB treatment failure or similar poor outcomes. They assessed study quality with the Prediction model Risk Of Bias Assessment Tool and pooled performance using random-effects meta-analysis of area under the curve values. Among included studies, tree-based methods were the most common algorithm family (52.9%), and 41.2% used multimodal models that combined three or more types of data. The pooled area under the curve was 0.836 (95% confidence interval 0.799–0.868), indicating generally good discrimination, but statistical heterogeneity was very high (I² = 97.9%). Subgroup analysis showed worse performance when models included HIV-positive participants (pooled area under the curve 0.748) compared with studies that excluded them (0.924). Only eight studies (23.5%) performed external validation, and just one study (2.9%) was rated low risk of bias overall because most studies had methodological concerns, especially in the analysis domain. Egger’s test suggested publication bias (p = 0.024). The review also identified major evidence gaps: underrepresentation of high-burden countries, HIV-affected populations, social determinants, pediatric TB, and extrapulmonary disease.
What does this mean for patients and clinicians? On average, the reviewed machine learning models correctly distinguished people who would fail treatment from those who would not about 84% of the time, a level many would call promising. But that average hides huge variation: performance dropped markedly in studies that included people with HIV and likely varies by setting and population. Methodological weaknesses were common—most studies did not test models in new populations, often mishandled missing data, and rarely checked calibration, so the numbers may overstate real-world reliability. The authors conclude these tools are not yet ready for routine clinical implementation. Future work should focus on rigorous external validation, careful calibration assessment, and development in underrepresented groups and high-burden settings so that any tools brought into clinics will be trustworthy, equitable, and useful for patients on anti-TB treatment.
If strengthened and properly validated, AI/ML tools could help target support to patients at highest risk of TB treatment failure. Until then, current models should not guide care because of inconsistent performance and methodological flaws.
Author: Rogers Kamulegeya