AI reads TB genes to measure drug resistance and predict outcomes
Sanjana G. Kulkarni and colleagues used convolutional neural networks to predict TB drug MICs and link higher rifampicin MICs to worse treatment outcomes.
Tuberculosis remains a global threat because the bacteria that cause it, Mycobacterium tuberculosis (Mtb), can resist antibiotics in complex ways. Researchers are therefore eager to use machine learning on genomic data to make fast, accurate diagnostics. Many successful models simplify the problem to yes-or-no predictions, but predicting continuous measures that matter in the clinic is much harder. In new work led by Sanjana G. Kulkarni, scientists trained convolutional neural networks (CNNs) to predict minimum inhibitory concentrations (MICs) — a continuous lab measure of how much drug is needed to stop bacterial growth — directly from Mtb gene sequences. Rather than treating the genome as a black box, the team built models that folded in biological knowledge: evolutionary information about mutations, protein biochemical properties, and methods to augment rare variants so the model could learn from uncommon changes. The goal was to move beyond binary resistant/susceptible calls and produce diagnostic-grade, quantitative predictions that reflect the real biology of resistance.
The technical approach combined convolutional neural networks (CNNs) with multiple layers of domain-specific data. By encoding evolutionary patterns, protein biochemical properties, and applying data augmentation for rare variants, the models learned to predict minimum inhibitory concentrations (MICs) for eight antibiotics from Mtb gene sequences. Performance was strong: the CNNs predicted 89% of MICs within one drug concentration doubling. Notably, although the models were trained on ≤ 52% of the World Health Organization's (WHO) drug resistance mutation catalogue data, they still accurately predicted the effects of 97% of its graded mutations. The team tested clinical relevance in a cohort of 373 patients with rifampicin-susceptible Mtb infections and found that higher CNN-predicted rifampicin MICs were associated with unfavorable treatment outcomes. These results show that a model informed by biology and sequence data can produce interpretable, clinically meaningful MIC predictions rather than just binary labels.
The implications are twofold. First, the study suggests subtle differences in MIC below the standard resistance threshold matter: patients with higher predicted rifampicin MICs did worse even though their infections were labeled rifampicin-susceptible. That finding argues for richer, quantitative diagnostics that can flag borderline cases and potentially guide closer monitoring or adjusted therapy. Second, the work demonstrates the value of combining evolutionary and biochemical knowledge with modern machine learning: domain knowledge-inspired CNNs can be interpretable and attain clinical-grade accuracy. While the study focuses on eight antibiotics and work remains to translate models into routine practice, it provides a clear example of how encoding multiple biological dimensions into AI models can improve predictions of bacterial function and treatment response.
Quantitative MIC predictions from genomic data could help clinicians identify patients at higher risk of treatment failure even when standard tests label infections as susceptible. This approach may lead to more personalized antibiotic choices and targeted monitoring using existing genetic sequencing.
Author: Sanjana G. Kulkarni