Machine learning boosts community TB screening accuracy in South Africa and Zambia
Claudia M Denkinger reports an XGBoost machine learning model greatly outperformed the WHO four-symptom screen but did not yet meet WHO's 2025 TPP standards.
Tuberculosis remains a major public health challenge, and finding cases in the community depends on simple screening tools that decide who should get diagnostic testing. Current tools, such as the WHO four-symptom screen (W4SS), are easy to use but miss many cases or flag too many people for testing, leading to both missed diagnoses and unnecessary follow-up. To address this gap, researchers led by Claudia M Denkinger combined data from four large community-based TB prevalence surveys in South Africa and Zambia to see whether a machine learning approach could do better. They assembled a harmonized dataset of 169,813 people aged 15 and older who were not on TB treatment when surveyed. Using available microbiological and radiographic results, the team labeled each person as either 'Possible TB' or 'Unlikely TB' to create a clear outcome for training and testing. The study focused on community-based active case finding, aiming to develop a practical risk-prediction tool that could be used before sending people for diagnostic tests.
The team trained an XGBoost model on 80% of the combined dataset (135,854 people) using demographic, clinical, and socio-economic variables, and tested it on the remaining 20% (33,959 people). Overall, 16,413 people (9.7%) were labeled 'Possible TB.' Model interpretability was assessed with SHapley Additive exPlanations (SHAP) values to see which factors drove risk scores. On the held-out test set the XGBoost model achieved an area under the curve (AUC) of 79.7% (95% CI: 78.7, 80.7), substantially higher than the W4SS which had an AUC of 57.0% (95% CI: 56.1, 57.8). At a threshold set to about 60% specificity, XGBoost reached 81.5% sensitivity (95% CI: 77.6, 84.9), while the W4SS only reached 38.2% sensitivity (95% CI: 36.5, 39.9) on the same data. SHAP analysis identified age, previous TB treatment, times treated for TB and unemployment as the primary contributors to predicted risk. The authors compared performance against WHO's 2025 Target Product Profile (TPP) and noted the model did not fully meet those targets.
These findings suggest a machine learning-based screen could be a useful step before diagnostic testing in community TB programs, helping to prioritize who should receive further tests and potentially reducing both missed cases and unnecessary evaluations. Because the XGBoost model markedly outperformed the WHO four-symptom screen in this large, multi-country dataset, programs that struggle with low sensitivity of simple symptom screens might consider piloting similar approaches. At the same time, the model did not meet the WHO 2025 TPP benchmarks, so it is not yet a finished solution; the authors suggest that adding additional variables, such as geolocation data, could improve performance. The study used data only from the four surveys and was not registered, so further external validation, refinement, and consideration of practical implementation issues will be needed before this approach can be recommended for broad rollout in community screening.
Author: A. Zimmer