AI spots likely tuberculosis cases from routine clinic data
Devendra Singh Parihar shows machine learning, including CNNs, can flag people for TB confirmatory testing using routine clinical and demographic data.
Tuberculosis remains a global health challenge, and finding efficient ways to identify people who need definitive testing is a priority. In routine care, community-level clinics often collect basic clinical and demographic information from people who come in with a cough or other TB-related symptoms. Devendra Singh Parihar and colleagues explored whether machine learning could use those easily obtainable data to classify who is likely to have TB. The idea is straightforward: an automated triage tool would analyze information already being collected at the first point of contact and point clinicians toward people who should receive more expensive, specialized confirmatory testing. By relying on existing data rather than new laboratory tests, such a tool could lower costs and help prioritize limited resources. Parihar’s team set out to test multiple machine learning approaches on real-world datasets made up of individuals who self-presented to healthcare facilities with symptoms or risk factors suggestive of TB, seeking a low-cost method that could be deployed in primary health-care settings.
The researchers evaluated Logistic regression, XGBoost, and convolutional neural network classifiers, using fully-nested cross validation, with and without feature selection, to see which approach worked best on clinical and demographic inputs. Although applying CNNs to this kind of data is unconventional, the team found it to be effective. Experiments used two datasets: cough diagnostic algorithm for TB (CODA TB), n = 1140, and cough audio triage for TB (CAGE-TB), n = 463. All participants in both datasets self-presented to healthcare facilities with symptoms or risk factors suggestive of TB. Using the CNN, areas under the receiver operating characteristic (AUROC) of 80.48% and 83.06% are achieved for the two datasets respectively. The study also showed that performance improves when the set of clinical features is extended and when the number of people in the dataset increases, indicating that more and richer data help the models make better predictions.
The results point toward a practical application: an automated TB triage tool that runs on a low-cost mobile device such as a smartphone and is suitable for use at primary health-care facilities. If a clinic could run a short, inexpensive screening using routine demographic and clinical information and a phone-based model, it could flag people most likely to have TB and ensure they receive confirmatory tests. That kind of triage would not replace laboratory diagnosis but could make screening more efficient, directing scarce diagnostic resources to the people who need them most. The study’s finding that wider feature sets and larger datasets improve accuracy suggests a pathway for future development: collecting more routinely available data and continuing to refine models could raise performance further and make smartphone-based triage a useful adjunct to existing TB programs.
A phone-based AI triage could help clinics identify who should get expensive confirmatory TB tests, saving time and resources. Implemented at primary-care facilities, it could speed referrals and focus testing on the people most likely to have TB.
Author: Devendra Singh Parihar