AI tools make a complex TB mutation catalogue easier to use
Timothy C. Rodwell led a study showing AI models can help users query the WHO 2023 Mutation Catalogue for Mycobacterium tuberculosis more easily.
The World Health Organization’s WHO 2023 Mutation Catalogue for Mycobacterium tuberculosis is a central resource for interpreting genetic changes linked to drug-resistant tuberculosis, but its size and complexity make it hard for many clinicians, laboratorians, and public health workers to use. To tackle that access problem, a team led by corresponding author Timothy C. Rodwell explored whether modern generative artificial intelligence could let people interact with the catalogue in everyday language. Rather than asking users to wade through dense tables and long documents, the researchers tested whether AI chat-style tools could answer general questions, find specific mutations, and retrieve information from the full catalogue as well as from antibiotic-specific tables. They also tested whether models could apply extra grading rules to score novel mutations—an important step when a genetic change hasn’t been seen before. The goal was practical: make the catalogue more usable for all levels of healthcare users without changing the authoritative document itself, by building an AI interface that helps people find and understand the catalogue’s clinical interpretations.
The study compared four prominent AI models: Google Gemini 2.5 Pro, OpenAI ChatGPT 4.1, Perplexity AI, and DeepSeek R1. Tests included general test questions, mutation search and retrieval tasks using both full-catalogue queries and focused antibiotic-specific tables, and an exercise in applying additional grading rules to novel mutations. Performance was judged on accuracy, completeness, clarity, whether the model cited sources, and whether it produced unsupported statements (so-called hallucinations). Across most evaluations, Google Gemini 2.5 Pro performed best overall, showing stronger accuracy, more complete answers, and fewer hallucinations—especially on general queries and large-dataset searches. DeepSeek R1 stood out for applying grading rules to novel mutations and showed high accuracy in focused datasets, though it did produce some hallucinations. OpenAI ChatGPT 4.1 was noted for clarity but fell short on proper source citation. Perplexity AI delivered variable results and a higher frequency of hallucinations. These head-to-head findings point to clear differences between models in suitability for clinical support tasks.
The results matter because the WHO 2023 Mutation Catalogue is intended to guide clinical interpretation of mutations tied to drug resistance, and any tool that helps people use it must be reliable. The study highlights that generative AI can increase accessibility and speed for users who need to interpret catalogue entries, but it also makes clear that not all models are equally safe or useful for clinical contexts. The authors emphasize the need for careful model selection, rigorous benchmarking, and attention to citation and hallucination risks before deploying any clinical AI agent. Based on this evaluation, Google Gemini 2.5 Pro and DeepSeek R1 are promising candidates for building a custom clinical AI assistant to help users at all levels navigate the Mutation Catalogue. If implemented with strong quality controls, such an assistant could make the catalogue’s complex information easier to use while preserving the accuracy required for TB control efforts.
A well-chosen AI interface could let clinicians, laboratorians, and public health workers query the WHO 2023 Mutation Catalogue in plain language, speeding interpretation of mutations. However, rigorous testing and careful model choice are essential to prevent errors that could affect patient care.
Author: Miguel Moreno-Molina