New genetic map reveals tuberculosis diversity
David Alland led sequencing of 50 complete Mycobacterium tuberculosis genomes and created a Pangenome Gene Reference Resource to capture strain diversity and aid drug and vaccine discovery.
Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), remains a global disease with a wide range of clinical and microbiological outcomes. For decades, much of the laboratory study of Mtb has relied on only a few reference strains, which limits our view of the bacterium’s true diversity and may obscure important genetic differences that influence disease, transmission, or treatment response. To address this gap, a team led by David Alland used a de novo assembly pipeline to produce extremely complete and accurate whole genome sequences. Rather than relying on fragmented or error-prone reference genomes, the researchers assembled 50 closed Mtb genomes that represent all seven major lineages of the species. By generating fully closed genomes from across the Mtb family tree, the study moves beyond single-strain comparisons and creates a broader foundation for exploring how genetic differences map onto the wide variety of clinical behaviors seen in TB patients worldwide.
Using their de novo assembly pipeline, the team analyzed the 50 closed genomes and cataloged gene content across strains. They identified 3,377 core gene clusters present across the collection and 379 accessory clusters found in some but not all strains. When core clusters appeared in multiple copies, the causes were resolved: 76% were due to gene fragmentation, 12% to paralogs, 4% to nearly identical gene duplications, and 8% to combinations of these processes. The researchers also pinpointed 16 hypervariable regions (HVRs), which included novel paralogs and variable PE/PPE genes, a gene family known for its variability. All of these findings were consolidated into a Pangenome Gene Reference Resource (PGRR) meant for precision alignment and comparison. The analysis supports the idea of a largely closed Mtb pangenome in which most of the meaningful diversity is concentrated in accessory genes and the identified HVRs.
The practical result of this work is a unified resource that can make genetic comparisons more precise and meaningful. By creating the Pangenome Gene Reference Resource, the authors provide a tool intended to improve the search for drug and vaccine targets and to better link genetic variation to clinical outcomes. The study underscores why researchers should move beyond reliance on the single, commonly used H37Rv strain when studying Mtb genetic and phenotypic diversity: important differences are found across lineages, in accessory clusters and in PE/PPE representation, and in the 16 hypervariable regions described. Because many observed differences arose from fragmentation, paralog duplication and deletion events, the new closed genomes and the PGRR can help distinguish true biological variation from assembly artifacts, giving clinicians and scientists a firmer basis for developing new interventions and interpreting diagnostic or resistance-related genetic signals.
A unified reference like the PGRR can help researchers find new drug and vaccine targets by making genetic comparisons more precise. Sequencing diverse strains and reducing reliance on H37Rv could improve diagnostics, treatment strategies, and public health responses to tuberculosis worldwide.
Author: Poonam Chitale