Hidden TB diversity revealed by complete genomes
Iñaki Comas reports that long-read sequencing of 216 MTBC isolates reveals far more genetic diversity, changing evolutionary rate estimates and uncovering hundreds of hidden SNPs.
Tuberculosis research has benefited from short-read sequencing, but those methods miss parts of the pathogen's full genetic picture. To address that gap, a team led by Iñaki Comas turned to long-read sequencing and applied it to 216 Mycobacterium tuberculosis complex (MTBC) isolates collected in the Valencia Region (Spain). By producing high-quality, complete genomes for each isolate, the researchers were able to look at genetic variation across multiple evolutionary scales, from differences between major lineages down to changes that occur within individual patients. The move from partial snapshots to whole, finished genomes revealed many features that short-read methods had overlooked. Complete genomes allow direct comparison across entire bacterial chromosomes, making it possible to detect more mutations, structural changes, and repeated sequence events. This work therefore set out to test how much genetic diversity of MTBC had been hidden from view, and to explore what that hidden diversity means for how we understand the bacterium's evolution, how it spreads, and how it interacts with the human immune system.
The study compared results from long-read, finished genomes against what is typically seen with short-read sequencing, and the differences were striking. Complete genome comparisons increased the estimated evolutionary rate by about 1.5-fold and added a median of 312 (–1 to 792) additional SNPs per pairwise comparison. The researchers identified multiple diversity hotspots, mostly in pe/ppe genes, and showed that much of this variation is driven by gene conversion. At the same time, most PE/PPE epitopes were hyperconserved, with notable exceptions that involved vaccine candidates. The team also found previously undetected SNPs and indels that improved the resolution of transmission analyses, helping to distinguish linked cases more clearly. Finally, by using patient-specific reference mapping the authors validated only 5–10% of the within-host diversity detected by standard pipelines, indicating that earlier approaches may substantially overestimate diversity inside patients. Together these results demonstrate that long-read, complete genomes reveal layers of MTBC variation that were previously invisible.
These findings expand our view of MTBC diversity and carry several important implications. First, a 1.5-fold increase in the estimated evolutionary rate and hundreds of additional SNPs per comparison change how scientists measure the pace of TB evolution and how they date transmission events. Second, the discovery of diversity hotspots in pe/ppe genes and the general hyperconservation of most PE/PPE epitopes highlight areas where the bacterium changes versus where it remains stable, information that matters for understanding host-pathogen interactions and for designing or evaluating vaccine candidates. Third, the improved resolution in transmission analyses means public health teams could better trace who infected whom when they rely on more complete genomic data. Finally, the finding that standard pipelines may overestimate within-host diversity calls for caution when interpreting studies of infection dynamics and could prompt updates to genomic surveillance and analysis methods. Overall, complete genomes offer a deeper, more accurate picture of MTBC that affects epidemiology, vaccine development, and strategies to control tuberculosis.
Author: Ana María García García