Faster TB lineage typing without a reference genome
Conor J. Meehan led research showing reference-free methods, especially SKA2, can rapidly and accurately type Mycobacterium tuberculosis lineages.
Whole-genome sequencing (WGS) of Mycobacterium tuberculosis (Mtb) has transformed how public health teams investigate recent transmission, offering high-resolution strain typing that helps guide outbreak responses and tuberculosis control strategies. But the current gold-standard approach — a reference-guided SNP-calling pipeline that relies on computationally intensive reference-mapping — can be hard to run in high-burden, resource-limited settings where simplified and scalable genomic tools are urgently needed. To tackle that gap, a team led by corresponding author Conor J. Meehan explored whether simpler, reference-free approaches could do the job at a medium resolution: lineage typing of Mtb strains. They worked from a collection of 535 complete genomes that span the human- and animal-adapted lineages of Mtb. From each complete genome they simulated Illumina paired-end reads, assembled those reads, and then tested reference-free, k-mer-based tools to see whether lineage structure could be recovered without the heavy computational steps required by reference mapping. The study set out to compare these faster workflows directly to established lineage assignments and evaluate whether they could be a practical alternative for many laboratories.
The team compared three reference-free, k-mer-based tools: MASH, PopPUNK, and SKA2 (Split K-mer Analysis). For each simulated and assembled sample they used those tools to generate genetic distance measures and then compared the results against a ground truth lineage assignment obtained with TB Profiler. Across the dataset of 535 complete genomes, reference-free methods were able to distinguish Mtb lineages, but performance differed among tools. SKA2 showed the most promising performance across all datasets, consistently recovering lineage and sub-lineage structure with high accuracy. MASH and PopPUNK also produced genetic distance estimates but did not match the consistency of SKA2 in reconstructing the known lineage partitions defined by TB Profiler. By demonstrating that SKA2 in particular can reproduce lineage-level relationships without reference-guided mapping, the study suggests a viable reference-free path for medium-resolution epidemiology of Mtb that still aligns with established lineage calls.
These findings point to a practical shift in how genomic epidemiology for tuberculosis could be performed in settings with limited computational resources. Reference-free methods, and SKA2 specifically, offer a route to accessible, scalable, and rapid Mtb strain typing that supports epidemiological investigations without the need for intense reference-mapping workflows. That matters because accurate and rapid strain typing is essential for informing outbreak investigations and guiding tuberculosis control strategies, and many high-burden regions lack the infrastructure to run the gold-standard pipelines. By recovering lineage and sub-lineage structure reliably, SKA2 could enable more labs to incorporate genomic information into routine surveillance and outbreak response, accelerating the translation of sequencing data into public health action while keeping computational demands low.
Reference-free tools like SKA2 could allow more laboratories in resource-limited, high-burden settings to perform Mtb lineage typing quickly and cheaply. Faster lineage information may speed outbreak investigations and improve targeting of tuberculosis control strategies.
Author: Aureliana F.C. Chilengue