New realistic genomes expose blind spots in TB variant detection
Guislaine Refrégier and colleagues developed Maketube, showing conventional benchmarks miss important Mycobacterium tuberculosis variants and overestimate tool performance.
As whole genome sequencing becomes routine in bacterial research, scientists need reliable ways to judge the computer tools that detect genetic differences, a process called reference-based variant calling. Current benchmarks use two imperfect approaches: natural genomes that have been assembled from real bacteria and compared back to a reference with a genome aligner, or simplified in silico genomes created by inserting small changes into a reference. These strategies can hide real-world complications. Guislaine Refrégier and collaborators set out to build a better test system. They developed Maketube, a method that evolves realistic versions of Mycobacterium tuberculosis complex genomes in silico by incorporating the full diversity of variants that have been observed and verified in natural isolates. Using these Maketube-evolved genomes, the team re-ran benchmark tests and compared how well common pipelines and aligners detect variants in a setting that more closely resembles true biological diversity. Their goal was to reveal biases in current benchmarking approaches and to offer a more faithful tool for testing variant-calling software.
Maketube-evolved genomes were evaluated for how well they mimic real Mtbc complex genomes and then used to test common analysis tools. The authors found that genome aligners can miss as much as 7.5% of true variants, a gap that means benchmarks relying on de novo assembled natural genomes are biased by the aligner’s own errors. They also compared variant-calling pipelines and found that recall was overestimated by 1 to 10% when benchmarks used simplistic in silico-evolved genomes instead of Maketube-derived ones. The tools tested included MTBseq, TB-Profiler, and an in-house pipeline called genotube; the study reports slight but significant differences in performance among them. The work also identifies specific problem areas: variants are frequently missed in duplicated regions and in regions flanking sequences absent from the reference, such as displaced insertion sequences or sequences deleted during the reference’s evolution. Finally, the authors show that structural variants interfere with variant calling both because they add extra sequence and because they cause misalignments around insertions.
These findings matter because benchmarking drives what tools people trust for research and clinical interpretation of Mycobacterium tuberculosis genomes. By showing that Maketube-evolved genomes more faithfully reproduce the kinds of variants seen in nature, Guislaine Refrégier and colleagues argue for using realistic in silico genomes when evaluating variant-calling tools. Benchmarks that ignore structural complexity and duplicated regions can give inflated confidence in pipelines, masking systematic blind spots where real variants will be missed. The evidence that structural variants and sequences absent from the reference lead to missed calls highlights the need for tests that include these challenges, so developers can improve aligners and callers and users can understand limitations. In short, Maketube-derived genomes provide a more demanding and informative benchmark, helping to make genomic tools and comparisons more reliable for anyone working with Mtbc data.
More realistic benchmarking with Maketube can reveal when popular pipelines miss clinically or epidemiologically important variants, reducing overconfidence in flawed workflows. Better benchmarks should drive improvements in tools and more accurate interpretation of Mycobacterium tuberculosis genomic data in research and public health.
Author: Adrien Le Meur