PAPER 19 Feb 2026 Global

Minipoa accelerates large-scale genome alignments with far less memory

Mengting Niu presents minipoa, a new tool that speeds up and shrinks memory needs for large-scale sequence alignment and pangenomics tasks.

Aligning many DNA sequences efficiently is a growing challenge as researchers work with longer reads and far larger datasets. Partial order alignment (POA) is a core technique for tasks such as long-read error correction, assembly and pangenomics, but conventional POA methods can be slow and memory-hungry, limiting their use on big projects. To address this, Mengting Niu and colleagues developed minipoa, a new POA tool designed for speed and low memory usage. The team kept the core goal of POA—accurately aligning many sequences while respecting their order—but rethought the way alignments are found and constructed so the method could scale to megabase genomes and to datasets containing millions of sequences. By focusing on making POA practical for modern, large datasets, the developers aimed to keep accuracy high while dramatically cutting computational demands, enabling workflows that need multiple sequence alignment or consensus building to run on much larger collections of long reads than before.

Minipoa combines a set of algorithmic and implementation strategies to reach its gains. The tool uses seed-chain-align heuristics, adaptive or static banding strategies, and single-instruction multiple-data optimizations to reduce work and memory footprint during alignment. In head-to-head comparisons reported in the abstract, minipoa reached up to a 5-fold speedup over abPOA and reduced memory usage by up to 16-fold, while also improving correction accuracy. Performance remained strong on both PacBio and ONT simulated datasets. On multiple sequence alignment benchmarks, minipoa showed superior computational efficiency and alignment accuracy compared with other tested tools, producing Total Column scores up to 2.5-fold higher than MAFFT in low-similarity scenarios. The abstract also reports that minipoa enabled multiple sequence alignment of megabase-long genomes with a set of 342 Mycobacterium tuberculosis sequences and scaled to one million SARS-CoV-2 sequences, illustrating its ability to handle both bacterial genome collections and very large viral datasets.

The implications of these improvements are practical and immediate for groups working in long-read error correction, assembly and pangenomics. By trimming runtime and memory requirements while maintaining or improving alignment accuracy, minipoa can be integrated into existing workflows to make large analyses feasible on available compute resources. Its ability to handle megabase-long genomes and million-sequence datasets suggests researchers can broaden the scale of comparative and consensus analyses without bespoke high-memory systems. The combination of algorithmic heuristics and low-level optimizations means minipoa is positioned to serve as a foundational tool as pangenomics projects grow in size. According to the abstract, these features make minipoa well positioned to become a cornerstone in the era of large-scale pangenomics, enabling more comprehensive studies of genomic diversity across pathogens and other organisms.

Public Health Impact

Minipoa could let researchers run large-scale multiple sequence alignments and error correction on much bigger datasets without prohibitive memory or time costs. This makes routine analysis of megabase genomes and million-sequence collections more accessible to labs and consortiums.

minipoa
abPOA
MAFFT
pangenomics
long-read sequencing
{% if expert_links_html %}
Featured Experts

Author: Haodong Liu

Read Original Source →