PAPER 27 Apr 2026 Global

New computational method uncovers missed gene signals in small tuberculosis studies

Tavpritesh Sethi's GeneLift uses generative data augmentation and BayesScore to reveal gene signals missed in small transcriptomic studies, including tuberculosis.

A major obstacle in studying diseases at the molecular level is that many gene expression studies measure thousands of genes but include only a handful of patient samples. This imbalance — many more features than samples — makes it hard to tell which gene changes are real and which are just noise. To tackle that problem, Tavpritesh Sethi and colleagues developed GeneLift, a computational pipeline designed to strengthen small transcriptomic studies by combining synthetic data generation, stability-aware signature discovery, and multi-source biological validation into one workflow. Rather than relying on a single method, GeneLift was tested across 36 microarray datasets from five disease areas, including sepsis, breast cancer, ovarian cancer, tuberculosis, and diabetes. The idea is straightforward: carefully create additional, realistic-looking data, use that augmented dataset to test which gene signals are stable, and then check those signals against independent biological evidence. By packaging these steps together, the team aimed to find gene signatures that standard small-cohort analyses would miss, providing a way to uncover previously overlooked molecular signals in diseases such as tuberculosis.

GeneLift examined a range of generative approaches that have been successful in imaging — including GANs, VAEs, and diffusion models — but performed a component-wise evaluation to see which methods actually reproduce gene-level behavior. Surprisingly, Gaussian Mixture Models (GMMs) outperformed the deep generative approaches, more faithfully reproducing the distribution of individual genes. The researchers also introduced a novel way to 'titrate' augmentation, meaning they varied how much synthetic data to add and observed which gene candidates became more robust as augmentation increased. This approach revealed biologically meaningful gene candidates that were absent from the original, underpowered analyses. To add literature-based validation, the team developed BayesScore, a Bayesian posterior probability of gene-disease association computed from PubMed co-occurrence. BayesScore recovered well-characterized disease genes that standard differential expression missed and even flagged candidates later confirmed in the literature, sometimes up to 18 years after the original dataset was published. The software is freely available at tavlab-iiitd/GeneLift.

For tuberculosis research and other fields that rely on small transcriptomic cohorts, GeneLift offers a practical route to surface signals that would otherwise remain hidden. The combination of GMM-based augmentation, stability testing across titrated synthetic data, and BayesScore literature validation provides multiple, complementary ways to judge which gene patterns are credible. Importantly, the findings suggest that complex deep generative models are not always superior for omics augmentation and that simpler probabilistic models can better preserve gene-level distributions. By making GeneLift openly available, Tavpritesh Sethi and collaborators enable other teams to re-examine legacy datasets or new small studies with a systematic augmentation and validation framework. While computational results always need experimental follow-up, this approach could help prioritize genes for further laboratory testing and speed up the search for reliable biomarkers in tuberculosis and other diseases.

Public Health Impact

GeneLift can help researchers extract more reliable gene signals from small, existing tuberculosis transcriptomic datasets, improving biomarker discovery without needing new large cohorts. By surfacing candidates missed by traditional analysis, it may speed up which genes are prioritized for experimental follow-up and clinical study.

GeneLift
BayesScore
transcriptomics
tuberculosis
Gaussian Mixture Models
{% if expert_links_html %}
Featured Experts

Author: Tavpritesh Sethi

Read Original Source →