PAPER 31 Jan 2025 Global

crumblr reveals subtle cell changes in disease

Gabriel E. Hoffman introduces crumblr, a tool that more reliably detects shifts in cell type frequencies from single-cell data while controlling false positives.

Studying how the mix of cell types in tissue changes is central to understanding many health problems. Advances in single-cell technology have made it possible to measure the proportions of many cell types at high resolution across large groups of people, but that very detail creates new statistical problems. Raw counts of cells must be interpreted as compositional data—parts of a whole—and traditional tests can miss real differences or produce false leads when studies have complex designs or when cell types are related in a lineage. To address these challenges, Gabriel E. Hoffman and colleagues developed crumblr (DiseaseNeurogenomics.github.io/crumblr), a scalable statistical approach designed specifically for count ratio data from single-cell studies. The method is built to handle large datasets and complicated study setups by explicitly modeling differences in cell composition while accounting for study structure and measurement variability. By focusing on the way cell types relate to one another and on reliable estimates of uncertainty, crumblr aims to make comparisons across individuals and experimental groups more robust and interpretable.

At its core, crumblr analyzes count ratio data using precision-weighted linear mixed models and incorporates random effects to accommodate complex study designs. Unlike methods that test one cell type at a time, crumblr performs statistical testing at multiple levels of the cell lineage hierarchy with a multivariate approach, which lets it borrow strength across related cell types. This multilevel testing increases the chance of detecting true shifts in composition compared to single-type tests. In simulation studies reported by the authors, crumblr increased statistical power relative to existing methods while still controlling the false positive rate, meaning it finds more real signals without producing more spurious ones. The team demonstrated the method on published single-cell RNA-seq datasets covering diverse biological contexts: aging, tuberculosis infection in T cells, bone metastases from prostate cancer, and SARS-CoV-2 infection. Across these examples, crumblr provided a flexible way to test for and interpret changes in cell type frequency in real-world datasets.

The approach has several practical implications for researchers working with single-cell data. By combining precision-weighted linear mixed models, random effects, and multivariate testing across the cell lineage hierarchy, crumblr offers a more powerful and adaptable framework for detecting meaningful shifts in cell composition. This can help scientists identify which parts of a tissue or immune system are changing in disease or aging, and do so with better statistical confidence. Because crumblr is scalable and designed for complex study designs, it is intended to be useful for large cohort studies and experiments with repeated measures or nested sampling. The examples using single-cell RNA-seq data for aging, tuberculosis in T cells, prostate cancer bone metastases, and SARS-CoV-2 underscore the method's broad applicability. Making such tools available helps translate rich single-cell measurements into clearer biological insights without overstating confidence in weak signals.

Public Health Impact

crumblr can help researchers detect and trust changes in cell populations that matter for disease research and patient studies. By increasing power while controlling false positives, it can make single-cell analyses more reliable across diverse biological questions.

single-cell RNA-seq
compositional analysis
statistical methods
crumblr
cell lineage hierarchy
{% if expert_links_html %}
Featured Experts

Author: Gabriel E. Hoffman

Read Original Source →