PAPER 24 Aug 2025 Global

Can AIs Find Disease Signals in Blood RNA?

Iwijn De Vlaminck led a study showing large-language models can help find diagnostic gene panels from plasma cell-free RNA and sometimes match conventional methods.

Researchers led by Iwijn De Vlaminck asked whether large-language models (LLMs) — software that can read huge amounts of information and even write executable code — can help discover diagnostic biomarkers from high-throughput molecular data. To test this, they benchmarked six LLMs against real clinical data: OpenAI’s o3 and GPT-4o, Anthropic’s Claude Opus 4 and Claude 3.7 Sonnet, and Google’s Gemini 2.5 Pro and Gemini 2.0 Flash. The team used plasma cell-free RNA (cfRNA) profiles obtained by RNA sequencing from three distinct comparisons: children with Kawasaki disease (KD) versus multisystem inflammatory syndrome in children (MIS-C); adults with active tuberculosis (TB) versus other non-TB respiratory conditions; and people with myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) versus a sedentary lifestyle. They set up two tasks: a gene-panel design task in which each LLM mined public knowledge to propose diagnostic genes for use in machine learning (ML), and an end-to-end modeling task in which LLMs were asked to build a full ML workflow directly from raw RNA-seq counts. The study aimed to map what these LLMs can and cannot do when applied to biomedical biomarker discovery.

In the gene-panel design task, LLM-derived panels tended to capture canonical immune pathways and, across all three cohorts, outperformed randomly selected gene sets. However, when compared to panels chosen by differential gene expression (DGE) analysis, the LLM panels underperformed in the KD versus MIS-C and ME/CFS cohorts, while performing comparably or better for the TB cohort. For the end-to-end modeling task, one model in particular — OpenAI’s o3 — produced classifiers for KD versus MIS-C that performed just as well as conventional statistical methods without human intervention, using the raw RNA-seq counts as input. For the TB and ME/CFS cohorts the automated LLM-built workflows produced slightly lower performance than the conventional approach. The bench of models tested therefore showed a mixed picture: they can nominate biologically meaningful gene lists and in some cases build effective classifiers, but they do not uniformly beat standard DGE-driven or statistical pipelines.

These results outline both promise and limits for applying LLMs to diagnostics. On the positive side, LLMs can sift public knowledge to nominate genes that reflect immune biology and can outperform random choices, making them useful for prioritizing candidates in early biomarker discovery. The success of o3 in producing KD versus MIS-C classifiers without human coding suggests parts of the analytic workflow can be automated, potentially speeding development and lowering technical barriers. On the cautionary side, LLM-derived panels did not consistently outperform established differential gene expression analysis across all conditions, and automated end-to-end modeling was slightly weaker for TB and ME/CFS. That pattern indicates LLMs are a complement rather than a replacement for traditional methods today: they can accelerate hypothesis generation and workflow assembly, but their outputs will still need careful evaluation and validation before clinical or research adoption.

Public Health Impact

LLMs could speed the early stages of biomarker discovery by proposing biologically relevant gene panels and automating parts of analysis workflows. Because performance varies by disease, rigorous validation will remain essential before clinical application.

cell-free RNA

large language models

biomarker discovery

tuberculosis diagnostics

Kawasaki disease & MIS-C

{% if expert_links_html %}

Featured Experts

Author: Hunter Gaudio

Read Original Source →