Federated Learning Boosts HIV Prediction Across International Sites
Nicholas J Jackson reports Federated Learning enables privacy-preserving, multi-site HIV prediction models that match centralized performance and beat local models across diverse sites.
Machine learning and other digital health technologies are changing how infectious diseases are managed, but building useful models for HIV care has run into a practical barrier: patient-level data are often not shared across hospitals and countries. That limit makes it hard to train models on the large, diverse datasets needed for reliable predictions. To tackle this problem, a team led by Nicholas J Jackson tested Federated Learning (FL), a privacy-preserving approach that trains models across multiple locations without moving or sharing raw patient records. The researchers worked with the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet) and analyzed data from 22,234 people living with HIV (PLWH) across six sites in five countries. Instead of centralizing all data, FL lets each site keep its records and contribute to a shared model through secure updates. The study used FL to build clinical prediction models and compared those models to ones trained on combined centralized data and to models trained only at individual sites. The results speak directly to whether FL can overcome data-sharing hurdles in international HIV research.
The team evaluated FL across four clinical prediction tasks: 1-year mortality, 3-year mortality, tuberculosis incidence, and AIDS-defining cancer incidence. Using the records of 22,234 PLWH spread across six sites in five countries within CCASAnet, they trained federated algorithms and measured how those models performed compared with a centralized benchmark and with site-specific models. Federated Learning (FL) algorithms achieved near-centralized performance, meaning they closely matched models trained on pooled data, while substantially outperforming models that were trained only on a single site’s data. However, the size of each site’s dataset and the differences between sites — described as between-site heterogeneity — affected how much improvement FL delivered. In many cases, applying local fine-tuning after federated training further improved performance, but the benefit of this step varied depending on the prediction task. Overall, FL combined strong accuracy with the privacy advantage of not sharing patient-level records.
These findings support Federated Learning as a practical, scalable, and privacy-preserving infrastructure for multi-site machine learning in international HIV research. By matching centralized model performance without moving individual-level data, FL can help research networks like CCASAnet build robust clinical prediction tools while respecting local data policies and patient privacy. The variations in benefit across sites show that size and local differences matter: smaller or more different sites tend to gain more from pooled learning, and local fine-tuning can help adapt shared models to specific settings. That task-dependent nuance means implementers should consider both global federated training and local adjustments when deploying models for outcomes such as short- and longer-term mortality, tuberculosis, and AIDS-defining cancers. In short, FL offers a path for collaborative model development across borders, but attention to site variation and targeted fine-tuning will be important for getting the best results.
Federated Learning lets hospitals and clinics in different countries train shared prediction models without exchanging patient-level data, protecting privacy. By improving prediction accuracy over local models, it can help researchers develop better tools for forecasting mortality, tuberculosis, and AIDS-defining cancers among people living with HIV.
Author: Nicholas J Jackson