Network maps reveal tuberculosis spread patterns in Lima
Anne N. Shapiro and colleagues show a network method using directed pairwise transmission probabilities can identify TB clusters and is consistent across clustering approaches.
Tuberculosis is a disease where the actual transmission events — who infected whom — are usually hidden, especially because TB can pass slowly from person to person. That makes it hard for public health teams to map how the disease moves through a community. Anne N. Shapiro and collaborators set out to tackle this problem by using network tools that work from estimated connections instead of direct observations. Rather than relying only on clear outbreak chains, the team estimated the likelihood that each person infected every other person and used those pairwise probabilities to build a network of possible transmission links. They applied this approach to both computer-generated simulated data and real-world records from a cohort study in Lima, Peru. The goal was to identify transmission clusters — groups of people likely connected by chains of infection — and then look for features that cluster members shared. By comparing simulation and field data, the researchers tested whether their network methods could reliably reveal grouping patterns and whether the groups were meaningfully similar or diverse in key traits.
The study estimated directed pairwise transmission probabilities using an existing iterative algorithm that employs a modified Naïve Bayes classifier to bring together demographic, clinical, and genetic information. Those probabilities were then turned into a network by drawing links between people with higher estimated transmission likelihoods. To reduce noise, the team explored techniques to trim low probability edges so weak, uncertain links did not overwhelm the network. They applied a variety of clustering algorithms to group people based on the informed edges, and they first evaluated the approach on simulated data to see how well the clustering algorithms recovered known simulated clusters. When they applied the same pipeline to the Lima, Peru cohort study, they assessed cluster similarity with a binary entropy measure. Across different edge trimming scenarios and clustering methods, cluster performance remained consistent. The analysis also found high levels of entropy for age, sex, socioeconomic status, and individuals who work outside the house and use public transit, indicating these variables were heterogeneous across clusters rather than defining tight, uniform groups.
This work demonstrates practical ways to analyze estimated transmission links with network techniques, offering a route to study outbreaks when direct transmission observation is not possible. The authors show the approach is robust — consistent across choices about how to build the network and which clustering method to use — which strengthens confidence that findings are not artifacts of a particular analytic decision. Because the method starts from directed pairwise transmission probabilities, it can highlight likely chains and clusters even for diseases with long serial intervals such as tuberculosis (TB). Importantly, the finding that common traits like age, sex, socioeconomic status, and patterns of work and transit were heterogeneous across clusters suggests that transmission in this setting does not simply follow obvious demographic lines. The researchers conclude this general framework can be applied to any disease outbreak to understand its dynamics, helping public health teams to focus investigations and interventions based on inferred transmission structure rather than only on visible case counts.
Public health teams could use this network approach to identify likely transmission clusters when direct links are unobservable, improving situational awareness during TB outbreaks. By revealing which traits are or are not shared within clusters, the method can help target investigations and interventions more precisely.
Author: Anne N. Shapiro