NEW STATISTICAL METHODS FOR LONGITUDINAL MICROBIOME DATA
Conference
Category: International Statistical Institute
Abstract
Talk1: Statistical challenges in longitudinal microbiome data analysis.
The microbiome is a complex and dynamic community of microorganisms that co-exist interdependently within an ecosystem, and interact with its host or environment. Longitudinal studies can capture temporal variation within the microbiome to gain mechanistic insights into microbial systems, however current statistical methods are limited due to the complex and inherent features of the data. We have identified three analytical objectives in longitudinal microbial studies: (1) differential abundance over time and between sample groups, demographic factors or clinical variables of interest; (2) clustering of microorganisms evolving concomitantly across time; and (3) network modelling to identify temporal relationships between microorganisms. We have explored the strengths and limitations of current methods to fulfill these objectives, and compared different methods in simulation and case studies for objectives (1) and (2). We will also present our current methodological developments for objectives (2) and (3).
Talk2: Addressing model identifiability in the analysis of longitudinal sequence count data
A common statistical problem is inference from positive-valued multivariate measurements where the scale (e.g., sum) of the measurements are not representative of the scale (e.g., total size) of the system being studied. This situation is common in the analysis of modern sequencing data. The field of Compositional Data Analysis (CoDA) axiomatically states that analyses must be invariant to scale. Yet, many scientific questions posed in the analysis of longitudinal studies rely on the unmeasured system scale for identifiability. Instead, many existing tools make a wide variety of assumptions to identify models, often imputing the unmeasured scale. Here, we analyze the theoretical limits on inference given these data and formalize the assumptions required to provide principled scale reliant inference. Using statistical concepts such as consistency and calibration, we show that we can provide guidance on how to make scale reliant inference from these data. We prove that the Frequentist ideal is often unachievable and that existing methods can demonstrate bias and a breakdown of Type-I error control. We introduce scale simulation estimators and scale sensitivity analysis as a rigorous, flexible, and computationally efficient means of performing scale reliant inference.
Talk3: Dynamic clustering and heterogeneity pursuit with longitudinal microbiome data
Detecting sample clusters and identifying sources of heterogeneity (i.e., the distinctive microbial components or phenotypic features that differentiate the clusters) play a critical role in unraveling the relationship between microbial profiles and heterogeneous health states. We develop Dirichlet-Multinomial (DM) mixture models with heterogeneity pursuit to cluster microbiome profiles and pinpoint key taxa with distinctive abundances across clusters. We further adapt the heterogeneity pursuit method to the longitudinal setting through a hidden Markov setup to simultaneously identify latent microbial states and characterize the dynamics of state transitions. An application with iHMP data demonstrates the promise of the proposed framework for deciphering the heterogeneity and dynamics of microbiome.
Talk4: Strain genetic association studies within the human microbiome