65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Mixed effects models for longitudinal compositional data using the SAEM algorithm: Application to identifying respiratory microbiome-based predictive

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: compositional

Abstract

It is very common to treat data that at a given time describe the intrinsic structure of an integrated system by proportion or percentage. This kind of data are called compositional data and they have been of great interest recently. As pointed by Wang et al. (2019), we observe these kinds of data in many fields of application such as in economic data analysis (water consumption structure, investment/employment structure of industrial sectors, industrial raw material consumption proportion across regions) but above all in biometrics and more precisely in longitudinal microbiome studies. Although the significance of the microbiome in biological systems and in disease has been extensively studied, its structure and function are still not fully understood. A microbiome's temporal variation is also fundamentally complex, with dynamic interactions between the host or environment that have not yet been well researched (see Gerber G., 2015; or Schmidt et al., 2018). Addressing these problems is the aim of longitudinal studies. In recent works (see Kodikara et al., 2022 for a review), complex mixed-effects models have been proposed to analyze the longitudinal microbiome compositions considering the over-abundance of zeros (leading to zero-inflation and overdispersion issues) and the dependence between the binary and non-zero-parts of the taxon-level relative abundance. In the main works, to date, the authors proposed to obtain the maximum likelihood parameters using approximation methods like Gauss-Hermite quadrature to deal with intractable integrals. However, it is well known that these kinds of numerical methods can produce inconsistent estimates. We propose in this work to develop Stochastic EM-type algorithms such as the SAEM algorithm (Delyon et al. 1999; Kuhn and Lavielle 2004, 2005) for longitudinal microbiome data using complex mixed-effects models, avoiding the issues associated with the approximation methods like Gauss-Hermite quadrature. We applied the SAEM algorithm to study the association between environmental or human microbiome and chronic respiratory diseases, such as asthma or cystic fibrosis, across different cohorts.