Download PDF

A divide-and-conquer EM algorithm for large matrix-variate non-Gaussian longitudinal data

Format: IPS Abstract - WSC 2025

Keywords: "electronic, big-data, matrix-data, skewed longitudinal data

Session: IPS 770 - Computationally-Intensive Methodologies for Analyzing Large Datasets: A Blissful Marriage Against All Odds?

Wednesday 8 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)

Abstract

Features of non-Gaussianity, manifested via skewness and heavy tails, are ubiquitous in databases generated from large scale observational studies, yet, they continue to be routinely analyzed via linear/non-linear mixed effects models under standard Gaussian assumptions of the random terms. In periodontal disease data, these issues are applicable to the modeling of clinical attachment loss and pocket depth. These problems are exacerbated in large longitudinal observational data extracted from electronic health records (EHR), where subjects are monitored at irregular time-points. In this talk, we define and elucidate a matrix-variate extension of the Skew-t regression model, along with several strategies to ensure its extensibility to the irregular framework. This extensibility is achieved via the implementation of a distributed framework of the expectation-maximization algorithm, often considered the “gold-standard” algorithm for longitudinal data. Specifically, the E-step of the EM algorithm is run in parallel on multiple worker processes, while manager processes perform the M-step with a fraction of the results from the local expectation steps. We explore our methodology in terms of computational enhancements and finite sample performances (in light of existing tools) via synthetic data, as well as present application to the periodontal EHR data.

65th ISI World Statistics Congress

A divide-and-conquer EM algorithm for large matrix-variate non-Gaussian longitudinal data

Author

Co-author

Conference

Wednesday 8 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)

Abstract