A divide-and-conquer EM algorithm for large matrix-variate non-Gaussian longitudinal data
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: "electronic, big-data, matrix-data, skewed longitudinal data
Wednesday 8 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)
Abstract
Features of non-Gaussianity, manifested via skewness and heavy tails, are ubiquitous in databases generated from large scale observational studies, yet, they continue to be routinely analyzed via linear/non-linear mixed effects models under standard Gaussian assumptions of the random terms. In periodontal disease data, these issues are applicable to the modeling of clinical attachment loss and pocket depth. These problems are exacerbated in large longitudinal observational data extracted from electronic health records (EHR), where subjects are monitored at irregular time-points. In this talk, we define and elucidate a matrix-variate extension of the Skew-t regression model, along with several strategies to ensure its extensibility to the irregular framework. This extensibility is achieved via the implementation of a distributed framework of the expectation-maximization algorithm, often considered the “gold-standard” algorithm for longitudinal data. Specifically, the E-step of the EM algorithm is run in parallel on multiple worker processes, while manager processes perform the M-step with a fraction of the results from the local expectation steps. We explore our methodology in terms of computational enhancements and finite sample performances (in light of existing tools) via synthetic data, as well as present application to the periodontal EHR data.