Advancing environmental statistics through online collaborative groups
Conference
Category: The International Environmetrics Society (TIES)
Abstract
Title: Mixed Bayesian compressed regression for multivariate models for large correlated geospatial data-sets
Abstract: Modeling complex high-dimensional geostatistical data presents many computational challenges, which has led to substantive algorithmic developments, beyond the possible need for high-performance computing. Even with these developments, the challenges for model fitting and inference handling multivariate inference, especially within a Bayesian statistical framework, are still substantial. Here, we offer an extension of the efficient new sampling algorithm developed by Moran and Weller (2022), named as Fast Increased Fidelity Approximate Gaussian Process (FIFA-GP), to multivariate spatial data observed at fixed locations of a region. This algorithm takes advantage of $\mathcal{H}$-matrices approximation of the matrices comprising the GP posterior covariance, and allows to move from a cubic complexity to a near linear complexity. We demonstrate the scalability of the proposed approach using synthetic data as well as existing geospatial ecological data.
Title: A novel Bayesian framework for source apportionment of particle number size distribution
Abstract: The increase in health risk due to particle pollution exposure is a constant concern for society. Particulate Matter (PM) is a combination of different components from multiple sources. The identification of those sources is of vital importance to generate effective air pollution-health policies and regulations. In this work, we present a flexible statistical approach to apportion air pollution particles into their sources. While factor analysis, and particularly Positive Matrix Factorization (PMF), might be the usual approach to this problem, we propose a novel Bayesian modelling framework to overcome the limitations that these traditional models have. Indeed, we modelled source contribution with a Dirichlet process, which allowed us to estimate the number of components that contribute to the particle concentration rather than fixing it. This allowed us to identify latent sources without needing to specify their number a priori. Furthermore, we extended the DP framework by allowing dependence from meteorological variables like wind speed and wind direction, while also smoothing the process using a flexible Gaussian kernel. We applied our method for Particle Number Size Distribution (PNSD) data time series gathered near London Gatwick Airport (UK) in 2019. The results exhibited the effectiveness of our proposed approach to identify the expected sources as well as new ones not previously identified by other methodologies.
Title: Explainable AI, Uncertainty, and Environmental Modeling
Abstract: Statisticians, grounded as they are in the core problems of uncertainty quantification and inference, have had historic issues with machine learning and deep neural network models due to their limited access to the same. In this talk, we will review the key findings of these explorations, including both results for explainability of model-input importance, and coverage and performance of uncertainty quantified model outputs, both in the context of environmental modeling problems.