Computationally-Intensive Methodologies for Analyzing Large Datasets: A Blissful Marriage Against All Odds?
Conference
Category: International Statistical Institute
Proposal Description
The recent spectacular advances in data storage and processing capabilities have positively contributed to generation, storage, and access to datasets of massive size in a variety of domains, such as in biomedicine, genomics, and bio-behavior. Proper statistical analyses of these datasets are already challenging, due to a variety of non-trivial complexities, in addition to the large-data context. “Necessity is the mother of invention” – the adage goes. With significant advances in computational tools and techniques (both within the classical and Bayesian framework), a variety of computationally-intensive tools are already available to analyze these datasets of varying complexities, and the future of continued methodology development in this direction remains extremely promising. However, whether these techniques are well-equipped and scalable to handle datasets from large domains continues to remain debatable. The well-justified need and timeliness to discuss the pros and cons of these modern techniques, develop alternatives, and further disseminate them not only to theoretically motivated statisticians but also to those with an applied bent of mind cannot be stressed further. An invited session at the 2025 ISI World Statistics Congress would be an excellent platform to do so.
The main purpose of organizing this invited session, in terms of focus, content, timeliness, and appeal, is to bring together a group of researchers in this age of big-data, and explore the recent cutting-edge advances and their limitations in marrying computer-intensive techniques to large datasets. The four speakers are among the most accomplished group of statisticians and data scientists engaged in developing tools and techniques for analyzing large datasets, combining both classical and Bayesian paradigms, and represent a balanced combination of academic ranks, gender, and geographical diversity.
Submissions
- A divide-and-conquer EM algorithm for large matrix-variate non-Gaussian longitudinal data
- Asynchronous and Distributed Data Augmentation for Massive Data Settings
- Supervised Modeling of Heterogeneous Networks: Investigating Functional Connectivity Across Various Cognitive Control Tasks
- Topical Hidden Genome: Discovering Latent Cancer Mutational Topics Using a Bayesian Multilevel Context-Learning Approach