Large-Scale Spatial Data Science

Large-Scale Spatial Data Science

Large-Scale Spatial Data Science

Instructors: Marc G. Genton, Sameh Abdulah & Mary Lai O. Salvaña

03 October 2025


For more details on registrations and submissions for the Large-Scale Spatial Data Science, please first login to your account. If you do not have an account then you can create one below:

About this short course

The course, designed for data scientists, geospatial analysts, and researchers, will provide a comprehensive understanding of advanced methods in large-scale geospatial data science. The focus will be on three key topics: large-scale data modeling and prediction, accelerating geospatial data processing with multi- and mixed-precision techniques on modern hardware architectures, and parallelizing related R codes using the first parallel runtime system package in R. Participants will first explore ExaGeoStatCPP, a parallel framework for high-performance geostatistical computations. It enables efficient modeling and prediction of large-scale geospatial datasets within C++ and R environments. The course will also focus on the MPCR package, which provides multi- and mixed-precision support on CPUs and GPUs. Attendees will learn how to integrate MPCR functions into their R workflows to optimize performance and precision trade-offs in computational tasks. Participants will also be introduced to RCOMPSs, a new runtime system designed to parallelize R code across HPC systems. The course will demonstrate how RCOMPSs can be used to accelerate R code execution in high-performance computing environments, providing hands-on experience in parallelizing computations effectively. Hands-on sessions will provide practical examples of parallelizing computations. By the end of the course, participants will have gained advanced skills in large-scale geospatial data science and be ready to apply them in their professional roles.

 

Instructors' biographies:

Marc G. Genton

Marc G. Genton

Marc G. Genton is Al-Khawarizmi Distinguished Professor of Statistics at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. He received the Ph.D. degree in Statistics (1996) from the Swiss Federal Institute of Technology (EPFL), Lausanne. He is a fellow of the American Statistical Association (ASA), of the Institute of Mathematical Statistics (IMS), and the American Association for the Advancement of Science (AAAS), and is an elected member of the International Statistical Institute (ISI). In 2010, he received the El-Shaarawi award for excellence from the International Environmetrics Society (TIES) and the Distinguished Achievement award from the Section on Statistics and the Environment (ENVR) of the American Statistical Association (ASA). He received an ISI Service award in 2019 and the Georges Matheron Lectureship award in 2020 from the International Association for Mathematical Geosciences (IAMG). He led a Gordon Bell Prize finalist team with the ExaGeoStat software for Super Computing 2022. He received the Royal Statistical Society (RSS) 2023 Barnett Award for his outstanding research in environmental statistics and the prestigious 2024 Don Owen Award from the ASA’s San Antonio Chapter. He again led a Gordon Bell Prize in Climate Modeling winner team with Exascale Climate Emulators for Super Computing 2024. His research interests include statistical analysis, flexible modeling, prediction, and uncertainty quantification of spatio-temporal data, with applications in environmental and climate science, as well as renewable energies.

Personal webpage: http://stsds.kaust.edu.sa

Sameh Abdulah

Sameh Abdulah

Sameh Abdulah obtained his M.S. and Ph.D. degrees from Ohio State University, Columbus, USA, in 2014 and 2016, respectively. Presently, he serves as a research scientist at the Extreme Computing Research Center (ECRC), King Abdullah University of Science and Technology, Saudi Arabia. His research focuses on various areas, including high-performance computing applications, big data, bitmap indexing, handling large spatial datasets, parallel spatial statistics applications, algorithm-based fault tolerance, and machine learning and data mining algorithms. Sameh was a part of the KAUST team nominated for the ACM Gordon Bell Prize in 2022 and winning it on 2024 (climate track) for their work on large-scale climate/weather modeling and prediction.

Personal webpage: https://sites.google.com/view/samehabdulah

Mary Lai O. Salvaña

Mary Lai O. Salvaña

Mary Lai O. Salvaña is an Assistant Professor in Statistics at the University of Connecticut (UConn). Prior to joining UConn, she was a Postdoctoral Fellow in the Department of Mathematics at the University of Houston. She received her B.S. and M.S. degrees in Applied Mathematics from Ateneo de Manila University, Philippines, in 2015 and 2016, respectively, and Ph.D. degree at the King Abdullah University of Science and Technology (KAUST), Saudi Arabia. Her research interests include extreme and catastrophic events, risks, disasters, space-time statistics, environmental statistics, high performance computing, and computational statistics.

Personal webpage: https://marylaisalvana.com/

 

Course Summary

The course, designed for data scientists, geospatial analysts, and researchers, will provide a comprehensive understanding of advanced methods in large-scale geospatial data science. The focus will be on three key topics: large-scale data modeling and prediction, accelerating geospatial data processing with multi- and mixed-precision techniques on modern hardware architectures, and parallelizing related R codes using the first parallel runtime system package in R. Participants will first explore ExaGeoStatCPP, a parallel framework for high-performance geostatistical computations. It enables efficient modeling and prediction of large-scale geospatial datasets within C++ and R environments. The course will also focus on the MPCR package, which provides multi- and mixed-precision support on CPUs and GPUs. Attendees will learn how to integrate MPCR functions into their R workflows to optimize performance and precision trade-offs in computational tasks. Participants will also be introduced to RCOMPSs, a new runtime system designed to parallelize R code across HPC systems. The course will demonstrate how RCOMPSs can be used to accelerate R code execution in high-performance computing environments, providing hands-on experience in parallelizing computations effectively. Hands-on sessions will provide practical examples of parallelizing computations. By the end of the course, participants will have gained advanced skills in large-scale geospatial data science and be ready to apply them in their professional roles.

 

Course outline

We can summarise the proposed course topics as follows:

  • Overview of Spatial Statistics: introduction to spatial statistics, including background and tools for large-scale spatial data manipulation.
  • Introduction to High-Performance Computing (HPC) and Parallel Systems: overview of HPC and parallel hardware systems with an introduction to ExaGeoStatCPP and its large-scale geospatial data modeling capabilities using modern parallel systems, including GPUs.
  • Hands-on: Spatial Data Modeling and Prediction with ExaGeoStatCPP in R: practical session on spatial data modeling and prediction, focusing on performance and accuracy with large synthetic and real datasets.
  • Introduction to Multi-Precision and Mixed-Precision Computing: overview of multi-precision and mixed-precision computing, featuring the MPCR R package on CPU and GPU architectures.
  • Hands-on: Spatial Data with Multi- and Mixed-Precision Computation in R: practical session using the MPCR package to process spatial data with multi- and mixed-precision techniques in R.
  • Overview of Parallel Processing with RCOMPSs: introduction to parallel processing using the RCOMPSs runtime system in R, focusing on task-based parallelism.
  • Hands-on: Developing Task-Based Algorithms for Big Data: practical session on building task-based algorithms with examples of parallelizing spatial data analysis for big data applications.

The prerequisites for attending this short course include having a background in data science, geospatial analysis, or related research. It is designed for individuals who want to advance their skills in large-scale geospatial data science, specifically those interested in geospatial data modeling, multi-precision computing, and parallelization of R code for high-performance computing.


For more details on registrations and submissions for the Large-Scale Spatial Data Science, please first login to your account. If you do not have an account then you can create one below: