Integrating Data Science into Official Statistics
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: alternative data sources, data science, official statistics
Session: IPS 734 - Data Science and Official Statistics: Toward a New Culture
Monday 6 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)
Abstract
The discussion on advantages, disadvantages, limitations, and requirements of using alternative data sources integrated with probability sample surveys informs the debate in national and international statistical systems worldwide. The temptation to replace rigorous and costly data collection approaches with “smarter” ones is increasing. However, evaluating the reliability of statistics produced by elaborating alternative data sources is mandatory. In this work, we analyze the relationship between data science, new data sources, machine learning, citizen science, smart statistics, official statistics and the role of probability sample surveys.
We show that elaborating satellite data through parametric and machine learning classifiers does not always provide accurate statistics in complex landscapes, and machine learning classifiers do not systematically outperform parametric classifiers. Moreover, data collected by probabilistic samples play a crucial role. They should not be replaced by data collected by citizens without clear and strict guidelines in case official statistics have to be produced.