Enhancing Official Statistics through Artificial Intelligence
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: data, imputation, missing
Session: CPS 63 - Transforming Official Statistics
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Session: CPS 63 - Transforming Official Statistics
Tuesday 7 October 5:10 p.m. - 6:10 p.m. (Europe/Amsterdam)
Abstract
In the last years, the need for reliable and timely data has become even more apparent and National Statistical Institutes are increasingly called to develop statistical frameworks to contribute to informed policy decision-making but incomplete or missing data in questionnaires or registers can affect the accuracy and reliability of the results,
The aim of this work is, therefore, to ascertain whether the rapid advances in information technology that have significantly led to advances in the field of artificial intelligence , especially in the branches of Machine Learning and Deep Learning, can also be applied to Official Statistics to solve the problem of incomplete or missing data.
A comparative analysis of different imputation techniques, including traditional statistical methods and cutting-edge deep learning algorithms, has been carried out to achieve this goal.
A comparative analysis of different imputation techniques, including traditional statistical methods and cutting-edge deep learning algorithms, has been carried out to achieve this goal. These techniques include Linear Regression (LR), k-Nearest Neighbour (KNN), Decision Trees (DT), Random Forests (RF), Gradient Boosting (GB), Support Vector Machines (SVMs) and Deep Learning models such as Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), Long-Short Term Memories (LSTMs), Generative Adversarial Networks (GANs) and the recent Transformers.
The comparisons are based on real datasets from Istat Census and multipurpose survey on households, where missing data are common. Preliminary results suggest that ML/AI-based imputation methods outperform traditional statistical techniques in terms of performance and robustness, especially when dealing with complex datasets and high-dimensional features. Therefore, this work aims to explore innovative AI solutions to contribute to the advancement of imputation techniques in official statistics to have more complete and more accurate data <