Integrating available Data Sources for Official Statistics: Challenges and Opportunities
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: data,, official statistics
Session: CPS 61 - Data Integration in Official Statistics
Monday 6 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
Data integration refers to the technical and business processes used to combine data from multiple sources to provide a unified, single view of the data. The production of official statistics is a multifaceted concept. Many of these facets are affected by the nature of the data. In a statistical office three basic data sources are now a days identified: survey, administrative and digital data. Official statistics is associated with Big Data.
Data sources for the production of official statistics can be grouped in survey data, administrative data, and digital data. The advent of both administrative and digital data introduces important changes in the production landscape of statistical offices. In the latest years, there are the challenges for statistical offices likely to face to profit from new data sources and analytical method. This session will start by setting the scene of the current official statistics system, with a focus on fundamental principles and dimensions relevant to the use of non-traditional data.
The very fast evolution of the information technologies has changed our lives. Nowadays, almost every human activity leaves a digital footprint: from searching information on Internet using a search engine to using a mobile phone for a simple call or paying a product with a credit card, the traces of these activities are stored somewhere in a digital database. Accordingly, these enormous quantities of data will draw the attention of statisticians who started to consider their potential for computing new indicators.
It is described the new technologies needed to handle new big data sources and emphasize that the computing technologies are evolving with an unprecedented speed and what it seems to be now the best solution. It will then present some experiments and proofs of concepts in the context of data innovation for official statistics followed by a discussion on prospective challenges related to sustainable data access, new technical and methodological approaches and effective use of integrated This will also provide some examples of concrete computing environments used for experimental studies in the official statistics area.
Different phases of the statistical production process such as drawing the samples, data editing and imputation, calculation of aggregates, calibration of the sampling weights, seasonal adjustments of the time series, performing statistical matching or record linkage use specialized software routines, most of the time developed in-house by some statistical agencies and then shared with the rest of the statistical community. This papers idea works for official statistics like commercial products such as open source software like R or Python.
It is described the new technologies needed to handle new big data analytics sources, implementing Machine learning algorithm and AI model to identify patterns, trends and anomalies in data set thereby improving and efficiency of statistical analysis, data linkage, interactive data visualization, real time data processing. This AI and Machine learning model can be used in official statistics.
By embracing the innovative approaches statistician can enhance the quality, timeliness and relevance of official statistics leading to better informed decision making and improve public policy outcomes.
Figures/Tables
Data Integration concept
Data Integration