Data science and the future of official statistics
Conference
64th ISI World Statistics Congress
Format: IPS Abstract
Keywords: data science, official statistics
Session: IPS 90 - Fourth Industrial Revolution, Data Science and the Future of Official Statistics
Monday 17 July 2 p.m. - 3:40 p.m. (Canada/Eastern)
Abstract
Data science covers data engineering, data analytics, machine learning, artificial intelligence and more. Data science has three main functions for official statistics. Firstly, it can be used to build reproducible data pipelines. This is the automation of the statistical production processes, which increases efficiency and improves quality. Secondly, it can be used for supplementary analysis and insights. Whereas the existing statistics and indicators are often routinely and regularly produced, the supplementary indicators would be produced for emerging issues to provide additional insights, which are important for that moment, but not necessarily as a continuous series. Thirdly, data science can be used to transform the statistical production process. An example of this is the use of scanner data from retail stores and webscraping of prices from the internet combined with traditional price surveys to produce regular consumer price indices. While transforming the production processes, institutes may at the same time adopt new ways of working, like using the DevOps approach with the corresponding technological, process, measurement and cultural capabilities. All these applications of data science require new skills. Existing staff could be trained to acquire these new skills, or the institute could recruit data scientists, data engineers and data analysts.