The process for machine learning at Statistics Sweden
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: machine learning, official statistics, production-process
Session: CPS 60 - Technology and Knowledge Integration in Official Statistics
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
The production of statistics at Statistics Sweden is governed by the Swedish Process Model, which is similar to the GSBPM. The Process Model is operationalized in the Process Support System (PSS), which describes the phases and subprocesses of the statistical production. To facilitate the use of machine learning to improve efficiency in processes such as imputation, editing, and coding, Statistics Sweden has developed a process for machine learning which is fully integrated in the Process Model and PSS.
The process for machine learning was introduced as an overarching process in the PSS in 2023. It may be used to support development and implementation of machine learning applications for use in any other process. The process is in part built on CRISP-DM and inspired by previous work on developing and using machine learning for official statistics production by, e.g., UNECE.
The initial subprocess maps user needs, conditions, risks, and requirements of the task at hand to identify challenges and guide subsequent work. It is followed by subprocesses on development and validation of machine learning models including data processing, model building, and model selection. The chosen model and its maintenance plan are used as inputs for the final deployment and production subprocesses. Each subprocess is described by its realisation, inputs, and outputs, of which the latter are typically used as inputs for subsequent subprocesses.
The development of the process was motivated by the process-based workflow at Statistics Sweden and by previous experiences of machine learning development. It has been used successfully to support several machine learning projects. The process is continuously undergoing development and maintenance on, for example, documentation, standardization, and integration with technical platforms.