AI and quality in official statistics
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: "statistical_quality_control, artificial intelligence
Session: IPS 889 - From Theory to Practice - Implementing Generative AI in Statistical Organizations
Thursday 9 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
Artificial intelligence (AI) is an emerging software technology that can undertake tasks that would traditionally require human intelligence. In this paper, we explore the use of generative AI and large language models in official statistics. We focus on quality assurance and consider how the quality assurance of AI-assisted work aligns with traditional assurance frameworks and standards. We draw on some exploratory case studies from the UK Office for National Statistics.
Our focus here is on Large Language Models (LLMs) which use generative neural networks trained on large volumes of information. LLMs can perform a wide range of tasks with direct application for official statistics, including writing code, giving advice, answering questions, summarising or transcribing conversations and documents, translating languages, designing survey questions, or performing data analysis. The technology is evolving at an increasing pace, with significant step changes in performance over periods of a few months.
Frameworks for assessing, validating, and assuring the quality of large language models and their outputs in the context of analysis and statistics are still emerging, although many metrics have been suggested. As National Statistical Institutes begin to incorporate AI into our work, how might we assure quality while making sure that we maintain quality and transparency?
Our paper discusses the approach taken to AI adoption at the UK Office for National Statistics. We have taken a controlled and phased approach. We are starting with an evidence-gathering stage, understanding quality risks, their impact and likelihood. This includes the evaluation of a series of prototype use cases which address potential applications of AI for official statistics. Case studies include the automatic generation of survey questions, improving data linkage using Large Language Models, extracting text from PDF files and supporting developers as they generate statistical code. We explore these examples in the paper and draw out key lessons.
A significant challenge is the lack of agreed and standardised frameworks for evaluating the specification, performance, and output of LLM-generated analytical content in the context of official statistics. There are robust quality assurance frameworks for analysis already in place, but those may need to be adapted to consider specific challenges related to AI deployment. We reflect on how well our generic assurance frameworks have been able to deal with the challenges of AI quality assurance in the context of our exploratory work and draw out common themes.