65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Use of AI for raising awareness of users of statistics on data quality

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: artificial-intelligence, data-quality, user-dialogue

Session: CPS 55 - Public Engagement and Statistical Literacy

Monday 6 October 4 p.m. - 5 p.m. (Europe/Amsterdam)

Session: CPS 55 - Public Engagement and Statistical Literacy

Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)

Session: CPS 55 - Public Engagement and Statistical Literacy

Tuesday 7 October 5:10 p.m. - 6:10 p.m. (Europe/Amsterdam)

Abstract

This paper addresses the challenges associated with assessing the quality of statistical data from the perspective of data users. Traditional quality reports produced by data providers are often difficult for users to comprehend due to excessive details and lack of clarity regarding the usability and limitations of the data, or combinations of statistical data. In this study, we propose a conceptual model aimed at informing users about the quality of statistics through web services that utilize artificial intelligence, namely large language models, and socio-psychological theories. Our goal is to present quality information in a format that is easily understandable and engaging for users, ultimately increasing their interest in data quality.

Drawing inspiration from theories on survey participation, we train models to extract data from user interactions with web services and gather feedback from users. We stress the critical point of assuring the user to be informed and become interested to read the quality declaration, even very brief summary information to the level of detail they need in their usage of statistics. By tailoring our approach to different user personas, we aim to improve the communication of data quality information and enhance user awareness. Our study also explores how statistical agencies can leverage artificial intelligence to effectively communicate data quality information in a responsible, transparent and trustworthy manner.

We pilot the use of these models to gather reliable information that meets user requirements and develop a proof-of-concept for interactive statistical data services. Through focus groups, we assess the potential of large language models in improving communication on data quality and tailor the information to better suit the diverse needs of users. Our ultimate goal is to empower users to make informed decisions about the suitability of individual statistics for their purposes and enable them to effectively utilize data from statistical databases.