65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Challenges and Insights in Developing and Evaluating a Retrieval Augmented Generation Agent for Official Statistics

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: artificial intelligence, large language models

Session: CPS 64 - Data Dissemination and User Engagement in Official Statistics

Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)

Abstract

This paper presents an experimental implementation of a Retrieval Augmented Generation (RAG) system at the French National Statistics Institute (INSEE). The primary objective was to develop an agent capable of retrieving accurate information from INSEE's website while minimizing hallucinations. We focus on the methodological challenges encountered during development and evaluation.
The RAG approach integrates a Large Language Model (LLM) with a retrieval mechanism based on official statistical publications. This experimental setup aimed to enhance the model's ability to provide reliable, domain-specific information.
Our evaluation methodology incorporated multiple strategies to assess the performance and reliability of the RAG system. We utilized real user queries and corresponding expert-provided answers from INSEE. We supplemented this with manual annotations to fine-tune the system and assess its performance. We also implemented a strategy using LLM as a judge to provide an additional perspective on the system’s performance. These different approaches allowed us to explore the complexities of evaluating RAG systems in a specialized context. By combining human expertise, real-world data, and AI-driven assessment, we aimed to gain a comprehensive understanding of the system's capabilities and limitations.
Key challenges included developing effective retrieval strategies, mitigating LLM hallucinations, and establishing robust evaluation metrics. Our findings provide insights into the practical implementation of RAG systems for official statistics and highlight areas for future work.