Challenges and Insights in Developing and Evaluating a Retrieval Augmented Generation Agent for Official Statistics
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: artificial intelligence, large language models
Session: CPS 64 - Data Dissemination and User Engagement in Official Statistics
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
This paper presents an experimental implementation of a Retrieval Augmented Generation (RAG) system at the French National Statistics Institute (INSEE). The primary objective was to develop an agent capable of retrieving accurate information from INSEE's website while minimizing hallucinations. We focus on the methodological challenges encountered during development and evaluation.
The RAG approach integrates a Large Language Model (LLM) with a retrieval mechanism based on official statistical publications. This experimental setup aimed to enhance the model's ability to provide reliable, domain-specific information.
Our evaluation methodology incorporated multiple strategies to assess the performance and reliability of the RAG system. We utilized real user queries and corresponding expert-provided answers from INSEE. We supplemented this with manual annotations to fine-tune the system and assess its performance. We also implemented a strategy using LLM as a judge to provide an additional perspective on the system’s performance. These different approaches allowed us to explore the complexities of evaluating RAG systems in a specialized context. By combining human expertise, real-world data, and AI-driven assessment, we aimed to gain a comprehensive understanding of the system's capabilities and limitations.
Key challenges included developing effective retrieval strategies, mitigating LLM hallucinations, and establishing robust evaluation metrics. Our findings provide insights into the practical implementation of RAG systems for official statistics and highlight areas for future work.