65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Exploring ODQA for Improved Understanding of Complex Numerical Tables

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: nlp, numerical-tables, open-domian-question-answering

Session: CPS 78 - AI and Machine Learning in Statistics

Monday 6 October 5:10 p.m. - 6:10 p.m. (Europe/Amsterdam)

Abstract

The current volume of data is growing rapidly and presented in various forms for easy comprehension by data users. Tables are one of the commonly used formats for data presentation due to their advantages in organizing data into a standardized structure, making retrieval and comparison of information easier. Despite tables being compact, easy to read, and process, understanding data can be facilitated further through visual forms like graphs or maps, especially when dealing with complex tables. However, tables offer richer and more comprehensive information compared to graphs or maps. Therefore, solutions are needed to help users to find information more easily and quickly, especially in complex tables. This challenge is evident even at BPS-Statistics Indonesia, the official provider of statistical data in Indonesia. BPS presents data from surveys and censuses in the form of numerical tables, displaying numbers in rows and columns. Moreover, the number and variety of tables released continues to grow, making it increasingly difficult for data users to find tables that are relevant to their needs. Hence, mechanisms are necessary to facilitate easier discovery of relevant tables for users. On the other hand, advancements in Natural Language Processing (NLP) offer potential solutions to address these challenges through Open Domain Question Answering (ODQA), which can identify answers to natural language questions based on large-scale documents (PDFs, Word files, Tables). ODQA has two main components: Retriever, which serves as an Information Retrieval (IR) system to identify and retrieve a set of documents that potentially contain relevant answers, and the Reader, which analyzes the retrieved documents to derive the answer to the user’s question. The objective of this research is to explore the application of ODQA by comparing several publicly available ODQA models to find answers from complex numerical tables based on user questions. Furthermore, research on ODQA models applied specifically to numerical tables is limited. Therefore, this study contributes by assessing how well ODQA models can be applied to numerical tables. Each model is evaluated using automated evaluation to measure the accuracy of answers provided to questions. Additionally, human evaluations are conducted to compare results before and after using ODQA. The findings of this research provide recommendations for effective ODQA models to enhance users' understanding of data presented in complex numerical tables.