65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Study on Large language Models and Knowledge Graphs integration for item classification

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: ai, classification, kgs, llms, nlp, statistical models

Session: CPS 78 - AI and Machine Learning in Statistics

Monday 6 October 5:10 p.m. - 6:10 p.m. (Europe/Amsterdam)

Abstract

Recently, interesting artificial intelligence (AI) has increased significantly, Large Language Models (LLMs) are also attracting great interest. However, because LLMs are a black box model, there were limitations to obtaining factual knowledge. To compensate for these limitations of LLMs, Knowledge Graphs (KGs), which is required to explicitly store and structure factual knowledge, appeared. KGs can improve LLMs because it provides external expert knowledge on reasoning and interpretation. This paper proposes integrating knowledge graphs (KG) to enhance LLMs. Fine-tuning LLMs on such datasets can extract both factual and structural knowledge from KGs, thereby improving the reasoning capabilities of LLMs. This paper also recognizes the urgency of discovering methodologies necessary for compiling statistics in such a rapidly changing environment, and aims to propose a method that integrates LLMs and KGs to contribute to AI-based online price information refinement and utilization. Statistics Korea uses web scraping technology to collect online price data, gathering 322 product items daily. The collected online price data is notably large in volume and characterized by various unstructured data, necessitating data collection and refinement. Daily price data for 132 product items are provided, and 'ramen', which has 4,614 unstructured data entries, is selected for analysis. The results obtained through the integration of LLMs and KGs are significantly better than those performed with LLMs alone. Rather than selecting unstructured items through LLMs analysis, it is necessary to approach the selection of item classifications in a more structured manner for faster and more accurate classification. While item classifications selected using LLMs with applied weights may exhibit fluid dynamics, applying methodologies suggested by KGs can create a more refined and accurate item classification structure. In this study, we presented a methodology for an item classification system applied by integrating LLMs and KGs for efficient natural language processing, and presented results that improved both LLMs and KGs through two-way reasoning based on data and knowledge.