Developing a Comprehensive Business Glossary and Core Ontology (CBGCO)
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: standardisation
Session: CPS 67 - Data Quality and Management in Official Statistics
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
This paper covers the development of a Comprehensive Business Glossary and Core Ontology (CBGCO) in SCAD to serve the Emirate of Abu Dhabi. Initially, the project focused on developing a Business Glossary to unify key concepts used across various domains in producing official statistics. Standardization addresses common ambiguities in statistical offices. The CBGCO aims to mitigate these obstacles by establishing a unified glossary that aligns concepts across different domains. By defining key concepts and their interdependencies in a consistent manner, users can enhance the comparability of data, streamlining data integration processes, and fostering cross-domain collaboration.
The spread of Large Language Models (LLMs) and their application in statistical analysis underscores the critical need for standardized concepts and semantic frameworks. LLMs excel in processing vast amounts of unstructured data, yet their effectiveness hinges on precise definitions and contextual understanding. The core ontology will provide structured definitions with semantic relationships that facilitate accurate interpretation and manipulation of data by LLMs, thereby enhancing the reliability and reproducibility of statistical analyses.
In a proof-of-concept initiative, we focused on applying the ontology framework to a specific domain extracted from the Comprehensive Business Glossary and Core Ontology (CBGCO). This pilot aimed to demonstrate the feasibility and effectiveness of using ontological principles within statistical methodologies. We evaluated the application of structured definitions and semantic relationships to enhance data interpretation and analysis. The outcomes of this pilot study will inform the broader implementation strategy across all domains. Based on the results and insights gained, we will refine and expand the ontology framework to encompass additional domains within SCAD. This iterative approach ensures that the ontology's benefits in enhancing LLM applications, prompt engineering practices, and machine learning capabilities are maximized across diverse data sets and analytical contexts.
SCAD can enhance communication with stakeholders and facilitate informed decision-making based on robust data interpretations. Moreover, supporting the evolution of emerging trends in data analytics and technological advancements. This paper advocates the adoption of CBGCO as an initiative to strengthen the credibility and utility of statistical outputs, supporting data-based decision making in the Emirate of Abu Dhabi.
Key words: Glossary, Ontology, Standardization, Data Governance.