Streamlining Official Statistics Production via Standardization: Practical Solutions and Trade-Offs for a Hard Problem
Conference
Category: International Association for Official Statistics (IAOS)
Proposal Description
Over the past decade, official statistics organizations have focused on developing common ways of understanding the information, processes and architectures used to produce high-quality, meaningful, accessible and timely statistics. Standard resources developed by the UNECE ModernStats, such as the Generic Statistical Information Model (GSIM), the Generic Statistical Business Process Model (GSBPM), the Common Statistical Data Architecture (CSDA), the Core Ontology for Official Statistics (COOS), and the Metadata Glossary, to name but a few, provide together a solid, conceptual and methodological foundation to help statistical agencies achieve this goal. In parallel, the development and use of implementation standards in the statistical domain, for instance SDMX, DDI, VTL, SKOS, DCAT, and schema.org, has been expanding, and many standardized tools have been implemented with them. This flourishing of standardization activities, together with the development of supporting frameworks, like the UNECE Data Governance Framework for Statistical Interoperability (DAFI), has led to many normalized descriptions of both statistical processes and data and to some degree of automation and interoperability. In the context of AI, standards are starting to play a pivotal role in structuring processes, providing context and improving quality in general, for instance by enhancing the accuracy and transparency of the results and reducing the assumptions needed for AI to interpret data.
However, this multiplication of standards has had three major unintended consequences. First, none of these standards constitutes in itself a silver bullet for improving quality and finding operational efficiencies, which means there is a growing need for understanding exactly how, when, and where to use each of them across the statistical production process. Second, there is an “impedance mismatch” among them since they were developed by different communities that see the world in fundamentally different ways, which hinders interoperability. Third, there is a disconnect between the conceptual models and their implementation counterparts that gets in the way of developing cost-effective and efficient end-to-end production pipelines.
The session will discuss use cases and ongoing work towards practical solutions to address these issues based on rich, standardized, and machine-actionable metadata and mappings. These approaches aim at producing automated data pipelines and making data findable, accessible, interoperable and reusable, in line with the FAIR principles. Concretely, the session will elaborate on how implementation standards can be used together with conceptual ModernStats models to improve interoperability at technical, semantic and organizational levels, and how they can be leveraged to build statistical production pipelines that are metadata-driven, semantically consistent and reusable.