65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Critical Points on Using Differential Privacy for Official Statistics: Case Study on Economic Census Data of Indonesia

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: data-simulation, differential privacy, economic-census, machine learning, statistical-query

Abstract

The demand for open sharing data between government agencies along with the escalating need for various statistical indicators are constantly rising due to the growing awareness of evidence-based policy making. Despite the fact that the provision of various statistical indicators as a single source of truth is highly beneficial to serve the societal need, challenging issues on privacy risk inevitably grows as more computations are released. In the latest economic census, the National Statistics Office (NSO) of Indonesia (BPS-Statistics Indonesia) has already implemented a set of disclosure avoidance mechanisms both on microdata and statistical indicators release. However, the emerging study explains that a dedicated computational tables query might lead a person to locate a specific information of observation which results in disclosing its personal information. Following this issue, we assessed that it is crucial to implement a privacy mechanism acting based on the nature of the dataset, while maintaining the quality of the economic census data considering that it is the actual enumeration of business establishments in Indonesia. This research aims to study the feasibility of implementing the Differential Privacy (DP) mechanism on the nature of Indonesia’s 2016 economic census data, addressing the growing importance of data privacy issues for the upcoming economic census in 2026. To achieve this, a series of DP mechanism simulations was done both on statistical queries and machine learning domain. To study the implementation of DP mechanism on statistical queries, we considered both numerical and categorical data provided in the economic census dataset by quantifying their performance under different data sizes. Meanwhile, the DP modeling on the machine learning domain was conducted under two different algorithms, i.e. Gaussian Naive Bayes and Logistic Regression. We observed how the sequence of privacy budgets give impact to the accuracy of the queries and modeling. Through the findings of this study, we contribute a basic knowledge as a consideration for the upcoming economic census in 2026 on generating privacy-protected microdata release and privacy-protected aggregate tables for statistical indicator release.