65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Statistical Disclosure Control Strategy for microdata from the Brazilian ICT Enterprises Survey

Author

DB
Denise Silva

Co-author

  • C
    Camila dos Reis Lima
  • M
    Mayra Pizzott R. dos Santos
  • S
    Sâmela B. Arantes

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: confidentiality, survey-data

Session: CPS 65 - Enhancing Data Access and Privacy Protection in Official Statistics

Wednesday 8 October 4 p.m. - 5 p.m. (Europe/Amsterdam)

Abstract

The careful handling and protection of confidential information is required according to the Fundamental Principals of Official Statistics (United Nations Statistics Division [UNSD], 2014). Principle 6 - Confidentiality states that individual data collected by National Statistical Offices, whether they refer to natural or legal persons, must be strictly confidential and used solely for statistical purposes.

In an effort to publish microdata from an ICT survey of enterprises, the Regional Center for Studies on the Development of the Information Society (Cetic.br) investigated the use of Statistical Disclosure Control (SDC) procedures. SDC tools enable the processing and treatment of microdata to reduce the risk of disclosing confidential information. The risk pertains to the possibility of re-identifying an enterprise in published microdata or revealing sensitive attributes by combining its data with external information. Therefore, the initial development of a confidentiality protection strategy involves creating intruder scenarios and measuring the disclosure risk.

The ICT Enterprises survey conducted by Cetic.br gathers information on "the ownership and use of information and communication technologies (ICT) among Brazilian companies with 10 or more employees" (CGI.br 2020). A stratified sample is used in which the stratification variables are geographic regions, economic activity, and company size (defined by the number of employees). For this study, 201 variables were considered, most of them categorical.
The microdata from the ICT Enterprises survey is currently made available to specific users under confidentiality agreements. Given the growing demand for microdata from this survey, we report the evaluation of individual disclosure risk, considering different intruder scenarios, and present a proposal for statistical treatment to enable the public release of the ICT Enterprises microdata.

The original data (2019 survey data) contained 160 records with disclosure risk higher than 10% and a maximum risk value of 54%. Non-perturbative protection methods such as global recoding and local suppression were tested and implemented for controlling disclosure risk. The SDC-treated microdata achieved a maximum individual disclosure risk of 15% with only 7 companies presenting risk higher than 10%. In addition, information loss was evaluated based on the survey tabulation plan (comparing contingency tables) and the number of records changed .

Finally, after considering a specific disclosure scenarios and assessing the risk of company identification in the ICT Enterprises survey microdata base, the proposed SDC strategy showed good performance in controlling the risk of identity disclosure of respondent companies.

Keywords: Statistical Disclosure Control, microdata, ICT Enterprises