65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

BPLIM approach to accessing highly sensitive micro databases

Author

RS
Rita Sousa

Co-author

  • P
    Paulo Guimarães
  • G
    Gustavo Iglesias

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Session: IPS 959 - Sharing and Accessing Granular Administrative Data

Wednesday 8 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)

Abstract

As research increasingly depends on data, accessing public datasets has become more difficult, particularly in regions with stringent data privacy laws, like the EU. To address this challenge, some research centers have developed methods to provide secure access to confidential datasets. Banco de Portugal's Microdata Research Laboratory (BPLIM) is a notable example. Established in 2016, BPLIM aims to grant external researchers access to the microdata collected and maintained by the central bank. Initially, BPLIM allowed researchers to analyze confidential datasets by providing remote access to perturbed versions of the original data on the bank's servers. This approach enabled researchers to prepare their scripts, which were then executed on the original data by BPLIM staff. However, this method had limitations, including the need for continuous server access and potential delays in analysis. To overcome these challenges, BPLIM has developed tools that allow researchers to prepare their scripts on anonymized pseudo datasets, which are representative of the original confidential data. These pseudo datasets mirror the structure and metadata of the original data but do not contain any identifying information. Instead of sharing the actual pseudo datasets, BPLIM provides researchers with the code (dofiles) that generates these datasets, ensuring transparency and allowing for customization according to the researchers' needs. Researchers use the pseudo datasets to develop their scripts, which are then submitted to BPLIM. The staff at BPLIM runs the scripts on the original confidential data, and after standard output controls are applied, the results are shared with the researchers. It ensures that the identity of respondents is protected while allowing researchers to perform meaningful analyses on data that accurately represents the original dataset. Overall, BPLIM's innovative workflow represents a significant advancement in how public institutions can provide access to confidential data while maintaining privacy, enabling cutting-edge research in a secure and efficient manner. This approach protects data privacy on the one hand and facilitates rigorous research on the other hand, balancing the needs of data providers and analysts.