Anonymization for integrated and georeferenced Data (AnigeD)
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: anonymization, confidentiality, data_integration, geo statistics, synthetic
Session: CPS 71 - Spatial Data and Geomasking
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
Data-based information plays a central role in politics, business, science and public life. With digitization and the exponential growth of stored data, as well as new analytical methods such as machine learning, the possibilities for evidence-based decision making have expanded and evolved significantly.
A key challenge in integrating disparate data sets from different data custodians is the protection of personal privacy and trade secrets within organizations. This currently hinders both the wider use of data as a product and the use of integrated data in policy advice and scientific research. Methods for anonymization and statistical confidentiality face the challenge of finding a compromise. On the one hand, they need to protect the information of the data subjects, while on the other hand, the chosen methods should still offer sufficient analysis and information potential for the anonymized data. Anonymization and confidentiality of individual data leads to information reduction.
In the past it has been shown that common anonymization strategies for individual data in economic statistics led to de facto or absolutely anonymized data sets, which were severely limited for scientific analyses due to the reduced or even distorted information potential. Anonymization and pseudonymization of data, which limits the risk of detection to an acceptable level while preserving sufficient analytical potential, is therefore essential for wider use and value creation.
The AnigeD competence cluster is part of the "Research Network Anonymization for Secure Data Use" of the German Federal Ministry of Education and Research (BMBF) within the framework of the Federal Government's IT security research program "Digital. Secure. Sovereign”. The thematic focus, which is supported by various research strands, is the further and new development of strategies for the protection of personal and company-related data when using complex integrated data sets. Not only the integration of different data via direct identifiers or probabilities is relevant, but also the integration and linking of data via regional information in the form of georeferencing.
The talk will introduce the challenges in the anonymization process and its possible impact on the quality of statistical products and publications. It will summarize the research results of the cluster in regard to the main research strands within the AnigeD project: 1) Evaluation of anonymized data according to formal criteria, 2) Anonymization through synthetic data, 3) Anonymization of georeferenced data, and 4) Open software tools for anonymization.