Regional analysis of taxable income in Berlin and spatial anonymization methods
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: anonymization, kernel estimation, spatial statistics
Session: CPS 71 - Spatial Data and Geomasking
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
For the first time, the geocoded taxable income based on taxpayers' places of residence is available for research. Previously, only analyses at the municipal level were possible. In the case of Berlin, the taxable income of all income taxpayers was described by a single figure. However, Berlin is characterized by a heterogeneous society and very different taxable incomes. Therefore, a smooth regional representation using a kernel density is aimed for in order to demonstrate the local structure. In addition, interest lies in the analysis of the regional shares of high and low earners.
When analyzing highly sensitive data, the question arises how the data can be presented and how it can be made accessible to the public securely. One anonymization method frequently used in official statistics is aggregation. However, aggregated data has some disadvantages, such as discontinuities at the boundaries of the administrative aggregation areas. Using an algorithmic approach based on aggregated data, a smooth representation can be generated. Other anonymization methods, in which the original coordinates are shifted using a Gaussian distribution, are known as geomasking.
This raises the question of the aggregation or shift level at which local structures can be again recognized and error measures are used as a reference.