Estimating Small Area Indicators with Complex Survey Data and Limited Auxiliary Data
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: complex sampling design, official statistics, small area estimation
Session: CPS 13 - Small Area Estimation for Policy and Socio-Economic Modelling
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
Many surveys face the issue of small sample sizes within certain subpopulations. Small area estimation is a powerful tool to address this problem. Small area models combine survey data (from complex sampling designs) with auxiliary population data. Typically, spatially disaggregated indicators are estimated using model-based methods that assume access to auxiliary information from population micro-data. However, in many countries like Germany, population micro-data are not publicly available. Therefore, there is a need for small area estimators that can work with survey data from complex sampling designs in the absence of population micro-data.
As small area models rely on linear mixed models, the Gaussian assumption of the error terms must hold. In practice, this assumption is often not met for many indicators, so transforming the response can help satisfy the model assumptions.
In the literature, small area estimators for survey data with informative weights in the absence of population micro-data are discussed. However, there is a research gap in combining addressing this data situation with the simultaneous use ofwhile simultaneously using (data-driven) transformations needed for many applications.
In the absence of population micro-data, appropriate bias-corrections for small area prediction are needed , taking into account the weighting of the individuals. The approach we propose uses aggregate statistics (means and covariances) and kernel density estimation to resolve adress the issue of not having access to population micro-data. To measure the uncertainty of the estimators, a parametric bootstrap is proposed.
We evaluate the proposed method against competitors (those using population micro-data or ignoring the informative weights) through extensive model-based and design-based simulation studies. These simulations demonstrate the validity and necessity of the proposed estimator. Finally, the proposed methodology is applied to the aggregate census information from Germany for estimating a spatially disaggregated indicator.