65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Combining Probability and Nonprobability Samples on an Aggregated Level

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: administrative data, data integration;, nonprobability sample, probability sampling

Abstract

Probability surveys are experiencing important drawbacks nowadays: costs are relatively high and participation rates are decreasing, which could yield less accurate estimates. Alternatively, nonprobability samples like administrative records are having a rise in popularity due to their convenience and low costs. Unfortunately, nonprobability samples are often selective and, as the underlying sampling design is unknown, estimators based on such samples are generally biased. Research is ongoing on how to deal with this selection bias.

In this presentation, a method is proposed that combines estimators from a probability and nonprobability sample on an aggregated level. Our estimator is constructed as a weighted mean of both estimators. The weight is chosen to minimize the expected value of the mean squared error (MSE) of the combined estimator under an assumed model for the bias in the estimator based on the nonprobability sample. Our method does not require any data on the level of the individual units in the samples. We performed simulation studies where two different methods of modeling the bias in the nonprobability sample were tested. We also applied one of these methods to a real dataset from Statistics Netherlands and showed that the MSE was indeed reduced in a real application.