Correcting Sample Selection Bias: Practical Challenges
Conference
65th ISI World Statistics Congress 2025
Format: CPS Poster - WSC 2025
Keywords: data integration, nonprobability sample, sample selection bias, weighting
Abstract
Research on correcting sample selection bias in a nonprobability sample is blooming and many approaches for doing this have been proposed in recent literature. One often-used method is assigning a set of pseudo weights to the units in the nonprobability sample. The pseudo weights may then be treated as survey weights. The pseudo weights can be constructed by, for example, estimating the inclusion probabilities of the nonprobability sample with a reference probability sample that shares some common auxiliary variables. However, constructing pseudo weights faces several practical challenges. For example, approaches for model selection in standard statistical analysis (i.e., assuming an i.i.d. sample or a known sampling mechanism) may not be useful. Also, often the nonprobability sample is much larger than the reference probability sample, which may result in a large variation in the pseudo weights. When besides a reference probability sample with common auxiliary variables also (aggregated) results for the target variable of interest based on a probability sample are available, estimates based on the pseudo weights, e.g., for small areas, may be further improved upon. In this presentation, we will share how we try to address these challenges and present simulation results as proof of concept.