Download PDF

Nonprobability samples and uncertainty: Avoiding maximal estimation error via averaging

Author

Alexander Murray-Watters

Co-author

Stefan Zins
Joe Sakshaug

Conference

65th ISI World Statistics Congress

Format: IPS Abstract - WSC 2025

Keywords: model_averaging, nonprobability, uncertainty

Session: IPS 700 - Non-probability and Probability Sample Integrated Estimators for the Population Parameters

Monday 6 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)

Abstract

Online non-probability sample data are often treated as if they were obtained from a simple random sample drawn from the general population. As the exact sampling frame for these samples is typically unknown, there is no general method to construct unbiased estimators. This suggests a question: Are estimates based on online non-probability samples consistent across sample vendors and with respect to probability-based estimates? To address this question, we analyze data collected from 8 different online non-probability sample vendors and one online probability-based sample. We find that estimates from the different non-probability samples are often highly inconsistent, and suggest averaging estimates across multiple vendor samples to eliminate the risk of worst-case (maximum) estimation error. We assess multiple averaging approaches, including a LASSO regression procedure which discovers a subset of vendors that, when averaged, produces estimates more consistent with the reference probability-based estimates, outperforming any single vendor. Our results imply that averaging across multiple non-probability sample vendors, depending on the research question at hand, may result in substantial gains in estimation precision"