Non-probability and Probability Sample Integrated Estimators for the Population Parameters
Conference
Category: International Association of Survey Statisticians (IASS)
Proposal Description
Probability surveys are experiencing important drawbacks nowadays: costs are relatively high and participation rates are decreasing, which could yield less accurate estimates. Alternatively, nonprobability samples like administrative records or volunteer web-surveys are having a rise in popularity due to their convenience and low costs and respondent burden, and quick turnaround since they allow for producing estimates shortly after the information needed has been identified. Unfortunately, their constituent way is usually unknown, the estimators based on such samples are usually biased and there is no general method to construct unbiased estimators. Nevertheless, the research is going on to gain use of the non-probability data sets by combining them with the probability survey data. This session consists of four presentations and includes several cases of integrated estimators for improving accuracy of the final population estimators.
1. The question of the consistency of the non-probability samples across sample vendors has arisen. The authors analyze data collected from several different online non-probability sample vendors and one online probability-based sample. They find that the estimates of the parameters across multiple non-probability samples are often highly inconsistent. Therefore, they propose averaging of the estimates obtained from different sample vendors, with the probability sample data exploited in the averaging procedure. The consistency of the estimates obtained with the reference probability sample-based estimates is an objective of a study. Speaker: Alexander Murray-Waters.
2. The values of the binary study variable are available in both, probability and non-probability samples. Under independence of the pseudo-inclusion indicators to the non-probability sample, the composite estimator of the population total is studied. The integration is composed of the linear combination of the inverse probability weighted estimator and a design-based estimator. Evaluating the variance of the former estimator, the randomness of the underlying non-probability sample is taken into account through the distribution of the estimated propensity scores. Speaker: Vilma Nekrasaite-Liege.
3. The authors estimate the proportions of units belonging to the certain categories across finite population domains. Probability sample and non-probability sample from the same population is available. The composite estimator integrating estimators from both samples is used. Two models for the selection bias of the estimator obtained from the non-probability sample are assumed and used to find the weights minimizing the expected mean squared error of the composite estimator. Speaker: Sander Scholtus
4. Administrative data are generally affected by errors; among others, under and over-coverage may introduce bias in the statistics produced. The presentation will propose a method to make inferences on the population sizes at different aggregation levels by leveraging administrative data in the presence of coverage errors. The proposed Bayesian statistical model integrates administrative sources-based register data with sample surveys that are carried out to gather information on the over and under-coverage. Speaker: Veronica Ballerini.
All the methods proposed will be illustrated with the simulation studies demonstrating advantages of the proposals. The methods proposed may be applied in the public opinion surveys, official statistics, and other studies.
Submissions
- Combining Probability and Nonprobability Samples on an Aggregated Level
- Combining administrative and survey data to correct coverage errors in register-based statistics: a Bayesian approach
- Impact of the non-probability sample on the accuracy of an estimator of a total when integrating non-probability and probability samples
- Nonprobability samples and uncertainty: Avoiding maximal estimation error via averaging