Download PDF

Computational Analysis and Comparison of Snowball and Respondent-Driven Sampling

Author

João Gabriel Malaguti

Conference

65th ISI World Statistics Congress

Format: CPS Abstract - WSC 2025

Keywords: monte carlo simulation, non-probability sample

Session: CPS 27 - Nonresponse Bias, Nonprobability Sampling, and Estimation Strategies in Survey Methodology

Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)

Abstract

More and more, throughout the social sciences we have seen efforts to better our understanding of the dynamics certain social groups are subjected to, as well as their in-group and out-group relationships. However, some of these social groups are hard-to-reach populations, be they geographically difficult to access, such as nomads, those forcibly displaced and homeless populations, or socially difficult to access, such as drug users, victims of domestic violence, cultural and linguistic minorities and queer people. For these kinds of populations, the usual sampling methods cannot be realized for a proper sampling frame does not exist nor should it be constructed for ethical reasons.
Nevertheless, because individuals these populations tend to have many social bonds with others of the same population there exists a class of methods that aim to leverage these social networks in order to produce a sample, so called “link-tracing methods” of which both snowball sampling and Respondent-Driven sampling (RDS) are a part of. Snowball sampling is a non-probabilistic sampling method, widely used in the social sciences, while RDS is applied more often by those in the health fields.
Snowball sampling starts from a small initial sample and the respondents are asked to refer other people (those who belong to the population of interest) to join the sample. The referral request is repeated to these new respondents and, like a snowball rolling down a hill, the sample slowly increases in size. Due to being a non-probabilistic method, formal equations for the standard error of estimators do not exist, making analyses more complex.
RDS differs in that the respondents themselves are the ones who recruit other respondents, instead of the researcher leading the effort. It often also features incentives for both answering a survey and recruiting others. Though some would argue that RDS is a quasi-probabilistic method capable of producing non-biased estimates, these require a number of heavy assumptions that are hardly met in practice. These estimators could also be applied to snowball samples, should the necessary information be collected.
This paper seeks to bypass these issues of estimations by using Monte Carlo simulations, which allow the study of the behaviour of sample dynamics and estimates in different controlled scenarios in order to better our understanding of the sampling methods.
For this, data from Project 90 and from the National Longitudinal Study of Adolescent Health, both of which mapped social networks will be used. These datasets have been used previously in order to assess RDS’ performance.
The present investigation seeks to understand the effects of (i) original sample size, (ii) manner of selection of initial sample, (iii) number of referrals, (iv) manner of selection of referrals and (v) burn-in, the action of eliminating the original sample and possibly the first few waves. These factors’ effects on the sample size reached and on the estimates will be studied in regards to bias, convergence, distribution of values across waves and a comparison between the different methods of estimating the standard error will be drawn.