Estimation of Non-sampling Error in Brazilian Household Sample Survey Using Reinterviews
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: data-quality, household surveys, measurement error, official statistics
Session: CPS 28 - Nonresponse Bias and Missing Data in Surveys
Wednesday 8 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
The credibility of an official statistics producer depends on several factors, one of the most important being the production of quality statistical data. The United Nations Statistics Quality Assurance Framework introduces a common understanding of the quality dimensions and quality assurance for these producers, highlighting the relevance of the output quality dimension – in particular, the accuracy and reliability components - which specifies that quality performance indicators should be implemented to assure that data sources and results are assessed and validated. Studies in this field have focused on the total survey error, which comprises any error arising from the survey process that contributes to the deviation of an estimate from its true value, and can be decomposed in sampling and non-sampling error. While the first arises from the fact that only a subset of the population is surveyed and can be controlled during the development of the sampling plan, the second arises from adverse conditions in the process of obtaining information, being difficult to control and measure, and increasing with the increase in sample size. Studies on non-sampling errors are rare, although they are essential for the efficient allocation of resources in surveys, as well as understanding the uncertainty in estimates. The non-sampling error can have different sources, among them, the measurement error. This error has four primary sources: questionnaire, data-collection method, interviewer, and respondent. The interviewer introduces error in survey responses by not reading the items as intended, by probing inappropriately when handling an inadequate response, or by adding information that may confuse or mislead the respondent. In this context, the main objective of this study is to investigate the non-sampling error in the Continuous National Household Sample Survey (Continuous PNAD) from IBGE - the official statistics agency in Brazil - through a quantitative study focusing on measuring and potentially minimizing the interviewer’s interference on the results. Continuous PNAD provides information on the insertion of the population in the labor market, associated with education and demographic characteristics, by visiting selected households. Reinterviews were applied as a quality check. A sample of the households selected for Continuous PNADC was reinterviewed and the discrepancies between the reinterview responses and the original interview responses were reconciled for the purpose of obtaining response bias estimates, and an index of consistency was estimated. While the higher consistency index (96%) was obtained for questions related to race/skin color and date of birth, the worst indexes were observed for questions related to educational level and participation in labor force (69% and 81%, respectively). These results show specific situations that require attention, point out improvements that can be introduced in the survey process, and bring suggestions for the collection of new paradata. Based on the results, a quality standard process was defined, implemented and the interviewers labor force has been fully trained to mitigate their personal interference (error) while applying the survey. The goal is to ensure continuous quality management, reinforcing IBGE’s commitment to the principles of accuracy and reliability.