A Total Error Framework for Digital Data
Conference
64th ISI World Statistics Congress
Format: CPS Abstract
Keywords: big data, quality frameworks;, quality-assessment
Session: CPS 50 - Statistical methodology V
Tuesday 18 July 4 p.m. - 5:25 p.m. (Canada/Eastern)
Abstract
A changing survey landscape (Lyberg and Heeringa 2021) with increasing nonresponse rates and survey costs has caused organizations to explore new data sources for statistics production (Japec and Lyberg 2021). There is a great potential to use new types of data, hereafter called digital data, for statistics production especially when blending them with existing survey or administrative data (National Academies of Sciences, Engineering, and Medicine 2022, Japec et al 2015). Our quality framework build on existing frameworks for surveys (Groves and Lyberg 2010), administrative (Zhang 2012, Reid et al 2017), found (Biemer and Amaya 2021) and digital trace data (Sen et al 2021).
In our framework we describe steps taken when statistics are produced based on digital data and error sources associated with each step. Blending digital data with other data sources is a vital step in our quality framework. The framework offers standard terminology to describe and document errors in digital data. We connect terminology used in our framework to terminology used in TSE frameworks. We also provide indicators to be used to evaluate quality of statistics produced based on digital data. We will present examples from applying the framework on digital data at Statistics Sweden.
Biemer, P. and Amaya (2021). Total Error Framework for Found Data. Pp 133-162 in Big Data Meets Survey Science: A Collection of Innovative Methods (C.A. Hill, P.P. Biemer, T.D. Buskirk, L. Japec, A. Kirchner, S. Kolenikov, L.E. Lyberg, eds.). New York: John Wiley & Sons.
Lyberg, L. and Heeringa, S. (2021). A Changing Survey Landscape. In Handbook of Computational Social Science, Volume 1. (Engel, U., Quan-Haase, A., Xun Liu, S. and Lyberg, L. eds.). London: Taylor &Francis Group.
National Academies of Sciences, Engineering, and Medicine 2022. Toward a 21st Century National Data Infrastructure: Mobilizing Information for the Common Good. Washington, DC: The National Academies Press. https://doi.org/10.17226/26688
Groves, RM & Lyberg L. (2010) Total Survey Error: Past, present, and future. Public Opinion Quarterly 74:5, pp 849-879. DOI: https://doi.org/10.1093/poq/nfq065
Japec, L. and Lyberg, L. (2021). Big Data Initiative in Official Statistics. In Big Data Meets Survey Science: A Collection of Innovative Methods (C.A. Hill, P.P. Biemer, T.D. Buskirk, L. Japec, A. Kirchner, S. Kolenikov, L.E. Lyberg, eds.). New York: John Wiley & Sons.
Japec, L., Kreuter, F., Berg, M., Biemer, P., Decker, P., Lampe, C., Lane, J., O’Neil, C., and Usher, A. (2015). “Big Data in Survey Research: AAPOR Task Force Report.” Public Opinion Quarterly 79(4):839–80.
Reid, G., Zabala , F. & Holmberg, A. (2017) Extending TSE to administrative data: A quality framework and case studies from Stats NZ. Journal of Official Statistics 33:2, pp 477-511. DOI: https://doi.org/10.1515/jos-2017-0023
Zhang, L.-C. (2012) Topics of statistical theory for register-based statistics and data integration. Statistics Neerlandica 66:1, pp 41-63. DOI: https://doi.org/10.1111/j.1467-9574.2011.00508.x
Sen, I., Flöck , F., Weller, K., Weiss, B. & Wagner, C. (2021) A total error framework for digital traces of human behavior on online platforms . Public Opinion Quarterly 85:S 1, pp 399-422. DOI: https://doi.org/10.1093/poq/nfab018