Official statistics using web data: new use cases
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: data-linkage, experimental, scraping, web
Session: IPS 776 - Web Data for Official Statistics – Methodology, Quality, Production and Community
Wednesday 8 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)
Abstract
Web data has become an invaluable resource for official statistics, offering new perspectives and opportunities. In combination with traditional inputs such data can be a valuable addition to the data portfolio, improving or speeding-up statistics or creating new opportunities for new indicators. There are successful examples in price statistics scraping web shops, enterprise statistics scraping business websites and social statistics using social media. However, there are also challenges: web data can be very volatile, one has to be aware of the biases contained in these data and the quality heavily depends on the web data sources of interest.
In the International ESSnet project Trusted Smart Statistics – Web Intelligence Network (TSS-WIN) a work package was dedicated to exploring new use cases for using web data sources for official statistics, such as 1) characteristics of the real estate market, 2) measuring construction activities, 3) online prices of household appliances and audio-visual, photographic and information processing equipment, 4) the development of experimental indices in tourism statistics, 5) business register quality enhancement from web data sources and 6) faster economic indicators from web data. They all had their successes and challenges such as volatile inputs, deduplication challenges, mapping web data onto statistical concepts and operational challenges.
In many web data projects it is a challenge to link, map or cluster the web data, not designed for official statistics, into statistical units or aggregates. Therefore, in addition to bulk scraping approaches, where vast amounts of data are collected from one or many sites, selective or statistical scraping methods, where the web is queried with an identifier, name, category, or statistical definition, have gained popularity. An example is in the business register enhancement use case, where the web is searched for digital traces associated with a statistical unit in the business register.
In this presentation, we highlight the work in the TSS-WIN project on new use cases for web data in official statistics and also relate them to emerging concepts such as statistical or selective scraping. Moreover, we philosophize on the way forward for future web data use in official statistics meeting the dynamics of the ever-changing web and meeting the high-quality standards expected in official statistics.