65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Identifying Online Platforms: Model development, validation and type-I error reduction

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: big data,, classification, webscraping

Session: IPS 799 - Real-World Machine Learning Applications in Official Statistics

Thursday 9 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)

Abstract

A Machine Learning-based classification model was developed to identify online platform organizations by using the texts on their website. The model was used to identify all (potential) online platform organizations in the Dutch Business Register. The external validity of the model-based findings was verified via a survey held under (a sample of) the organizations identified as potential platforms. The response to the survey confirmed the validity of the model but also revealed a substantial number of type-I errors. Based on these findings, the classification approach was adjusted to reduce the number of false positives as much as possible while retaining its high accuracy and recall. This was achieved by making use of calibrated probabilities and ensembles.