On Developing Splink: A Free Software Package for Probabilistic Record Linkage at Scale
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: "data_linkage, data-linkage, probabilisticlinkage, record linkage
Tuesday 7 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)
Abstract
The Fellegi-Sunter model is widely used for probabilistic record linkage to link and deduplicate datasets which lack a unique identifier. Splink is an open-source software package developed by the UK Ministry of Justice to address the challenges of probabilistic record linkage on large datasets. Using a flexible and customisable statistical framework, Splink efficiently handles millions of records. It offers a variety of transparency and diagnostic capabilities and interactive visualisations that enhance model understanding and validation.
The presentation give an overview of Splink, and also discusses its adoption and use, and the challenges and benefits of developing statistical software in the open.