65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

On Developing Splink: A Free Software Package for Probabilistic Record Linkage at Scale

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: "data_linkage, data-linkage, probabilisticlinkage, record linkage

Session: IPS 798 - Assessment and Improvement of Data Quality Through Use of Auxiliary Information and Record Linkage

Tuesday 7 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)

Abstract

The Fellegi-Sunter model is widely used for probabilistic record linkage to link and deduplicate datasets which lack a unique identifier. Splink is an open-source software package developed by the UK Ministry of Justice to address the challenges of probabilistic record linkage on large datasets. Using a flexible and customisable statistical framework, Splink efficiently handles millions of records. It offers a variety of transparency and diagnostic capabilities and interactive visualisations that enhance model understanding and validation.

The presentation give an overview of Splink, and also discusses its adoption and use, and the challenges and benefits of developing statistical software in the open.