65th ISI World Statistics Congress 2025 | The Hague

65th ISI World Statistics Congress 2025 | The Hague

Assessment and Improvement of Data Quality Through Use of Auxiliary Information and Record Linkage

Organiser

WM
Wendy Martinez

Participants

  • WM
    Dr Wendy Martinez
    (Chair)

  • AM
    Andrew Martin
    (Presenter/Speaker)
  • Estimating missed links between administrative data lists using dual systems estimation

  • AH
    Dr Anders Holmberg
    (Presenter/Speaker)
  • Modernising probabilistic linking at the Australian Bureau of Statistics and its potential to improve multisource statistics production

  • KP
    Dr Krista Park
    (Presenter/Speaker)
  • Developing quality metrics and error in record linkage methodologies

  • Category: International Association of Survey Statisticians (IASS)

    Proposal Description

    Assessment and Improvement of Data Quality through Use of Auxiliary Information and Record Linkage

    As National Statistical Offices (NSOs) work to modernize their production of statistical information, they have come to rely more deeply on the integration of multiple data sources, such as data collected from a survey or census, administrative data, web scraping, satellite imagery, and third-party data. Many of the concepts, methods, technologies, and empirical properties of multiple-source work depend heavily on the use of record linkage, and on the availability and quality of auxiliary data.

    A record linkage system is required to successfully use data from multiple sources. Better understanding and quantification of the uncertainty in the linkage process (error and data quality) are needed to properly disseminate and utilize statistical products resulting from the use of integrated data sets.

    This session explores in depth five ongoing record linkage efforts being conducted in different NSOs: Australia, Canada, New Zealand, United Kingdom, and the United States. All speakers are confirmed have agreed to attend the World Congress, funding permitting. The following ideas will be addressed by the speakers.

    • Estimating the probabilities in the Fellegi-Sunter approach using new density estimation methods to reflect changes in name patterns and other demographic shifts.
    • Assessing improvements in linkages and linkage counts.
    • Exploring the effect of different blocking techniques and/or string comparator approaches on linkage rates.
    • Developing and implementing quality metrics and error calculations for the linking process, concentrating on managing error in the Fellegi-Sunter method.
    • Employing dual systems estimation to correct for missed links.
    • Providing open-source tools for fast implementation of record linkage (e.g., Bayesian, Fellegi-Sunter)

    All participants are confirmed and agreed to abide by the GDPR and the ISI code of conduct.

    The organizer was unable to enter all participants in the online proposal system, because the system kept returning an error. So, participant information (name, email) and title of the talk are given below.

    Chair: John Eltinge, U.S. Census Bureau

    Paper A.1: Wendy Martinez (speaker), Krista Park, U.S. Census Bureau. wendy.l.martinez@census.gov Affiliation: IASC, IAOS, IASS
    Title: Developing Quality Metrics and Error in Record Linkage Methodologies.

    Paper A.2: Anders Holmberg (speaker), Daniel Elazar, Aymon Wuolanne, Australian Bureau of Statistics: anders.holmberg@abs.gov.au Affiliation: IAOS
    Title: Modernising probabilistic linking at the Australian Bureau of Statistics and its potential to improve multisource statistics production

    Paper A.3: Robin Linacre, Ministry of Justice, UK, robin.linacre@justice.gov.uk .
    Title: On Developing Splink: A Free Software Package for Probabilistic Record Linkage at Scale

    Paper A.4: Vince Galvin; co-authors Patrick Graham & Andrew Martin, all from Statistics New Zealand. vince.galvin@stats.govt.nz
    Title: “Estimating missed links between administrative data lists using dual systems estimation”

    Paper A.5: Martin Provost; martin.provost@statcan.gc.ca Statistics Canada
    Title: The use of admin data in the Canadian census and population estimates, and its impact on data quality