64th ISI World Statistics Congress

64th ISI World Statistics Congress

Modernizing Access to Statistics Canada’s Microdata Files

Conference

64th ISI World Statistics Congress

Format: CPS Abstract

Keywords: governance, metadata, microdata

Session: CPS 40 - Aspects of official statistics III

Tuesday 18 July 8:30 a.m. - 9:40 a.m. (Canada/Eastern)

Abstract

Over the last several years, Statistics Canada (StatCan) has been forming and implementing its modernization strategy, with a focus on user-centric delivery, sharing and collaboration, and enhanced tools and platforms. In line with this corporate initiative, StatCan’s Data Access Division (DAD) has been leading efforts to update the platforms and governance that facilitate access to microdata for external users.

Launched in 2021 and 2022, the Virtual DataLab (vDL) and the Rich Data Services (RDS) platforms provide users with 24/7 access to confidential and public use microdata files, respectively. These cloud-based platforms were both developed to align with the Government of Canada’s cloud-first approach, which highlights the need to implement this type of technology in order to improve the existing information technology (IT) services. The shift to cloud-based IT infrastructure has significantly improved access to microdata for external users, by eliminating geographical constraints, and providing modern and flexible platforms.

The vDL is a secure access solution hosted on the Microsoft Azure cloud, which enables access to a wide range of de-identified microdata files along with a variety of software that allows users to analyze the data. With the launch of this innovative platform, a new governance framework was implemented, premised on a shared-risk approach that leverages partner accountability for oversight and monitoring of its vDL users. Access to the vDL is predicated on an agreement between StatCan and the accessing organization, each user receiving accreditation, and an evaluation of the sensitivity of the data; data deemed too sensitive remains in our on-premises access locations.

The RDS platform, developed by Metadata Technology North America, hosts StatCan’s public use microdata files (PUMFs), and allows users to create customized datasets, codebooks, and analytical tables, as well as run linear regressions. Additionally, it standardizes PUMF metadata in machine-readable formats to support automation and system integration, which are part of a StatCan led metadata modernization project. This project will help ensure a clear governance structure is in place in order for metadata to be standardized and managed across the agency, as well as to ensure users have easy access to clear and accurate data and metadata.

This paper will outline the new platforms and governance for access to microdata in Statistics Canada’s Data Access Division.