Automation of the Capture and Coding of the Canadian Survey of Household Spending Diary
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: automation, coding, diary, ocr
Thursday 9 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
Accurate and comprehensive data collection on household expenditures is critical for understanding consumer behavior, economic analysis and policymaking. Like most statistical agencies, Statistics Canada has adopted the international collection model for its Survey of Household Spending (SHS). The survey collects household expenditure data with a self-completed questionnaire or an interview, and a one- or two-week paper diary.
While completing the SHS diary, respondents can choose to submit their paper shopping receipts instead of writing down all the daily expenses they’ve made. Traditional methods of processing diary entries and shopping receipts are labor-intensive, time-consuming, and prone to human error. This presentation introduces an innovative solution that leverages techniques to automate both the capture of shopping receipts and the coding of expenditure items to streamline the processing of the Canadian household expenditure data.
The approach integrates optical character recognition (OCR) technology with machine learning algorithms to automatically extract and parse information from scanned shopping receipts. Once captured, the diary and receipt items are classified into the SHS expenditure codes by a trained machine learning algorithm. This algorithm is re-trained at each survey cycle, which means biennially for the SHS.
This presentation will cover the technical aspects of the automatic capture and coding systems, including the OCR and machine learning techniques, and the implementation challenges and solutions found along the way. These will be followed up with potential research avenues and next steps for the automation of the processing steps of the Canadian Survey of Household Spending diary.