Transforming Statistical Workflows to Refine Weekly Price Index Reporting
Conference
65th ISI World Statistics Congress 2025
Format: CPS Poster - WSC 2025
Keywords: automation, data_pipeline, data_processing, price_index
Abstract
Price index is a critical indicators for monitor price fluctuations in the market. Since 2022, Statistics Indonesia (BPS) collaborates with the Ministry of Trade and the Ministry of Home Affairs, computes the Price Index for 20 essential food commodities across all cities and districts in Indonesia weekly. This collaboration exemplifies the principle of official statistics, emphasizing inter-agency cooperation for statistical purposes. Data collection is managed by the Industry and Trade Offices of each district and city, with subsequent aggregation at the Ministry of Trade and processing by BPS to generate the Price Index. Currently, the data processing workflow relies on Excel, spanning from data cleaning to dissemination, and takes up to two days to complete, from Friday to Sunday. To enhance data quality, accuracy, reliability, and timeliness, BPS is committed to developing more efficient and precise processing methods.
This study integrates technology to construct a data pipeline that accelerates price data processing. A comprehensive data pipeline has been developed to support the full statistical workflow, including data acquisition by the Ministry of Trade, data cleaning, data processing, and dissemination through a visualization dashboard. The pipeline employs ETL (Extract, Transform, Load) tools, relational database management systems, and advanced visualization technologies. Utilizing an agile development methodology, the data pipeline facilitates rapid iterations in the development of processing algorithms. The pipeline encompasses the following stages: data acquisition from the Ministry of Trade via API, data transformation for readiness, data cleaning through missing value imputation, indicator calculation, and data preparation for visualization. Post-development, both manual and automated processing methods are operated concurrently to verify compliance with established protocols and ensure consistency between the outputs of manual and automated processes.
The findings of this study demonstrate significant improvements in both speed and accuracy of data processing, previously conducted using Excel. The introduction of the data pipeline has reduced processing time from two days to approximately 30 minutes. This automation not only eliminates the need for analysts to manually perform data processing but also mitigates the risk of human error by enforcing automated cleaning based on predefined rules.
The research outcomes underscore the necessity of integrating technology into statistical processes to enhance the quality of official statistics generated by the National Statistics Office. The adoption of technological advancements facilitates improvements in efficiency, speed, and overall quality of statistical outputs.