Outlier management on administrative data
Conference
64th ISI World Statistics Congress
Format: IPS Abstract
Keywords: administrative data, outlier
Session: IPS 434 - New methods and sources in the modernisation of economic statistics
Monday 17 July 10 a.m. - noon (Canada/Eastern)
Abstract
Using administrative data – especially tax data – the NSIs face several important challenges. All of tax datasets need some pre-processing procedure before they become suitable for statistical purposes. The principal reason for that is the logic of tax regulations differs from the accounting rules which serves the base of the survey statistics in the vast majority of countries. The official statistics increasingly needs the information and knowledge that are exploited from the administrative data sets but the ordinary statistical methods cannot be used or used just limited extent for the administrative data. Hungarian Central Statistical Office is committed to using administrative datasets wherever it is possible in order to provide consistent data for the largest multinational enterprises and assist to estimate and check some key variables of the economic statistics. Therefore, it has to develop appropriate methods for transforming the administrative datasets into statistical "raw material". From this point of view the outlier treatment is particularly exciting issue because the presence of the outliers has an effect on the statistical models (both time series and regression models), on the imputation and on the comparison of different data sets etc. Moreover, an outlier is not necessarily an error, it also can be a real data and the separation of these cases is very important part of the outlier treatment. The presentation focuses on the experience gained by the examination of VAT data and the effectiveness of the different methods. For checking the results several data sources are used included the summary reports of VAT data and a new data source, called e-invoice. The government ordered the introduction of e-invoice in 2018 but it is used in current form from 2021. As for the outlier detection and handling methods the novelty is rather in the fields of their application. The presentation will touch the connection between VAT and survey data (for example in case of performance statistics) and the impact of the outlier treatment on the quality of economic statistics.