The Minimum Covariance Determinant estimator for interval-valued data
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: anomaly-detection, outliers, robust inference, symbolic_data_analysis
Session: IPS 768 - Symbolic Data Analysis for Data Science
Thursday 9 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)
Abstract
The increasing need to analyze immense volumes of data has led to the emergence of symbolic data analysis, which addresses complex data challenges. New data types, like interval-valued data, pose theoretical and methodological problems requiring innovative solutions. The location and scale of interval-valued random vectors are commonly estimated by the barycentre approach based on the Mallows’ distance [1]. We illustrate that, like in conventional data analysis, these (classical) estimates can be significantly affected by anomalous data, frequently present in real-life datasets. We then present a robust alternative which estimates location and scale generalizing the Minimum Covariance Determinant (MCD) estimator [2] to interval-valued data. The MCD algorithm leads to a robust distance used to detect anomalous observations. To conclude, we evaluate the performance of these MCD estimators and the outlier detection method on synthetic and real datasets.
References:
[1] M. R. Oliveira, D. Pinheiro, and L. Oliveira, “Towards location and association measures for interval data: a theoretical approach based on Mallows’ distance, arXiv, 2407.05105, 2024.
[2] Rousseeuw P. J. and Driessen K. V., “A Fast Algorithm for the Minimum Covariance Determinant Estimator,” Technometrics, 41, 3, 212–223, 1999.