65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

The Minimum Covariance Determinant estimator for interval-valued data

Author

MO
Maria do Rosario Oliveira

Co-author

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: anomaly-detection, outliers, robust inference, symbolic_data_analysis

Session: IPS 768 - Symbolic Data Analysis for Data Science

Thursday 9 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)

Abstract

The increasing need to analyze immense volumes of data has led to the emergence of symbolic data analysis, which addresses complex data challenges. New data types, like interval-valued data, pose theoretical and methodological problems requiring innovative solutions. The location and scale of interval-valued random vectors are commonly estimated by the barycentre approach based on the Mallows’ distance [1]. We illustrate that, like in conventional data analysis, these (classical) estimates can be significantly affected by anomalous data, frequently present in real-life datasets. We then present a robust alternative which estimates location and scale generalizing the Minimum Covariance Determinant (MCD) estimator [2] to interval-valued data. The MCD algorithm leads to a robust distance used to detect anomalous observations. To conclude, we evaluate the performance of these MCD estimators and the outlier detection method on synthetic and real datasets.

References:
[1] M. R. Oliveira, D. Pinheiro, and L. Oliveira, “Towards location and association measures for interval data: a theoretical approach based on Mallows’ distance, arXiv, 2407.05105, 2024.
[2] Rousseeuw P. J. and Driessen K. V., “A Fast Algorithm for the Minimum Covariance Determinant Estimator,” Technometrics, 41, 3, 212–223, 1999.