65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Reliability of clustering methods-based uncertainty

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: cluster, cluster-validation, clustering, complex, fuzzy

Session: CPS 22 - Statistical Theory

Wednesday 8 October 4 p.m. - 5 p.m. (Europe/Amsterdam)

Abstract

In today's world, where large amounts of complex data are being collected in a wide variety of fields, clustering methods are becoming increasingly important as they summarize data and extract its latent characteristics by grouping the data based on their similarities.
Among clustering methods, there is a methodology that takes into account the uncertainty of an object's cluster membership and attempts to extract the structure of real-world complex data with a smaller number of clusters.
In the past, classification methods based on statistics and probability theory were used as the basis, but recently clustering methods that more flexibly expand the solution space of the clustering partition matrix have been proposed. Fuzzy clustering is one of these types of clustering methods, and is a method for extracting fuzzy clusters defined based on fuzzy subsets, which are the basis of fuzzy logic. Fuzzy theory, along with neural networks and evolutionary computation, is a field of study based on soft computing, and constitutes computational intelligence (CI), a branch of artificial intelligence.
The advantage of fuzzy clustering is that it allows for uncertainty in the membership of objects to clusters, making it possible to obtain classification results with excellent robustness and tractability for large-scale, complex data. However, on the other hand, as the classification results become more complex, they are difficult to interpret, and the reliability of the results is therefore an issue. In particular, with regard to the reliability of fuzzy clustering results, due to the definition of fuzzy subsets, the classification results cannot be measured with normal probability measures, and original validity functions are currently being developed.
Therefore, we define the reliability of fuzzy classification results by introducing the validity measure of fuzzy clustering into the fuzzy clustering results. An aggregation operator defined in a statistical measure space is used to aggregate the clustering results and the validity function values. We also show that by applying the asymmetric aggregation operator developed by the author, it is possible to introduce weights that take into account the difference between the validity measure values and the clustering results. Furthermore, from the mathematical definition of these aggregation operators, we show that the proposed reliability measures make it possible to remove noise in the data, and we theoretically discuss the usefulness of classification results using reliability measures. Furthermore, we show that the proposed method can be used for classification with learning in machine learning. This is because it utilizes the robustness of fuzzy clustering for large amounts of complex data, which solves the problem of the reliability of the training data. In other words, noise can be removed by converting objects from the training data to the solution space of fuzzy clustering. We demonstrate the effectiveness of our method through several numerical examples.