Sparse Feature Group K-Means
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: clustering, high-dimensional, sparsity
Session: CPS 11 - Dimension Reduction and Clustering Techniques for High-Dimensional Data
Wednesday 8 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
We consider the problem of clustering high dimensional multiblock data. The full set of features may not be relevant to determine the true clusters which may differ with respect to a small number of features or blocks of features.
Subspace clustering methods such as Entropy weighting K-Means (EWKM) or Feature Group K-means (FGKM) provides a partition of the observations and sets of weights associated to features only or to features and blocks respectively. Further analysis performed on these weights help to determine the features or blocks relevant to the clusters based on the largest weights. However, when the number of features or blocks is large, this step of weights analysis can be tedious and difficult.
Sparse K-means has been proposed as a method which clusters observations using an adaptively chosen subset of features relevant for the whole partition. Sparse subspace K-means (SSKM), a new subspace clustering method, performs simultaneously a clustering of the observations and the selection of features relevant to each cluster rather than to the whole partition. We propose Sparse-FGKM, an extension of SSKM taking into account the multiblock structure in the clustering process by simultaneously identify the importance of features and feature blocks in each cluster.