64th ISI World Statistics Congress

64th ISI World Statistics Congress

Flexible clustering for asymmetric data via mixtures of unrestricted skew normal factor analyzers

Conference

64th ISI World Statistics Congress

Format: IPS Paper

Keywords: cluster, factoranalysis, maximum likelihood, missing-data, mixture-model, skewness

Session: IPS 92 - Innovative Nonregular Approaches to Statistical Modelling for Complex Data

Tuesday 18 July 2 p.m. - 3:40 p.m. (Canada/Eastern)

Abstract

Mixtures of factor analyzers (MFA) based on the restricted skew normal distribution (rMSN) has been shown to be a flexible tool to handle asymmetrical high-dimensional data with heterogeneity. However, the rMSN distribution is oft-criticized a lack of sufficient ability to accommodate potential skewness arisen from more than one feature space. This paper presents an alternative extension of MFA by assuming the unrestricted skew normal (uMSN) distribution for the component factors. In particular, the proposed mixtures of unrestricted skew normal factor analyzers (MuSNFA) can simultaneously capture multiple directions of skewness and deal with the occurrence of missing values or nonresponses. Under the missing at random (MAR) mechanism, we develop a computationally feasible expectation conditional maximization (ECM) algorithm for computing the maximum likelihood estimates of model parameters. Practical aspects related to model-based clustering, prediction of factor scores and missing values are also discussed. The utility of the proposed methodology is illustrated with the analysis of simulated data and the Pima Indian women diabetes data containing genuine missing values.