Abstract

A new method is proposed that combines dimension reduction and cluster analysis for categorical data. A least-squares objective function is formulated that approximates the cluster by variables cross-tabulation. Individual observations are assigned to clusters in such a way that the distributions over the categorical variables for the different clusters are optimally separated. In a unified framework, a brief review of alternative methods is provided and performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with cluster analysis based on the full dimensional data. Our results show that the joint dimension reduction and clustering methods outperform, both with respect to the retrieval of the true underlying cluster structure and with respect to internal cluster validity measures, full dimensional clustering. The differences increase when more variables are involved and in the presence of noise variables.

, , , ,
Erasmus School of Economics
hdl.handle.net/1765/77010
Econometric Institute Research Papers
Erasmus School of Economics

van de Velden, M., Iodice D' Enza, A., & Palumbo, F. (2014). Cluster Correspondence Analysis (No. EI 2014-24). Econometric Institute Research Papers. Retrieved from http://hdl.handle.net/1765/77010