Friedman and Tukey (1974, A projection pursuit algorithm for exploratory data analysis. IEEE Trans Comput. C-23:881-890), in their seminal paper, coined the term projection pursuit to denote a sequential classification method where data are projected onto the direction which best separates a cluster from the others. The detected cluster is then removed from the data and the procedure is repeated until no clusters are left. This paper implements their approach by an iterative algorithm based on kurtosis minimization and k-means clustering. The algorithm is initialized with the eigenvector associated with the smallest positive eigenvalue of the standardized fourth moment. The iterative step simplifies the multivariate minimization procedure to a bivariate one, which admits an analytical solution. The classification step is a simplified version of k-means clustering, where the data are unidimensional and only two groups are sought. The practical usefulness and the asymptotic properties of the proposed method are illustrated with simulated data.

Special issue linStat: sequential projection pursuit / Franceschini, Cinzia; Loperfido, Nicola. - In: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION. - ISSN 0094-9655. - (2023), pp. 1-18. [10.1080/00949655.2023.2294101]

Special issue linStat: sequential projection pursuit

Franceschini, Cinzia;
2023-01-01

Abstract

Friedman and Tukey (1974, A projection pursuit algorithm for exploratory data analysis. IEEE Trans Comput. C-23:881-890), in their seminal paper, coined the term projection pursuit to denote a sequential classification method where data are projected onto the direction which best separates a cluster from the others. The detected cluster is then removed from the data and the procedure is repeated until no clusters are left. This paper implements their approach by an iterative algorithm based on kurtosis minimization and k-means clustering. The algorithm is initialized with the eigenvector associated with the smallest positive eigenvalue of the standardized fourth moment. The iterative step simplifies the multivariate minimization procedure to a bivariate one, which admits an analytical solution. The classification step is a simplified version of k-means clustering, where the data are unidimensional and only two groups are sought. The practical usefulness and the asymptotic properties of the proposed method are illustrated with simulated data.
2023
Special issue linStat: sequential projection pursuit / Franceschini, Cinzia; Loperfido, Nicola. - In: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION. - ISSN 0094-9655. - (2023), pp. 1-18. [10.1080/00949655.2023.2294101]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/338629
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact