Abstract
High dimensionality poses two challenges for clustering algorithms: features may be noisy and data may be sparse. To address these challenges, subspace clustering seeks to project the data onto simple yet informative subspaces. The projection process should be fast and the projected subspaces should be well-clusterable. In this paper, we describe a numerical one-dimensional subspace approach for high dimensional data. First, we show that the numerical one-dimensional subspaces can be constructed efficiently by controlling the correlation structure. Next, we propose two strategies to aggregate the representatives from each numerical one-dimensional subspace into the final projected space, where the clustering problem becomes tractable. Finally, the experiments on real-world document data sets demonstrate that, compared to competing methods, our approach can find more clusterable subspaces which align better with the true class labels.
Original language | English (US) |
---|---|
Pages (from-to) | 311-323 |
Number of pages | 13 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 8444 LNAI |
Issue number | PART 2 |
DOIs | |
State | Published - 2014 |
Externally published | Yes |
Event | 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014 - Tainan, Taiwan, Province of China Duration: May 13 2014 → May 16 2014 |
Keywords
- clusterable subspace
- numerical one-dimension
- subspace learning
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science