Scalable temporal clustering for massive multidimensional data streams

Gediminas Adomavicius, Jesse Bockstedt, Vishnu Parimi

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

Today's organizations are continuously capturing extremely large amounts of data, which will only continue to increase. In this paper we present a new approach to discovering clusters in these massive amounts of complex (i.e., multidimensional) continuously-arriving data, which are much too large to be analyzed as one dataset. In order to guarantee acceptable scalability, our approach builds on existing data mining literature and uses sampling-based techniques, an advanced variation of hierarchical agglomerative clustering, and an approach for sample-based cluster reconstruction to provide an approximate clustering solution of very high accuracy. We test the proposed approach empirically and show that it provides excellent clustering performance and, at the same time, demonstrates significant computational savings.

Original languageEnglish (US)
Pages121-126
Number of pages6
StatePublished - 2008
Externally publishedYes
Event2008 Workshop on Information Technologies and Systems, WITS 2008 - Paris, France
Duration: Dec 13 2008Dec 14 2008

Other

Other2008 Workshop on Information Technologies and Systems, WITS 2008
Country/TerritoryFrance
CityParis
Period12/13/0812/14/08

ASJC Scopus subject areas

  • Information Systems
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Scalable temporal clustering for massive multidimensional data streams'. Together they form a unique fingerprint.

Cite this