BICO: BIRCH meets coresets for k-means clustering

Citation DataLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN: 0302-9743, Vol: 8125 LNCS, Page: 481-492

Publication Year2013

49
Citations
0
Usage
19
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
49
- Citation Indexes
  49
Captures
19
- Readers
  19

Conference Paper Description

We design a data stream algorithm for the k-means problem, called BICO, that combines the data structure of the SIGMOD Test of Time award winning algorithm BIRCH [27] with the theoretical concept of coresets for clustering problems. The k-means problem asks for a set C of k centers minimizing the sum of the squared distances from every point in a set P to its nearest center in C. In a data stream, the points arrive one by one in arbitrary order and there is limited storage space. BICO computes high quality solutions in a time short in practice. First, BICO computes a summary S of the data with a provable quality guarantee: For every center set C, S has the same cost as P up to a (1 + ε)-factor, i. e., S is a coreset. Then, it runs k-means++ [5] on S. We compare BICO experimentally with popular and very fast heuristics (BIRCH, MacQueen [24]) and with approximation algorithms (Stream-KM++ [2], StreamLS [16,26]) with the best known quality guarantees. We achieve the same quality as the approximation algorithms mentioned with a much shorter running time, and we get much better solutions than the heuristics at the cost of only a moderate increase in running time. © 2013 Springer-Verlag.

Bibliographic Details

DOI10.1007/978-3-642-40450-4_41

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=84884342599&origin=inward; http://dx.doi.org/10.1007/978-3-642-40450-4_41; http://link.springer.com/10.1007/978-3-642-40450-4_41; http://link.springer.com/content/pdf/10.1007/978-3-642-40450-4_41; https://dx.doi.org/10.1007/978-3-642-40450-4_41; https://link.springer.com/chapter/10.1007/978-3-642-40450-4_41

AUTHOR(S)

Hendrik Fichtenberger; Marc Gillé; Melanie Schmidt; Chris Schwiegelshohn; Christian Sohler

PUBLISHER(S)

Springer Science and Business Media LLC

TAG(S)

Mathematics; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know