PlumX Metrics
Embed PlumX Metrics

BICO: BIRCH meets coresets for k-means clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN: 0302-9743, Vol: 8125 LNCS, Page: 481-492
2013
  • 49
    Citations
  • 0
    Usage
  • 19
    Captures
  • 0
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

  • Citations
    49
    • Citation Indexes
      49
  • Captures
    19

Conference Paper Description

We design a data stream algorithm for the k-means problem, called BICO, that combines the data structure of the SIGMOD Test of Time award winning algorithm BIRCH [27] with the theoretical concept of coresets for clustering problems. The k-means problem asks for a set C of k centers minimizing the sum of the squared distances from every point in a set P to its nearest center in C. In a data stream, the points arrive one by one in arbitrary order and there is limited storage space. BICO computes high quality solutions in a time short in practice. First, BICO computes a summary S of the data with a provable quality guarantee: For every center set C, S has the same cost as P up to a (1 + ε)-factor, i. e., S is a coreset. Then, it runs k-means++ [5] on S. We compare BICO experimentally with popular and very fast heuristics (BIRCH, MacQueen [24]) and with approximation algorithms (Stream-KM++ [2], StreamLS [16,26]) with the best known quality guarantees. We achieve the same quality as the approximation algorithms mentioned with a much shorter running time, and we get much better solutions than the heuristics at the cost of only a moderate increase in running time. © 2013 Springer-Verlag.

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know