Web-scale k-means clustering

D Sculley - Proceedings of the 19th international conference on …, 2010 - dl.acm.org
Proceedings of the 19th international conference on World wide web, 2010dl.acm.org
We present two modifications to the popular k-means clustering algorithm to address the
extreme requirements for latency, scalability, and sparsity encountered in user-facing web
applications. First, we propose the use of mini-batch optimization for k-means clustering.
This reduces computation cost by orders of magnitude compared to the classic batch
algorithm while yielding significantly better solutions than online stochastic gradient descent.
Second, we achieve sparsity with projected gradient descent, and give a fast ε-accurate …
We present two modifications to the popular k-means clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, we propose the use of mini-batch optimization for k-means clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, we achieve sparsity with projected gradient descent, and give a fast ε-accurate projection onto the L1-ball. Source code is freely available: http://code.google.com/p/sofia-ml
ACM Digital Library
Showing the best result for this search. See all results