题目/Title:Accelerating Clustering on the GPU Platform
作者/Author:帅吉红,邓仰东
Jihong Shuai,Yandong Deng
会议/Conference:iTAP 2012
地点/Location:Wuhan, China
年份/Issue Date:2012.18-20 Aug.
页码/pages:
摘要/Abstract:
Data clustering is a fundamental problem of data analysis. The large volume of data to be clustered in real-world applications always calls for faster processing. Although its data-parallel nature makes it suitable for GPU platforms, data clustering also needs to handle very large data sets, which often exceeds the available memory size on an off-the-shelf graphics card. As a result, current data clustering methods require frequent data transfers between the host machine and the GPU. Such transfers are time consuming and the corresponding coordination complicates the programming. In this work we choose the BIRCH method, which is able to handle large datasets with a relatively low memory requirement, to accelerate on GPU platforms. In this paper, we propose an efficient GPU implementation of the BIRCH algorithm, and provide a solution to efficiently process dataset larger than the capacity of GPU memory. Experimental results prove that our implementation achieves a 50X speedup when compared with the original CPU-based BIRCH and a running time scaled linearly with the increasing dataset size.