Biography
Enrollment Date: 2008
Graduation Date:2013
Degree:Ph.D.
Defense Date:2013.11.19
Advisors:Zhihua Wang
Department:Institute of Microelectronics,Tsinghua University
Title of Dissertation/Thesis:Research on Resources Scheduling for Irregular Applications on Graphics Processing Units
Abstract:
Recently, GPUs (Graphics Processing Units) have been widely adopted in many scientific and engineering applications, such as graphic and image processing, scientific computing, multi-media applications, data mining, financial computing and so on. GPUs are inherently suitable for regular applications as it follows the SIMD (Single Instruction Multiple Data) execution model. However, the irregular patterns that are pervasive in computation and memory operations have become the performance bottleneck of GPU applications. Such irregular patterns as unbalance workloads, divergent control flow, irregular memory access and poor data locality are exhibited in almost all aspects of computer architecture design. Therefore, it is critical to minimize the overhead of processing such irregular patterns for better performance. This work aims at solving the abovementioned obstacles from the perspectives of both designing efficient algorithms and optimizing micro-architectures. The contributions of this thesis are as follows: (1) We analyze and optimize three irregular applications: sparse matrix vector product (SMVP), string matching and QR decomposition. For SMVP, a technique is proposed to eliminate irregular memory accesses by expanding the vector. For sting matching, we devise two efficient techniques, data partitioning and data reordering, to solve the irregular computation and memory access patterns simultaneously. For QR decomposition, we exploit the pipelined parallelism by considering the inherent data dependence. Our techniques exhibit superior performance improvement, with an average speed-up over the CPU implementation by one order of magnitude. (2) We conduct a systematical analysis on the characteristics of GPU programs. Our analysis proves that the irregular patterns cause low utilization of GPU resources. On one hand, the unbalanced memory access latency introduced by the memory latency divergence result in the under-utilization multiprocessors. On the other hand, current cache management cannot adapt to the complex memory access patterns. Therefore, GPU programs cannot fully exploit the cache resources. (3) We develop a cache management policy called Effective Address Based Priority and a memory scheduling policy called Divergence Aware Memory Scheduling, respectively. These two microarchitecture techniques can improve the cache efficiency and reduce the impact of memory latency divergence concurrently. Experimental results show that the cache miss rate can be reduced by 20% and system performance can be improved by 30%. (4) For the unbalanced task workloads existed in streaming processing, we develop a dynamic resources scheduling policy. Under such a policy, the workloads of each task are monitored and the amount of data transferred between different tasks will be calculated. Therefore, the computation and cache resources can be allocated to each task in a dynamically tuned manner. Experimental results show that our dynamic scheduling policy can improve the system performance by 20% compared to current GPU preemptive scheduling.