Biography
Enrollment Date: 2010
Graduation Date:2016
Degree:Ph.D.
Defense Date:2016.12.18
Advisors:Zhihua Wang
Department:Institute of Microelectronics,Tsinghua University
Title of Dissertation/Thesis:Research on Hand Gesture Interface for Wearable Applications
Abstract:
With the development of microelectronic technology, processors are getting higher performance with smaller sizes and lower power consumptions and computing devices are also becoming stronger while getting smaller. In 2013, Google put a wearable computing devices on people’s head. After then, a variety of wearable devices with different applications has been hitting the market and been starting to catch on. However, these devices lack of natural human-computer interface. To solve this problem, we propose a hand gesture based interface for wearable applications. Our main contributions in this paper includes:
1. Considering wearable devices have limited computational power, memory and battery power, this work proposes a scan-line forest growing based hand segmentation framework with multi-priority stereo matching for wearable applications. To improve accuracy, in this framework, all the objects in a scene are regarded as trees in a forest, and the problem of the hand segmentation can be converted into constructing and finding the target tree (called hand tree) in the forest with proper hand properties including color consistency, space consistency, disparity and hand shape constraints. To lower the computational and memory consumptions and considering the strong spatial biases to hand location in the first person vision of wearable devices, a Scan-line based Maximum Spanning Forest (SMSF) is proposed to construct the forest and segment the hand scan-line by scan-line, which is suitable for hardware implementation. To avoid losing global or local information in the proposed scan-line based method, History Information (HI), which contains the information from the region with high confidence (palm or forearm region), propagates along the edges of trees to help the segmentation and stereo matching in the region with low confidence (finger region). To get rid of the interference of skin-colored objects, an under-segment vertex splitting step is used to find blurred boundaries. To get accuracy depth feature for segmentation with low complexity, a multi-priority stereo matching method is realized to get rid of the problem caused by the short distance, i.e., 25cm-70cm, between the hand and device, such as the large occlusion problem. Experimental results demonstrate that by using our method, the hand can be well segmented. The F1 score and accuracy of the hand segmentation result is about 96.3% and 92.9%, respectively. And the frame rate of our method is 1.6fps on PC.
2. Based on the hand segmentation method mentioned above, this work first defines 14 types of natural bare-hand gestures, which can be easily performed, for wearable applications. To recognize these fourteen gestures, this paper proposes a static hand gesture recognition method with low computation and memory consumptions for wearable applications. In the first person view of wearable devices, hand contour is the most significant feature. Considering this, the hand contour is chosen as the hand gesture feature and SVM is used to classify the feature. A Multi-Scale Weighted Histogram of Contour Direction (MSWHCD) based direction normalization is proposed to ensure a good recognition performance. In order to lower computational and memory consumptions, the proposed histogram only counts the direction of contour point to focus on the most significant hand feature. Based on hand anatomy, the proposed histogram is weighted by considering each contour point’s position and direction jointly, so as to ensure the robustness. Experimental results show that the proposed method can give a recognition accuracy of 97.1% for all the fourteen gestures with a frame rate of 30fps on PC. If selecting five gestures from the fourteen gestures for the normal use, the accuracy is above 99%.
3. To the best of our knowledge, this work establishes the first dataset for hand segmentation and recognition in the first person view using stereo camera. The dataset contains 14 different types of hand gestures. We collected 8400 pairs of images, i.e., 16800 images, in six indoor and outdoor scenes. We labeled the ground truth of 300 pairs of images for hand segmentation evaluation. We also segmented all the images to extract the hand regions using the proposed hand segmentation method, as well as labeled their gesture type for hand gesture recognition evaluation.