Yue Zheng1, Yi Zhang1, Kun Qian1, Guidong Zhang1 , Yunhao Liu1,3 , Chenshu Wu2, Zheng Yang1
1Tsinghua University 2University of Maryland, College Park 3Michigan State University
Widar3.0
Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi
Widar3.0 Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi - - PowerPoint PPT Presentation
Widar3.0 Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi Yue Zheng 1 , Yi Zhang 1 , Kun Qian 1 , Guidong Zhang 1 , Yunhao Liu 1,3 , Chenshu Wu 2 , Zheng Yang 1 1 Tsinghua University 2 University of Maryland, College Park 3 Michigan State
Yue Zheng1, Yi Zhang1, Kun Qian1, Guidong Zhang1 , Yunhao Liu1,3 , Chenshu Wu2, Zheng Yang1
1Tsinghua University 2University of Maryland, College Park 3Michigan State University
Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi
Motivation
wide range of applications.
– Less privacy concern. – No requirement of on-body sensors. – More ubiquitous deployment and larger sensing range.
2
Smart Home Virtual Reality Security Surveillance
Wi-Fi is currently widely deployed !
State-of-the-Art Works
– a pioneer work to use strength distribution of commercial Wi-Fi signals and KNN to recognize human activities.
3
WIMU E-eyes CARM
They use primitive signal features which usually carry environment information unrelated to gestures.
– calculates power distribution of Doppler Frequency Shifts components as learning features of HMM model.
– segments DFS power profiles for multi-person activity recognition.
State-of-the-Art Works
– CrossSense (Zhang et al, MobiCom’18) – EI (Jiang, MobiCom’18)
– WiAG (Virmani et al, MobiSys’17)
4
Cross-domain Gesture Recognition
All require extra training efforts at each time a new target domain is added into the recognition model.
Domain: Location, Orientation, Environment
Key Idea
retraining for cross-domain recognition ?
– Yes! We move generalization ability downwardly at the lower signal level, rather than the upper model level. – Extract domain-independent features – Trained once and used anywhere
5
System Overview
6
C2: How to estimate the feature in practice with collected Wi-Fi measurements ? C1: How to define a domain-independent feature in theory? C3: How to devise the recognition model to fully capture the characteristics of the new feature?
Our Prior Efforts
– models the relation among person’s walking velocity, location and DFS, and pinpoints the person passively. – achieves a decimeter-level accuracy with only one commercial Wi-Fi sender and two receivers.
8
Tx Rx
LoS Path Array Baseline AoA ToF DFS Ellipse Curve Ray
– proposes a unified model of ToF, AoA and DFS and devises an efficient algorithm for their joint estimation. – with fine-grained range and AoA provided by a single link, directly localizes the moving person at the decimeter-level.
Our Prior Efforts
9
Prior works regard a person as a single point, which is infeasible for recognizing complex gestures that involve multiple body parts. We need to define a new feature!
Anticipated Properties of Signal Features for Finer-Grained Tasks
– capture only human actions rather than domain factors (location, orientation, environment, etc.).
– no model re-training for a new domain.
– contain multiple signal components that correspond to different body parts.
10
Our Solution
– Same gestures may exhibit different velocity distributions in the global coordinate system. – Transformation can be achieved with the knowledge of locations of devices, and location and orientation information of the user.
11
𝐸(𝑗) and the vectorized BVP 𝑊 which include multiple velocity components can be modeled as:
– 𝐸(𝑗) = 𝑑(𝑗)𝐵(𝑗)𝑊
One-Link DFS and BVP
13
𝑑(𝑗) - scaling factor due to propagation loss 𝐵(𝑗) - assignment matrix. 𝐵𝑘,𝑙
(𝑗) = ൝1, 𝑔 𝑘= 𝑔(𝑗) റ
𝑤𝑙 0, 𝑓𝑚𝑡𝑓
𝐸(𝑗): 𝐺 × 1, 𝐺 is the number of sampling points in frequency domain. 𝑊: 𝑂2 × 1, 𝑂 is the number of sampling points in velocity domain.
From Multiple DFS to BVP
depicts radial velocity components.[1]
are utilized to fully recover BVP.
14
[1] Widar, MobiHoc ’17
Problems of BVP Estimation
under-determined.
– DFS profiles from multiple links provide much fewer constraints compared with the variables which required to be estimated in BVP.
each BVP snapshot.
15
Optimization of BVP Estimation
– min
𝑊 σ𝑗=1 𝑁
𝐹𝑁𝐸(𝐵 𝑗 𝑊, 𝐸(𝑗)) + 𝜃 ԡ ԡ𝑊 0 – The sparsity of the number of the velocity components is coerced by the term 𝜃 ԡ ԡ𝑊 0. – EMD(Earth Mover’s Distance) resolves the unknown scaling factor caused by propagation loss of the reflected signal and relieves quantization error in BVP.
16
Comparison of Signal Features
– example gesture: Pushing and Pulling – two domains
17
Comparison of Signal Features
18
Domain-1
location #1 environment #1
CSI DFS Domain-2
location #2 environment #2
CSI DFS BVP
CSI and DFS of same gestures are probable to vary across different domains, but BVP stays consistent!
BVP
BVP Examples
19
Pushing & Pulling Clapping Sliding
Gesture Recognition Model
GRU captures temporal dependencies among BVP snapshots, and is easier to train with less data
21
capture characteristics of BVP.
CNN extract spatial features from each single BVP snapshot.
With the help of BVP, the simple recognition model is effective.
Experiment
– Mini-desktops with Intel 5300 NIC.
– 3 scenarios: classroom, hall, office.
22
(a) Classroom (c) Office (b) Hall Sensing Area Sensing Area Sensing Area 1 2 3 4 5 Tx Rx Loc Orient B E D C A 0.5m 0.9m 0.5m 0.9m 0.5m 2m 0.5m 2m Sensing Area
Overall Accuracy
23
positions × 5 orientations × 6 gestures × 5 instances)
sliding, drawing circle and drawing zigzag.
different domains.
Method Comparison
domain learning methodologies.
– It does not require extra data from a new domain or model re-training.
24
Different Approaches Different Input Different learning model
effective with BVP as input.
Parameter Study
number of training users varies from 1 to 7.
– More data to train the learning model. – More likely to reduce the behavior difference between testing persons and training persons.
25
Impact of training set diversity.
Conclusion
– Widar3.0 aims at recognizing complex gestures that involve multiple body parts rather than regarding a person as a single point.
– We propose the domain-independent feature, BVP. – With BVP as input, the recognition model does not require extra data collection or model-retraining when a new domain is added. – With spatial-temporal characteristics of BVP fully captured, the system achieves high recognition accuracy across different domain factors, specifically, 89.7%, 82.6%, 92.4% for user’s location, orientation, and environment. – The dataset is available to public.
26
Data Availability
features (e.g., DFS and BVP) of 258K instances, duration of 8,620 minutes, from 75 domains.
found in http://tns.thss.tsinghua.edu.cn/widar3.0/index.html
27
28
Yue Zheng Tsinghua University
cczhengy@gmail.com