Widar3.0 Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi - - PowerPoint PPT Presentation

widar3 0
SMART_READER_LITE
LIVE PREVIEW

Widar3.0 Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi - - PowerPoint PPT Presentation

Widar3.0 Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi Yue Zheng 1 , Yi Zhang 1 , Kun Qian 1 , Guidong Zhang 1 , Yunhao Liu 1,3 , Chenshu Wu 2 , Zheng Yang 1 1 Tsinghua University 2 University of Maryland, College Park 3 Michigan State


slide-1
SLIDE 1

Yue Zheng1, Yi Zhang1, Kun Qian1, Guidong Zhang1 , Yunhao Liu1,3 , Chenshu Wu2, Zheng Yang1

1Tsinghua University 2University of Maryland, College Park 3Michigan State University

Widar3.0

Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi

slide-2
SLIDE 2

Motivation

  • Human gesture recognition is the core enabler for a

wide range of applications.

  • RF radios VS. Cameras/Wearable devices/Ultrasound

– Less privacy concern. – No requirement of on-body sensors. – More ubiquitous deployment and larger sensing range.

2

Smart Home Virtual Reality Security Surveillance

Wi-Fi is currently widely deployed !

slide-3
SLIDE 3

State-of-the-Art Works

  • E-eyes (Wang et al, MobiCom’14)

– a pioneer work to use strength distribution of commercial Wi-Fi signals and KNN to recognize human activities.

3

WIMU E-eyes CARM

They use primitive signal features which usually carry environment information unrelated to gestures.

  • CARM (Wang et al, MobiCom’15)

– calculates power distribution of Doppler Frequency Shifts components as learning features of HMM model.

  • WIMU (Venkatnarayan et al, MobiSys’18)

– segments DFS power profiles for multi-person activity recognition.

slide-4
SLIDE 4

State-of-the-Art Works

  • Explore cross-domain generalization ability of recognition model.

– CrossSense (Zhang et al, MobiCom’18) – EI (Jiang, MobiCom’18)

  • Generate signal features of target domain for model re-training.

– WiAG (Virmani et al, MobiSys’17)

4

Cross-domain Gesture Recognition

All require extra training efforts at each time a new target domain is added into the recognition model.

Domain: Location, Orientation, Environment

slide-5
SLIDE 5

Key Idea

  • Can we avoid extra data collection or model-

retraining for cross-domain recognition ?

– Yes! We move generalization ability downwardly at the lower signal level, rather than the upper model level. – Extract domain-independent features – Trained once and used anywhere

5

slide-6
SLIDE 6

System Overview

6

C2: How to estimate the feature in practice with collected Wi-Fi measurements ? C1: How to define a domain-independent feature in theory? C3: How to devise the recognition model to fully capture the characteristics of the new feature?

slide-7
SLIDE 7

Our Prior Efforts

  • Widar (MobiHoc’17)

– models the relation among person’s walking velocity, location and DFS, and pinpoints the person passively. – achieves a decimeter-level accuracy with only one commercial Wi-Fi sender and two receivers.

8

slide-8
SLIDE 8

Tx Rx

LoS Path Array Baseline AoA ToF DFS Ellipse Curve Ray

  • Widar2.0 (MobiSys’18)

– proposes a unified model of ToF, AoA and DFS and devises an efficient algorithm for their joint estimation. – with fine-grained range and AoA provided by a single link, directly localizes the moving person at the decimeter-level.

Our Prior Efforts

9

Prior works regard a person as a single point, which is infeasible for recognizing complex gestures that involve multiple body parts. We need to define a new feature!

slide-9
SLIDE 9

Anticipated Properties of Signal Features for Finer-Grained Tasks

  • Domain-independent

– capture only human actions rather than domain factors (location, orientation, environment, etc.).

  • Zero-effort

– no model re-training for a new domain.

  • Finer-grained

– contain multiple signal components that correspond to different body parts.

10

slide-10
SLIDE 10

Our Solution

  • BVP: Body-coordinate Velocity Profile

– Same gestures may exhibit different velocity distributions in the global coordinate system. – Transformation can be achieved with the knowledge of locations of devices, and location and orientation information of the user.

11

slide-11
SLIDE 11
  • The relation between DFS profile of the 𝑗𝑢ℎ link

𝐸(𝑗) and the vectorized BVP 𝑊 which include multiple velocity components can be modeled as:

– 𝐸(𝑗) = 𝑑(𝑗)𝐵(𝑗)𝑊

One-Link DFS and BVP

13

𝑑(𝑗) - scaling factor due to propagation loss 𝐵(𝑗) - assignment matrix. 𝐵𝑘,𝑙

(𝑗) = ൝1, 𝑔 𝑘= 𝑔(𝑗) റ

𝑤𝑙 0, 𝑓𝑚𝑡𝑓

𝐸(𝑗): 𝐺 × 1, 𝐺 is the number of sampling points in frequency domain. 𝑊: 𝑂2 × 1, 𝑂 is the number of sampling points in velocity domain.

slide-12
SLIDE 12

From Multiple DFS to BVP

  • DFS from one link only

depicts radial velocity components.[1]

  • DFS from multiple links

are utilized to fully recover BVP.

14

[1] Widar, MobiHoc ’17

slide-13
SLIDE 13

Problems of BVP Estimation

  • The equation system 𝐸(𝑗) = 𝑑(𝑗)𝐵(𝑗)𝑊 is severely

under-determined.

– DFS profiles from multiple links provide much fewer constraints compared with the variables which required to be estimated in BVP.

  • Only a few dominant velocity components exist in

each BVP snapshot.

15

slide-14
SLIDE 14

Optimization of BVP Estimation

  • We adopt sparse recovery to estimate BVP.
  • We formulate the estimation of BVP as a 𝑚0
  • ptimization problem:

– min

𝑊 σ𝑗=1 𝑁

𝐹𝑁𝐸(𝐵 𝑗 𝑊, 𝐸(𝑗)) + 𝜃 ԡ ԡ𝑊 0 – The sparsity of the number of the velocity components is coerced by the term 𝜃 ԡ ԡ𝑊 0. – EMD(Earth Mover’s Distance) resolves the unknown scaling factor caused by propagation loss of the reflected signal and relieves quantization error in BVP.

16

slide-15
SLIDE 15

Comparison of Signal Features

  • Investigate raw CSI, DFS, BVP

– example gesture: Pushing and Pulling – two domains

17

slide-16
SLIDE 16

Comparison of Signal Features

18

Domain-1

  • rientation #1

location #1 environment #1

CSI DFS Domain-2

  • rientation #2

location #2 environment #2

CSI DFS BVP

CSI and DFS of same gestures are probable to vary across different domains, but BVP stays consistent!

BVP

slide-17
SLIDE 17

BVP Examples

19

Pushing & Pulling Clapping Sliding

slide-18
SLIDE 18

Gesture Recognition Model

GRU captures temporal dependencies among BVP snapshots, and is easier to train with less data

21

  • A hybrid CNN+RNN model is designed to fully

capture characteristics of BVP.

CNN extract spatial features from each single BVP snapshot.

With the help of BVP, the simple recognition model is effective.

slide-19
SLIDE 19

Experiment

  • Implementation

– Mini-desktops with Intel 5300 NIC.

  • Setup

– 3 scenarios: classroom, hall, office.

22

(a) Classroom (c) Office (b) Hall Sensing Area Sensing Area Sensing Area 1 2 3 4 5 Tx Rx Loc Orient B E D C A 0.5m 0.9m 0.5m 0.9m 0.5m 2m 0.5m 2m Sensing Area

slide-20
SLIDE 20

Overall Accuracy

23

  • Dataset:12,000 gesture samples (16 users × 5

positions × 5 orientations × 6 gestures × 5 instances)

  • Gestures: pushing and pulling, sweeping, clapping,

sliding, drawing circle and drawing zigzag.

  • Widar3.0 achieves consistent high accuracy across

different domains.

slide-21
SLIDE 21

Method Comparison

  • Widar3.0 outperforms with the state-of-the-art cross-

domain learning methodologies.

– It does not require extra data from a new domain or model re-training.

24

Different Approaches Different Input Different learning model

  • BVP outperforms both denoised CSI and DFS.
  • The proposed recognition model is simple but

effective with BVP as input.

slide-22
SLIDE 22

Parameter Study

  • The accuracy increases from 74% to 89% when the

number of training users varies from 1 to 7.

– More data to train the learning model. – More likely to reduce the behavior difference between testing persons and training persons.

25

Impact of training set diversity.

slide-23
SLIDE 23

Conclusion

  • From Widar, Widar2.0 to Widar3.0

– Widar3.0 aims at recognizing complex gestures that involve multiple body parts rather than regarding a person as a single point.

  • Zero-effort cross-domain gesture recognition system

– We propose the domain-independent feature, BVP. – With BVP as input, the recognition model does not require extra data collection or model-retraining when a new domain is added. – With spatial-temporal characteristics of BVP fully captured, the system achieves high recognition accuracy across different domain factors, specifically, 89.7%, 82.6%, 92.4% for user’s location, orientation, and environment. – The dataset is available to public.

26

slide-24
SLIDE 24

Data Availability

  • We collect a hand gesture dataset, which consists
  • f raw Wi-Fi readings (CSI) and other sophisticated

features (e.g., DFS and BVP) of 258K instances, duration of 8,620 minutes, from 75 domains.

  • The dataset and Widar series of works can be

found in http://tns.thss.tsinghua.edu.cn/widar3.0/index.html

27

slide-25
SLIDE 25

28

Yue Zheng Tsinghua University

cczhengy@gmail.com