Mobile AR/VR with Edge-based Deep Learning Jiasi Chen Department - - PowerPoint PPT Presentation

mobile ar vr with edge based deep learning
SMART_READER_LITE
LIVE PREVIEW

Mobile AR/VR with Edge-based Deep Learning Jiasi Chen Department - - PowerPoint PPT Presentation

Mobile AR/VR with Edge-based Deep Learning Jiasi Chen Department of Computer Science & Engineering University of California, Riverside CNSM Oct. 23, 2019 Outline What is AR/VR? Edge computing can provide... 1. Real-time object


slide-1
SLIDE 1

Mobile AR/VR with Edge-based Deep Learning

Jiasi Chen Department of Computer Science & Engineering University of California, Riverside CNSM

  • Oct. 23, 2019
slide-2
SLIDE 2

Outline

  • What is AR/VR?
  • Edge computing can provide...
  • 1. Real-time object detection for mobile AR
  • 2. Bandwidth-efficient VR streaming with deep learning
  • Future directions

2

slide-3
SLIDE 3

What is AR/VR?

3

slide-4
SLIDE 4

Multimedia is…

4

Internet Content creation Compression Storage Distribution End users Audio On-demand video Live video Virtual and augmented reality

slide-5
SLIDE 5

What is AR/VR?

5

mixed reality virtual reality | augmented reality | augmented virtuality | reality |

slide-6
SLIDE 6

Who’s Using Virtual Reality?

HTC Vive Playstation VR Google Daydream Google Cardboard

6

High-end hardware: Smartphone-based hardware:

slide-7
SLIDE 7

Why VR now?

7

(1) Have to go somewhere (2) Watch it at home (3) Carry it with you

Similar portability trend for VR, driven by hardware advances from the smartphone revolution.

CAVE (1992) Virtuality gaming (1990s) Oculus Rift (2016)

Movies: VR: Portability

slide-8
SLIDE 8

Who’s Using Augmented Reality?

Pokemon Go Snapchat filters (face detection) Microsoft Hololens Google Glasses Google Translate (text processing)

8

High-end hardware: Smartphone- based:

slide-9
SLIDE 9

Is it all just fun and games?

  • AR/VR has applications in many areas:
  • What are the engineering challenges?
  • AR: process input from the real world (related to computer vision, robotics)
  • VR: output the virtual world to your display (related to computer graphics)

Education Data visualization

9

Public Safety

slide-10
SLIDE 10

How AR/VR Works

VR: AR:

  • 1. Virtual world

generation

  • 3. Render
  • 4. Display
  • 1. Device tracking
  • 2. Real object detection
  • 4. Render
  • 5. Display

10

slide-11
SLIDE 11

What systems functionality is currently available in AR/VR?

11

slide-12
SLIDE 12

Systems Support for VR

VR:

Game engines

  • Unity
  • Unreal
  • 1. Virtual world

generation

  • 3. Render
  • 4. Display
  • Mobile

GPU

  • Qualcomm

VR/AR chips

12

slide-13
SLIDE 13

Systems Support for AR

  • 1. Device tracking
  • 2. Real object detection
  • 4. Render
  • 5. Display
  • Google ARCore
  • Apple ARKit
  • Microsoft Hololens
  • Microsoft

Hololens

  • Magic Leap
  • Smartphones

Computer vision / machine learning libraries

  • Vuforia
  • OpenCV
  • Tensorflow

13

slide-14
SLIDE 14

What AR/VR functionality is needed by researchers?

14

slide-15
SLIDE 15

Research Space in AR

  • 1. Device tracking
  • 2. Real object detection
  • 4. Render
  • 5. Display

Typically done using deep learning (research, not industry)

  • Slow: 600 ms per frame on a smartphone
  • Energy drain: 1% battery per minute on a smartphone

MARLIN (SenSys’19), Liu et al. (MobiCom’19), DeepDecision (INFOCOM’18), DeepMon (MobiSys’17) ShareAR (HotNets’19), MARVEL (SenSys’18), OverLay (MobiSys’15)

Typically done using SLAM (combine camera + IMU sensors)

  • Slow: 30 ms per frame on a smartphone
  • Energy drain: > 1.5 W on a smartphone

Can edge computing help? Can edge computing help?

15

slide-16
SLIDE 16

Research Space in AR

Example of slow object detection: Comparison of different apps’ energy drain:

16

Take-home message: Machine learning is useful in AR

  • As part of the AR processing pipeline (object detection)
  • At the expense of energy
slide-17
SLIDE 17

Research Space in VR

  • 1a. Virtual world

generation

  • 3. Render
  • 4. Display

On the mobile device On a content/edge server

Internet

  • 1b. Transmission
  • ver the network

Rubiks (MobiSys’18), FLARE (MobiCom’18), Characterization (SIGCOMM workshop’17), FlashBack (MobiSys’16)

High bandwidth: Up to 25 Mbps on YouTube at max resolution

Can machine learning help with VR traffic optimization?

17

Take-home message: Machine learning is useful in VR

  • To help with user predictions, traffic management
slide-18
SLIDE 18

Outline

  • Overview of AR/VR
  • Edge computing can provide...
  • 1. Real-time object detection for mobile AR
  • 2. Bandwidth-efficient VR streaming with deep learning
  • Future directions

18

slide-19
SLIDE 19

How AR Works

  • 1. Device tracking
  • 2. Real object detection
  • 4. Render
  • 5. Display
  • Object detection is a computational bottleneck for AR
  • Current AR is only able to detect flat planes or specific
  • bject instances
  • Can we do more powerful processing on a server?

19

slide-20
SLIDE 20

Reducing lag for augmented reality

  • Augmented and virtual reality requires a lot of computational power
  • Run expensive computer vision and machine learning algorithms

20

Run on the device? Too slow! Internet Cloud datacenter e.g. AWS Run on the cloud? Too far  too slow! Edge compute node Run on the edge?

Xukan Ran, Haoliang Chen, Xiaodan Zhu, Zhenming Liu, Jiasi Chen, “DeepDecision: A Mobile Deep Learning Framework”, IEEE INFOCOM, 2018.

slide-21
SLIDE 21

Challenges with current approaches

  • Current approaches for machine learning on mobile devices
  • Local-only processing
  • Apple Photos, Google Translate
  • GPU speedup
  • Remote-only processing
  • Apple Siri, Amazon Alexa
  • Our observations
  • Different AR apps have different accuracy and latency requirements
  • Network latency is often higher than CPU/GPU processing time on the edge server
  • Video streams and deep learning models can scale gracefully

21

Slow! (~600 ms/frame) Doesn’t work when network is bad

Local processing Remote processing

slide-22
SLIDE 22

Problem Statement

  • Problem: How should the mobile device be configured to meet the

lag requirements of the AR app and the user?

  • Solution: Periodically profile, optimize, and update the configuration

22

  • 1. Offline performance

characterization

  • 2. Online
  • ptimization
  • 3. Update the

configuration

slide-23
SLIDE 23

Optimize decision

Constraints:

  • Current network conditions
  • Application requirements
  • ffloading decision

neural net model size

video resolution

Online decision framework

detection accuracy Time energy consumption

Metrics: Degrees of freedom:

23

  • Video characteristics
  • Frame rate
  • Resolution
  • Bit rate
  • Deep learning characteristics
  • Model size
  • Model latency / energy
  • Model accuracy
  • Network condition
  • Bandwidth
  • Latency
  • App requirements
  • Latency
  • Accuracy
  • Energy
slide-24
SLIDE 24

System design

24

Front-end device Performance characterization Output display Edge server

Tiny deep learning User’s battery constraint Current network conditions App latency requirement App accuracy requirement Big deep learning

Online decision framework

Big deep learning

Input live video

slide-25
SLIDE 25

AR Object Detection Quality Metrics

25

  • Accuracy
  • Classification and location both important for AR
  • Intersection over union (IoU) metric
  • Ground truth: Big deep learning running on highest

resolution

  • Timing
  • Latency: time from when we sent the frame to getting the result
  • Frame rate: 1 / time between consecutive frames
slide-26
SLIDE 26
  • 1. Offline performance characterization:

How do latency and energy change with video resolution?

26

Energy and latency increase with pixels2 for local processing

slide-27
SLIDE 27
  • 1. Offline performance characterization:

How does accuracy change with bit rate and resolution?

27

Accuracy increases more with resolution than bitrate, especially for big deep learning Big deep learning: Tiny deep learning:

  • Encoded videos at different bitrates and resolutions
slide-28
SLIDE 28
  • 1. Performance characterization:

How does accuracy change with latency?

28

Accuracy decreases as latency increases.

Deep learning processing latency (ms) t = 0 ms t = 100 ms time Result from deep learning is stale! Deep learning processing delay

  • Measured accuracy as deep learning processing latency increased
slide-29
SLIDE 29
  • 2. Online decision framework: Optimization problem

Maximize

𝑔 + ⍺ ∑ 𝑏 𝑞, 𝑠, 𝑚 · 𝑧

  • 𝑚 =

𝑚

𝑞 +

  • + 𝑀 𝑗𝑔 𝑗 = 0

𝑚

𝑞 𝑗𝑔 𝑗 > 0

∑ 𝑚

𝑞 𝑧

  • ≤ 1/𝑔

∑ 𝑐 𝑞, 𝑠, 𝑔 · 𝑧 ≤ ℬ

  • 𝑏 𝑞, 𝑠, 𝑔 ≥ 𝐵 · 𝑧 , ∀ 𝑗:

𝑔 ≥ 𝐺; 𝑠 · 𝑧 ≤ 𝑆 ∑ 𝑧 = 1

  • 𝑞, 𝑠, 𝑔 ≥ 0; 𝑧 ∈ 0,1 ;

29

p: video resolution r: video bitrate 𝑔 : frame rate 𝑧𝑗 : which deep learning model to run (local, remote) Finish processing a frame before next frame arrives. Don’t use more than R bandwidth. Don’t use more than B battery Meet application accuracy requirement.

Subject to Variables

Meet application frame rate requirement. Frame rate Accuracy Local processing time Network transmission time 𝑏𝑗 𝑞, 𝑠, 𝑚𝑗 : accuracy function of model 𝑗 𝑚

𝑞 : latency function of model i

𝑐 𝑞, 𝑠, 𝑔 : battery function of model i From offline performance characterization: Calculate end-to-end latency.

slide-30
SLIDE 30

30

After:

slide-31
SLIDE 31

Key Take-Aways

31

Real-time video analysis using local deep learning is slow (~600 ms/frame on current smartphones) Relationship between degrees of freedom and metrics is complex, and requires profiling Choose the right device configuration (resolution, frame rate, deep learning model) to meet QoE requirements

slide-32
SLIDE 32

Outline

  • Overview of AR/VR
  • Edge computing can provide...
  • 1. Real-time object detection for mobile AR
  • 2. Bandwidth-efficient VR streaming using deep learning
  • Future directions

32

slide-33
SLIDE 33

How VR Works

  • Can we only send what is needed?
  • How do we know what to send?

33

  • 1a. Virtual world

generation

  • 3. Render
  • 4. Display

On the mobile device On a content/edge server

Internet

  • 1b. Transmission
  • ver the network
slide-34
SLIDE 34

34

360-degree Video Example

  • https://www.youtube.com/watch?v=sT0hVLEe5mU
slide-35
SLIDE 35

35

Only a portion of the scene is viewed

slide-36
SLIDE 36

Motivation

  • 360° videos are becoming popular
  • Predicted to become a $108B industry by 20211
  • More engaging and interesting for the user
  • Off-the-shelf hardware and software for content creators
  • 360° camera hardware
  • Automatic stitching software
  • Many companies/websites serving 360° videos

36

  • 1. https://www.digi-capital.com/news/2017/01/after-mixed-year-mobile-ar-to-drive-108-billion-vrar-market-by-2021/
slide-37
SLIDE 37

Challenges

  • 360° videos take more bandwidth
  • Higher resolution: 360° videos cover all spatial directions
  • Portions out of the field-of-view are wasted
  • How can we reduce the bandwidth requirements?
  • 1. Chop up the scene into tiles
  • 2. Predict the field-of-view beforehand
  • 3. Send the appropriate tiles to the client in advance
  • How can we predict the future field-of-view of the user?
  • Machine learning / time series analysis

37

slide-38
SLIDE 38

How much bandwidth do 360° videos need?

  • Collected dataset of ~4600 YouTube 360° and regular videos
  • Duration
  • Resolution
  • Bit rate
  • Motion vector
  • Measured variability of bit rates over time of 360° and regular videos
  • Compared the motion vectors of 360° and regular videos
  • Calculated effective resolution of 360° videos based on field-of-view

38

Shahryar Afzal, Jiasi Chen, K.K. Ramakrishnan, “Characterization of 360-degree videos,” ACM SIGCOMM Workshop on Virtual Reality and Augmented Reality Network, 2017

slide-39
SLIDE 39

Duration

Median Duration (s)

39

360° Videos are short:

  • new medium
  • complex to produce

Aggregate duration Per-category duration

slide-40
SLIDE 40

Resolution

Number of Resolutions 40 Fraction of videos encoded at the given resolution

DASH: multiple resolutions of each video stored on server 360° videos have more resolutions 360° videos tend to have higher resolutions

slide-41
SLIDE 41

Bit rate

  • What is the bit rate of the maximum resolution?

41

Bitrate of Maximum resolution

High bit rates for 360° video

slide-42
SLIDE 42

System Design

Gyroscope, accelerometer

VR Player

User’s head movements

Other + current users’ historical data

User prediction

Streaming

  • ptimization

Tile delivery

Prediction of where the user will look Which tiles to fetch VR video metadata

Client Server

42

Downloaded tiles

slide-43
SLIDE 43

User Prediction

  • Combined 3 publicly available datasets of users watching 360 videos
  • Used LSTM machine learning model for time series prediction
  • Data representation + cleaning matters!

43

1Xavier Corbillon, Francesca De Simone, and Gwendal Simon, “360-Degree Video Head Movement Dataset”, ACM MMSys, 2017.

Sample dataset [1]: Quaternion representation: Spherical representation: Euclidean representation:

slide-44
SLIDE 44

Prediction

44

  • Two dots video
slide-45
SLIDE 45

User Prediction Results

  • Average loss: average loss of the prediction across all frames across all users in the test set (in degrees)
  • Future value indicates how many frames ahead we are predicting

45

slide-46
SLIDE 46

User Prediction Results

  • Average loss: average loss of the prediction across all frames across all users in the test set (in degrees)
  • Future value indicates how many frames ahead we are predicting

46

slide-47
SLIDE 47

User Prediction Results

  • Average loss: average loss of the prediction across all frames across all users in the test set (in degrees)
  • Future value indicates how many frames ahead we are predicting

47

slide-48
SLIDE 48

User Prediction Results

  • Average loss: average loss of the prediction across all frames across all users in the test set (in degrees)
  • Future value indicates how many frames ahead we are predicting

48

slide-49
SLIDE 49

User Prediction Results

  • Average loss: average loss of the prediction across all frames across all users in the test set (in degrees)
  • Future value indicates how many frames ahead we are predicting

49

slide-50
SLIDE 50

User Prediction Results

  • Average loss: average loss of the prediction across all frames across all users in the test set (in degrees)
  • Future value indicates how many frames ahead we are predicting

50

slide-51
SLIDE 51

User Prediction Results

  • Average loss: average loss of the prediction across all frames across all users in the test set (in degrees)
  • Future value indicates how many frames ahead we are predicting

51

slide-52
SLIDE 52

Does video content matter?

  • Why are the losses so much different between two videos?
  • Can the content of the video help us predict more accurately?
  • We plot the heat map of the head position of the users for each video
  • Videos where users don’t look around much lower prediction error

52

slide-53
SLIDE 53

Heat maps

53

Paris

slide-54
SLIDE 54

Heat maps

54

Rollercoaster

slide-55
SLIDE 55

Heat maps

55

Rhino

slide-56
SLIDE 56

Heat maps

56

Venise

slide-57
SLIDE 57

Heat maps

57

Timelapse

slide-58
SLIDE 58

Key Take-Aways

58

360-degree VR video are large (up to 25 Mbps) Machine learning or time series prediction can help predict user behavior and avoid wasted bandwidth Domain representation and data pre-processing matter! ... Is machine learning really the optimal choice?

slide-59
SLIDE 59

Future Directions

59

slide-60
SLIDE 60

New application: Multi-User AR

How to create a synchronized world view for multiple users?

  • 1. Device tracking
  • 2. Real object detection
  • 4. Render
  • 5. Display

60

slide-61
SLIDE 61

What does AR network traffic look like?

  • AR traffic mainly involves sending device tracking information

Unpredictable because of user interactions Large bursts (>20Mb) corresponding to tracking data

Xukan Ran, Carter Slocum, Maria Gorlatova, Jiasi Chen, “ShareAR: Communication-efficient Multi-user Mobile Augmented Reality”, ACM HotNets, 2019.

How can networks manage this type of traffic?

61

slide-62
SLIDE 62

What should AR network architectures look like?

  • Current AR platforms (Google, Apple, Microsoft) use cloud or P2P

network architectures

  • Focus is on device tracking computations

 communication vs computation vs privacy tradeoffs

Can edge computing help device tracking- based AR systems?

62

slide-63
SLIDE 63

What are AR quality-of-experience metrics?

  • How to evaluate whether an AR/VR system is performing well?
  • Needed to evaluate the performance of traffic management schemes
  • For video, we have MOS, PSNR, SSIM, stalls, bit rate
  • What are equivalent quality-of-experience for AR/VR?
  • Motion-to-photons latency
  • Bit rate?
  • Just noticeable difference?
  • Immersion?
  • ...?

63

slide-64
SLIDE 64

Summary

  • VR != AR != video streaming
  • Machine learning is helpful in certain aspects of AR/VR
  • As part of the AR processing pipeline (object detection)
  • To solve problems in VR (user prediction)
  • Edge computing is helpful in certain aspects of AR/VR
  • Reduce the computational load on the AR devices
  • Trade off between computation, communication, privacy
  • Many interesting research problems remain
  • Managing multi-user AR traffic
  • Defining user quality-of-experience metrics
  • ...?

Thank you! Questions?

64