Predictive View Generation to Enable Mobile 360-degree and VR - PowerPoint PPT Presentation

Predictive View Generation to Enable Mobile 360-degree and VR Experiences Xueshi Hou, Sujit Dey Jianzhong Zhang, Madhukar Budagavi Mobile Systems Design Lab, Samsung Research America Center for Wireless Communications, UC San Diego

Motivation: Towards a Truly Mobile VR Experience Goal: Enable wireless and light VR experience § Observation: Existing head-mounted displays (HMDs) have limitations § How to make it mobile and portable: wireless and lighter? Rendering on mobile device Rendering with tethered PC attached to HMD Not mobile Clunky to wear Solution: Shifting computing tasks (e.g. rendering) to the edge/cloud, and § streaming videos to the HMD Example for Cloud-based Solution : 1. Transmit Head Motion and Control to Cloud 2. Field-of-View Rendering on Cloud 3. Transmit Rendered Video to VR Glass Streaming only Field of View (FOV) 2

Challenges of Cloud/Edge-based Wireless VR § VRhead-mounteddevicesmaketherequirementsmuchsteeperthancloud/edge-basedvideostreaming - Userexperiencemuchmoresensitivetovideoartifacts à Significantlyhighervideoqualityneeded - Head motion significantly increases latency sensitivity à Significantly higher frame rate and bitrate needed Experimentsetup: - VR space created using Unity; VR HMD: Oculus Rift DK2; Video: H264, 1080p/4K,GOP=30 Bitrate Bitrate (Mbps) Display Head Framerate Acceptable (Mbps) (Racing Game) Device Motion & QP Latency (Virtual Classroom) Virtual Classroom 1080p 4K 1080p 4K PC - 45fps, 100-200ms(for VC) 5.8 14.5 16.6 41.5 Monitor <100ms(for Game ) QP=20 Oculus - 45fps, 10.9 27.3 33.9 84.8 28ms QP=15 Oculus 75fps, 28.2 70.5 39.7 99.3 22ms QP=15 Note: For Virtual Classroom with 50 students, bitrate needed for 4k > 3.5 Gbps; Racing Game For head motion, cloud/edge-based wireless VR will require very high frame rate and bit rate, and also needs to satisfy ultra-low latency! 3

Solution for Ultra-low Latency: Machine Learning Based Predictive Pre-Rendering Possible Method 1 : Render 360-degree video on cloud, transmit to RAN § edge, and FOV extraction at edge depending on head motion Advantage: low computation overhead on edge device § Problem: Very high (backhaul) data rate § Possible Method 2: Render 360-degree video on edge device and FOV § extraction depending on head motion Advantage: theoretically low (backhaul) data rate § Problem: Restricted to edge device with very high computation; § (FOV Extraction) 4

Solution for Ultra-low Latency: Machine Learning Based Predictive Pre-Rendering § Solution: Based on head motion prediction, pre-render and stream predicted FOV in advance from edge device § Advantages: § Latency: No rendering/encoding delay, minimal communication delay with Cellular Control Data Video Connection significantly reduced bandwidth Glasses § Edge can be RAN or local; can be mobile device Cloud Server MEC System overview for proposed approach: Controller § (Predictive FOV Generation) (a) Cellular WiFi/Millimeter Data Control Video Data Control Video Connection Wave Glasses Glasses LEC Cloud Server MEC Cloud Server (Predictive FOV (Predictive (Predictive Controller (Predictive FOV Controller Generation) FOV Generation) FOV (b) Generation) (a) Generation) WiFi/Millimeter (b) Using Local Edge Computing node (LEC) (a) Using Mobile Edge Computing node (MEC) Data Control Video Wave Glasses Question: Is it possible to predict Head Motion? LEC 5 Cloud Server (Predictive FOV Controller Generation) (b)

Predictive View Generation to Enable Mobile 360-degree and VR Experiences: Early experiments with Samsung Dataset // // y 90 ° x ~90 ° 180 ° -180 ° ~90 ° FOV -90 ° Projection Euler Coordinates FOV in a 360-degree view Motivation: address both bandwidth & latency challenges § Cellular Control Data Video Connection Common approach to reduce bandwidth: streaming only FOV à still cannot § Glasses address latency problem Cloud Server MEC System overview for proposed approach: § Controller (Predictive FOV Generation) (a) Cellular WiFi/Millimeter Data Control Video Data Control Video Connection Wave Glasses Glasses LEC Cloud Server MEC Cloud Server (Predictive FOV (Predictive (Predictive Controller (Predictive FOV Controller Generation) FOV Generation) FOV (b) Generation) (a) Generation) WiFi/Millimeter (a) Using Mobile Edge Computing node (MEC) (b) Using Local Edge Computing node (LEC) Data Control Video Wave 6 Glasses LEC Cloud Server (Predictive FOV Controller Generation) (b)

§ Idea: predictive view generation approach – § only predicted view is extracted (for 360-degree video) or rendered (in case of VR) and transmitted in advance (viewpoint refers to the center of FOV) y 90° x -180° 180° ~90° Tile (30°x30°) Viewpoint FOV ~90° -90° 7

Predictive View Generation to Enable Mobile 360-degree and VR Experiences: Early experiments with Samsung Dataset Setup: Samsung Gear VR, sampling frequency f=5Hz § Dataset: head motion traces from over 36,000 viewers for 19 360-degree/VR § videos during 7 days Tiles options: 12x6 tiles (30°x30°), 18x6 tiles (20°x30°), etc. § VR dataset statistics 1 1 § Over 80% of videos have >100s for 0.8 0.8 0.6 0.6 CDF duration CDF 0.4 0.4 § Around 85% of videos have >1000 0.2 0.2 viewers 0 0 100 200 300 400 1000 2000 3000 Video Duration (s) # Viewers Head Motion Speed (°/s) Max 150 This boxplot shows head motion speed 75 th Percentile distribution for over 1500 viewers during 100 60s; it presents the challenging situation of Median predicting head motion since viewers may 25 th Percentile 50 change viewing direction fast as well as Min 0 frequently 0 10 20 30 40 50 60 Time (s) Head motion speed versus time in Kong VR 8

Predictive View Generation to Enable Mobile 360-degree and VR Experiences § Attention heatmap is defined as a series of probability that a viewpoint is within a tile for n viewers during time-period from cts 1 to cts 2 Example of attention heatmap Brighter tiles attract more attention and viewers are more likely to look at § these areas § Feasibility of performing viewpoint prediction (some areas attracting more attention than remaining areas within a 360-degree view) Multiple tiles (as high as 11 tiles) have relatively high probabilities (>5%), § indicating the difficulties of predicting viewpoint accurately 9

Viewpoint Prediction using Deep Learning Goal: predict viewpoint position (tile) for 200ms in advance § Model: multi-layer long short-term memory (LSTM) networks § Input Features: tile-based one-hot encoding representation for viewpoint traces § as 72x10 matrix (72 tiles, 10 timestamps in 2s) Label for training: whether viewpoint belonging to each tile as 72x1 matrix § Output: probability of viewpoint belonging to each of the 72 tiles § Viewpoint trace during t ∈ (3,5], in seconds Where is the viewpoint when t=5s t= 5.2s (200ms t=3s afterwards)? 10

Viewpoint Prediction using Deep Learning Goal: predict viewpoint position (tile) for 200ms in advance § Model: multi-layer long short-term memory (LSTM) networks § Input Features: tile-based one-hot encoding representation for viewpoint traces § as 72x10 matrix (72 tiles, 10 timestamps in 2s) Label for training: whether viewpoint belonging to each tile as 72x1 matrix § Output: probability of viewpoint belonging to each of the 72 tiles § Viewpoint traces during t ∈ (3,5) seconds <0.01 <0.01 <0.01 <0.01 0.21 0.37 0.11 <0.01 0.05 <0.01 0.03 0.04 0.10 <0.01 <0.01 <0.01 <0.01 11

Viewpoint Prediction using Deep Learning Dataset: Head motion traces of 36,000 viewers § Predicted Viewpoint during 7 days for 19 360-degree/VR videos; Softmax Layer Each trace point 200ms Fully Connected Layer Training Data: 45,000 head motion sampling § … LSTM LSTM LSTM traces (each for 2s long) Unit Unit Unit Test Data: 5,000 head motion sampling traces § … LSTM LSTM LSTM (where viewers are different from training data) Unit Unit Unit … Parameters: § Viewpoint Features § first layer: 128 LSTM units; second layer: 128 LSTM units; fully connected layer: 72 nodes; § We explore four deep learning or classical machine learning models for viewpoint prediction: LSTM, Stacked sparse autoencoders (SAE), Bootstrap- aggregated decision trees (BT), and Weighted k-nearest neighbors (kNN) § SAE: two fully-connected layers with 100 and 80 nodes respectively; BT: ensembles with 30 bagged decision trees; kNN: 100 nearest neighbors 12

Predictive View Generation: Accuracy and Bitrate FOV Selection: Accuracy and Bitrate § FOV generation method: § Select m tiles with highest probabilities predicted by the LSTM model § Compose the predicted FOV as the combination of FOVs for each selected tile § Transmit the predicted FOV with high quality FOV generation while leaving the rest of tiles blank FOV prediction accuracy: the probability that actual user view will be within the predicted FOV § depends on the LSTM model accuracy and FOV generation method, § and thus reflects both the performance of our LSTM model and FOV generation method 13

Predictive View Generation to Enable Mobile 360-degree and VR - PowerPoint PPT Presentation

Predictive View Generation to Enable Mobile 360-degree and VR Experiences Xueshi Hou, Sujit Dey Jianzhong Zhang, Madhukar Budagavi Mobile Systems Design Lab, Samsung Research America Center for Wireless Communications, UC San Diego

360 Hyundai i30 360 Video Entry Level Sho o ting in 360 L ig hting Pa ra lla x E

LG s View on View on LG s LG s View on Future Mobile Future Mobile Future Mobile

360 Foodservice - 2015 Dan McGlynn Account Director CGA Strategy 360 Foodservice - 2015 Who

Lampton 360 Group Report Quarter Three 1 Purpose of report The purpose of this report is to

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

3D Documentation Using Entry Level 360 Degree Cameras 3 Easy Steps Take Photos Upload to Cloud

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

INTERNET FOR A MOBILE INTERNET FOR A MOBILE GENERATION GENERATION www.itu.int/mobileinternet

360-Degree 3D Ground Surface Reconstruction Using a Single Rotating Camera Kouma Motooka ,

Social Gaming in Virtual Reality using 360 Degree Immersion Jackie Engberg Christensen

360 CAPITAL TOTAL RETURN FUND (ASX: TOT) FY18 Results Presentation 22 August 2018 The stapled

BIM 360 Design: What MEP Contractors Need to know Core Services Applied Matt Dillon Director

Power and Bandwidth Optimization in 360-Degree Immersive Mobile Video Streaming Sheng Wei,

Mobile Capabilities And Credentials Contents Mobile Landscape Mobile Functionalities

Useful Reading Papers about: Latest Developments MUPPETS (CITC4 03) M2MI (Web

Outline Wearable Computers and Wearable computers Overview Augmented Reality

See-Through Head-Mounted Displays based on 3D Eye Localization Yuta Itoh and Gudrun Klinker TU

BFD for 1 + 1 protection schemes with point 2 point adjacencies: post-MPLS_WC meeting David Ward

Past, Present and Future Class #2: Intro to Video Game User Interfaces Content based on

Mobile Technologies context and task theory interaction techniques in/output technologies 1

Podcast: Ch05-03 Title : Identifying Attributes and Operations Description : finding

Mass Storage & IO - I Tevfik Ko ar University at Buffalo November 14 th , 2013 1 Overview

Predictive View Generation to Enable Mobile 360-degree and VR - PowerPoint PPT Presentation

Predictive View Generation to Enable Mobile 360-degree and VR Experiences Xueshi Hou, Sujit Dey Jianzhong Zhang, Madhukar Budagavi Mobile Systems Design Lab, Samsung Research America Center for Wireless Communications, UC San Diego

360 Hyundai i30 360 Video Entry Level Sho o ting in 360 L ig hting Pa ra lla x E

LG s View on View on LG s LG s View on Future Mobile Future Mobile Future Mobile

360 Foodservice - 2015 Dan McGlynn Account Director CGA Strategy 360 Foodservice - 2015 Who

Lampton 360 Group Report Quarter Three 1 Purpose of report The purpose of this report is to

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

3D Documentation Using Entry Level 360 Degree Cameras 3 Easy Steps Take Photos Upload to Cloud

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

INTERNET FOR A MOBILE INTERNET FOR A MOBILE GENERATION GENERATION www.itu.int/mobileinternet

360-Degree 3D Ground Surface Reconstruction Using a Single Rotating Camera Kouma Motooka ,

Social Gaming in Virtual Reality using 360 Degree Immersion Jackie Engberg Christensen

360 CAPITAL TOTAL RETURN FUND (ASX: TOT) FY18 Results Presentation 22 August 2018 The stapled

BIM 360 Design: What MEP Contractors Need to know Core Services Applied Matt Dillon Director

Power and Bandwidth Optimization in 360-Degree Immersive Mobile Video Streaming Sheng Wei,

Mobile Capabilities And Credentials Contents Mobile Landscape Mobile Functionalities

Useful Reading Papers about: Latest Developments MUPPETS (CITC4 03) M2MI (Web

Outline Wearable Computers and Wearable computers Overview Augmented Reality

See-Through Head-Mounted Displays based on 3D Eye Localization Yuta Itoh and Gudrun Klinker TU

BFD for 1 + 1 protection schemes with point 2 point adjacencies: post-MPLS_WC meeting David Ward

Past, Present and Future Class #2: Intro to Video Game User Interfaces Content based on

Mobile Technologies context and task theory interaction techniques in/output technologies 1

Podcast: Ch05-03 Title : Identifying Attributes and Operations Description : finding

Mass Storage &amp; IO - I Tevfik Ko ar University at Buffalo November 14 th , 2013 1 Overview

Mass Storage & IO - I Tevfik Ko ar University at Buffalo November 14 th , 2013 1 Overview