Distribution Systems for 3D Teleimmersive and Video 360 Content: - - PowerPoint PPT Presentation

distribution systems for 3d
SMART_READER_LITE
LIVE PREVIEW

Distribution Systems for 3D Teleimmersive and Video 360 Content: - - PowerPoint PPT Presentation

Distribution Systems for 3D Teleimmersive and Video 360 Content: Similarities and Differences Klara Nahrstedt Department of Computer Science University of Illinois at Urbana-Champaign klara@illinois.edu ACM Multimedia Systems, June 12, 2018,


slide-1
SLIDE 1

Distribution Systems for 3D Teleimmersive and Video 360 Content: Similarities and Differences

Klara Nahrstedt Department of Computer Science University of Illinois at Urbana-Champaign klara@illinois.edu

ACM Multimedia Systems, June 12, 2018, Amsterdam, Netherlands

slide-2
SLIDE 2

Overview

  • Motivation
  • 3D Teleimmersive Video Representation
  • Video 360 Representation
  • Similarities and Differences in Content Representation
  • Distribution of 3DTI Video
  • Distribution of Video 360
  • Similarities and Differences in Content Distribution
  • Conclusion
slide-3
SLIDE 3

3D Teleimmersive (3DTI) Systems

3

Source: http://tele-immersion.citris-uc.org; http://monet.cs.illinois.edu/projects/cyphy-multi- modal-teleimmersion-for-tele-physiotherapy/teleimmersion-gallery/

slide-4
SLIDE 4

High-End Tele-Presence Environments

Cisco Tele-presence HP Halo UNC HP Colesium

slide-5
SLIDE 5

Multi-Camera Live Broadcast Systems

http://www.dailymail.co.uk/sciencetech/article-2336893/New-TV-cameras-bring-Matrix-style-bullet-time- trickery-live-sports-coverage.html

slide-6
SLIDE 6

Multi-Camera Broadcast Systems

https://thegadgetflow.com/portfolio/slingstudio- multi-camera-broadcaster/ https://www.myslingstudio.com/ https://www.cinfo.es/our-products/synthetrick/multicam https://www.spiideo.com/sports/

slide-7
SLIDE 7

360-Degree Video

7

360 Degrees Cameras – CoolPile.com: http://coolpile.com/tag/360-degrees-cameras

slide-8
SLIDE 8

3D Teleimmersive Video Representation

slide-9
SLIDE 9

3D Teleimmersive Stereo Video and Free Viewpoint Video Capture

slide-10
SLIDE 10

3DTI Viewing

Photo courtesy of Prof. Ruzena Bajcsy. Singapore, 2014

slide-11
SLIDE 11

3D Stereo Video Representation

Wu, Ahsan, Kurillo, Agarwal, Nahrstedt, Bajcsy, “Color-plus-Depth Level-of-Detail in 3D Teleimmersive Video: A Psychophysical Approach”, ACM Multimedia 2011

slide-12
SLIDE 12

Free-Viewpoint 3D Video Representation

Example of 3D representation captured by different cameras camera-1 Camera-2 Camera-3 Camera-8

slide-13
SLIDE 13

camera direction

source: http://zing.ncsl.nist.gov/~gseidman/vrml/

Angle

θ

View Model

Oi Ou

slide-14
SLIDE 14

3DTI Data Model

  • 3D frame for camera i at time t:

fi,t

  • Each pixel in the frame

carries color+depth data and can be independently rendered

  • Stream for camera i
  • Si = { fi,t1 fi,t2 … }
  • Macro-frame
  • Ft = { f1,t f2,t … fn,t }

… …

1 n

f1,t1 fn,t1 Ft1 … f1,t2 fn,t2 Ft2 S1 Sn

slide-15
SLIDE 15

360-Degree Video Representation

slide-16
SLIDE 16

360-Degree Video

User’s Viewport

Generation of 360-Degree Video

  • Capturing of multiple 2D videos together with their metadata
  • Stitching videos together and further editing them in spherical video
  • Encoding spherical video considering projection, interactivity, storage and delivery formats

(this will impact decoding and rendering processes)

slide-17
SLIDE 17

Video 360 Viewing and Navigation

https://en.wikipedia.org/wiki/Head-mounted_display Controller Example of HDM (Head-Mounted Displays) – Oculus Rift, Samsung Gear VR, HTC Vive,

slide-18
SLIDE 18

360-Degree Video Data Model

  • Field-of-View or Viewport – display region on the Head-Mounted Display
  • Fraction of omnidirectional view of the scene
  • Viewport defined by a device-specific viewing angle (typically 120 degrees) which

delimits horizontally scene from head direction center, called viewport center

  • Viewport Resolution – 4K (3840x2160) pixels
  • Resolution of full 360-degree video – at least 12K (11520x6480)
  • Video Framerate – order of HMD refresh rate 100Hz – 100 fps
  • Motion-to-Photon Latency requirement
  • Less than 20 ms for VR – much smaller than Internet request-reply delay
  • Need viewport prediction
  • Bitrate – Video 360 vs HEVC (8K video at 60fps is approx. 100 Mbps)
  • Tiling- Spatial divide of spherical video into in independent tiles
slide-19
SLIDE 19

Tiles and Spherical Maps

Issues with Spherical Mapping to Tiles

  • Viewport distortion
  • Spatial quality variance

Considerations of sphere-to-plane mapping and viewing probability of tiles are IMPORTANT

  • Overall spherical distortion of segment is the sum of distortion over all pixels the segment

covers

Xie et al. “360ProbDASH: Improving QoE of 360 Video Streaming Using Tile-based HTTP Adaptive Streaming”, ACM MM 2017

slide-20
SLIDE 20

Video 360 Spherical-to-Plane Projections

Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017 Nasrabadi et al. “Adaptive 360-Degree Video Streaming using Scalable Video Coding”, ACM Multimedia 2017 Video 360 Capture as Spherical Video

Equirectangular Projection – stretches poles and reduces efficiency of coding Pyramid Projection – sees degradation on sides Cubemap – maps 90 degree FOV to sides of cube and provides hence less degradation

slide-21
SLIDE 21

Encoding and Delivery Formats

  • Codecs
  • AVC/H.264, HEVC/H.265
  • VP8, VP9
  • Delivery Formats
  • DASH/HLS (Dynamic Adaptive HTTP)
  • MPEG-DASH Standard considers

tiling

  • MPD (Media Presentation

Description) –Modified for Video 360

  • SRD (Spatial Relation Description)

integrated into MPD

  • HEVC considers video tiles
  • MPEG – Immersive media standard

ISO/IEC 23090

  • Part 1: Use cases
  • Part 2: OMAF (Omnidirectional Media

Application Format)

  • Description of equirectangular projection

format

  • Metadata for interoperable rendering of

360-degree monoscopic and stereoscopic audio-visual data

  • Storage format (ISO base media file

format/MP4

  • Codecs: HEVC, MPEG0H 3D audio
  • Part 3: Immersive video
  • Part 4: Immersive Audio

Graf, Timmerer, Mueller, “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP”, ACM MMSys 2017

slide-22
SLIDE 22

Similarities and Differences of Representations

slide-23
SLIDE 23

Similarity Parameter 3DTI Video 360-Degree Video Multi-camera Views Yes (view) Yes (viewport) Joint coordinate system Yes Yes Bitrate consideration Yes Yes View change Yes Yes Difference Parameter 3DTI Video 360-Degree Video

Video Format

Color-Plus-Depth Color

Smallest item to adapt

3DTI frame tile

Frame Representation

Frame manipulation at Pixel level (RGB, Depth, Polygons) Frame manipulation at tiles and Region

  • f Interest level

Coding

Simple zlip Complex HVEC

Resolution

640x480 or 1080p 4K to 16K

Resolution for diverse devices

No Yes

Format for diverse navigation

No Yes

slide-24
SLIDE 24

Distribution Systems of 3DTI Video

slide-25
SLIDE 25

Multi-Camera 3DTI Transmission System

P camera av display

C C R G G

switch

Site -2 A

microphone camera av display

R C C G

switch

Site-1 A

microphone

C = camera A = microphone G = gateway R = renderer

Internet

25

slide-26
SLIDE 26

Approach: Multi-stream Hierarchical Adaptation

slide-27
SLIDE 27

Multi-stream Adaptation (Stream Selection)

  • Camera orientation:
  • User view orientation:

 cos = , , where  is the angle between camera and user view

  • Selection (SI) – View-Centric Stream Selection

where T is a user specified parameter

camera direction Zhenuy Yang, Klara Nahrstedt, Bin Yu, Ruzena Bajcsy, “A Multi-stream Adaptation framework for Bandwidth Management in 3D Teleimmersion”, ACM NOSSDAV 2006, May 2006, Newport, Rhode Island

slide-28
SLIDE 28

View-Centric Stream Differentiation

3D D captu turin ing

8 4 6 2

3D D camera tr transmis ission

8 4 6 2

3D D ren enderin ing user er vie view str trea eams con

  • ntributin

ing more to

  • user vie

iew les less im important str trea eams

slide-29
SLIDE 29

Timing Performance Validation

Macro-Frame Delay at Sender side Macro-frame Completion Interval at Receiver Side (End-to-End Delay UIUC-UCB)

slide-30
SLIDE 30

Immersive View-Centric Multi- View Multi-Party 3DTI

  • Z. Yang et al. “ViewCast: View Dissemination and Management for Multi-Party 3D Tele-

immersive Environments, ACM Multimedia 2007

slide-31
SLIDE 31

Multi-Party Multi-View Telepresence

Example of 3D representation captured by 4 cameras

camera-1 Camera-2 Camera-3 Camera-8

c1 c2 Camera c 3 c 4 c 5 c 6 c 7 c 8

view

Multi-stream contents Multi-view environment High resource demand Multi-stream dependency Real-time interactivity

slide-32
SLIDE 32

Telepresence Session Control

G G G

R C C A C C R A R A C C C = camera A = microphone G = gateway R = renderer

Decoupled control and data plane  Hierarchical control  Global session controller  Local session controllers at G Coordinated global control plane  Monitor data plane  Configure data plane Data plane at TI participants  Session routing table (SRT)  Stream forwarding

Global Session Controller

(SRT)

Matching Field (ID) Forwarding Action Bitrate

Site-X Site-Y Site-Z

slide-33
SLIDE 33

ViewCast: Middleware (Overlay) Framework

A three-layer multi-party/multi-stream management framework

View-aware Stream Differentiation/Selection Overlay network Service Middleware Network Tele-immersive Application

ViewCast

slide-34
SLIDE 34

V2 V3 V4 V1 U2.w U3.w U4.w

user view

U2

session controller

U3 U4

3D capturing 8 4 6 2 3D camera transmission 8 4 6 2 3D rendering User/node’s view request streams contributing more to user view less important streams

slide-35
SLIDE 35

V2 V3 V4 V1 U2.w U3.w U4.w U2

session controller

U3 U4 U3.w victim

Why view change a problem?

slide-36
SLIDE 36

Streams/View

GC = 100%, Ii (Oi) = 24

average 3.2 better than MC–3 performance but with 22% less rejection ratio

slide-37
SLIDE 37

Immersive and Non-Immersive Multi-Party Multi-View (Live Broadcast) Systems

Arefin Ahsan , Zixia Huang, Klara Nahrstedt, Pooja Agarwal, “4D TeleCast: Towards Large Scale Multi-site and Multi-view Dissemination of 3DTI Content”, IE IEEE IC ICDCS 20 2012 12, Makau, China.

slide-38
SLIDE 38

TI Components & Participants

  • Immersive Participants
  • Tight Interactivity
  • Limited Scale

P

camera av display

C C R G G

switch

SITE -2 A

microphone

Berkeley

camera av display

R C C G

switch

SITE-1 A

microphone

Illinois INTERNET C = camera A = microphone G = gateway R = renderer S = sensors S

  • Non-immersive Participants
  • Large Scale

SITE-3 R G SITE-4 R G SITE-5 R G SITE-6 R G SITE-7 R G SITE-9 R G SITE-8 R G SITE-10 R G SITE-10 R G

Producers

NI Viewers

Producers

slide-39
SLIDE 39

View/Stream Concepts among Immersive Participants

3D capturing

8 4 6 2

3D camera transmission

8 4 6 2

3D rendering user view streams contributing more to user view less important streams

Content Producer (Immersive Participant) Content Producer (Immersive Participant)

slide-40
SLIDE 40

View/Stream Concept among Non- Immersive Participants

Site-A Site-B 1 3 5 7 1 3 7 Display

view

Viewers (Non-immersive Participant)

5

Camera v1

v1 = [ ]

3D streams 3D streams

4D Content

Content Producers

5 7 5

6 5 4 7 6 5 6 5 4 7 6 5 > > > > >

slide-41
SLIDE 41

Site-B 1 3 5 7 1 3 7 Display

view

Viewers 5

Camera v2

Site-A Producers v2 = [ ] 4 3 5 6 7 5 4 3 5 6 7 5 > > > > >

Multi-View Video among Non- Immersive Participants

slide-42
SLIDE 42

Approach: 4D TeleCast

Producer Tier

Site-A Site-B Site-C

Viewer Tier

C G

Viewer Camera Communication Gateway

Internet GSC

LSC LSC LSC

GSC – Global Session Controller LSC – Local Session Controller

slide-43
SLIDE 43

Viewer Tier

G C C A S R G C C A S R

Producer Tier

Site-B Site-A Site-C D

Internet

CDN CDN-P2P

Infrastructure Management

[CDN Assisted Peer] Wang’08, Liu’10, Chang’09

slide-44
SLIDE 44

4D TeleCast

Distribution Core Server Edge Server

CDN

Request view V1={S1, S2, S3}

s1 s2 s3

Request view: V1={S1, S2, S3}

s1 s3 s2 s1 s2 s3

Request view: V1={S1, S2, S3} Request view: V1={S1, S2, S3}

s2 s3 s1

slide-45
SLIDE 45

Multi-stream Dependency (Problem Description)

S1

A

S2

A

>dbuff v1 = {S1

A, S2 A}

U1 dbuff

Send to display

time

S1

A

S1

A

U2

S2

A

S1

A

S2

A

U1

CDN

Victim stream

Maximum allowed delay bound = dbuff

S1

A

Violation of delay bound by dbuff Waste of bandwidth Victim streams

U3

slide-46
SLIDE 46

Understanding E2E Delay

u2

A B C

Site-A Site-B Site-C

Layer-0 Layer-1 Layer-2 τ

Δ

end-to-end delay

Producer

u1

s1

A

s1

B

s1

C

u3 u2 u1 u3 u3 u2 u1

τ = layer size Δ = distance from source

  • Use Delay Layer Hierarchy

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 4 6 8 10 12 14 16 18 Fraction of viewers Maximum layer of accepted streams

slide-47
SLIDE 47

Distribution Systems for Video 360

slide-48
SLIDE 48

Pipeline of 360-Degree Video

Graf, Timmerer, Mueller, “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP”, ACM MMSys 2017

slide-49
SLIDE 49

Challenges of 360-degree Video Distribution

  • Real-Time Stitching
  • Simulator Sickness in Interactivity Scenarios
  • Enable to react to HMD head movements as fast as the HMD refresh rate (120 Hz)
  • Viewport extraction in real-time
  • Challenge: difficult to predict user orientation for more than 3 seconds
  • Challenge: if short-term prediction is needed, how do we avoid rebuffering/stall

under small playout buffers?

  • Avoidance of bandwidth waste (if one downloads viewports that are not

needed)

  • Tiles prefetching error
slide-50
SLIDE 50

MPEG-DASH Video Distribution System for Single 2D Video Stream

dash.js. https://github.com/Dash-Industry-Forum/dash.js/wiki.

slide-51
SLIDE 51

MPEG-DASH Video 360 Video Streaming using Tiles

Graf, Timmerer, Mueller, “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP”, ACM MMSys 2017

slide-52
SLIDE 52

360-Video Streaming Systems

  • Tiling for Adaptive Streaming
  • Video divided into tiles
  • Depending on the mapping of spherical video projection, different tiles will be

streamed

  • Tiles currently viewed by users are streamed at high quality and the rest with low

resolution

  • Personalized Viewport-Only Streaming – Asymmetric Panorama viewing
  • Also called asymmetric panorama viewport adaptive streaming
  • Methods: Truncated Pyramid Projection (TSP), Cubemap
  • Video divided into segments
  • When client moves head, the viewport center changes and new viewport must be

display

  • Decrease of bitrate without decrease of quality of viewport

ISO/IEC JTC1/SC29/WG11/M. 2016. VR/360 Video Truncated Square Pyramid Geometry for OMAF.

slide-53
SLIDE 53

Tile-based HTTP Adaptive Streaming and Head Movement Prediction

Xie, Xu, Ban, Zhang, Guo, “360ProbDASH: Improving QoE of 360 Video Streaming Tile-based HTTP Adaptive Streaming”, ACM Multimedia 2017

slide-54
SLIDE 54

Tile-based HTTP Adaptive Streaming for 360 Video

slide-55
SLIDE 55

Data Model at 360ProbDASH Server

ERP – Raw Panoramic Video

  • ERP is divided into video chunks
  • Each chunk is cropped into N tiles, indexed in raster-scan order
  • Each tile is encoded into segments with M bit-rate levels
  • MxN optional segments stored at server and ready for pre-fetching and streaming
slide-56
SLIDE 56

360ProbDASH Approach

  • Pre-fetch Segments by predicting viewport
  • Use probabilistic model for prediction
  • Leverage Linear Regression Prediction of Orientation
  • Distribution of Prediction Errors
  • Long-term predictions are hard
  • 5 users data collection for short term prediction error

(3 seconds)

Yourstory.com Yaw prediction Pitch prediction Roll prediction Delta = 3 sec

slide-57
SLIDE 57

Tile-based Adaptive Video Streaming

  • Ochi et al use tile-based streaming where spherical video is mapped to equirectangular

video and video is cut into 8x8 tiles

  • Hosseini and Swaminathan use hexa-face sphere-based tiling of 360-degree video to

take into account projection distortion

  • Description of tiles with MPEG-DASH Spatial Relation Description
  • Quan et al use prediction of head movement to deliver tiles
  • Weaknesses of Tiling systems
  • Time and energy consuming reconstruction
  • Coding inefficiency due to independent tiling
  • Server management of files is difficult due to large amount of quality levels and large MPD files
  • Client selection process is complex
  • Mixed bit-rate tiles can result in visible border and quality inconsistence in combined-tiles rendering
  • Multiple Decoders
  • D. Ochi, Y. Kunita, A. Kameda, A. Kojima, and S. Iwaki. Live streaming system for omnidirectional video. In Proc. of IEEE Virtual Reality (VR), 2015.
  • M. Hosseini and V. Swaminathan. Adaptive 360 vr video streaming: Divide and conquer! In IEEE International Symposium on Multimedia (ISM), 2016.
  • F. Quan, B. Han, L. Ji, and V. Gopalakrishnan. Optimizing 360 video delivery over cellular networks. In ACM SIGCOMM AllThingsCellular, 2016.
slide-58
SLIDE 58

QER Viewport-Adaptive Streaming

Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017

slide-59
SLIDE 59

Viewport Adaptive Streaming System

Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017

slide-60
SLIDE 60

Approach: QER - Quality Emphasized Region

  • Not only bit-rate adaptation but also QER server adaptation where different regions have

different quality

  • QER – Quality Enhanced Region
  • Each QER is represented by Quality Emphasis Center (QEC)
  • Full video gets delivered in certain projection representation (equirectangular, cube, ..), but it has different

versions of video QEC

  • Client device selects the right representation and extracts viewport
  • Viewport-adaptive streaming similar to DASH
  • Client runs adaptation algorithm to select video representation; selects QER and QEC of available QER
  • QEC selection is based on smallest orthodromic distance
  • Orthodromic distance –shortest distance between two points on surface of sphere, measured along surface of sphere
  • Video segment length
  • Temporal Chunk sent from server – 1-10 seconds
  • Tradeoff between short and long segments
  • Expanded MPD
  • MPD file expanded with new information
  • Coordinates of its QEC in degrees
  • Two angles (0,360) degrees and (-90,90) degrees
  • All representations assume the same reference coordinate system
slide-61
SLIDE 61

QER-Based Viewport Adaptive Streaming

Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017

slide-62
SLIDE 62

Examples of Experimental Results

  • Metric to extract viewport – (1) MS-SSIM: Multi-Scale Structural Similarity and (2) PSNR
  • Original equirectangular video of full quality - 4K video with 1080p resolution
  • QEC - in center of face encoded with best quality, other faces at 25% of full quality
  • Distance - for d = 0, QEC and viewport center match 0.98; as d increases, quality decreases
  • QEC numbers - With increased QEC number, quality increases; shorter segments are better
slide-63
SLIDE 63

Similarities and Differences of Distribution Systems

slide-64
SLIDE 64

Similarity Parameter 3DTI Video 360-Degree Video Dealing with Bandwidth Adapt Views Adapt Viewports View change yes yes Navigation Via mouse yes Via mouse yes Client adaptation yes yes Streaming Protocols TCP-based TCP-based Difference Parameter 3DTI Video 360-Degree Video Dealing with Bandwidth Adapt Views/Streams Adapt Viewports/Tiles Encoding Standards zlip/some efforts in MPEG/OMAF on 3DTI compression MPEG-DASH considers

  • mnidirectional video tiles

Distribution Style Real-time view-based telepresence style or live view-based broadcast On-demand DASH-style Clients homogeneous heterogeneous Viewing Flat 2D or 3D displays Head-Mounted Displays Streaming Protocols TCP-Based HTTP-based Standard MPEG-DASH Navigation Via mouse only Via mouse, head movement, hand movement

slide-65
SLIDE 65

Conclusion and Summary

  • 360-degree video is becoming possible for
  • 3D teleimmersive video or
  • Omnidirectional video
  • First solutions are coming up in terms of
  • capture, encoding and viewing
  • But distribution represents challenge
  • Real-time live streaming or
  • Near-real-time distribution of 360-degree video
  • A lot of presented material will be published in a survey paper
  • “Scalable 36-Degree Video Streaming: Challenges, Solutions and Opportunities”
  • Authors: Michael Zink, Ramesh Sitaraman, Klara Nahrstedt
  • Journal Venue: Proceedings of IEEE Special Issue
  • Editors: Boris Koldehofe, Ralf Steinmetz, …
  • Coming up in early 2019