Holograms are the Next Video Philip A. Chou, 8i Labs, Inc. ACM - - PowerPoint PPT Presentation

holograms are the next video
SMART_READER_LITE
LIVE PREVIEW

Holograms are the Next Video Philip A. Chou, 8i Labs, Inc. ACM - - PowerPoint PPT Presentation

Holograms are the Next Video Philip A. Chou, 8i Labs, Inc. ACM Multimedia Systems Conference 13 June 2018 Princess Leia Star Wars Episode IV, 1977 The Holodeck Star Trek Next Generation, Episode 12, 1988 Black Panther, 2018 No, you


slide-1
SLIDE 1

Holograms are the Next Video

Philip A. Chou, 8i Labs, Inc. ACM Multimedia Systems Conference 13 June 2018

slide-2
SLIDE 2

Star Wars Episode IV, 1977 Princess Leia

slide-3
SLIDE 3

Star Trek Next Generation, Episode 12, 1988 The Holodeck

slide-4
SLIDE 4

Black Panther, 2018

slide-5
SLIDE 5

Solo, 2018 “No, you can’t wipe `em off. They’re holograms.” – Tobias Becket to Chewbacca.

slide-6
SLIDE 6
slide-7
SLIDE 7

Gabor Holograms

  • Dennis Gabor,

“A new microscopic principle,” Nature, 1948.

  • Etymology: holo + gram,

from Ancient Greek (hólos, whole) + (grammḗ, letter, line, writing, message)

Decode Encode

https://en.wikipedia.org/wiki/Holography

slide-8
SLIDE 8

Gabor Holograms as (Angular Spectra of) Images from Multiple Viewpoints

𝑤 𝑣 𝜄 𝜚 𝑤 𝑣 𝜄 𝜚

slide-9
SLIDE 9

Images from Multiple Viewpoints as Light Fields

𝑤 𝑣 𝑡 𝑢

slide-10
SLIDE 10

Light Fields as Point Clouds

slide-11
SLIDE 11

Agenda

  • Introduction
  • Holograms == Volumetric Media (Gabor Holograms, Light Fields, Point Clouds, …)
  • Applications
  • Historical remarks
  • Point Cloud Compression (PCC)
  • Light Field Compression using PCC
  • Streaming Holograms
  • Conclusion
slide-12
SLIDE 12

Applications

slide-13
SLIDE 13

Holograms: The Medium to Represent Natural Content in VR / AR / MR

VR puts you in a Virtual World AR puts virtual objects in your world

slide-14
SLIDE 14

Audio: Three Modes of Distribution

On-Demand Live Broadcast Telecommunication

slide-15
SLIDE 15

Video: Three Modes of Distribution

On-Demand Live Broadcast Telecommunication

slide-16
SLIDE 16

Holograms: Three Modes of Distribution

On-Demand Live Broadcast Telecommunication

Buzz Aldrin: Cycling Pathways to Mars

slide-17
SLIDE 17

Historical Remarks

slide-18
SLIDE 18

180+ years since invention of images (as photographs)

Daguerreotype, 1838

slide-19
SLIDE 19

140+ Years since the invention of audio (as telephony)

Telephone ca. 1875

slide-20
SLIDE 20

90+ Years since the invention of video (as television)

Television, 1926

slide-21
SLIDE 21

Dawn of Digital Video

Arun Netravali, Head Visual Communications Research Dept. IEEE Transactions on Computers, 1974

slide-22
SLIDE 22

JPEG (January 1988)

Today: > 1 Trillion photos/year Photo from JPEG (Macau, October 2017): Celebration of 25th Anniversary

  • f JPEG Standard (1992)
slide-23
SLIDE 23

MPEG PCC (Macau, October 2017)

Hologram compression today is like video compression in 1988

slide-24
SLIDE 24

Subjective Results: Category 2 (Dynamic)

3.9 6.0 13 27 Mbps 3.5 6.0 11 20 Mbps 3.5 6.0 9 18 Mbps

slide-25
SLIDE 25

1. Static (single-frame) 3. Dynamic Acquisition (e.g., from Mobile Mapping Systems)

MPEG Point Cloud Compression (PCC)

2. Dynamic (multi-frame)

slide-26
SLIDE 26

“Video-based” approach

  • Patch Information
  • Dominant axis
  • (x,y,z) offset
  • (u,v) offset
  • Dimensions
  • Occupancy map
  • Geometry video
  • Texture video
slide-27
SLIDE 27

“Native 3D” approach to coding geometry

10010001 10010001 11001001 10010001

slide-28
SLIDE 28

“Native 3D” approach to coding attributes

221,136,255 255,153,255 255,102,255 153,153,255

e.g., Y, U, V

slide-29
SLIDE 29

Point Cloud Attribute Compression using a Region Adaptive Hierarchical Transform (RAHT)

Ricardo L. de Queiroz and Philip A. Chou, “Compression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform,” IEEE Trans. Image Processing, Aug 2016. Maja Krivokuca, Maxim Koroteev, Philip A. Chou, Robert Higgs, and Charles Loop, “A Volumetric Approach to Point Cloud Compression,” in preparation.

slide-30
SLIDE 30

Three Generations of Transforms for Point Cloud Attribute Compression

  • 1. Graph Signal Processing (Graph Fourier Transform – GFT)
  • 2. Sampled Spatial Stochastic Process (Gaussian Process Transform – GPT)
  • 3. Volumetric Functions (Region Adaptive Hierarchical Transform – RAHT)
slide-31
SLIDE 31

Measure

  • Measure 𝜈: 𝑇 ↦ ℝ+ maps each set to a non-negative real number
  • The sets lie in a 𝜏-algebra ℬ (set of sets for which 𝑇𝑗 ∈ ℬ ⇒ 𝑇𝑗

𝑑 and ∪ 𝑇𝑗 ∈ ℬ)

  • If 𝑇1, 𝑇2, … are disjoint, then 𝜈 ∪ 𝑇𝑗 = ∑𝜈(𝑇𝑗).
  • Examples:
  • Lebesgue measure on ℝ maps each interval of length 𝑀 to 𝑀
  • Probability distribution of r.v. 𝑌 maps each set 𝑇 to the probability that 𝑌 ∈ 𝑇
  • Counting measure w.r.t. points 𝒚1, … , 𝒚𝑜 ∈ ℝ3 maps each 𝑇 ⊂ ℝ3 to #points

in 𝑇

𝜈 = 𝜈 = 2

𝑦2 𝑦1 𝑦2 𝑦1

slide-32
SLIDE 32

Measure defines Integration

∫ 𝑔 𝒚 𝑒𝜈 𝒚 = lim

Δ→0 Δ ෍ 𝑜

𝜈( 𝒚 | 𝑔 𝒚 ≥ 𝑜Δ ) = ෍

𝑗

𝑔 𝒚𝑗

𝑜Δ 𝜈 𝒚 | 𝑔 𝒚 ≥ 𝑜Δ 𝑔(𝑦) 𝑦 Δ

slide-33
SLIDE 33

Integration defines Inner Product. Inner Product defines Norm, Orthogonality.

𝑔, 𝑕 = ∫ 𝑔 𝒚 𝑕 𝒚 𝑒𝜈 𝒚 = ∑𝑗 𝑔 𝒚𝑗 𝑕(𝒚𝑗) 𝑔 2 = 𝑔, 𝑔 = ∑𝑗 𝑔 𝒚𝑗

2

𝑔 ⊥ 𝑕 iff 0 = 𝑔, 𝑕 = ∑𝑗 𝑔 𝒚𝑗 𝑕(𝒚𝑗)

⇒ Measure defines Hilbert Space, and with it all the machinery required for function approximation

slide-34
SLIDE 34

Cardinal B-Splines of Order 𝑞

Scaling functions Integer shifts of scaling functions span space of functions that are

  • Piecewise polynomial of degree

𝑞 − 1 over unit intervals

  • Continuously differentiable up to
  • rder 𝑞 − 1
slide-35
SLIDE 35

B-Spline Basis Functions (𝑞 = 1)

𝑊 𝑊

1

𝑊

2

𝑊

2

𝑊

1

𝑊 𝑊

0 ⊕ 𝑋 0 = 𝑊 1

𝑊

1 ⊕ 𝑋 1 = 𝑊 2

Nested subspaces

slide-36
SLIDE 36

B-Spline Wavelet Basis Functions (𝑞 = 1)

Using Lebesgue Measure Using Counting Measure

𝑊 𝑋 𝑋

1 1 1 2 1 1/2 1 −1 − 2 −1 −1 − 2

2 × 1 3 × 2 3 × 1 2

slide-37
SLIDE 37

Multiresolution Approximation

𝑊 𝑊

1

𝑊

2 Using Lebesgue Measure Using Counting Measure

slide-38
SLIDE 38

B-Spline Approximation (𝑞 = 1)

Level 7 (15604 coeffs) Level 6 (3821 coeffs) Level 5 (917 coeffs) Level 8 (62073 coeffs) Level 9 (237965 coeffs)

slide-39
SLIDE 39

B-Spline Approximation (𝑞 = 2)

Level 7 (30455 coeffs) Level 6 (7213 coeffs) Level 5 (1699 coeffs) Level 8 (125244 coeffs) Level 9 (497199 coeffs)

slide-40
SLIDE 40

Compression Results

Comparison to Zhang, Florencio, and Loop, “Point cloud attribute compression with graph transform,” ICIP 2014

slide-41
SLIDE 41

Surface Light Field Compression using a Point Cloud Codec

Xiang Zhang, Philip A. Chou, Ming-Ting Sun, Maolong Yang, et al., “Surface Light Field Compression using a Point Cloud Codec,” submitted to IEEE JETCAS special issue on immersive video, and to appear at ICIP 2018.

slide-42
SLIDE 42

“Light Field” == Plenoptic Function

  • 7D: 𝑔 𝑦, 𝑧, 𝑨, 𝜄, 𝜚, 𝜇, 𝑢
  • 5D: 𝑔(𝑦, 𝑧, 𝑨, 𝜄, 𝜚)
  • 4D: 𝑔(𝑦, 𝑧, 𝜄, 𝜚)
  • E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing, 1991.
slide-43
SLIDE 43

Image-Based Light Field Representations

  • M. Levoy and P. Hanrahan, “Light field rendering,” SIGGRAPH 1996.
  • S. J. Gortler, R. Grzeszczuk, R. Szeliski, M. Cohen, “The Lumigraph,” SIGGRAPH 1996.

Multiview representation Lenslet representation

slide-44
SLIDE 44

Surface Light Field (SLF)

  • The SLF can be regarded as a function

𝑔 𝒒, 𝝏 , representing the color of surface point 𝒒 = (𝑦, 𝑧, 𝑨) when viewed from direction 𝝏 = (𝜄, 𝜚).

  • Spherical image 𝑔 𝝏 𝒒 , or view map,

for each 𝒒 generalizes lenslet representation.

  • To compress 𝑔 𝒒, 𝝏 efficiently:
  • Represent 𝑔 𝝏 𝒒 for each 𝒒 in

some image basis

  • Compress coefficients across surface

points to reduce spatial redundancy

  • D. N. Wood, et al., “Surface light fields for 3d photography,” SIGGRAPH 2000

W.-C. Chen, et al., “Light field mapping: efficient representation and hardware rendering of surface light fields,” SIGGRAPH 2002

𝒒 𝝏

vie view map

𝑔 𝝏 𝒒

slide-45
SLIDE 45

View Map Representation

Linear combination of basis functions: 𝑔 𝝏 𝒒 = ∑𝑗 𝐻𝑗 𝝏 𝛽𝑗(𝒒)

Basis functions

𝒅 𝑯𝜷 𝑯 𝜷

Coefficients Observations B-spline wavelet basis functions

𝜷 = argmin

𝜷

𝑯𝜷 − 𝒅 2 + 𝜇 𝜷 2 + 𝛾 𝜷 − ഥ 𝜷 2

𝜄 sin 𝜚

slide-46
SLIDE 46

Compress Coefficients of Representation

  • Spatially, using a Point Cloud Codec
  • Coefficients are attributes of the points
  • In this work, we used
  • Octree+RAHT PCC (MPEG PCC TMC1)
  • Video-based PCC (MPEG PCC TMC2)
  • All the SLF coefs. are scaled to the range of [0,255] for 8-bit video codec
slide-47
SLIDE 47

Datasets

Synthetic datasets: Can, Die Natural datasets: Elephant, Fish

  • D. N. Wood, et al., “Surface light fields for 3D photography,” SIGGRAPH 2000
slide-48
SLIDE 48

Die LF reconstruction

N=1, 0.30 MB N=8, 0.62 MB N=32, 1.71 MB N=128, 3.90 MB

slide-49
SLIDE 49

Fish LF reconstruction

N=1, 0.24 MB N=8, 0.53 MB N=32, 1.57 MB N=128, 4.02 MB

slide-50
SLIDE 50

RD Performance

Fish Elephant

slide-51
SLIDE 51

Streaming of Volumetric Media

Jounsup Park, Philip A. Chou, and Jenq-Neng Hwang, “Rate-Utility Optimized Streaming of Volumetric Media for Augmented Reality,” arXiv:1804.09864. Also submitted to IEEE JETCAS special issue on immersive video, and to appear at Globecom 2018.

slide-52
SLIDE 52

Streaming begins: Delivery rate > Media rate

Streaming

QCIF (176x144) streaming video over 56 Kbps in 1997

Hologram streaming today is like video streaming in 1997

slide-53
SLIDE 53

Streaming 360° (Spherical) Video as Tiles

https://bitmovin.com/bitmovin-receives-excellence-dash-award-tile-based-streaming-vr-360-video/

slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56

Capture Stage VPC 3D Tiles Multiple Representations Client Buffer Manager Reconstructed VPC User’s viewport Decoding Rendering Request data chunks Viewport information Encoding Representation Tile Data delivery

Play out New data Buffer

slide-57
SLIDE 57

Object Object Object Object Segment Manifest Segment Segment Represen- tation Represen- tation Represen- tation GOF GOF GOF GOF Tile Tile Tile ⋮ Tile Index ⋮ ⋮ ⋮ ⋮ File

DASH-Like File Layout at Server

slide-58
SLIDE 58

Window-Based Algorithm

𝑋

𝑚𝑓𝑏𝑒(𝑢)

𝑋

𝑢𝑠𝑏𝑗𝑚(𝑢)

𝑢0 𝑢 𝜐0 𝜐 Media Timeline Playout Timeline 𝑢𝑗 𝑢𝑗+1 𝑢𝑗−1

slide-59
SLIDE 59

Utility Maximization

Maximize 𝑉 𝑠

1, … , 𝑠 𝐿 = ∑𝑙=1 𝐿

𝑉𝑙 𝑠

𝑙

subject to 𝐶 𝑠

1, … , 𝑠 𝐿 = ∑𝑙=1 𝐿

𝐶𝑙(𝑠

𝑙) ≤ 𝐶𝑗𝑢𝐷𝑝𝑣𝑜𝑢𝑢

max

𝑠1,…,𝑠𝑙 𝑉 𝑠 1, … , 𝑠 𝐿 − 𝜇𝐶 𝑠 1, … , 𝑠 𝐿

= max

𝑠1,…,𝑠𝑙

∑𝑙=1

𝐿

[𝑉𝑙 𝑠

𝑙 − 𝜇𝐶𝑙 𝑠 𝑙 ] = ∑𝑙=1 𝐿

max

𝑠 [𝑉𝑙 𝑠 − 𝜇𝐶𝑙 𝑠 ]

𝑠

𝑙(𝜇) = argmax 𝑠

𝑉𝑙 𝑠 − 𝜇𝐶𝑙 𝑠

slide-60
SLIDE 60

Upper Convex Hull for Individual Tile at 𝑢𝑗

𝐶(4) 𝑉(0) 𝑉 4 , 𝑉(5) 𝑉(2) Utility 𝒯 መ 𝒯 5 3 4 2 1 𝐶(1) 𝐶(2) 𝐶(3) 𝐶(5) 𝐶(0) 𝑉(1) 𝑉(3) Bits 𝜇45

slide-61
SLIDE 61

Upper Convex Hull for Individual Tile at 𝑢𝑗+1

𝐶(4) 𝑉(0) 𝑉 4 , 𝑉(5) 𝑉(2) Utility 𝒯 መ 𝒯 5 3 4 2 1 𝐶(2) 𝐶(3) 𝐶(5) 𝐶(1) 𝑉(1) 𝑉(3) Bits 𝜇45

slide-62
SLIDE 62

Utility Functions

  • 𝑉𝑙 𝑠

𝑙 = 𝑣 𝑆𝑠𝑙 × 𝑄𝑙 𝑤 × 𝑀𝑃𝐸𝑙(𝑠 𝑙, 𝑤)

  • 𝑣 𝑆 = ቊ𝛽log(𝛾𝑆)

𝑆 > 0 𝑆 = 0

  • 𝑄𝑙 𝑤 = ቊ1 − 𝑄𝑙

𝑓𝑠𝑠(𝑤)

if 𝑙 currently visible from 𝑤 𝑄𝑙

𝑓𝑠𝑠(𝑤)

  • therwise
  • 𝑀𝑃𝐸𝑙 𝑠, 𝑤 = 𝑆𝐵𝐸𝑙 𝑤 ∗ 𝑛𝑗𝑜 𝑊𝑄𝑆𝑙 𝑠, 𝑤 , 𝑄𝑄𝑆𝑙 𝑤

2

  • 𝑆𝐵𝐸𝑙(𝑤) = 𝑝𝑐𝑘𝑓𝑑𝑢.𝑢𝑗𝑚𝑓𝑋𝑗𝑒𝑢ℎ∗𝑝𝑐𝑘𝑓𝑑𝑢.𝑑𝑣𝑐𝑓𝑈𝑝𝑃𝑐𝑘𝑓𝑑𝑢𝑇𝑑𝑏𝑚𝑓

𝑒𝑗𝑡𝑢(𝑤)

  • 𝑊𝑄𝑆𝑙 𝑠, 𝑤 =

𝑝𝑐𝑘𝑓𝑑𝑢.𝑠𝑓𝑞𝑠𝑓𝑡𝑓𝑜𝑢𝑏𝑢𝑗𝑝𝑜 𝑠 .𝑥𝑗𝑒𝑢ℎ∗𝑒𝑗𝑡𝑢(𝑤) 𝑝𝑐𝑘𝑓𝑑𝑢.𝑛𝑏𝑦𝑋𝑗𝑒𝑢ℎ∗𝑝𝑐𝑘𝑓𝑑𝑢.𝑑𝑣𝑐𝑓𝑈𝑝𝑃𝑐𝑘𝑓𝑑𝑢𝑇𝑑𝑏𝑚𝑓

  • 𝑄𝑄𝑆𝑙 𝑤 =

𝑒𝑗𝑡𝑞𝑚𝑏𝑧.ℎ𝑝𝑠𝑨𝑄𝑗𝑦𝑓𝑚𝑡 𝑤𝑗𝑓𝑥[𝑤].𝑔𝑠𝑣𝑡𝑢𝑣𝑛.ℎ𝑝𝑠𝑨𝐺𝑃𝑊

Basic utility, based on bitrate

  • f representation

Probability of user seeing tile, based on user prediction model Level of Detail provided by tile Radians subtended Tile voxels per Radian Display pixels per Radian For user view 𝑤

slide-63
SLIDE 63

Representations

Test Dataset 1 2 3 4 5 Queen 3 5* 15* 30 55* Loot 3.5* 5 8* 16 27* Redandblack 3.5* 6 9* 18 30* Soldier 3.5* 6 11* 20 37.1* Longdress 3.9* 6 13* 27 42.7*

Call for Proposals for Point Cloud Coding V2, ISO/IEC JTC1/SC29 WG11 Doc. N16763, Hobart, 2017

slide-64
SLIDE 64

Stable and Variable Network Conditions

slide-65
SLIDE 65

Network Adaptivity Results (no user interaction) – variable network conditions

(a) TBA-Bitrates (b) BBA-Bitrates (c) WBA (proposed)-Bitrates (a) TBA-Buffer Occupancy (b) BBA-Buffer Occupancy (c) WBA (proposed)-Buffer Occupancy

slide-66
SLIDE 66

Viewpoint Paths

slide-67
SLIDE 67

GoF

slide-68
SLIDE 68

GoF

slide-69
SLIDE 69
slide-70
SLIDE 70

User Adaptivity Results

slide-71
SLIDE 71

Conclusion: Challenges Ahead

slide-72
SLIDE 72

Theses of this talk

Hologram compression today is like video compression in 1988 Hologram streaming today is like video streaming in 1997

slide-73
SLIDE 73

Challenges ahead for holograms

(Hint: If you’ve seen it for video, you’ll see it for holograms.)

  • Capture hardware
  • Playback hardware
  • Compression
  • Streaming on-demand
  • Live broadcast
  • Telecommunication
  • Format wars
  • Industry vs international standards
  • Royalty-free vs fee-based licensing
  • Encryption and DRM
  • Distribution through the Web
  • Distribution to mobile devices
  • Quality measurements
  • Search
  • Analytics
  • Advertisements
  • High-value production vs the long tail of user-

generated content

  • Applications to
  • Entertainment
  • Social networking
  • Communication
  • Commerce
  • Education
  • Healthcare
  • Surveillance
  • Intelligent agents
  • Travel
  • Mapping
  • Etc.
  • Etc.
slide-74
SLIDE 74

Holograms are the Next Video

Philip A. Chou, 8i Labs, Inc. ACM Multimedia Systems Conference 13 June 2018