[PPT] - Holograms are the Next Video Philip A. Chou, 8i Labs, Inc. ACM PowerPoint Presentation

SLIDE 1

Holograms are the Next Video

Philip A. Chou, 8i Labs, Inc. ACM Multimedia Systems Conference 13 June 2018

SLIDE 2

Star Wars Episode IV, 1977 Princess Leia

SLIDE 3

Star Trek Next Generation, Episode 12, 1988 The Holodeck

SLIDE 4

Black Panther, 2018

SLIDE 5

Solo, 2018 “No, you can’t wipe `em off. They’re holograms.” – Tobias Becket to Chewbacca.

SLIDE 6

SLIDE 7

Gabor Holograms

Dennis Gabor,

“A new microscopic principle,” Nature, 1948.

Etymology: holo + gram,

from Ancient Greek (hólos, whole) + (grammḗ, letter, line, writing, message)

Decode Encode

https://en.wikipedia.org/wiki/Holography

SLIDE 8

Gabor Holograms as (Angular Spectra of) Images from Multiple Viewpoints

𝑤 𝑣 𝜄 𝜚 𝑤 𝑣 𝜄 𝜚

SLIDE 9

Images from Multiple Viewpoints as Light Fields

𝑤 𝑣 𝑡 𝑢

SLIDE 10

Light Fields as Point Clouds

SLIDE 11

Agenda

Introduction
Holograms == Volumetric Media (Gabor Holograms, Light Fields, Point Clouds, …)
Applications
Historical remarks
Point Cloud Compression (PCC)
Light Field Compression using PCC
Streaming Holograms
Conclusion

SLIDE 12

Applications

SLIDE 13

Holograms: The Medium to Represent Natural Content in VR / AR / MR

VR puts you in a Virtual World AR puts virtual objects in your world

SLIDE 14

Audio: Three Modes of Distribution

On-Demand Live Broadcast Telecommunication

SLIDE 15

Video: Three Modes of Distribution

On-Demand Live Broadcast Telecommunication

SLIDE 16

Holograms: Three Modes of Distribution

On-Demand Live Broadcast Telecommunication

Buzz Aldrin: Cycling Pathways to Mars

SLIDE 17

Historical Remarks

SLIDE 18

180+ years since invention of images (as photographs)

Daguerreotype, 1838

SLIDE 19

140+ Years since the invention of audio (as telephony)

Telephone ca. 1875

SLIDE 20

90+ Years since the invention of video (as television)

Television, 1926

SLIDE 21

Dawn of Digital Video

Arun Netravali, Head Visual Communications Research Dept. IEEE Transactions on Computers, 1974

SLIDE 22

JPEG (January 1988)

Today: > 1 Trillion photos/year Photo from JPEG (Macau, October 2017): Celebration of 25th Anniversary

f JPEG Standard (1992)

SLIDE 23

MPEG PCC (Macau, October 2017)

Hologram compression today is like video compression in 1988

SLIDE 24

Subjective Results: Category 2 (Dynamic)

3.9 6.0 13 27 Mbps 3.5 6.0 11 20 Mbps 3.5 6.0 9 18 Mbps

SLIDE 25

1. Static (single-frame) 3. Dynamic Acquisition (e.g., from Mobile Mapping Systems)

MPEG Point Cloud Compression (PCC)

2. Dynamic (multi-frame)

SLIDE 26

“Video-based” approach

Patch Information
Dominant axis
(x,y,z) offset
(u,v) offset
Dimensions
Occupancy map
Geometry video
Texture video

SLIDE 27

“Native 3D” approach to coding geometry

10010001 10010001 11001001 10010001

SLIDE 28

“Native 3D” approach to coding attributes

221,136,255 255,153,255 255,102,255 153,153,255

e.g., Y, U, V

SLIDE 29

Point Cloud Attribute Compression using a Region Adaptive Hierarchical Transform (RAHT)

Ricardo L. de Queiroz and Philip A. Chou, “Compression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform,” IEEE Trans. Image Processing, Aug 2016. Maja Krivokuca, Maxim Koroteev, Philip A. Chou, Robert Higgs, and Charles Loop, “A Volumetric Approach to Point Cloud Compression,” in preparation.

SLIDE 30

Three Generations of Transforms for Point Cloud Attribute Compression

1. Graph Signal Processing (Graph Fourier Transform – GFT)
2. Sampled Spatial Stochastic Process (Gaussian Process Transform – GPT)
3. Volumetric Functions (Region Adaptive Hierarchical Transform – RAHT)

SLIDE 31

Measure

Measure 𝜈: 𝑇 ↦ ℝ+ maps each set to a non-negative real number
The sets lie in a 𝜏-algebra ℬ (set of sets for which 𝑇𝑗 ∈ ℬ ⇒ 𝑇𝑗

𝑑 and ∪ 𝑇𝑗 ∈ ℬ)

If 𝑇1, 𝑇2, … are disjoint, then 𝜈 ∪ 𝑇𝑗 = ∑𝜈(𝑇𝑗).
Examples:
Lebesgue measure on ℝ maps each interval of length 𝑀 to 𝑀
Probability distribution of r.v. 𝑌 maps each set 𝑇 to the probability that 𝑌 ∈ 𝑇
Counting measure w.r.t. points 𝒚1, … , 𝒚𝑜 ∈ ℝ3 maps each 𝑇 ⊂ ℝ3 to #points

in 𝑇

𝜈 = 𝜈 = 2

𝑦2 𝑦1 𝑦2 𝑦1

SLIDE 32

Measure defines Integration

∫ 𝑔 𝒚 𝑒𝜈 𝒚 = lim

Δ→0 Δ ෍ 𝑜

𝜈( 𝒚 | 𝑔 𝒚 ≥ 𝑜Δ ) = ෍

𝑗

𝑔 𝒚𝑗

𝑜Δ 𝜈 𝒚 | 𝑔 𝒚 ≥ 𝑜Δ 𝑔(𝑦) 𝑦 Δ

SLIDE 33

Integration defines Inner Product. Inner Product defines Norm, Orthogonality.

𝑔, 𝑕 = ∫ 𝑔 𝒚 𝑕 𝒚 𝑒𝜈 𝒚 = ∑𝑗 𝑔 𝒚𝑗 𝑕(𝒚𝑗) 𝑔 2 = 𝑔, 𝑔 = ∑𝑗 𝑔 𝒚𝑗

2

𝑔 ⊥ 𝑕 iff 0 = 𝑔, 𝑕 = ∑𝑗 𝑔 𝒚𝑗 𝑕(𝒚𝑗)

⇒ Measure defines Hilbert Space, and with it all the machinery required for function approximation

SLIDE 34

Cardinal B-Splines of Order 𝑞

Scaling functions Integer shifts of scaling functions span space of functions that are

Piecewise polynomial of degree

𝑞 − 1 over unit intervals

Continuously differentiable up to
rder 𝑞 − 1

SLIDE 35

B-Spline Basis Functions (𝑞 = 1)

𝑊 𝑊

1

𝑊

2

𝑊

2

𝑊

1

𝑊 𝑊

0 ⊕ 𝑋 0 = 𝑊 1

𝑊

1 ⊕ 𝑋 1 = 𝑊 2

Nested subspaces

SLIDE 36

B-Spline Wavelet Basis Functions (𝑞 = 1)

Using Lebesgue Measure Using Counting Measure

𝑊 𝑋 𝑋

1 1 1 2 1 1/2 1 −1 − 2 −1 −1 − 2

2 × 1 3 × 2 3 × 1 2

SLIDE 37

Multiresolution Approximation

𝑊 𝑊

1

𝑊

2 Using Lebesgue Measure Using Counting Measure

SLIDE 38

B-Spline Approximation (𝑞 = 1)

Level 7 (15604 coeffs) Level 6 (3821 coeffs) Level 5 (917 coeffs) Level 8 (62073 coeffs) Level 9 (237965 coeffs)

SLIDE 39

B-Spline Approximation (𝑞 = 2)

Level 7 (30455 coeffs) Level 6 (7213 coeffs) Level 5 (1699 coeffs) Level 8 (125244 coeffs) Level 9 (497199 coeffs)

SLIDE 40

Compression Results

Comparison to Zhang, Florencio, and Loop, “Point cloud attribute compression with graph transform,” ICIP 2014

SLIDE 41

Surface Light Field Compression using a Point Cloud Codec

Xiang Zhang, Philip A. Chou, Ming-Ting Sun, Maolong Yang, et al., “Surface Light Field Compression using a Point Cloud Codec,” submitted to IEEE JETCAS special issue on immersive video, and to appear at ICIP 2018.

SLIDE 42

“Light Field” == Plenoptic Function

7D: 𝑔 𝑦, 𝑧, 𝑨, 𝜄, 𝜚, 𝜇, 𝑢
5D: 𝑔(𝑦, 𝑧, 𝑨, 𝜄, 𝜚)
4D: 𝑔(𝑦, 𝑧, 𝜄, 𝜚)
E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing, 1991.

SLIDE 43

Image-Based Light Field Representations

M. Levoy and P. Hanrahan, “Light field rendering,” SIGGRAPH 1996.
S. J. Gortler, R. Grzeszczuk, R. Szeliski, M. Cohen, “The Lumigraph,” SIGGRAPH 1996.

Multiview representation Lenslet representation

SLIDE 44

Surface Light Field (SLF)

The SLF can be regarded as a function

𝑔 𝒒, 𝝏 , representing the color of surface point 𝒒 = (𝑦, 𝑧, 𝑨) when viewed from direction 𝝏 = (𝜄, 𝜚).

Spherical image 𝑔 𝝏 𝒒 , or view map,

for each 𝒒 generalizes lenslet representation.

To compress 𝑔 𝒒, 𝝏 efficiently:
Represent 𝑔 𝝏 𝒒 for each 𝒒 in

some image basis

Compress coefficients across surface

points to reduce spatial redundancy

D. N. Wood, et al., “Surface light fields for 3d photography,” SIGGRAPH 2000

W.-C. Chen, et al., “Light field mapping: efficient representation and hardware rendering of surface light fields,” SIGGRAPH 2002

𝒒 𝝏

vie view map

𝑔 𝝏 𝒒

SLIDE 45

View Map Representation

Linear combination of basis functions: 𝑔 𝝏 𝒒 = ∑𝑗 𝐻𝑗 𝝏 𝛽𝑗(𝒒)

Basis functions

𝒅 𝑯𝜷 𝑯 𝜷

Coefficients Observations B-spline wavelet basis functions

𝜷 = argmin

𝜷

𝑯𝜷 − 𝒅 2 + 𝜇 𝜷 2 + 𝛾 𝜷 − ഥ 𝜷 2

𝜄 sin 𝜚

SLIDE 46

Compress Coefficients of Representation

Spatially, using a Point Cloud Codec
Coefficients are attributes of the points
In this work, we used
Octree+RAHT PCC (MPEG PCC TMC1)
Video-based PCC (MPEG PCC TMC2)
All the SLF coefs. are scaled to the range of [0,255] for 8-bit video codec

SLIDE 47

Datasets

Synthetic datasets: Can, Die Natural datasets: Elephant, Fish

D. N. Wood, et al., “Surface light fields for 3D photography,” SIGGRAPH 2000

SLIDE 48

Die LF reconstruction

N=1, 0.30 MB N=8, 0.62 MB N=32, 1.71 MB N=128, 3.90 MB

SLIDE 49

Fish LF reconstruction

N=1, 0.24 MB N=8, 0.53 MB N=32, 1.57 MB N=128, 4.02 MB

SLIDE 50

RD Performance

Fish Elephant

SLIDE 51

Streaming of Volumetric Media

Jounsup Park, Philip A. Chou, and Jenq-Neng Hwang, “Rate-Utility Optimized Streaming of Volumetric Media for Augmented Reality,” arXiv:1804.09864. Also submitted to IEEE JETCAS special issue on immersive video, and to appear at Globecom 2018.

SLIDE 52

Streaming begins: Delivery rate > Media rate

Streaming

QCIF (176x144) streaming video over 56 Kbps in 1997

Hologram streaming today is like video streaming in 1997

SLIDE 53

Streaming 360° (Spherical) Video as Tiles

https://bitmovin.com/bitmovin-receives-excellence-dash-award-tile-based-streaming-vr-360-video/

SLIDE 54

SLIDE 55

SLIDE 56

Capture Stage VPC 3D Tiles Multiple Representations Client Buffer Manager Reconstructed VPC User’s viewport Decoding Rendering Request data chunks Viewport information Encoding Representation Tile Data delivery

Play out New data Buffer

SLIDE 57

Object Object Object Object Segment Manifest Segment Segment Represen- tation Represen- tation Represen- tation GOF GOF GOF GOF Tile Tile Tile ⋮ Tile Index ⋮ ⋮ ⋮ ⋮ File

DASH-Like File Layout at Server

SLIDE 58

Window-Based Algorithm

𝑋

𝑚𝑓𝑏𝑒(𝑢)

𝑋

𝑢𝑠𝑏𝑗𝑚(𝑢)

𝑢0 𝑢 𝜐0 𝜐 Media Timeline Playout Timeline 𝑢𝑗 𝑢𝑗+1 𝑢𝑗−1

SLIDE 59

Utility Maximization

Maximize 𝑉 𝑠

1, … , 𝑠 𝐿 = ∑𝑙=1 𝐿

𝑉𝑙 𝑠

𝑙

subject to 𝐶 𝑠

1, … , 𝑠 𝐿 = ∑𝑙=1 𝐿

𝐶𝑙(𝑠

𝑙) ≤ 𝐶𝑗𝑢𝐷𝑝𝑣𝑜𝑢𝑢

max

𝑠1,…,𝑠𝑙 𝑉 𝑠 1, … , 𝑠 𝐿 − 𝜇𝐶 𝑠 1, … , 𝑠 𝐿

= max

𝑠1,…,𝑠𝑙

∑𝑙=1

𝐿

[𝑉𝑙 𝑠

𝑙 − 𝜇𝐶𝑙 𝑠 𝑙 ] = ∑𝑙=1 𝐿

max

𝑠 [𝑉𝑙 𝑠 − 𝜇𝐶𝑙 𝑠 ]

𝑠

𝑙(𝜇) = argmax 𝑠

𝑉𝑙 𝑠 − 𝜇𝐶𝑙 𝑠

SLIDE 60

Upper Convex Hull for Individual Tile at 𝑢𝑗

𝐶(4) 𝑉(0) 𝑉 4 , 𝑉(5) 𝑉(2) Utility 𝒯 መ 𝒯 5 3 4 2 1 𝐶(1) 𝐶(2) 𝐶(3) 𝐶(5) 𝐶(0) 𝑉(1) 𝑉(3) Bits 𝜇45

SLIDE 61

Upper Convex Hull for Individual Tile at 𝑢𝑗+1

𝐶(4) 𝑉(0) 𝑉 4 , 𝑉(5) 𝑉(2) Utility 𝒯 መ 𝒯 5 3 4 2 1 𝐶(2) 𝐶(3) 𝐶(5) 𝐶(1) 𝑉(1) 𝑉(3) Bits 𝜇45

SLIDE 62

Utility Functions

𝑉𝑙 𝑠

𝑙 = 𝑣 𝑆𝑠𝑙 × 𝑄𝑙 𝑤 × 𝑀𝑃𝐸𝑙(𝑠 𝑙, 𝑤)

𝑣 𝑆 = ቊ𝛽log(𝛾𝑆)

𝑆 > 0 𝑆 = 0

𝑄𝑙 𝑤 = ቊ1 − 𝑄𝑙

𝑓𝑠𝑠(𝑤)

if 𝑙 currently visible from 𝑤 𝑄𝑙

𝑓𝑠𝑠(𝑤)

therwise
𝑀𝑃𝐸𝑙 𝑠, 𝑤 = 𝑆𝐵𝐸𝑙 𝑤 ∗ 𝑛𝑗𝑜 𝑊𝑄𝑆𝑙 𝑠, 𝑤 , 𝑄𝑄𝑆𝑙 𝑤

2

𝑆𝐵𝐸𝑙(𝑤) = 𝑝𝑐𝑘𝑓𝑑𝑢.𝑢𝑗𝑚𝑓𝑋𝑗𝑒𝑢ℎ∗𝑝𝑐𝑘𝑓𝑑𝑢.𝑑𝑣𝑐𝑓𝑈𝑝𝑃𝑐𝑘𝑓𝑑𝑢𝑇𝑑𝑏𝑚𝑓

𝑒𝑗𝑡𝑢(𝑤)

𝑊𝑄𝑆𝑙 𝑠, 𝑤 =

𝑝𝑐𝑘𝑓𝑑𝑢.𝑠𝑓𝑞𝑠𝑓𝑡𝑓𝑜𝑢𝑏𝑢𝑗𝑝𝑜 𝑠 .𝑥𝑗𝑒𝑢ℎ∗𝑒𝑗𝑡𝑢(𝑤) 𝑝𝑐𝑘𝑓𝑑𝑢.𝑛𝑏𝑦𝑋𝑗𝑒𝑢ℎ∗𝑝𝑐𝑘𝑓𝑑𝑢.𝑑𝑣𝑐𝑓𝑈𝑝𝑃𝑐𝑘𝑓𝑑𝑢𝑇𝑑𝑏𝑚𝑓

𝑄𝑄𝑆𝑙 𝑤 =

𝑒𝑗𝑡𝑞𝑚𝑏𝑧.ℎ𝑝𝑠𝑨𝑄𝑗𝑦𝑓𝑚𝑡 𝑤𝑗𝑓𝑥[𝑤].𝑔𝑠𝑣𝑡𝑢𝑣𝑛.ℎ𝑝𝑠𝑨𝐺𝑃𝑊

Basic utility, based on bitrate

f representation

Probability of user seeing tile, based on user prediction model Level of Detail provided by tile Radians subtended Tile voxels per Radian Display pixels per Radian For user view 𝑤

SLIDE 63

Representations

Test Dataset 1 2 3 4 5 Queen 3 5* 15* 30 55* Loot 3.5* 5 8* 16 27* Redandblack 3.5* 6 9* 18 30* Soldier 3.5* 6 11* 20 37.1* Longdress 3.9* 6 13* 27 42.7*

Call for Proposals for Point Cloud Coding V2, ISO/IEC JTC1/SC29 WG11 Doc. N16763, Hobart, 2017

SLIDE 64

Stable and Variable Network Conditions

SLIDE 65

Network Adaptivity Results (no user interaction) – variable network conditions

(a) TBA-Bitrates (b) BBA-Bitrates (c) WBA (proposed)-Bitrates (a) TBA-Buffer Occupancy (b) BBA-Buffer Occupancy (c) WBA (proposed)-Buffer Occupancy

SLIDE 66

Viewpoint Paths

SLIDE 67

GoF

SLIDE 68

GoF

SLIDE 69

SLIDE 70

User Adaptivity Results

SLIDE 71

Conclusion: Challenges Ahead

SLIDE 72

Theses of this talk

Hologram compression today is like video compression in 1988 Hologram streaming today is like video streaming in 1997

SLIDE 73

Challenges ahead for holograms

(Hint: If you’ve seen it for video, you’ll see it for holograms.)

Capture hardware
Playback hardware
Compression
Streaming on-demand
Live broadcast
Telecommunication
Format wars
Industry vs international standards
Royalty-free vs fee-based licensing
Encryption and DRM
Distribution through the Web
Distribution to mobile devices
Quality measurements
Search
Analytics
Advertisements
High-value production vs the long tail of user-

generated content

Applications to
Entertainment
Social networking
Communication
Commerce
Education
Healthcare
Surveillance
Intelligent agents
Travel
Mapping
Etc.
Etc.

SLIDE 74

Holograms are the Next Video

Philip A. Chou, 8i Labs, Inc. ACM Multimedia Systems Conference 13 June 2018