Image and Video Coding: Video Coding Extensions Screen Content - - PowerPoint PPT Presentation
Image and Video Coding: Video Coding Extensions Screen Content - - PowerPoint PPT Presentation
Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding sensor-captured video content screen content video Screen Content Video Increasingly becoming important for a number of applications (e.g., online
Screen Content Coding
Screen Content Coding
sensor-captured video content screen content video
Screen Content Video Increasingly becoming important for a number of applications (e.g., online meetings) Screen content video sequences have different properties than sensor-captured video sequences Coding efficiency could be improved by dedicated coding tools / coding modes
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 2 / 34
Screen Content Coding / Coding Tools
Transform Skip Mode
DCT-II Quant. Dequant. IDCT-II
Transform Coding Efficiency for Screen Content Less energy compaction as for typical sensor-captured content Strong quantization can result in disturbing artefacts Transform Skip Mode Coding mode for which no transform is carried out (indicated by special flag) Direct quantization of residual samples Can be combined with dedicated entropy coding for quantization indexes
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 3 / 34
Screen Content Coding / Coding Tools
Block Differential Pulse Code Modulation (BDPCM)
quantization prediction
vertical BDPCM ˆ q[x, y] = q[x, y −1] horizontal BDPCM ˆ q[x, y] = q[x −1, y] no BDPCM ˆ q[x, y] = 0
Exploit Dependencies in Transform Skip Mode Quantization indexes are not directly transmitted by entropy coding Two additional modes for prediction of quantization indexes (inside block):
Horizontal prediction (first column is not predicted) Vertical prediction (first row is not predicted)
Entropy coding of prediction errors ∆q = q[x, y] − ˆ q[x, y]
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 4 / 34
Screen Content Coding / Coding Tools
Intra Block Copy
curr curr curr curr invalid 64×64 region valid 64×64 region restrictions in VVC
“Motion-compensated” prediction inside a picture with integer-sample accurate motion vectors To reduce memory access complexity, VVC includes restrictions of permitted motion vectors
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 5 / 34
Screen Content Coding / Coding Tools
Palette Mode
1 2 3 4 x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 escape
G|Y B|Cb R|Cr
palette 0 0 0 0 0 0 0 0 0 3 3 3 4 2 2 2 0 3 3 1 1 2 2 2 0 3 3 1 1 2 2 2 0 0 0 1 1 2 2 2 0 1 1 1 1 1 1 4 0 1 0 1 1 1 1 2 0 0 0 0 0 0 0 0 1 1 1 1 1 1
run = 5
index = 1, run = 5 0 3 3 1 1 2 2 2 0 3 3 1 1 2 2 2
run = 7
copy above, run = 7
Alternative Coding Mode: Palette Mode Quantized color vectors are represented by palette indexes
Palette for current block is predictively coded referring to preceding palettes Palette can include an escape symbol for representing less likely values
Palette indexes are coded using horizontal or vertical scanning, using two coding modes
1 Index mode:
Transmit palette index and run length (≥ 0)
2 Copy mode:
Index is copied from top (hor. scan) or left (ver. scan), transmit run length (≥ 0)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 6 / 34
Screen Content Coding / Coding Efficiency
Coding Efficiency Example: ”Desktop” (1920 x 1080)
0.5 1 1.5 2 2.5 3 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
VVC without screen content tools VVC with additional screen content tools bit rate [Mbits/s] PSNR [dB]
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 7 / 34
Screen Content Coding / Coding Efficiency
Subjective Comparison: “Desktop” (Crop of Top-Left Region)
VVC without SCC tools @ 1 Mbit/s VVC with SCC tools @ 1 Mbit/s
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 8 / 34
Screen Content Coding / Coding Efficiency
Coding Efficiency Impact of Screen Content Coding Tools (Example: VVC)
average bit-rate savings intra only random access low delay ChineseEditing 38 % 36 % 32 % Console 66 % 52 % 48 % Desktop 67 % 61 % 57 % FlyingGraphics 41 % 18 % 14 % SlideEditing 47 % 44 % 36 % SlideShow 20 % 16 % 10 % average 46 % 38 % 33 %
Average Bit Rate Savings Bit-rate savings based on PSNR as quality measure Averages over reasonable quality range Screen content tools provide large gains for many sequences
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 9 / 34
Scalable Video Coding / Types of Scalability
Scalable Video Coding
- riginal: 1080p, 60Hz
video encoder video decoder
1080p, 60Hz, 10 MBits/s bitstream
video decoder
1080p, 60Hz, 5 MBits/s
video decoder
720p, 30Hz, 1.5 MBits/s
Scalable Bitstream Includes multiple coded versions
- f a video sequence
Representations must be extractable by simple discarding of packets Decoder or middlebox can extract representation suitable for application requirements
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 10 / 34
Scalable Video Coding / Types of Scalability
Types of Scalability
Temporal Scalability Scalable bitstream contains representations with different frame rates Spatial Scalability Scalable bitstream contains representations with different spatial resolutions Quality Scalability Scalable bitstream contains representations with different bit rates (but same resolution) Combined Scalability Combination of two or more of the above types
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 11 / 34
Scalable Video Coding / Temporal Scalability
Temporal Scalability
I B 3 B 2 B 4 6 B 1 base layer B 4 B 5 B 7 B 8 additional enhancement layer pictures
Coding Structures for Temporal Scalability Requirement: Enhancement layer picture are not used for prediction of base layer pictures Hierarchical B picture are well suited and provide very high coding efficiency Very small loss in coding efficiency relative to best possible single layer coding
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 12 / 34
Scalable Video Coding / Quality and Spatial Scalability
Quality / SNR Scalability
I B 1 B 9 B 7 B 11 B 5 B 15 B 13 B 17 B 3 enhancement layer I B 8 B 6 B 10 B 4 B 14 B 12 B 16 B 2 base layer
Inter-Layer Prediction Add co-located base layer picture to reference list of enhancement layer picture Base layer data are exploited by sample prediction and motion prediction Improves coding efficiency relative to independent coding of both layers (simulcast)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 13 / 34
Scalable Video Coding / Quality and Spatial Scalability
Spatial Scalability
I B 1 B 9 B 7 B 11 B 5 B 15 B 13 B 17 B 3 enhancement layer
upsampler upsampler upsampler upsampler upsampler upsampler upsampler upsampler upsampler
I B 8 B 6 B 10 B 4 B 14 B 12 B 16 B 2 base layer
Inter-Layer Prediction with Upsampling Add upsampled co-located base layer picture to reference list of enhancement layer picture Use information coded in base layer for improving coding efficiency relative to simulcast
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 14 / 34
Scalable Video Coding / Quality and Spatial Scalability
Multi-Layer and Combined Scalability
B 2 B 13 B 11 B 15 B 8 B 20 B 18 B 22 B 5
layer 2
B 1 B 12 B 10 B 14 B 7 B 19 B 17 B 21 B 4
layer 1
I B 9 B 6 B 16 B 3
layer 0
Multiple quality and/or spatial enhancement layers are possible
Coding efficiency for top layer decreases with number of supported layers Decoding complexity for top layer increases with number of supported layers
Temporal scalability can be straightforwardly combined with quality/spatial scalability
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 15 / 34
Multiview and 3D Video Coding / Stereo and Multiview Coding
3D Cinema / Home Cinema: Stereo Video
display positive parallax
Why Glasses ? Need to project different image to each eye Glasses control over what each eye sees Need to transit video with two images per time instance
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 16 / 34
Multiview and 3D Video Coding / Stereo and Multiview Coding
Stereo Video Example
Similarities between left and right picture for same time instance Can be exploited by technique similar to motion-compensated prediction
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 17 / 34
Multiview and 3D Video Coding / Stereo and Multiview Coding
Multi-view Coding with Disparity-Compensated Prediction
5 4 6 2 I B B B B
left view (primary)
1 6 5 7 3 B B B B B
right view (secondary)
Multiview Coding with Disparity-Compensated Prediction Add reconstructed picture of primary view to reference lists for secondary view (same time instance) Only change required is construction of reference picture lists Straightforward extension to more than 2 views
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 18 / 34
Multiview and 3D Video Coding / 3D Coding
Autostereoscopic Displays
[ J. Geng, Three-dimensional display technologies, 2013 ]
Need to provide very large number of views (> 50) Problem for video acquisition and transmission
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 19 / 34
Multiview and 3D Video Coding / 3D Coding
Disparity and Object Distance
- bject distance z
baseline b focal length f disparity d disparity d = f · b z
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 20 / 34
Multiview and 3D Video Coding / 3D Coding
Depth Maps and Rendering of Virtual View
Depth Image Based Rendering Depth maps provide information about object distance for each sample in a picture Virtual views can be generated at receiver side by depth image based rendering using
One or multiple views (preferably multiple view due to occlusions) Associated depth maps
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 21 / 34
Multiview and 3D Video Coding / 3D Coding
3D Video Coding: Transmission of Multiple Views with Depth Maps
Conventional multiview coding (with disparity-compensated prediction) for textures and depth maps Potential improvements:
Dedicated coding tools for depth map coding (characterized by sharp edges, low details) Exploitation of texture data for depth coding (or vice versa)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 22 / 34
Virtual Reality: 360 Degree Video
Virtual Reality (VR) / 360◦ Video
conventional video coding
panorama stitching frame packing video encoder video decoder viewport rendering
head movement
- mnidirectional
capture
Virtual Reality: Coding of 360◦ Video Panorama stitching: Combine multiple videos into single 360◦ panoramic video Frame packing: Project 3D representation into conventional video frames Require suitable projection formats Video coding: Conventional coding of 2D video frames (need very large resolution !) Coding efficiency depends on chosen projection format Viewport rendering: Rendering of viewport (e.g., 75◦ viewing angle) given projection format Considering head movement in real-time
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 23 / 34
Virtual Reality: 360 Degree Video / Image Stitching
Panorama Stiching
[ source: Wikipedia ]
Transform images into common coordinate system (compositing surface, projection format) Seemless blending of overlapping parts Stitching issues: Parallax, lens distortion, motion in scene, camera calibration, exposure, ...
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 24 / 34
Virtual Reality: 360 Degree Video / Image Stitching
Example: Possible Stitching Artifacts
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 25 / 34
Virtual Reality: 360 Degree Video / Projection Formats
Projection onto 2D Video Frames
Representation of 360◦ Video Video samples for dense set of
latitude angles θ ∈ [−π; π] and longitude angles φ ∈ [−π/2; π/2]
Represent as 2D arrays of samples Projection Formats Use a virtual object in 3D space (e.g., sphere, cube) Project captured video samples on surface of object Pack surface samples into 2D array (video frame) Impact of Projection Formats Chosen format impacts quality of viewport rendering Chosen format impacts efficiency of video coding P
θ φ
Y X Z X = cos(θ) · cos(φ) Y = sin(θ) Z = − cos(θ) · sin(φ)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 26 / 34
Virtual Reality: 360 Degree Video / Projection Formats
360◦ Projection Formats: Equirectangular Projection (ERP)
Project surface of sphere into a rectangular picture: x = αφ and y = αθ (α specifies resolution) Non-uniform sampling, strong geometric distortions (in particular at the poles) Camera and object motion difficult to represent in coded video
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 27 / 34
Virtual Reality: 360 Degree Video / Projection Formats
360◦ Projection Formats: Segmented Sphere Projection (SSP)
top bottom
Latitude angles θ ∈ [−π/2; π/2] are projected as in equirectangular projection Top and bottom parts of sphere surface are represented as additional circles Reduced geometric distortion for pole regions, still similar problems as in equirectangular projection
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 28 / 34
Virtual Reality: 360 Degree Video / Projection Formats
360◦ Projection Formats: Octahedron Projection (OHP)
8 triangular faces of regular octahedron are arranged in rectangular picture Small geometric distortion inside faces Complicated motion across face boundaries Diagonal borders at top and bottom are unsuitable for video coding
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 29 / 34
Virtual Reality: 360 Degree Video / Projection Formats
360◦ Projection Formats: Truncated Square Pyradmid Projection (TSP)
Project 6 faces of truncated square pyramid into rectangular picture Front face has same resolution as combination of remaining 5 faces No borders in composed picture, typical geometric motion artifacts at face boundaries
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 30 / 34
Virtual Reality: 360 Degree Video / Projection Formats
360◦ Projection Formats: Cubemap Projection (CMP)
left front right bottom back top
Project 6 faces of cube into rectangular picture (two connected regions of 3 faces) Multiple versions that differ in projection geometry (slightly distorted faces) Small geometric distortion inside faces, typical geometric motion artifacts at face boundaries
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 31 / 34
Virtual Reality: 360 Degree Video / Viewport Rendering
Attenuation of Seem Artifacts in Viewport Rendered from Decoded Video
guard bands for equirectangular projection left front right bottom back top guard bands for cubemap projection
Modified Projection Formats: Guard Bands Extend faces of 3D body (regions at boundaries are included multiple times) Reduction of coding artifacts at seem boundaries Additional samples for interpolation filters and seem blending
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 32 / 34
Virtual Reality: 360 Degree Video / Viewport Rendering
Dynamic Viewport Rendering
Y X Z left front right bottom back top
Rendering of Viewport
Map viewport coordinates to world coordinates XYZ Rotate XYZ according to head movement Map XYZ to coordinates of projection format Generate viewport sample by interpolation and blending
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 33 / 34
Summary
Summary of Lecture
Screen Content Coding Dedicated coding tools: Transform skip, BDPCM, intra block copy, palette mode Improved coding efficiency for typical screen content pictures and videos Scalable Video Coding Hierarchical B pictures suitable for providing temporal scalability Quality and spatial scalability: Layered coding with inter-layer prediction Multiview and 3D Video Coding Multiview coding with inter-view prediction (similar to quality scalability) 3D video: Multiview coding of texture views and associated depth maps Virtual Reality: 360◦ Video Coding Projection of 360◦ video into conventional video pictures Conventional coding of resulting video pictures: Efficiency depends on projection format
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Video Coding Extensions 34 / 34