3D VIDEO SYSTEMS 3D VIDEO SYSTEMS Fernando Pereira Instituto - - PowerPoint PPT Presentation

3d video systems 3d video systems
SMART_READER_LITE
LIVE PREVIEW

3D VIDEO SYSTEMS 3D VIDEO SYSTEMS Fernando Pereira Instituto - - PowerPoint PPT Presentation

3D VIDEO SYSTEMS 3D VIDEO SYSTEMS Fernando Pereira Instituto Superior Tcnico Comunicao de udio e Vdeo, Fernando Pereira, 2012 Its a 3D World Its a 3D World, Stupid Its a 3D World Its a 3D World, Stupid , Stupid ! ,


slide-1
SLIDE 1

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D VIDEO SYSTEMS 3D VIDEO SYSTEMS

Fernando Pereira Instituto Superior Técnico

slide-2
SLIDE 2

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

It’s a 3D World It’s a 3D World, Stupid , Stupid ! It’s a 3D World It’s a 3D World, Stupid , Stupid !

slide-3
SLIDE 3

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Context and Motivation Context and Motivation Context and Motivation Context and Motivation

  • Strong interest in 3D services
  • Production of premium content increasing
  • Numerous devices supporting stereoscopic display

available to the consumer

  • Substantial investments being made to upgrade digital

cinema theaters with 3D capabilities

  • Many new standards being developed, e.g. production,

distribution, digital interfaces

slide-4
SLIDE 4

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Stereoscopic Displays Sales Forecast Stereoscopic Displays Sales Forecast Stereoscopic Displays Sales Forecast Stereoscopic Displays Sales Forecast

Source: DisplaySearch 3D Display Technology and Market Forecast Report

slide-5
SLIDE 5

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Critical Success Factors Critical Success Factors Critical Success Factors Critical Success Factors

  • High quality experience not burdened with high

transition costs or turned off by viewing discomfort or fatigue

  • Usability and consumer acceptance of 3D viewing

technology, e.g., glasses vs no glasses, Transition costs

  • Availability of premium 3D content in the home
  • Determination of an appropriate data format

providing interoperability through the delivery chain and taking into consideration the constraints imposed by each delivery channel

slide-6
SLIDE 6

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Basics

slide-7
SLIDE 7

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Experiences … and 3D Video … 3D Experiences … and 3D Video … 3D Experiences … and 3D Video … 3D Experiences … and 3D Video …

  • 3D experiences may be provided with 3D video in two main ways:
  • Depth perception/illusion – Provided through stereo video pairs which create an

illusion of depth for the scene

  • Navigation – Provided through free viewpoint video (FVV) with n video views

which allow navigating the 3D scene by changing the viewpoint and view direction within certain ranges (each view may be stereo)

  • 3D video is considered to refer to both the general n views multi-view video

representation AND its important stereo-view special case.

slide-8
SLIDE 8

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Human Human Visual Visual System System Human Human Visual Visual System System

slide-9
SLIDE 9

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Screen Parallax … Screen Parallax … Screen Parallax … Screen Parallax …

Screen Parallax = distance between Pleft and Pright

slide-10
SLIDE 10

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Stereoscopic Vision Stereoscopic Vision Stereoscopic Vision Stereoscopic Vision

  • Accommodation/convergence
  • The viewer eyes accommodate

(focus) to the depth of the display

  • and converge (point) to the

depth of the image

  • Negative and positive disparities

are not a natural situation as

  • Normally, both accommodation

and convergence occur at the same depth

  • Too large a disparity causes

eyestrain (especially in older viewers)

  • Negative

Disparity Positive Disparity

slide-11
SLIDE 11

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Stereoscopy Stereoscopy Stereoscopy Stereoscopy

Stereoscopy (also called stereoscopic or 3D imaging) regards the capability of recreating 3D visual information or creating the illusion of depth in an image based on two appropriate views. The basic requirement is to recreate offset images (with parallax) that are presented separately to the left and right eye. Both of these 2D offset images are then combined in the brain to give the perception of 3D depth.

slide-12
SLIDE 12

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Stereoscopy: Better 3D Illusions … Stereoscopy: Better 3D Illusions … Stereoscopy: Better 3D Illusions … Stereoscopy: Better 3D Illusions …

  • Most of the perceptual cues that humans use to visualize the world’s 3D structure are

available in 2D projections; this is why images on a television screen and at the cinema make sense. Perceptual cues for 3D perception include:

  • Occlusion - one object partially covering another
  • Perspective - point of view
  • Familiar size - we know the real-world sizes of many objects
  • Atmospheric haze - objects further away look more washed out
  • Selective focus – the object of interest is in focus
  • Some main cues are missing from 2D media:
  • Stereo parallax - seeing a different image with each eye
  • Movement parallax - seeing different images when we move our heads
  • Accommodation of the eyeball (eyeball focus) - process by which the eye changes optical power to

maintain a clear image (focus) on an object as its distance changes.

  • Stereoscopy is the enhancement of the illusion of depth in an image or movie by

presenting a slightly different image to each eye. The movement parallax cue is still not satisfied with stereoscopy and, therefore, the illusion of depth is incomplete.

slide-13
SLIDE 13

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Free Viewpoint Systems Free Viewpoint Systems Free Viewpoint Systems Free Viewpoint Systems

Free viewpoint systems require the acquisition of multiple scene views taken from different angles, allowing the user to navigate around the scene.

slide-14
SLIDE 14

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Video Applications … 3D Video Applications … 3D Video Applications … 3D Video Applications …

The complete 3D video system is relevant for multiple applications such as broadcast TV, teleconference, surveillance, interactive video, cinema, gaming and other immersive video applications.

3D Home Master 3D Encoding & Video Compression 3D Video Distribution Channels Media Players & Set Top Boxes

Video Decompress 3D Format Decode

3D TV Left Eye Right Eye

Video Compress 3D Format Encode

Blu-ray Disc DVD Cable TV Satellite TV Terrestrial TV IPTV Internet

3D Home Package

slide-15
SLIDE 15

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Video Content Chain … 3D Video Content Chain … 3D Video Content Chain … 3D Video Content Chain …

  • The 3D content chain includes a sequence of modules which closely mirror a

conventional 2D system but are quite different; they have all to evolve towards 3D regarding the 2D available solutions.

  • 3D content creation involves special production “rules”, e.g. avoid fast pans and manage

depth transitions.

  • Content representation, distribution and display may be performed with many different

formats; the best choice depends on distribution constraints, display capabilities, available equipment, target quality, etc.

  • New 3D display technology is an important driving force: no glasses, multi-persons

displays, higher display resolutions, avoid uneasy feelings (headaches, nausea, eye strain, etc.).

Content acquisition and creation Content Representation Content Distribution Content Consumption

slide-16
SLIDE 16

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Video Content Acquisition and Creation

slide-17
SLIDE 17

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

History of 3D Video … History of 3D Video … History of 3D Video … History of 3D Video …

slide-18
SLIDE 18

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Content is Exploding … Again … 3D Content is Exploding … Again … 3D Content is Exploding … Again … 3D Content is Exploding … Again …

  • 165 3D movies released since 1953
  • Almost 30 3D movies only in 1953
  • Much more to come …
slide-19
SLIDE 19

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Momentum … 3D Momentum … 3D Momentum … 3D Momentum …

  • Hollywood is now able to offer

unique, high-quality immersive 3D experiences in theaters

  • Revenue per 3D screen is typically

three times higher than traditional 2D screens

  • Increased momentum in 3D

production and growing consumer appetite for 3D content

Avatar cost was around $500 million !!! Box office in Jan 2011 was $2,781,835,502 … Naturally, the sequel is coming .!

slide-20
SLIDE 20

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Cameras … 3D Cameras … 3D Cameras … 3D Cameras …

  • A stereo camera is a type of camera with two lenses with a separate image

sensor or film frame for each lens. This allows simulating human binocular vision, and gives it the ability to capture 3D images, a process known as stereo

  • photography. Stereo cameras may be used for making stereoviews and 3D

pictures for movies.

  • The distance between the lenses in a typical stereo camera (the intra-axial

distance) is about the distance between one's eyes (known as the intra-ocular distance); this is about 6.35 cm, although a longer base line (greater inter- camera distance) produces more extreme 3-dimensionality.

slide-21
SLIDE 21

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Video Content Representation

slide-22
SLIDE 22

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Video Formats/Standards … 3D Video Formats/Standards … 3D Video Formats/Standards … 3D Video Formats/Standards …

  • There is much confusion in the area of 3D video formats and standards. Many

formats are closely coupled to 3D display types and application scenarios.

  • A universal, flexible, generic, scalable, backward compatible 3D video

format/standard would be highly desirable to support any 3D video application in an efficient way, while decoupling content creation from display and application.

  • Experts expect 3D television to follow much the same trajectory as HDTV did

earlier this decade: a slow start, then a rapid ascent in sales once enough content exists to attract mainstream buyers.

slide-23
SLIDE 23

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Multi Multi-View Video Data View Video Data Multi Multi-View Video Data View Video Data

  • Most test sequences have 8-16 views
  • But, several 100 camera arrays exist!
  • Redundancy reduction between camera views
  • Need to cope with color/illumination mismatch problems
  • Alignment may not always be perfect either
slide-24
SLIDE 24

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Main 3D Video Format Requirements Main 3D Video Format Requirements Main 3D Video Format Requirements Main 3D Video Format Requirements

  • HIGH COMPRESSION EFFICIENCY - significant compression gains compared to the

independent compression of each view.

  • VIEW-SWITCHING RANDOM ACCESS - any image can be accessed, decoded and

displayed by starting the decoder at a random access point and decoding a relatively small quantity of data on which that image may depend.

  • SCALABILITY – a decoder is able to generate effective video output – although

reduced in quality to a degree commensurate with the quantity of data in the subset used for the decoding process – although accessing only a portion of a bitstream.

  • VIEW SCALABILITY – only a portion of the bitstream has to be accessed to output a

limited number subset of the set of encoded views.

  • BACKWARD COMPATIBILITY - a subset of the bitstream corresponding to one ‘base

view’ is decodable by an ordinary H.264/AVC decoder.

  • QUALITY CONSISTENCY AMONG VIEWS - it should be possible to control the

encoding quality of the various views.

slide-25
SLIDE 25

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Video Related Formats: the Menu … 3D Video Related Formats: the Menu … 3D Video Related Formats: the Menu … 3D Video Related Formats: the Menu …

  • Multi-View Simulcasting
  • Frame Compatible Stereo
  • Conventional Stereo Video
  • 2D (Texture)+Depth
  • Multi-View Video
  • Multi-View+Depth (MVD)
  • 3DV (MVD+synthesis)
slide-26
SLIDE 26

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Display versus Distribution Formats Display versus Distribution Formats Display versus Distribution Formats Display versus Distribution Formats

  • Generally, 3D content requires conversion prior to display; either in STB, external

converter box or TV itself

  • Important to minimize quality degradation
  • e.g. bad idea to use side-by-side image representation for line interleaved displays

since both horizontal and vertical resolution are lost

STB Blu-ray Player HDMI Distribution format Display format

HDMI

slide-27
SLIDE 27

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Multi Multi-View View Simulcasting Simulcasting Multi Multi-View View Simulcasting Simulcasting

  • Multi-view simulcasting refers to the independent encoding of each view

(ignoring they are like ‘brothers’ due to the interview redundancy).

  • May use any coding technology, e.g. MPEG-2 Video, but an advanced codec such

as H.264/AVC is more likely.

  • This solution was used in Portugal by Meo and Zon Multimedia to broadcast the

2010 World Cup games.

slide-28
SLIDE 28

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Frame Compatible Stereo Formats Frame Compatible Stereo Formats Frame Compatible Stereo Formats Frame Compatible Stereo Formats

  • Frame compatible formats refer

to a class of formats in which the stereo signal is essentially a multiplex of the two views into a single frame or sequence of frames to be coded with 2D video coding solutions. They are also called stereo interleaving or spatial/temporal multiplexing formats.

  • The signaling for a complete set of

frame-compatible formats has been standardized in H.264/AVC as supplemental enhancement information (SEI) messages.

  • Embraced by broadcasters for

initial phase of services.

slide-29
SLIDE 29

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Frame Compatible Stereo Formats Examples Frame Compatible Stereo Formats Examples Frame Compatible Stereo Formats Examples Frame Compatible Stereo Formats Examples

  • Basic concept: pack pixels from left and right views into a single frame to be coded

‘as usual’:

  • Spatial Multiplexing: side-by-side, top-bottom, checkerboard formats
  • Time Multiplexing: views interleaved as alternating frames or fields
  • In such a format, half of the coded samples represent the left view and the other

half represent the right view; thus, each coded view has half the resolution of the full coded frame.

Left Right Left Right time

Left Right

slide-30
SLIDE 30

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Frame Compatible Formats: Pros and Cons Frame Compatible Formats: Pros and Cons Frame Compatible Formats: Pros and Cons Frame Compatible Formats: Pros and Cons

Advantages

  • Tunnels stereo bitstream through existing decoders (The stereo video can be

compressed with existing encoders, transmitted through existing channels, and decoded by existing receivers)

  • Depending on format, bandwidth of compressed stream is similar to any 2D stream

(some rate increase expected)

  • Uncompressed format has minimal impact on baseband infrastructure (production and

consumer interfaces)

Drawbacks

  • Interleaved views not readily usable for legacy receivers
  • Loss of resolution for each view (if total frame resolution is the same)
  • Potential mismatch between interleaving format of compressed stream and various

native display formats (further quality degradation)

  • Frame-compatible stereo video tend to have higher spatial frequency content

characteristics

slide-31
SLIDE 31

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Preferred Frame Compatible Formats Preferred Frame Compatible Formats Preferred Frame Compatible Formats Preferred Frame Compatible Formats

  • Industry has moved forward with two

primary formats: side-by-side and top-bottom

  • Quality of reconstructed signal after

compression can be better maintained

  • Row/column interleaving and checkerboard

introduce high frequency (higher bitrate, cross talk, color bleeding)

  • For interlaced, clear preference for side-by-side
  • Top-bottom would further reduce the vertical resolution
slide-32
SLIDE 32

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Conventional Stereo Format Conventional Stereo Format Conventional Stereo Format Conventional Stereo Format

  • Conventional stereo refers to the case where two full resolution stereo views

are coded exploiting their interview redundancy.

  • MPEG-2 Video, MPEG-4 Visual and the MVC standards offer full stereo

coding solutions with increased compression efficiency.

Combined temporal and interview prediction

slide-33
SLIDE 33

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

2D+Depth Format 2D+Depth Format 2D+Depth Format 2D+Depth Format

  • Includes a 2D view and the corresponding depth
  • Depth enables intermediate view generation
  • Standardized as ISO/IEC 23002-3 “MPEG-C Part 3”
  • Advantages
  • 2D video is backward compatible with legacy devices
  • Agnostic of coding format, so could utilize MPEG-2
  • Additional bandwidth to code depth could be minimal
  • Support both stereo and multi-view displays
  • Drawbacks
  • Stereo signal not easily accessible and error-prone (view generation needed)
  • No provisions to handle occlusions, capable of rendering a limited depth

range

slide-34
SLIDE 34

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Multi Multi-View Video Coding Format View Video Coding Format Multi Multi-View Video Coding Format View Video Coding Format

Multi-view video (MVV) refers to a set of N temporally synchronized video streams coming from cameras capturing the same real scenery from different viewpoints.

  • Provides the ability to change viewpoint freely with multiple views available
  • Renders one view (real or virtual) to legacy 2D display
  • Most important case is stereo video (N = 2), with each view derived for projection into one eye,

in order to generate a depth impression

VIEW-1 VIEW-2 VIEW-3

  • VIEW-N

TV/HDTV

3DTV

Stereo system

Channel

  • Multi-view

VIEW-1 VIEW-2 VIEW-3

  • VIEW-N

TV/HDTV

3DTV

Stereo system

Channel

  • Multi-view
slide-35
SLIDE 35

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

MPEG MPEG-2 Multiview Profile 2 Multiview Profile MPEG MPEG-2 Multiview Profile 2 Multiview Profile

  • MPEG-2 design leveraged temporal

scalability for coding second view

  • Reference picture could be either a

picture from the base view or from within the enhancement view

  • Main benefits
  • Uses existing block level coding tools and syntax
  • Enables inter-view prediction for first enhancement-view picture in each random-

accessible encoded video segment

  • Drawback
  • Prediction in the reverse-temporal direction not enabled for the enhancement view,

which minimizes the memory storage, but reduces compression efficiency

slide-36
SLIDE 36

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Multi Multi-View Video Coding (MVC) Standard View Video Coding (MVC) Standard Multi Multi-View Video Coding (MVC) Standard View Video Coding (MVC) Standard

  • MVC is a H.264/AVC extension without any

changes of the slice layer syntax and below and

  • f the decoding process.
  • Provides coding of multiple views, stereo to

multi-view.

  • Exploits redundancy between views using

inter-camera prediction to reduce the required bitrate.

  • It is mandatory for the multi-view stream to

include a base view, which is independently coded from other non-base views.

  • The MVC coding gains are:
  • For stereo video, the rate of the dependent

view is reduced around 30%

  • For multi-view, rate savings overall all views

are about 25%

slide-37
SLIDE 37

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Interview Prediction: Basics Interview Prediction: Basics Interview Prediction: Basics Interview Prediction: Basics

Many prediction structures possible to exploit interview redundancy, trading-off differently memory, delay, computation and coding efficiency.

View

MPEG-2 Video Multi-view profile

Pictures in the second view are not only predicted from temporal references (in the same view), but also from interview references (in the other view). The prediction is adaptive, so the best predictor among temporal and interview references can be selected on a block basis in terms of rate-distortion cost.

slide-38
SLIDE 38

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Interview Prediction in MVC Interview Prediction in MVC Interview Prediction in MVC Interview Prediction in MVC

Time View

  • The MVC standard enables interview prediction, as well as supporting ordinary

temporal and spatial prediction.

  • Interview prediction is a key feature of the MVC design, and it is enabled in a

way that makes use of the flexible reference picture management capabilities that had already been designed into H.264/AVC.

  • It also supports backward compatibility with existing legacy systems by

structuring the MVC bitstream to include a compatible ‘base view’.

Base View with GOP size 6 For complexity reasons, the MVC design does not allow the prediction of a picture in one view at a given time using a picture from another view at a different time.

slide-39
SLIDE 39

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

View Random Access View Random Access View Random Access View Random Access

  • Random Access Delay - As MVC introduces dependencies between

views, random access in the view dimension must also be considered; this regards controlled delay view switching.

  • Target versus Non-Target Views - Specifically, in addition to the views to be

accessed (called the target views), any views on which they depend for purposes of interview referencing also need to be accessed and decoded, which typically requires some additional decoding time or delay.

  • Prediction Structure - For applications in which random access or view switching is

important, the prediction structure has to be designed to minimize access delay, and the MVC design provides a way for an encoder to describe the prediction structure.

  • Access Points – To access a particular picture in a given view, the decoder should first

determine an appropriate access point. In H.264/AVC, each IDR picture provides a clean random access point, since these pictures can be independently decoded and all the coded pictures that follow them in bitstream order can also be decoded without temporal prediction from any picture decoded prior to the IDR picture.

slide-40
SLIDE 40

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

MVC: Technical Solution MVC: Technical Solution MVC: Technical Solution MVC: Technical Solution

The core macroblock-level and lower-level decoding modules of an MVC decoder are the same, regardless of whether a reference picture is a temporal or an interview

  • reference. This distinction is managed at a higher level of the decoding process.
  • Key elements of the MVC design
  • Does not require any changes to lower-level syntax, so it is very compatible with single-

layer AVC hardware;

  • Base layer required and easily extracted from video bitstream (identified by NAL unit

type)

  • Several additions to the high-level syntax, which are primarily signaled through a multi-

view extension of the sequence parameter set (SPS) defined by H.264/AVC.

  • Three important pieces of information are carried in the SPS extension: i) view

identification; ii) view dependency information; and iii) level index for operation points.

  • Inter-view prediction
  • Enabled through flexible reference picture management; allow decoded pictures from other

views to be inserted and removed from reference picture buffer

  • Core decoding modules do not need to be aware of whether reference picture is a time

reference or multi-view reference

slide-41
SLIDE 41

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

  • During the development of MVC, a number of macroblock-level coding tools

were also explored:

  • Illumination compensation: Incorporates illumination change into motion compensation

process; more efficient inter-view prediction when illumination mismatches exist

  • Adaptive reference filtering: compensate for other mismatches between views such as

focus

  • Motion skip mode: exploit high correlation between motion vectors in neighboring views,

Infer motion information from corresponding block in neighboring view; similar in concept to inter-layer motion prediction of SVC

  • View synthesis prediction: estimate depth, synthesize virtual view and use for prediction
  • Poor gains led to non-adoption
  • As these tools would provide only additional 10-15% gains, these gains were considered

not sufficient justification to change macroblock-level syntax of standard

  • Possibility for future amendment including these tools (and possibly others) once the 3D

market becomes more clear and needs are better understood

Proposed Proposed Tools Tools for MVC for MVC Proposed Proposed Tools Tools for MVC for MVC

slide-42
SLIDE 42

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

MVC: Profiles and Levels MVC: Profiles and Levels MVC: Profiles and Levels MVC: Profiles and Levels

There are two MVC profiles with support for more than

  • ne view, both based on the

H.264/AVC High profile:

  • The Multi-view High profile

supports multiple views and does not support interlaced coding tools.

  • The Stereo High profile is

limited to two views, but does support interlaced coding tools. Levels impose constraints on the MVC bitstreams to establish bounds on the necessary decoder resources and complexity. The level limits include limits on the amount of frame memory required for the decoding of a bitstream, the maximum throughput in terms of macroblocks per second, maximum picture size, overall bit rate, etc.

slide-43
SLIDE 43

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

MVC Compression Performance MVC Compression Performance MVC Compression Performance MVC Compression Performance

Simulcasting versus MVC comparison

8 views (with 640×480 resolution), and considering the rate for all views ~25% bit rate savings over all views

Ballroom

31 32 33 34 35 36 37 38 39 40 200 400 600 800 1000 1200 1400 1600 1800

Bitrate (Kb/s) PSNR (db) Simulcast MVC Race1

32 33 34 35 36 37 38 39 40 41 42 200 400 600 800 1000 1200 1400 1600

Bitrate (Kb/s) PSNR (db) Simulcast MVC

slide-44
SLIDE 44

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

MVC: Subjective Stereo Performance MVC: Subjective Stereo Performance MVC: Subjective Stereo Performance MVC: Subjective Stereo Performance

  • MVC achieves comparable perceptual quality to simulcast with as little as 25% rate

for the dependent view (75% gain); this rate may have to be higher for lower rates than 12 Mbit/s for the main view.

  • For similar PSNR, the gains are only about 30% for the dependent view.
  • This experiment shows that the 2 views don’t need to have the same quality.

1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50

Original Simulcast (AVC+AVC) 12L_50Pct 12L_35Pct 12L_25Pct 12L_20Pct 12L_15Pct 12L_10Pct 12L_5Pct Mean Opinion Score

Base view fixed at 12 Mbit/s; dependent view at varying percentage of base view rate.

slide-45
SLIDE 45

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

  • Acquisition and production of video with large camera arrays is hard and

uncommon

  • MVC is more efficient than simulcast but the rate is still rather proportional to

the number of views

  • Varies with scene, camera arrangements, etc

MVC MVC Limitations Limitations MVC MVC Limitations Limitations

slide-46
SLIDE 46

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Multi Multi-View Video plus Depth (MVD) View Video plus Depth (MVD) Multi Multi-View Video plus Depth (MVD) View Video plus Depth (MVD)

  • The MVD format encodes both the texture and the depth data (same number of

views) with MVC.

  • Coding texture and depth simultaneously is a direction currently explored in

MPEG as part of the 3D Video coding activity.

  • MVD is the reference format for MPEG 3D Video: stereo texture and stereo

depth (encoded with MVC).

slide-47
SLIDE 47

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

  • Depth has unique signal properties relative to natural images
  • Larger homogeneous areas inside scene objects
  • Sharp transitions along object boundaries
  • Depth maps are not reconstructed for display, but rather for

view synthesis of the video data (we never see depth maps!)

  • Depth represents a shift value for color samples from original views
  • Coding errors in depth maps result in wrong pixel shifts in synthesized views
  • Errors especially visible around depth discontinuities at the borders of objects

with different scene depth

  • Depth compression algorithm needs to preserve depth edges much better

than current coding methods such as AVC/MVC

Depth Coding Depth Coding vs vs Natural Image Coding Natural Image Coding Depth Coding Depth Coding vs vs Natural Image Coding Natural Image Coding

slide-48
SLIDE 48

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

After MVC: the MPEG 3DVC Approach After MVC: the MPEG 3DVC Approach After MVC: the MPEG 3DVC Approach After MVC: the MPEG 3DVC Approach

  • Synthesize a continuum of views based on a limited set of decoded views
  • Specify a format that fixes a rate, but allows arbitrarily large number of views

to be rendered

Arbitrarily Large Number

  • f Output Views

Data Format Data Format Constrained Rate (based on distribution) Limited Camera Inputs

  • Wide viewing angle
  • Large number of
  • utput views

Left Right Auto-stereoscopic N-view displays Stereoscopic displays

  • Variable stereo baseline
  • Adjust depth perception
slide-49
SLIDE 49

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Bitrate versus 3D Rendering Capability Bitrate versus 3D Rendering Capability Bitrate versus 3D Rendering Capability Bitrate versus 3D Rendering Capability

2D 2D 2D+Depth 2D+Depth MVC MVC Simulcast Simulcast

3D Rendering Capability Bit Rate

3DVC

3DVC should be compatible with:

  • existing standards
  • mono and stereo devices
  • existing or planned infrastructure
slide-50
SLIDE 50

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

MPEG 3DVC Framework MPEG 3DVC Framework MPEG 3DVC Framework MPEG 3DVC Framework

Depth Estimation Video/Depth Codec View Synthesis Limited Video Inputs (e.g., 2 or 3 views) Larger # Output Views

1010001010001

Binary Representation & Reconstruction Process

+

slide-51
SLIDE 51

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Quality Metrics: an Even Bigger Challenge Quality Metrics: an Even Bigger Challenge Quality Metrics: an Even Bigger Challenge Quality Metrics: an Even Bigger Challenge

How to measure the quality of the ‘synthetic’ views for which no ‘real’ references exist ? How do we know/measure what is ‘good quality’ ? Subjective testing is mostly being used by MPEG …

+

slide-52
SLIDE 52

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Content Distribution

(after this slide the topics are not for the exam)

slide-53
SLIDE 53

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

TV Service Transitions … TV Service Transitions … TV Service Transitions … TV Service Transitions …

  • 1930s – Black and White TV starts
  • 1950s: Color TV introduced
  • Analog, backward compatible
  • 1990s: Digital TV
  • New infrastructure required
  • Transitions from SD to HD
  • 2010: 3D
  • Introduction of services is mixed
  • Not a single format across all services
slide-54
SLIDE 54

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Blu Blu-Ray Disc Ray Disc Blu Blu-Ray Disc Ray Disc

  • Large storage capacity
  • Desire for 2D compatibility, and reuse of existing players
  • New 3D Blu-ray players introduced to the market
  • In recognition of its high quality encoding capability and support for

backward compatibility, the MVC Stereo High profile was selected by the Blu-Ray Disc Association as the coding format for 3D video with high-definition resolution.

  • BD Specification
  • 1080p resolution to each eye
  • Adopted MVC format for high coding efficiency and backward

compatibility

  • Upgrade of existing legacy players also possible
slide-55
SLIDE 55

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Cable and Satellite Transmission Cable and Satellite Transmission Cable and Satellite Transmission Cable and Satellite Transmission

  • Cable/satellite systems aware of set-top box capabilities
  • Backward compatibility less critical
  • Bandwidth not a major problem either
  • Biggest issue: wide installment of set-top boxes; how to deliver 3D to those

boxes?

  • First step: frame compatible formats
  • Full-resolution stereo later
  • Services have started
  • Good business models: video-on-demand, premium channels
  • Already many channels around the world
slide-56
SLIDE 56

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Internet Internet Internet Internet

  • External boxes with access to premium content, games, e.g. VUDU, TiVo
  • Internet-connected TVs already in the market: CEA projects 50% penetration by

2013

  • Samsung launched Explore 3D Service in US/UK in May 2011
  • Free streaming of HD content including movie trailers, music videos, TV shows
  • Pay services by end of 2011 including feature films and other premium content
  • Non-real-time (NRT) delivery of 3D content also on the horizon
slide-57
SLIDE 57

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Terrestrial Broadcast Terrestrial Broadcast Terrestrial Broadcast Terrestrial Broadcast

  • Problems with bandwidth

and legacy devices

  • MPEG-2 is still mandatory

in most regions

  • HD broadcasting eats up

significant bandwidth

Alternative Bandwidth Allocations

0% 20% 40% 60% 80% 100%

SDTV & 3DTV SDTV, Mobile & 3DTV HDTV & 3DTV HDTV, Mobile & 3DTV

3DTV Mobile SDTV HDTV

slide-58
SLIDE 58

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Games, Really Helping to Boost 3D … Games, Really Helping to Boost 3D … Games, Really Helping to Boost 3D … Games, Really Helping to Boost 3D …

Games create another important channel to bring 3D content and displays to the home, especially when there are young people around …

slide-59
SLIDE 59

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Content Consumption

slide-60
SLIDE 60

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Displays: a Major Driving Force … 3D Displays: a Major Driving Force … 3D Displays: a Major Driving Force … 3D Displays: a Major Driving Force …

  • 3D displays are maturing rapidly
  • High quality stereoscopic displays now with minimal added cost; lots of

investment in auto-stereoscopic

  • As display bandwidth increases, 3D more attractive to consumer
  • Customer base with 3D-ready HD displays has notably increased
slide-61
SLIDE 61

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

About 3D and 3D Displays … About 3D and 3D Displays … About 3D and 3D Displays … About 3D and 3D Displays …

  • A 3D display is any display device capable of conveying a

stereoscopic perception of 3D depth to the viewer.

  • The basic requirement is to present offset images that are displayed separately

to the left and right eye. Both of these 2D offset images are then combined in the brain to give the perception of 3D depth.

  • Although the term ‘3D’ is ubiquitously used, the presentation of dual 2D images

is distinctly different from displaying an image in three full dimensions. The most notable difference is that the observer is lacking any/full freedom of head movement (movement parallax) to increase information about the 3D objects being displayed.

  • Holographic/volumetric displays do not have this limitation, so the term ‘3D

display’ fits more accurately for such technology.

  • Similar to how in sound reproduction, it is not possible to recreate a full 3D

sound field merely with two stereophonic speakers, it is likewise an

  • verstatement of capability to refer to dual 2D images as being ‘3D’.
slide-62
SLIDE 62

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Display Formats: a Taxonomy 3D Display Formats: a Taxonomy 3D Display Formats: a Taxonomy 3D Display Formats: a Taxonomy

  • Variety - There are a wide range of 3D display technologies available and they all have to solve

the same fundamental problem: how to direct a different image to the left and right eyes.

  • Stereoscopic versus autostereoscopic - The optical technologies available to direct light in this

way have resulted in many 3D display designs commercially available, falling into two broad categories: stereoscopic and autostereoscopic.

  • Increasing views - The taxonomy above is arranged so that the number of simultaneous views in

each display type increases from left to right, from two-view stereoscopic displays, to horizontal parallax multi-view displays, to full horizontal and vertical parallax volumetric displays.

  • Parallax resolution - The number of views needed to drive a display, or its parallax resolution, is

a critical factor for display designers, content producers and users since it directly affects the whole imaging process from capture through to display.

Light field reconstruction Discrete number of views

slide-63
SLIDE 63

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Display Formats: Some Main Differences … 3D Display Formats: Some Main Differences … 3D Display Formats: Some Main Differences … 3D Display Formats: Some Main Differences …

  • Stereoscopic
  • Different images to the left and right eyes
  • Requires special glasses for viewing
  • Autostereoscopic (with discrete number of views)
  • Display enables each eye to see a different image through optics
  • No glasses required, but often sensitive to viewer positions
  • Only horizontal parallax
  • Computer-generated holography
  • Display creates a light field identical to that which would emanate from the
  • riginal scene
  • Provides both horizontal and vertical parallax across a large range of viewing

angles - similar to looking through a window at the scene being reproduced

  • Volumetric displays
  • Projected points of light within a volume
slide-64
SLIDE 64

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Stereoscopic Anaglyphic Displays Stereoscopic Anaglyphic Displays Stereoscopic Anaglyphic Displays Stereoscopic Anaglyphic Displays

Anaglyph images are used to provide a stereoscopic 3D effect, when viewed with glasses where the two lenses are different (usually chromatically opposite) colors, such as red and cyan. Images are made up of two color layers, superimposed, but offset with respect to each other to produce a depth effect (significant problems with color representation).

slide-65
SLIDE 65

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Stereoscopic Wavelength Multiplexing Displays Stereoscopic Wavelength Multiplexing Displays Stereoscopic Wavelength Multiplexing Displays Stereoscopic Wavelength Multiplexing Displays

Specific wavelengths of red, green, and blue are used for the right eye, and different wavelengths of red, green, and blue for the left eye. Eyeglasses which filter out the very specific wavelengths allow the wearer to see a 3D image (problems with color representation). This technology eliminates the expensive silver screens required for polarized systems, which is the most common 3D display system in

  • theaters. However, it requires more expensive

glasses than the polarized systems.

Left eye: Red 629nm, Green 532nm, Blue 446nm Right eye: Red 615nm, Green 518nm, Blue 432nm

slide-66
SLIDE 66

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Stereoscopic Polarized Displays Stereoscopic Polarized Displays Stereoscopic Polarized Displays Stereoscopic Polarized Displays

  • Polarized 3D glasses create the illusion of 3D images by restricting the light that

reaches each eye, exploiting the polarization of light; referred as a passive system.

  • Two images are projected superimposed onto the same screen through different

polarizing filters; the viewer wears low-cost eyeglasses which also contain a pair of different polarizing filters.

  • Two types of polarized systems: patterned retarder (micro-polarized) and active

retarder.

slide-67
SLIDE 67

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Polarized Stereo Display: Patterned Retarder Polarized Stereo Display: Patterned Retarder Polarized Stereo Display: Patterned Retarder Polarized Stereo Display: Patterned Retarder

Patterned Retarder (Micro-polarizer) Half vertical resolution to each eye Polarized 3D Glasses

(courtesy of Brad Hunt)

slide-68
SLIDE 68

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Polarized Stereo Display: Active Retarder Polarized Stereo Display: Active Retarder Polarized Stereo Display: Active Retarder Polarized Stereo Display: Active Retarder

Active Retarder Full-resolution to each eye Polarized 3D Glasses

(courtesy of Brad Hunt)

slide-69
SLIDE 69

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Stereoscopic Alternate Stereoscopic Alternate-Frame Sequencing Frame Sequencing Displays Displays Stereoscopic Alternate Stereoscopic Alternate-Frame Sequencing Frame Sequencing Displays Displays

Temporal 3D display technique creating a stereoscopic 3D effect by alternately displaying two different perspectives one for each eye. When viewed using active shutter glasses that alternate each eye between transparent and opaque in sync with the display at very high frame rate (e.g., 120 Hz), the brain merges the images into an integrated stereoscopic view; very used on PC systems to render 3D games. Frame rate has to be doubled to get an equivalent result or spatial resolution has to be halved if alternate field sequencing is used. Glass containing a liquid crystal and a polarizing filter becomes dark when a voltage is synchronously applied, but

  • therwise transparent.
slide-70
SLIDE 70

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Displays: a Summary 3D Displays: a Summary 3D Displays: a Summary 3D Displays: a Summary

Many types of 3D displays are available

  • Stereo vs Multiview
  • Glasses vs No Glasses
  • Varying resolutions and quality
  • Each with different data requirements
slide-71
SLIDE 71

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Conclusion

slide-72
SLIDE 72

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

The Standardization Path … The Standardization Path … The Standardization Path … The Standardization Path …

JPEG JPEG-LS JPEG 2000 MJPEG 2000 JPEG XR AIC ? H.261 H.263 H.264/AVC/SVC/MVC MPEG-1 Video H.262/MPEG-2 Video MPEG-4 Visual HEVC RVC 3DV

slide-73
SLIDE 73

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

3D Related Standards Landscape 3D Related Standards Landscape 3D Related Standards Landscape 3D Related Standards Landscape

Capture/Production Distribution Consumption Video Coding Formats

slide-74
SLIDE 74

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Another Big Step Forward … Another Big Step Forward … Another Big Step Forward … Another Big Step Forward …

  • Stereo services are arriving now
  • Mix of formats for different distribution channels
  • Blu

Blu-ray has decided on full ray has decided on full-resolution format resolution format

  • Broadcasters have embraced frame

Broadcasters have embraced frame-compatible formats compatible formats

  • Expect migration from frame-compatible to full-resolution
  • Variety of “enhancement layer” solutions to consider (either based on

MVC/SVC or some hybrid)

  • Should plan support for auto-stereoscopic displays
slide-75
SLIDE 75

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

slide-76
SLIDE 76

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Video Coding Standards: a Summary Video Coding Standards: a Summary Video Coding Standards: a Summary Video Coding Standards: a Summary

Standard Year Main Applications Profiles Main Bitrates Frame Types Ref. Frames Transf

  • rm

Number Motion Vectors (if any) Motion Vectors Precision Entropy Coding Deblocking Filter

H.261 1988 Videotelephony and videoconference No p× 64 kbit/s

  • 1

DCT 1 per MB Integer pel Huffman based In loop MPEG

  • 1

Video 1991 Digital storage in CD- ROM No Around 1- 1.2 Mbit/s I, P, B , and D 0-2 DCT 1 or 2 per MB (P and B) Half pel Huffman based Out of the loop H.262/MPEG- 2 Video 1994 Digital TV and DVD Yes, most used is Main Profile From 2 to 10 Mbit/s I, P and B 0-2 DCT 1 or 2 per MB (2 to 4 for interlaced video ) Half pel Huffman based Out of the loop H.263 1995 Videotelephony and videoconference and more Only in extensions From very low rates to around 1 Mbit/s I, P and B 0-2 DCT 1 or 2 per MB (4 in the optional modes) Half pel Huffman based Out of the loop MPEG

  • 4

Visual 1998 Large range with

  • bjects

Yes, most used are Simple and Advanced Simple Very large range using levels I, P and B 0-2 DCT 1 or 2 per MB (4 in the optional modes); also global motion vectors 1/4 pel Huffman based; arithmetic coding for the shape Out of the loop H.264/AVC 2004 Large range, from mobile to Blu-ray Yes, most used are Baseline, Main and High Very large range using levels I, P, generalize d B, SP and SI Up to 16 Integer DCT 1 to 16 per MB (P slices) and 1to 32 (B slices) 1/4 pel CAVLC and CABAC In loop SVC 2007 Robust delivery, graceful deletion, broadcasting, Yes Very large range using layers I, P and generalize d B, Up to 16 Integer DCT 1 to 16 per MB (?) 1/4 pel CAVLC and CABAC In loop MVC 2009 Stereo TV, Free viewpoint TV Yes Very large range using levels I, P, B, Up to 16 Integer DCT 1 to 16 per MB (?) 1/4 pel CAVLC and CABAC In loop

slide-77
SLIDE 77

Comunicação de Áudio e Vídeo, Fernando Pereira, 2012

Bibliography Bibliography Bibliography Bibliography

  • Overview of the Stereo and Multiview Video Coding Extensions of the

H.264/AVC Standard, A. Vetro, T. Wiegand, G. Sullivan, Proceedings of the IEEE, April 2011

  • 3D-TV Content Storage and Transmission, A. Vetro, A. Tourapis, K.

Muller, T. Chen, IEEE Transactions on Broadcasting, June 2011

  • Autostereoscopic 3D Displays, Neil A. Dodgson, Computer, August 2005
  • Three-Dimensional Displays: A Review and Applications Analysis, N.
  • S. Holliman, N. A. Dodgson, G. E. Favalora, and L. Pockett, IEEE

Transactions on Broadcasting, June 2011

Many of these slides have been inspired (and even more ;-) by materials provided by several friends with especial emphasis for Anthony Vetro, MERL,

  • USA. Obrigado !