3D VIDEO SYSTEMS Fernando Pereira Audio and Video Communication, - - PowerPoint PPT Presentation

3d video systems
SMART_READER_LITE
LIVE PREVIEW

3D VIDEO SYSTEMS Fernando Pereira Audio and Video Communication, - - PowerPoint PPT Presentation

3D VIDEO SYSTEMS Fernando Pereira Audio and Video Communication, Fernando Pereira, 2014/2015 Its a 3D World ! Audio and Video Communication, Fernando Pereira, 2014/2015 Context and Motivation Strong interest in 3D services Increasing


slide-1
SLIDE 1

Audio and Video Communication, Fernando Pereira, 2014/2015

3D VIDEO SYSTEMS

Fernando Pereira

slide-2
SLIDE 2

Audio and Video Communication, Fernando Pereira, 2014/2015

It’s a 3D World !

slide-3
SLIDE 3

Audio and Video Communication, Fernando Pereira, 2014/2015

Context and Motivation

  • Strong interest in 3D services
  • Increasing production of premium content, e.g. movies

and sports

  • Numerous devices supporting stereoscopic displaying

available to the consumer including mobile

  • Substantial investments being made to upgrade digital

cinema theaters with 3D capabilities

  • Many new standards being developed, e.g. production,

distribution, digital interfaces

slide-4
SLIDE 4

Audio and Video Communication, Fernando Pereira, 2014/2015

Stereoscopic Displays Sales Forecast

Source: DisplaySearch 3D Display Technology and Market Forecast Report

slide-5
SLIDE 5

Audio and Video Communication, Fernando Pereira, 2014/2015

Critical Success Factors

  • High quality experience not burdened with high

transition costs or turned off by viewing discomfort or fatigue

  • Usability and consumer acceptance of 3D viewing

technology, e.g., glasses vs no glasses

  • Availability of premium 3D content in the home
  • Availability of an appropriate data format providing

interoperability through the delivery chain and taking into consideration the constraints imposed by each delivery channel

slide-6
SLIDE 6

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Perception Basics

slide-7
SLIDE 7

Audio and Video Communication, Fernando Pereira, 2014/2015

The Human Eye

Rod and cone cells in the retina allow conscious light perception and vision including color differentiation and the perception of depth.

The crystalline lens changes/focus for the light to strike the retina

slide-8
SLIDE 8

Audio and Video Communication, Fernando Pereira, 2014/2015

Human Visual System

slide-9
SLIDE 9

Audio and Video Communication, Fernando Pereira, 2014/2015

Stereoscopic Vision

  • Accommodation, a monocular cue, refers to

the variation of the crystalline lens shape and thickness (and thus its focal length), to allow the eye to focus on an object as its distance varies to maintain a clear image or focus.

  • Vergence, a binocular cue, refers to the

muscular rotation of the eye balls, which is used to converge both eyes on the same object.

  • Under normal conditions, changing the focus
  • f the eyes to look at an object at a different

distance will automatically cause vergence and accommodation, sometimes known as the accommodation-convergence reflex.

  • In reality, the viewer eyes accommodate

(focus) and converge (point) to the depth of the object.

slide-10
SLIDE 10

Audio and Video Communication, Fernando Pereira, 2014/2015

Retinal Disparity

  • The retinal images are different in the left and right eyes depending on the object

distance and angle.

  • The retinal disparity is the slight difference in the two retinal images due to the

angle from which each eye views an object.

  • The retinal disparity is an importante depth cue.
slide-11
SLIDE 11

Audio and Video Communication, Fernando Pereira, 2014/2015

Positive and Negative Disparities

  • Negative and positive disparities are not a natural situation as normally both

accommodation and vergence occur at the same depth.

  • Too large a disparity may cause eyestrain (especially in older viewers).

Stereoscopic Display Location of “3D” Image

Negative Disparity Positive Disparity

slide-12
SLIDE 12

Audio and Video Communication, Fernando Pereira, 2014/2015

Depth Perception Control

  • In the human visual system, perception of depth is greatly controlled by 2

parameters: retinal disparity (controlled by vergence) and focus (controlled by accommodation).

  • Accommodation cannot be controlled in a stereo display situation as humans

focus the eyes on the screen surface, even if objects are positioned in 3D in front or behind the screen surface.

  • Current stereo display technologies only have indirect control over vergence

by presenting slightly different images to the left and right eyes (disparity).

slide-13
SLIDE 13

Audio and Video Communication, Fernando Pereira, 2014/2015

Accomodation-Vergence Conflict

  • In natural viewing, the vergence stimulus and focal stimulus are always at the same

distance and, therefore, are consistent with one another.

  • Stereo displaying create (varying) inconsistencies between vergence and focal

distances because the vergence distance varies depending on the image contents while the focal distance remains constant (in the screen).

  • The accommodation-vergence conflicts lead to problems, notably 3D structure

distortions and visual fatigue.

slide-14
SLIDE 14

Audio and Video Communication, Fernando Pereira, 2014/2015

Depth Perception: the Comfort Zone

  • Due to the accommodation-vergence conflict, there is a limited disparity range

allowing proper stereo vision and depth perception. In content production, the admissible disparity range is called comfort zone.

  • 3D video production has to map the arbitrary depth range of the real world into

this comfort zone by carefully modifying the stereo camera baseline and convergence settings.

slide-15
SLIDE 15

Audio and Video Communication, Fernando Pereira, 2014/2015

Camera Baseline

  • The camera baseline or base is the distance between the 2 stereoscopic
  • lenses. This distance has a profound effect on stereo content.
  • For most images, the baseline is close to the distance between the human

eyes, which is around 65 mm.

  • However, it is possible to use a shorter baseline for closeup photography or

a longer baseline when shooting distant subjects such as the moon or

  • mountains. This is critical to ‘put’ the real world in the comfort zone.
  • Especially for 3D home entertainment, newer stereoscopic displays can

vary the baseline between the views to adapt to different viewing distances.

slide-16
SLIDE 16

Audio and Video Communication, Fernando Pereira, 2014/2015

Screen Disparity …

Screen Disparity = distance between Pleft and Pright

Display Screen

Pleft Pleft Pright Pright

Left Eye Right Eye Object with Positive Disparity Object with Negative Disparity

slide-17
SLIDE 17

Audio and Video Communication, Fernando Pereira, 2014/2015

Disparity, Depth & Perceived Depth

  • Disparity depends on
  • f: focal length of cameras
  • b: distance between

cameras (baseline)

  • Z: depth of the object

Z b f d 

Perceived depth depends on ds on

  • be: eye distance
  • D: viewer distance to display
  • d’: screen parallax

(function of disparity)

) ' ( ' d b D b Z

e e

 

slide-18
SLIDE 18

Audio and Video Communication, Fernando Pereira, 2014/2015

Disparity, Depth & Perceived Depth

The left and right views are shot with some disparity ! The left and right views are consumed with some depth depending on the consumption conditions !

slide-19
SLIDE 19

Audio and Video Communication, Fernando Pereira, 2014/2015

Influence of Viewing Distance and Age

  • Perceived depth is

directly proportional to viewer distance

  • Kid eye distance/baseline

is ~84% of an adult

  • Therefore, for kids,

closer objects look closer, far objects look farther

slide-20
SLIDE 20

Audio and Video Communication, Fernando Pereira, 2014/2015

Depth Cues: Monocular and Binocular

  • Most of the depth cues used by humans to visualize the world’s 3D structure are

available in 2D projections; this is why images make sense on a TV/cinema screen.

  • The depth cues can be classified into oculomotor cues coming from the eye

muscles, and visual cues from the scene content itself. They can also be classified into monocular and binocular cues.

  • Monocular cues for 3D perception include:
  • Occlusion - one object partially covering another
  • Perspective - point of view
  • Familiar size - we know the real-world sizes of many objects
  • Atmospheric haze - objects further away look more washed out
  • Selective focus or Accommodation of the eyeball (eyeball focus) - the eye

changes optical power to maintain a clear image (focus) on an object as its distance changes.

slide-21
SLIDE 21

Audio and Video Communication, Fernando Pereira, 2014/2015

Main Monocular Depth Cues

slide-22
SLIDE 22

Audio and Video Communication, Fernando Pereira, 2014/2015

Main Binocular Depth Cues

Some main cues are missing from 2D media:

  • Stereo parallax - seeing a different

image with each eye, thus different aspects of the same object

  • Motion parallax - seeing different

perspective images when we move our heads; nearby objects appear to move faster across the view

  • Vergence - muscular rotation of the

eye balls, which is used to converge both eyes on the same object

slide-23
SLIDE 23

Audio and Video Communication, Fernando Pereira, 2014/2015

Range of Effectiveness of Depth Cues

  • Not all cues have the same importance in the visual system, and their relative

importance depends on the viewing distance, among other factors

  • Some depth cues are independent of distance, such as occlusion or relative size,

where as others are distance-dependent, such as disparity or vergence

slide-24
SLIDE 24

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Systems

slide-25
SLIDE 25

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video Experiences …

 Depth perception in stereoscopic displays – Effect provided through stereo video

pairs, targeting the left and right eyes, allowing the perception of depth using stereo parallax

 Depth perception in auto-stereoscopic displays – Effect provided through n video

views, targeting the left and right eyes in multiple positions, allowing the perception of depth using stereo and motion parallaxes

 Navigation – Effect provided through n video views, allowing navigating the 3D scene

by changing the viewpoint and view direction within certain ranges; the viewer may experience a look around effect as well as depth perception

slide-26
SLIDE 26

Audio and Video Communication, Fernando Pereira, 2014/2015

Early Stereoscopy

Stereoscopy regards the capability of recreating 3D visual information or creating the illusion of depth in an image based on two appropriate views. These two slightly different images are presented to each eye. Both of these 2D

  • ffset images are then combined in the brain to give the perception of 3D depth.

The motion parallax cue is not satisfied with stereoscopy and, therefore, the illusion

  • f depth is incomplete.
slide-27
SLIDE 27

Audio and Video Communication, Fernando Pereira, 2014/2015

Free Viewpoint Systems

Free viewpoint systems require the acquisition of multiple scene views taken from different angles, allowing the user to navigate around the scene.

slide-28
SLIDE 28

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video Applications …

The complete 3D video system is relevant for multiple applications such as broadcast TV, teleconference, surveillance, interactive video, cinema, gaming and other immersive video applications.

3D Home Master 3D Encoding & Video Compression 3D Video Distribution Channels Media Players & Set Top Boxes

Video Decompress 3D Format Decode

3D TV Left Eye Right Eye

Video Compress 3D Format Encode

Blu-ray Disc DVD Cable TV Satellite TV Terrestrial TV IPTV Internet

3D Home Package

slide-29
SLIDE 29

Audio and Video Communication, Fernando Pereira, 2014/2015

Main 3D Video Application Areas

3D cinema

  • Technology: stereoscopic 3D, glasses based
  • Good stereo 3D viewing
  • Decent number of 3D productions

3D mobile

  • Technology: auto-stereoscopic 2 view display with fixed viewing position
  • Good 3D viewing despite of small display sizes

3D home entertainment (3DTV)

  • Technology: Different types of displays available: stereoscopic, auto-stereoscopic

with 2 … N views

  • Various technologies, input formats and display sizes
  • Glasses based systems may not be acceptable
slide-30
SLIDE 30

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video Content Chain …

  • The 3D content chain includes a sequence of modules which closely mirror a

conventional 2D system but are quite different; they have all to evolve towards 3D regarding the 2D available solutions.

  • 3D content creation involves special production “rules”, e.g. avoid fast pans and manage

depth transitions.

  • Content representation, distribution and display may be performed with many different

formats; the best choice depends on distribution constraints, display capabilities, available equipment, target quality, etc.

  • New 3D display technology is an important driving force: no glasses, multi-persons

displays, higher display resolutions, avoid uneasy feelings (headaches, nausea, eye strain, etc.).

Content acquisition and creation Content Representation Content Distribution Content Consumption

slide-31
SLIDE 31

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video Content Acquisition and Creation

slide-32
SLIDE 32

Audio and Video Communication, Fernando Pereira, 2014/2015

History of 3D Video …

slide-33
SLIDE 33

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Content is Exploding … Again …

  • 165 3D movies released since 1953
  • Almost 30 3D movies only in 1953
  • Much more to come …
slide-34
SLIDE 34

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Momentum …

  • Hollywood is now able to offer

unique, high-quality immersive 3D experiences in theaters

  • Revenue per 3D screen is typically

three times higher than traditional 2D screens

  • Increased momentum in 3D

production and growing consumer appetite for 3D content

Avatar cost was around $500 million !!! Box office in Jan 2011 was $2,781,835,502 … Naturally, the sequel is coming !

slide-35
SLIDE 35

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Content Acquisition Modes

3D content production methods can be classified into three categories, namely:

  • Direct acquisition by stereo or multiview cameras - Precise calibration and

temporal synchronization of the cameras is very important for capturing high- quality multiview video.

  • Active depth sensing - Comprise time-of-flight (ToF) sensors and methods based
  • n structured light such as Microsoft's Kinect. ToF sensors estimate the depth, this

means the distance between the sensor and an object by extracting phase information from received light pulses. The structured-light approach usually recovers 3D shape from monocular images using a projector to illuminate objects with special patterns.

  • 2D-to-3D conversion – Existing 2D content can be converted to 3D video by

considering several depth cues such as motion parallax, vanishing points/lines, or camera motion in a structure-from-motion framework.

slide-36
SLIDE 36

Audio and Video Communication, Fernando Pereira, 2014/2015

Stereo Cameras …

  • A stereo camera is a type of camera with two lenses with a separate image

sensor or film frame for each lens. This allows simulating human binocular vision, and gives it the ability to capture 3D images, a process known as stereo

  • photography. Stereo cameras may be used for making stereoviews and 3D

pictures for movies.

  • The distance between the lenses in a typical stereo camera (the intra-axial

distance) is about the distance between one's eyes (known as the intra-ocular distance); this is about 6.35 cm, although a longer baseline (greater inter- camera distance) produces more extreme 3-dimensionality.

slide-37
SLIDE 37

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video Content Representation

slide-38
SLIDE 38

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video Formats/Standards …

  • Many formats are closely coupled to specific 3D display types and application

scenarios.

  • A universal, flexible, generic, scalable, backward compatible 3D video

format/standard would be highly desirable to support any 3D video application in an efficient way, while decoupling content creation from display and application.

  • Some experts expected 3D television to follow much the same trajectory as

HDTV did earlier this decade: a slow start, then a rapid ascent in sales once enough content exists to attract mainstream buyers. But this is clearly not happening !

slide-39
SLIDE 39

Audio and Video Communication, Fernando Pereira, 2014/2015

Stereo and Multiview Video Data

  • Redundancy reduction between

camera views

  • Need to cope with

color/illumination mismatch problems

  • Alignment may not always be

perfect either

slide-40
SLIDE 40

Audio and Video Communication, Fernando Pereira, 2014/2015

Main Multiview Video Format Requirements

  • HIGH COMPRESSION EFFICIENCY - Significant compression gains compared to the

independent compression of each view, so-called simulcasting.

  • VIEW-SWITCHING RANDOM ACCESS - Any image can be accessed, decoded and

displayed by starting the decoder at a random access point and decoding a relatively small quantity of data on which that image may depend.

  • SCALABILITY – A decoder is able to generate effective video output – although

reduced in quality to a degree commensurate with the quantity of data in the subset used for the decoding process – although accessing only a portion of a bitstream.

  • VIEW SCALABILITY – Only a portion of the bitstream has to be accessed to output a

limited number (subset) of the set of encoded views.

  • BACKWARD COMPATIBILITY - A subset of the bitstream corresponding to one

‘base view’ is decodable by an ordinary H.264/AVC decoder.

  • QUALITY CONSISTENCY AMONG VIEWS - It should be possible to control the

encoding quality of the various views.

slide-41
SLIDE 41

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video Formats: the Menu …

Texture

  • Multiview Simulcasting
  • Frame Compatible Stereo
  • Conventional Stereo Video
  • Multiview Video, MVC and MV-HEVC standards

Texture + Depth

  • 2D (Texture)+Depth, MPEG-C standard
  • Multiview+Depth (MVD), 3D-HEVC standard
slide-42
SLIDE 42

Audio and Video Communication, Fernando Pereira, 2014/2015

Redundancies in 3D Video

slide-43
SLIDE 43

Audio and Video Communication, Fernando Pereira, 2014/2015

The Texture Approach

slide-44
SLIDE 44

Audio and Video Communication, Fernando Pereira, 2014/2015

Multiview Simulcasting

  • Multiview simulcasting refers to the independent encoding of each view (ignoring

they are like ‘brothers’ due to the interview redundancy).

  • May use any coding technology, e.g. MPEG-2 Video, but an advanced codec such

as H.264/AVC is more likely.

  • This solution has been largely used in may countries, e. g. to broadcast the 2010

World Cup games.

slide-45
SLIDE 45

Audio and Video Communication, Fernando Pereira, 2014/2015

Frame Compatible Stereo Formats

  • Frame compatible formats refer

to a class of formats in which the stereo signal is essentially a multiplex of the two views into a single frame or sequence of frames to be coded with 2D video coding solutions. They are also called stereo interleaving or spatial/temporal multiplexing formats.

  • The signaling for a complete set of

frame-compatible formats has been standardized in H.264/AVC as supplemental enhancement information (SEI) messages.

  • Embraced by broadcasters for

initial phase of 3D video services.

slide-46
SLIDE 46

Audio and Video Communication, Fernando Pereira, 2014/2015

Frame Compatible Stereo Formats Examples

  • Basic concept: pack pixels from left and right views into a single frame to be coded

‘as usual’:

  • Spatial Multiplexing: side-by-side, top-bottom, checkerboard formats
  • Time Multiplexing: views interleaved as alternating frames or fields
  • In such a format, half of the coded samples represent the left view and the other

half represent the right view; thus, each coded view has half the resolution of the full coded frame.

Left Right Left Right time

Left Right

slide-47
SLIDE 47

Audio and Video Communication, Fernando Pereira, 2014/2015

Frame Compatible Stereo: Spatial Multiplexing

Side-by-Side Checkerboard Line Interleave Column Interleave Top & Bottom

Left Eye Right Eye Left Eye Right Eye

Reduced picture resolution !

slide-48
SLIDE 48

Audio and Video Communication, Fernando Pereira, 2014/2015

Frame Compatible Stereo: Temporal Multiplexing

2D

Time

3D Frame Sequential

RE LE RE LE RE LE RE LE RE LE

Left Eye Right Eye Left Eye Right Eye

Provides full resolution quality but requires increased bandwidth and storage!

slide-49
SLIDE 49

Audio and Video Communication, Fernando Pereira, 2014/2015

Conventional Stereo Format

  • Conventional stereo refers to the case where two full resolution stereo views

are coded exploiting their interview redundancy.

  • MPEG-2 Video, MPEG-4 Visual and the MVC standards offer full stereo

coding solutions with increased compression efficiency.

Combined temporal and interview prediction

slide-50
SLIDE 50

Audio and Video Communication, Fernando Pereira, 2014/2015

Multiview Video Coding Format

Multiview video (MVV) refers to a set of N temporally synchronized video streams coming from cameras capturing the same real scenery from different viewpoints.

  • Provides the ability to change viewpoint freely with multiple views available
  • Renders one view (real or virtual) to legacy 2D display
  • Most important case is stereo video (N = 2), generating a depth impression with each view

derived for projection into one eye

slide-51
SLIDE 51

Audio and Video Communication, Fernando Pereira, 2014/2015

MPEG-2 Multiview Profile

  • MPEG-2 design leveraged temporal

scalability for coding second view; only two views are supported.

  • Reference picture could be either a

picture from the base view or from within the enhancement view.

  • Main benefits
  • Uses existing block level coding tools and syntax
  • Enables inter-view prediction for first enhancement-view picture in each random-

accessible encoded video segment

  • Drawback
  • Prediction in the reverse-temporal direction not enabled for the enhancement view,

which minimizes the memory storage, but reduces compression efficiency

slide-52
SLIDE 52

Audio and Video Communication, Fernando Pereira, 2014/2015

Multiview Video Coding (MVC) Standard

  • MVC is a H.264/AVC extension without any

changes of the slice layer syntax and below and

  • f the decoding process.
  • Provides coding of multiple views, stereo to

multiview.

  • Exploits redundancy between views using

inter-camera prediction to reduce the required bitrate.

  • It is mandatory for the multiview stream to

include a base view, which is independently coded from other non-base views.

  • For similar PSNR, the MVC coding gains are:
  • For stereo video, the rate of the dependent

view is reduced around 30%

  • For multiview, rate savings over all views are

about 25%

slide-53
SLIDE 53

Audio and Video Communication, Fernando Pereira, 2014/2015

Interview Prediction: Basics

Many prediction structures are possible to exploit interview redundancy, trading-off differently memory, delay, computation and coding efficiency.

View

  • Pictures in the non-base views are not only predicted from temporal references

(in the same view), but also from interview references (in the other views).

  • Limitations: i) inter-view prediction only from same time instance; ii) cannot

exceed maximum number of stored reference pictures.

  • The prediction is adaptive, so the best predictor among temporal and interview

references can be selected on a block basis in terms of RD cost.

slide-54
SLIDE 54

Audio and Video Communication, Fernando Pereira, 2014/2015

MVC Prediction Structures

  • View-progressive encoding – View dependencies are exploited only for the

first frame of each GOP

  • Fully hierarchical encoding – Bidirectional predictions are allowed both in

the time and view dimensions

slide-55
SLIDE 55

Audio and Video Communication, Fernando Pereira, 2014/2015

Disparity-Compensated Prediction

  • Use previously

decoded pictures in neighbor views as additional reference pictures

  • Only construction of

reference picture lists is modified from H.264/AVC

slide-56
SLIDE 56

Audio and Video Communication, Fernando Pereira, 2014/2015

MVC: Technical Solution

The core macroblock-level and lower-level decoding modules of an MVC decoder are the same, regardless of whether a reference picture is a temporal or an interview

  • reference. This distinction is managed at a higher level of the decoding process.
  • Key elements of the MVC design
  • Does not require any changes to lower-level syntax, so it is very compatible with single-

layer H.264/AVC hardware;

  • Base layer required and easily extracted from video bitstream (identified by NAL unit

type)

  • Several additions to the high-level syntax, which are primarily signaled through a

multiview extension of the sequence parameter set (SPS) defined by H.264/AVC.

  • Three important pieces of information are carried in the SPS extension: i) view

identification; ii) view dependency information; and iii) level index for operation points.

  • Inter-view prediction
  • Enabled through flexible reference picture management; allow decoded pictures from other

views to be inserted and removed from reference picture buffer

  • Core decoding modules do not need to be aware of whether reference picture is a time

reference or multiview reference

slide-57
SLIDE 57

Audio and Video Communication, Fernando Pereira, 2014/2015

MVC: Profiles and Levels

There are two MVC profiles with support for more than

  • ne view, both based on the

H.264/AVC High profile:

  • Multiview High profile

supports multiple views and does not support interlaced coding tools.

  • Stereo High profile is

limited to two views, but does support interlaced coding tools. Levels impose constraints on the MVC bitstreams to establish bounds on the necessary decoder resources and complexity. The level constraints include limits on the amount of frame memory required for decoding

  • f a bitstream, the maximum throughput in terms of macroblocks per second, maximum

picture size, overall bitrate, etc.

slide-58
SLIDE 58

Audio and Video Communication, Fernando Pereira, 2014/2015

MVC Compression Performance

Simulcasting versus MVC comparison

8 views (with 640×480 resolution), and considering the rate for all views ~25% bit rate savings over all views

Ballroom

31 32 33 34 35 36 37 38 39 40 200 400 600 800 1000 1200 1400 1600 1800

Bitrate (Kb/s) PSNR (db) Simulcast MVC Race1

32 33 34 35 36 37 38 39 40 41 42 200 400 600 800 1000 1200 1400 1600

Bitrate (Kb/s) PSNR (db) Simulcast MVC

slide-59
SLIDE 59

Audio and Video Communication, Fernando Pereira, 2014/2015

MVC: Subjective Stereo Performance

  • MVC achieves comparable perceptual quality to simulcasting with as little as 25%

rate for the dependent view (75% gain); this rate may have to be higher for lower rates than 12 Mbit/s for the main view.

  • For similar PSNR, the gains are only about 30% for the dependent view.
  • This experiment shows that the 2 views don’t need to have the same PSNR quality.

1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50

Original Simulcast (AVC+AVC) 12L_50Pct 12L_35Pct 12L_25Pct 12L_20Pct 12L_15Pct 12L_10Pct 12L_5Pct Mean Opinion Score

Base view fixed at 12 Mbit/s; dependent view at varying percentage of base view rate.

slide-60
SLIDE 60

Audio and Video Communication, Fernando Pereira, 2014/2015

  • Acquisition and production of video with large camera arrays is hard and

uncommon

  • MVC is more efficient than simulcast but the rate is still rather proportional to

the number of views

  • Varies with scene, camera arrangements, etc

MVC Limitations

slide-61
SLIDE 61

Audio and Video Communication, Fernando Pereira, 2014/2015

The Texture+Depth Approach

slide-62
SLIDE 62

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video Coding Challenges

  • Consider capturing technology, i.e. maximal 2-3 recorded views
  • Provide scene geometry data in general form, e.g. pixel-wise depth

data (sender-side depth provision gives producers more control on depth perception)

  • Provide constant bit rates and generic transmission format

independent of 3D display (with display-specific number of views)

  • Consider statistical properties of depth (and supplementary) data
  • Consider intermediate views for coding optimization
  • Provide high-quality view synthesis for continuous viewing range
slide-63
SLIDE 63

Audio and Video Communication, Fernando Pereira, 2014/2015

Sensing More with Depth …

  • A depth map is a ‘gray image’

containing information with the distance from the scene

  • bjects to the camera.
  • Depth maps may be obtained

by:

  • Special range cameras
  • Extraction from texture
  • Inherent to the content,

e.g. computer-generated imagery

  • Depth maps provide important

information about the scene geometry.

slide-64
SLIDE 64

Audio and Video Communication, Fernando Pereira, 2014/2015

Representing Depth …

slide-65
SLIDE 65

Audio and Video Communication, Fernando Pereira, 2014/2015

Depth Maps Properties

  • Sharp edges at object borders
  • Large areas of gradual variation in object areas
  • Edges in depth maps are correlated with edges in video pictures
slide-66
SLIDE 66

Audio and Video Communication, Fernando Pereira, 2014/2015

Texture and Depth ...

Depth-enhanced formats are suitable for generic 3D video solutions, where only one format is coded and transmitted while all necessary views for any 3D display are generated from the decoded data, e.g., by means of depth image based rendering (DIBR).

slide-67
SLIDE 67

Audio and Video Communication, Fernando Pereira, 2014/2015

2D+Depth Format

  • Includes a 2D view and the corresponding depth
  • Depth enables neighboring view generation/synthesis
  • Standardized as ISO/IEC 23002-3 “MPEG-C Part 3”
  • Advantages
  • 2D video is backward compatible with legacy devices
  • Agnostic of coding format; can utilize MPEG-2, H.264/AVC
  • Additional bandwidth to code depth could be minimal
  • Support both stereo and multiview displays
  • Drawbacks
  • Stereo signal not easily accessible and error-prone (view generation needed)
  • No provisions to handle occlusions
  • Limited depth range rendering capability
slide-68
SLIDE 68

Audio and Video Communication, Fernando Pereira, 2014/2015

Multiview Video plus Depth (MVD)

  • The MVD format encodes both the texture and the depth data for the same

number of views.

  • Coding texture and depth simultaneously is a direction currently explored in

MPEG as part of the 3D Video coding activity.

  • MVD is the reference format for MPEG 3D Video with the texture and depth

views independently encoded with MVC.

slide-69
SLIDE 69

Audio and Video Communication, Fernando Pereira, 2014/2015

  • Depth has unique signal properties relative to natural images
  • Larger homogeneous areas inside scene objects
  • Sharp transitions along object boundaries
  • Depth maps are not reconstructed for display, but rather for

view synthesis of the video data (we never see depth maps!)

  • Depth represents a shift value (disparity) for color samples from original

views

  • Coding errors in depth maps result in wrong pixel shifts in synthesized views
  • Errors especially visible (in the synthesized views) around depth discontinuities

at the borders of objects with different scene depth

  • Depth compression algorithm needs to preserve depth edges much better

than current texture coding methods such as H.264/AVC and MVC

Depth Coding vs Texture Coding

slide-70
SLIDE 70

Audio and Video Communication, Fernando Pereira, 2014/2015

Combining Coding with Synthesis

  • As the transmission rate is limited, only a small number of texture and depth

views may be coded.

  • However, an arbitrarily large number of views may need to be rendered.
  • Using depth-image-based rendering (DIBR) techniques, a continuum of views

may be synthesized based on the limited set of decoded views.

Arbitrarily Large Number

  • f Output Views

Data Format Data Format Constrained Rate (based on distribution) Limited Camera Inputs

  • Wide viewing angle
  • Large number of
  • utput views

Left Right Auto-stereoscopic N-view displays Stereoscopic displays

  • Variable stereo baseline
  • Adjust depth perception

Encoding side Decoding and synthesis side

slide-71
SLIDE 71

Audio and Video Communication, Fernando Pereira, 2014/2015

Depth-Image-Based Rendering (DIBR)

  • In general case, 3D warping is

done using projective matrices and depth info

  • When cameras are rectified,

3D warping amounts to a simple 1D shift

  • Views may be either

extrapolated or interpolated

slide-72
SLIDE 72

Audio and Video Communication, Fernando Pereira, 2014/2015

Trading-off Bitrate with 3D Rendering Capability

2D 2D 2D+Depth 2D+Depth MVC MVC Simulcast Simulcast

3D Rendering Capability Bit Rate

Few 2D+Depth

3DV coding should be compatible with:

  • existing standards
  • mono and stereo devices
  • existing or planned infrastructure

More for less !

slide-73
SLIDE 73

Audio and Video Communication, Fernando Pereira, 2014/2015

MPEG 3D Video Framework

Depth Estimation Video/Depth Codec View Synthesis Limited Video Inputs (e.g., 2 or 3 views) Larger # Output Views

1010001010001

Binary Representation & Reconstruction Process

+

slide-74
SLIDE 74

Audio and Video Communication, Fernando Pereira, 2014/2015

HEVC 3D Related Extensions

  • MV-HEVC - Simple stereo/multiview extension, potentially including

encoding of depth maps as additional color plane (early 2014)

  • 3D-HEVC - More efficient video-plus-depth coding (2014/15+)
  • Scalable stereo/multiview
  • Combined coding of video and depth
  • Closer integration with view synthesis to save data rate by irrelevance

criteria, particularly for larger view ranges which are costly in terms of data rate

slide-75
SLIDE 75

Audio and Video Communication, Fernando Pereira, 2014/2015

MV-HEVC Approach

slide-76
SLIDE 76

Audio and Video Communication, Fernando Pereira, 2014/2015

3D-HEVC Coding Framework

  • K. Muller, Fraunhofer HHI
slide-77
SLIDE 77

Audio and Video Communication, Fernando Pereira, 2014/2015

3D-HEVC Approach

Format scalable approach as sub-bitstreams representing a subset of the video views (with or without associated depth data) can be extracted by discarding NAL units from the 3D bitstream and be independently decoded.

  • K. Muller, Fraunhofer HHI
slide-78
SLIDE 78

Audio and Video Communication, Fernando Pereira, 2014/2015

Coding of Texture Views

Coding of independent view:

  • Unmodified HEVC

Coding of dependent views: Inter-view correlations are exploited by prediction-based coding tools:

  • Disparity-compensated prediction
  • View synthesis prediction
  • Depth-based block partitioning
  • Inter-view prediction of motion parameters
  • Inter-view prediction of residual data
slide-79
SLIDE 79

Audio and Video Communication, Fernando Pereira, 2014/2015

Coding of Depth Maps

Coding of depth or disparity maps

  • Inter-view and additionally inter-component correlations are exploited by

prediction-based coding Tools:

  • Disparity-compensated prediction for

dependent view

  • Depth modelling modes
  • Segment-wise DC prediction
  • Motion parameter inheritance
  • Quadtree prediction
  • Synthesized view distortion
  • ptimization

Inter-view correlation Inter-component correlation

slide-80
SLIDE 80

Audio and Video Communication, Fernando Pereira, 2014/2015

Depth-based View Synthesis

  • The computation of SVDC requires including rendering in the encoding process;

since complexity is a critical factor, a simplified rendering method is used.

  • After decoding the 3D video content, a decoder-side synthesis algorithm generates

the required number of dense views for a particular multiview display.

  • Since the proposed 3D video codec produces a view- and component-scalable

bitstream, two main synthesis approaches can be applied:

  • View synthesis from a video-only decoded bitstream – only operates on the decoded video

data (depth may be generate from disparities)

  • View synthesis from a full MVD decoded bitstream - based on classical depth image based

rendering (DIBR) solutions

slide-81
SLIDE 81

Audio and Video Communication, Fernando Pereira, 2014/2015

Synthesized Views Quality Assessment

How to measure the quality of the ‘synthetic’ views for which no ‘real’ references exist ? A common solution is to compute a PSNR comparing the decoded synthesized views with the synthesized views from original uncoded video and depth data. Subjective testing is also largely used by MPEG …

+

slide-82
SLIDE 82

Audio and Video Communication, Fernando Pereira, 2014/2015

Conclusion

slide-83
SLIDE 83

Audio and Video Communication, Fernando Pereira, 2014/2015

The Right Balance: Science or Art ?

For some given available resources, e.g. in terms of bandwidth or memory, it is critical to find the right balance between

  • Number of views
  • Spatial resolution
  • Temporal resolution
  • Dynamic range
  • Colour subsampling
  • ...

to provide the best visual user experience … But this is expected to be content and display dependent …

slide-84
SLIDE 84

Audio and Video Communication, Fernando Pereira, 2014/2015

The Standardization Path …

JPEG JPEG-LS JPEG 2000 MJPEG 2000 JPEG XR AIC ? H.261 H.263 H.264/AVC/SVC/MVC MPEG-1 Video H.262/MPEG-2 Video MPEG-4 Visual HEVC RVC MV-HEVC 3D-HEVC SHVC

slide-85
SLIDE 85

Audio and Video Communication, Fernando Pereira, 2014/2015

slide-86
SLIDE 86

Audio and Video Communication, Fernando Pereira, 2014/2015

3D Video: Success or Not so Much ?

slide-87
SLIDE 87

Audio and Video Communication, Fernando Pereira, 2014/2015

Video Coding Standards: a Summary

Standard Year Main Applications Profiles Main Bitrates Frame Types Ref. Frames Transf

  • rm

Number Motion Vectors (if any) Motion Vectors Precision Entropy Coding Deblocking Filter

H.261 1988 Videotelephony and videoconference No p× 64 kbit/s

  • 1

DCT 1 per MB Integer pel Huffman based In loop MPEG-1 Video 1991 Digital storage in CD- ROM No Around 1- 1.2 Mbit/s I, P, B , and D 0-2 DCT 1 or 2 per MB (P and B) Half pel Huffman based Out of the loop H.262/MPEG- 2 Video 1994 Digital TV and DVD Yes, most used is Main Profile From 2 to 10 Mbit/s I, P and B 0-2 DCT 1 or 2 per MB (2 to 4 for interlaced video ) Half pel Huffman based Out of the loop H.263 1995 Videotelephony and videoconference and more Only in extensions From very low rates to around 1 Mbit/s I, P and B 0-2 DCT 1 or 2 per MB (4 in the optional modes) Half pel Huffman based Out of the loop MPEG-4 Visual 1998 Large range with

  • bjects

Yes, most used are Simple and Advanced Simple Very large range using levels I, P and B 0-2 DCT 1 or 2 per MB (4 in the optional modes); also global motion vectors 1/4 pel Huffman based; arithmetic coding for the shape Out of the loop H.264/AVC 2004 Large range, from mobile to Blu-ray Yes, most used are Baseline, Main and High Very large range using levels I, P, generalize d B, SP and SI Up to 16 Integer DCT 1 to 16 per MB (P slices) and 1to 32 (B slices) 1/4 pel CAVLC and CABAC In loop SVC 2007 Robust delivery, graceful deletion, broadcasting, Yes Very large range using layers I, P and generalize d B, Up to 16 Integer DCT 1 to 16 per MB (?) 1/4 pel CAVLC and CABAC In loop MVC 2009 Stereo TV, Free viewpoint TV Yes Very large range using levels I, P, B, Up to 16 Integer DCT 1 to 16 per MB (?) 1/4 pel CAVLC and CABAC In loop

slide-88
SLIDE 88

Audio and Video Communication, Fernando Pereira, 2014/2015

Bibliography

  • A. Smolic et. al., Coding Algorithms for 3DTV—A Survey, IEEE

Transactions on Circuits and Systems for Video Technology, November 2007

  • A. Vetro, T. Wiegand, G. Sullivan, Overview of the Stereo and Multiview

Video Coding Extensions of the H.264/AVC Standard, Proceedings of the IEEE, April 2011

  • A. Vetro, A. Tourapis, K. Muller, T. Chen, 3D-TV Content Storage and

Transmission, IEEE Transactions on Broadcasting, June 2011

  • N. S. Holliman, N. A. Dodgson, G. E. Favalora, and L. Pockett, Three-

Dimensional Displays: A Review and Applications Analysis, IEEE Transactions on Broadcasting, June 2011

  • K. Müller, 3D High-Efficiency Video Coding for Multi-View Video and

Depth Data, IEEE Transactions on Image Processing, Sep. 2013

  • S. Winkler, D. Min, Stereo/multiview Picture Quality: Overview and

Recent Advances, Image Communication, 2013