Audio and Video Communication, Fernando Pereira, 2014/2015
3D VIDEO SYSTEMS
Fernando Pereira
3D VIDEO SYSTEMS Fernando Pereira Audio and Video Communication, - - PowerPoint PPT Presentation
3D VIDEO SYSTEMS Fernando Pereira Audio and Video Communication, Fernando Pereira, 2014/2015 Its a 3D World ! Audio and Video Communication, Fernando Pereira, 2014/2015 Context and Motivation Strong interest in 3D services Increasing
Audio and Video Communication, Fernando Pereira, 2014/2015
Fernando Pereira
Audio and Video Communication, Fernando Pereira, 2014/2015
It’s a 3D World !
Audio and Video Communication, Fernando Pereira, 2014/2015
Context and Motivation
and sports
available to the consumer including mobile
cinema theaters with 3D capabilities
distribution, digital interfaces
Audio and Video Communication, Fernando Pereira, 2014/2015
Stereoscopic Displays Sales Forecast
Source: DisplaySearch 3D Display Technology and Market Forecast Report
Audio and Video Communication, Fernando Pereira, 2014/2015
Critical Success Factors
transition costs or turned off by viewing discomfort or fatigue
technology, e.g., glasses vs no glasses
interoperability through the delivery chain and taking into consideration the constraints imposed by each delivery channel
Audio and Video Communication, Fernando Pereira, 2014/2015
Audio and Video Communication, Fernando Pereira, 2014/2015
The Human Eye
Rod and cone cells in the retina allow conscious light perception and vision including color differentiation and the perception of depth.
The crystalline lens changes/focus for the light to strike the retina
Audio and Video Communication, Fernando Pereira, 2014/2015
Human Visual System
Audio and Video Communication, Fernando Pereira, 2014/2015
Stereoscopic Vision
the variation of the crystalline lens shape and thickness (and thus its focal length), to allow the eye to focus on an object as its distance varies to maintain a clear image or focus.
muscular rotation of the eye balls, which is used to converge both eyes on the same object.
distance will automatically cause vergence and accommodation, sometimes known as the accommodation-convergence reflex.
(focus) and converge (point) to the depth of the object.
Audio and Video Communication, Fernando Pereira, 2014/2015
Retinal Disparity
distance and angle.
angle from which each eye views an object.
Audio and Video Communication, Fernando Pereira, 2014/2015
Positive and Negative Disparities
accommodation and vergence occur at the same depth.
Stereoscopic Display Location of “3D” Image
Negative Disparity Positive Disparity
Audio and Video Communication, Fernando Pereira, 2014/2015
Depth Perception Control
parameters: retinal disparity (controlled by vergence) and focus (controlled by accommodation).
focus the eyes on the screen surface, even if objects are positioned in 3D in front or behind the screen surface.
by presenting slightly different images to the left and right eyes (disparity).
Audio and Video Communication, Fernando Pereira, 2014/2015
Accomodation-Vergence Conflict
distance and, therefore, are consistent with one another.
distances because the vergence distance varies depending on the image contents while the focal distance remains constant (in the screen).
distortions and visual fatigue.
Audio and Video Communication, Fernando Pereira, 2014/2015
Depth Perception: the Comfort Zone
allowing proper stereo vision and depth perception. In content production, the admissible disparity range is called comfort zone.
this comfort zone by carefully modifying the stereo camera baseline and convergence settings.
Audio and Video Communication, Fernando Pereira, 2014/2015
Camera Baseline
eyes, which is around 65 mm.
a longer baseline when shooting distant subjects such as the moon or
vary the baseline between the views to adapt to different viewing distances.
Audio and Video Communication, Fernando Pereira, 2014/2015
Screen Disparity …
Screen Disparity = distance between Pleft and Pright
Display Screen
Pleft Pleft Pright Pright
Left Eye Right Eye Object with Positive Disparity Object with Negative Disparity
Audio and Video Communication, Fernando Pereira, 2014/2015
Disparity, Depth & Perceived Depth
cameras (baseline)
Z b f d
Perceived depth depends on ds on
(function of disparity)
) ' ( ' d b D b Z
e e
Audio and Video Communication, Fernando Pereira, 2014/2015
Disparity, Depth & Perceived Depth
The left and right views are shot with some disparity ! The left and right views are consumed with some depth depending on the consumption conditions !
Audio and Video Communication, Fernando Pereira, 2014/2015
Influence of Viewing Distance and Age
directly proportional to viewer distance
is ~84% of an adult
closer objects look closer, far objects look farther
Audio and Video Communication, Fernando Pereira, 2014/2015
Depth Cues: Monocular and Binocular
available in 2D projections; this is why images make sense on a TV/cinema screen.
muscles, and visual cues from the scene content itself. They can also be classified into monocular and binocular cues.
changes optical power to maintain a clear image (focus) on an object as its distance changes.
Audio and Video Communication, Fernando Pereira, 2014/2015
Main Monocular Depth Cues
Audio and Video Communication, Fernando Pereira, 2014/2015
Main Binocular Depth Cues
Some main cues are missing from 2D media:
image with each eye, thus different aspects of the same object
perspective images when we move our heads; nearby objects appear to move faster across the view
eye balls, which is used to converge both eyes on the same object
Audio and Video Communication, Fernando Pereira, 2014/2015
Range of Effectiveness of Depth Cues
importance depends on the viewing distance, among other factors
where as others are distance-dependent, such as disparity or vergence
Audio and Video Communication, Fernando Pereira, 2014/2015
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Video Experiences …
Depth perception in stereoscopic displays – Effect provided through stereo video
pairs, targeting the left and right eyes, allowing the perception of depth using stereo parallax
Depth perception in auto-stereoscopic displays – Effect provided through n video
views, targeting the left and right eyes in multiple positions, allowing the perception of depth using stereo and motion parallaxes
Navigation – Effect provided through n video views, allowing navigating the 3D scene
by changing the viewpoint and view direction within certain ranges; the viewer may experience a look around effect as well as depth perception
Audio and Video Communication, Fernando Pereira, 2014/2015
Early Stereoscopy
Stereoscopy regards the capability of recreating 3D visual information or creating the illusion of depth in an image based on two appropriate views. These two slightly different images are presented to each eye. Both of these 2D
The motion parallax cue is not satisfied with stereoscopy and, therefore, the illusion
Audio and Video Communication, Fernando Pereira, 2014/2015
Free Viewpoint Systems
Free viewpoint systems require the acquisition of multiple scene views taken from different angles, allowing the user to navigate around the scene.
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Video Applications …
The complete 3D video system is relevant for multiple applications such as broadcast TV, teleconference, surveillance, interactive video, cinema, gaming and other immersive video applications.
3D Home Master 3D Encoding & Video Compression 3D Video Distribution Channels Media Players & Set Top Boxes
Video Decompress 3D Format Decode
3D TV Left Eye Right Eye
Video Compress 3D Format Encode
Blu-ray Disc DVD Cable TV Satellite TV Terrestrial TV IPTV Internet
3D Home Package
Audio and Video Communication, Fernando Pereira, 2014/2015
Main 3D Video Application Areas
3D cinema
3D mobile
3D home entertainment (3DTV)
with 2 … N views
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Video Content Chain …
conventional 2D system but are quite different; they have all to evolve towards 3D regarding the 2D available solutions.
depth transitions.
formats; the best choice depends on distribution constraints, display capabilities, available equipment, target quality, etc.
displays, higher display resolutions, avoid uneasy feelings (headaches, nausea, eye strain, etc.).
Content acquisition and creation Content Representation Content Distribution Content Consumption
Audio and Video Communication, Fernando Pereira, 2014/2015
Audio and Video Communication, Fernando Pereira, 2014/2015
History of 3D Video …
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Content is Exploding … Again …
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Momentum …
unique, high-quality immersive 3D experiences in theaters
three times higher than traditional 2D screens
production and growing consumer appetite for 3D content
Avatar cost was around $500 million !!! Box office in Jan 2011 was $2,781,835,502 … Naturally, the sequel is coming !
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Content Acquisition Modes
3D content production methods can be classified into three categories, namely:
temporal synchronization of the cameras is very important for capturing high- quality multiview video.
means the distance between the sensor and an object by extracting phase information from received light pulses. The structured-light approach usually recovers 3D shape from monocular images using a projector to illuminate objects with special patterns.
considering several depth cues such as motion parallax, vanishing points/lines, or camera motion in a structure-from-motion framework.
Audio and Video Communication, Fernando Pereira, 2014/2015
Stereo Cameras …
sensor or film frame for each lens. This allows simulating human binocular vision, and gives it the ability to capture 3D images, a process known as stereo
pictures for movies.
distance) is about the distance between one's eyes (known as the intra-ocular distance); this is about 6.35 cm, although a longer baseline (greater inter- camera distance) produces more extreme 3-dimensionality.
Audio and Video Communication, Fernando Pereira, 2014/2015
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Video Formats/Standards …
scenarios.
format/standard would be highly desirable to support any 3D video application in an efficient way, while decoupling content creation from display and application.
HDTV did earlier this decade: a slow start, then a rapid ascent in sales once enough content exists to attract mainstream buyers. But this is clearly not happening !
Audio and Video Communication, Fernando Pereira, 2014/2015
Stereo and Multiview Video Data
camera views
color/illumination mismatch problems
perfect either
Audio and Video Communication, Fernando Pereira, 2014/2015
Main Multiview Video Format Requirements
independent compression of each view, so-called simulcasting.
displayed by starting the decoder at a random access point and decoding a relatively small quantity of data on which that image may depend.
reduced in quality to a degree commensurate with the quantity of data in the subset used for the decoding process – although accessing only a portion of a bitstream.
limited number (subset) of the set of encoded views.
‘base view’ is decodable by an ordinary H.264/AVC decoder.
encoding quality of the various views.
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Video Formats: the Menu …
Texture
Texture + Depth
Audio and Video Communication, Fernando Pereira, 2014/2015
Redundancies in 3D Video
Audio and Video Communication, Fernando Pereira, 2014/2015
Audio and Video Communication, Fernando Pereira, 2014/2015
Multiview Simulcasting
they are like ‘brothers’ due to the interview redundancy).
as H.264/AVC is more likely.
World Cup games.
Audio and Video Communication, Fernando Pereira, 2014/2015
Frame Compatible Stereo Formats
to a class of formats in which the stereo signal is essentially a multiplex of the two views into a single frame or sequence of frames to be coded with 2D video coding solutions. They are also called stereo interleaving or spatial/temporal multiplexing formats.
frame-compatible formats has been standardized in H.264/AVC as supplemental enhancement information (SEI) messages.
initial phase of 3D video services.
Audio and Video Communication, Fernando Pereira, 2014/2015
Frame Compatible Stereo Formats Examples
‘as usual’:
half represent the right view; thus, each coded view has half the resolution of the full coded frame.
Left Right Left Right time
Left Right
Audio and Video Communication, Fernando Pereira, 2014/2015
Frame Compatible Stereo: Spatial Multiplexing
Side-by-Side Checkerboard Line Interleave Column Interleave Top & Bottom
Left Eye Right Eye Left Eye Right Eye
Reduced picture resolution !
Audio and Video Communication, Fernando Pereira, 2014/2015
Frame Compatible Stereo: Temporal Multiplexing
2D
Time
3D Frame Sequential
RE LE RE LE RE LE RE LE RE LE
Left Eye Right Eye Left Eye Right Eye
Provides full resolution quality but requires increased bandwidth and storage!
Audio and Video Communication, Fernando Pereira, 2014/2015
Conventional Stereo Format
are coded exploiting their interview redundancy.
coding solutions with increased compression efficiency.
Combined temporal and interview prediction
Audio and Video Communication, Fernando Pereira, 2014/2015
Multiview Video Coding Format
Multiview video (MVV) refers to a set of N temporally synchronized video streams coming from cameras capturing the same real scenery from different viewpoints.
derived for projection into one eye
Audio and Video Communication, Fernando Pereira, 2014/2015
MPEG-2 Multiview Profile
scalability for coding second view; only two views are supported.
picture from the base view or from within the enhancement view.
accessible encoded video segment
which minimizes the memory storage, but reduces compression efficiency
Audio and Video Communication, Fernando Pereira, 2014/2015
Multiview Video Coding (MVC) Standard
changes of the slice layer syntax and below and
multiview.
inter-camera prediction to reduce the required bitrate.
include a base view, which is independently coded from other non-base views.
view is reduced around 30%
about 25%
Audio and Video Communication, Fernando Pereira, 2014/2015
Interview Prediction: Basics
Many prediction structures are possible to exploit interview redundancy, trading-off differently memory, delay, computation and coding efficiency.
View
(in the same view), but also from interview references (in the other views).
exceed maximum number of stored reference pictures.
references can be selected on a block basis in terms of RD cost.
Audio and Video Communication, Fernando Pereira, 2014/2015
MVC Prediction Structures
first frame of each GOP
the time and view dimensions
Audio and Video Communication, Fernando Pereira, 2014/2015
Disparity-Compensated Prediction
decoded pictures in neighbor views as additional reference pictures
reference picture lists is modified from H.264/AVC
Audio and Video Communication, Fernando Pereira, 2014/2015
MVC: Technical Solution
The core macroblock-level and lower-level decoding modules of an MVC decoder are the same, regardless of whether a reference picture is a temporal or an interview
layer H.264/AVC hardware;
type)
multiview extension of the sequence parameter set (SPS) defined by H.264/AVC.
identification; ii) view dependency information; and iii) level index for operation points.
views to be inserted and removed from reference picture buffer
reference or multiview reference
Audio and Video Communication, Fernando Pereira, 2014/2015
MVC: Profiles and Levels
There are two MVC profiles with support for more than
H.264/AVC High profile:
supports multiple views and does not support interlaced coding tools.
limited to two views, but does support interlaced coding tools. Levels impose constraints on the MVC bitstreams to establish bounds on the necessary decoder resources and complexity. The level constraints include limits on the amount of frame memory required for decoding
picture size, overall bitrate, etc.
Audio and Video Communication, Fernando Pereira, 2014/2015
MVC Compression Performance
Simulcasting versus MVC comparison
8 views (with 640×480 resolution), and considering the rate for all views ~25% bit rate savings over all views
Ballroom
31 32 33 34 35 36 37 38 39 40 200 400 600 800 1000 1200 1400 1600 1800Bitrate (Kb/s) PSNR (db) Simulcast MVC Race1
32 33 34 35 36 37 38 39 40 41 42 200 400 600 800 1000 1200 1400 1600Bitrate (Kb/s) PSNR (db) Simulcast MVC
Audio and Video Communication, Fernando Pereira, 2014/2015
MVC: Subjective Stereo Performance
rate for the dependent view (75% gain); this rate may have to be higher for lower rates than 12 Mbit/s for the main view.
1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50
Original Simulcast (AVC+AVC) 12L_50Pct 12L_35Pct 12L_25Pct 12L_20Pct 12L_15Pct 12L_10Pct 12L_5Pct Mean Opinion Score
Base view fixed at 12 Mbit/s; dependent view at varying percentage of base view rate.
Audio and Video Communication, Fernando Pereira, 2014/2015
uncommon
the number of views
MVC Limitations
Audio and Video Communication, Fernando Pereira, 2014/2015
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Video Coding Challenges
data (sender-side depth provision gives producers more control on depth perception)
independent of 3D display (with display-specific number of views)
Audio and Video Communication, Fernando Pereira, 2014/2015
Sensing More with Depth …
containing information with the distance from the scene
by:
e.g. computer-generated imagery
information about the scene geometry.
Audio and Video Communication, Fernando Pereira, 2014/2015
Representing Depth …
Audio and Video Communication, Fernando Pereira, 2014/2015
Depth Maps Properties
Audio and Video Communication, Fernando Pereira, 2014/2015
Texture and Depth ...
Depth-enhanced formats are suitable for generic 3D video solutions, where only one format is coded and transmitted while all necessary views for any 3D display are generated from the decoded data, e.g., by means of depth image based rendering (DIBR).
Audio and Video Communication, Fernando Pereira, 2014/2015
2D+Depth Format
Audio and Video Communication, Fernando Pereira, 2014/2015
Multiview Video plus Depth (MVD)
number of views.
MPEG as part of the 3D Video coding activity.
views independently encoded with MVC.
Audio and Video Communication, Fernando Pereira, 2014/2015
view synthesis of the video data (we never see depth maps!)
views
at the borders of objects with different scene depth
than current texture coding methods such as H.264/AVC and MVC
Depth Coding vs Texture Coding
Audio and Video Communication, Fernando Pereira, 2014/2015
Combining Coding with Synthesis
views may be coded.
may be synthesized based on the limited set of decoded views.
Arbitrarily Large Number
Data Format Data Format Constrained Rate (based on distribution) Limited Camera Inputs
Left Right Auto-stereoscopic N-view displays Stereoscopic displays
Encoding side Decoding and synthesis side
Audio and Video Communication, Fernando Pereira, 2014/2015
Depth-Image-Based Rendering (DIBR)
done using projective matrices and depth info
3D warping amounts to a simple 1D shift
extrapolated or interpolated
Audio and Video Communication, Fernando Pereira, 2014/2015
Trading-off Bitrate with 3D Rendering Capability
2D 2D 2D+Depth 2D+Depth MVC MVC Simulcast Simulcast
3D Rendering Capability Bit Rate
Few 2D+Depth
3DV coding should be compatible with:
More for less !
Audio and Video Communication, Fernando Pereira, 2014/2015
MPEG 3D Video Framework
Depth Estimation Video/Depth Codec View Synthesis Limited Video Inputs (e.g., 2 or 3 views) Larger # Output Views
1010001010001
Binary Representation & Reconstruction Process
+
Audio and Video Communication, Fernando Pereira, 2014/2015
HEVC 3D Related Extensions
encoding of depth maps as additional color plane (early 2014)
criteria, particularly for larger view ranges which are costly in terms of data rate
Audio and Video Communication, Fernando Pereira, 2014/2015
MV-HEVC Approach
Audio and Video Communication, Fernando Pereira, 2014/2015
3D-HEVC Coding Framework
Audio and Video Communication, Fernando Pereira, 2014/2015
3D-HEVC Approach
Format scalable approach as sub-bitstreams representing a subset of the video views (with or without associated depth data) can be extracted by discarding NAL units from the 3D bitstream and be independently decoded.
Audio and Video Communication, Fernando Pereira, 2014/2015
Coding of Texture Views
Coding of independent view:
Coding of dependent views: Inter-view correlations are exploited by prediction-based coding tools:
Audio and Video Communication, Fernando Pereira, 2014/2015
Coding of Depth Maps
Coding of depth or disparity maps
prediction-based coding Tools:
dependent view
Inter-view correlation Inter-component correlation
Audio and Video Communication, Fernando Pereira, 2014/2015
Depth-based View Synthesis
since complexity is a critical factor, a simplified rendering method is used.
the required number of dense views for a particular multiview display.
bitstream, two main synthesis approaches can be applied:
data (depth may be generate from disparities)
rendering (DIBR) solutions
Audio and Video Communication, Fernando Pereira, 2014/2015
Synthesized Views Quality Assessment
How to measure the quality of the ‘synthetic’ views for which no ‘real’ references exist ? A common solution is to compute a PSNR comparing the decoded synthesized views with the synthesized views from original uncoded video and depth data. Subjective testing is also largely used by MPEG …
+
Audio and Video Communication, Fernando Pereira, 2014/2015
Audio and Video Communication, Fernando Pereira, 2014/2015
The Right Balance: Science or Art ?
For some given available resources, e.g. in terms of bandwidth or memory, it is critical to find the right balance between
to provide the best visual user experience … But this is expected to be content and display dependent …
Audio and Video Communication, Fernando Pereira, 2014/2015
The Standardization Path …
JPEG JPEG-LS JPEG 2000 MJPEG 2000 JPEG XR AIC ? H.261 H.263 H.264/AVC/SVC/MVC MPEG-1 Video H.262/MPEG-2 Video MPEG-4 Visual HEVC RVC MV-HEVC 3D-HEVC SHVC
Audio and Video Communication, Fernando Pereira, 2014/2015
Audio and Video Communication, Fernando Pereira, 2014/2015
3D Video: Success or Not so Much ?
Audio and Video Communication, Fernando Pereira, 2014/2015
Video Coding Standards: a Summary
Standard Year Main Applications Profiles Main Bitrates Frame Types Ref. Frames Transf
Number Motion Vectors (if any) Motion Vectors Precision Entropy Coding Deblocking Filter
H.261 1988 Videotelephony and videoconference No p× 64 kbit/s
DCT 1 per MB Integer pel Huffman based In loop MPEG-1 Video 1991 Digital storage in CD- ROM No Around 1- 1.2 Mbit/s I, P, B , and D 0-2 DCT 1 or 2 per MB (P and B) Half pel Huffman based Out of the loop H.262/MPEG- 2 Video 1994 Digital TV and DVD Yes, most used is Main Profile From 2 to 10 Mbit/s I, P and B 0-2 DCT 1 or 2 per MB (2 to 4 for interlaced video ) Half pel Huffman based Out of the loop H.263 1995 Videotelephony and videoconference and more Only in extensions From very low rates to around 1 Mbit/s I, P and B 0-2 DCT 1 or 2 per MB (4 in the optional modes) Half pel Huffman based Out of the loop MPEG-4 Visual 1998 Large range with
Yes, most used are Simple and Advanced Simple Very large range using levels I, P and B 0-2 DCT 1 or 2 per MB (4 in the optional modes); also global motion vectors 1/4 pel Huffman based; arithmetic coding for the shape Out of the loop H.264/AVC 2004 Large range, from mobile to Blu-ray Yes, most used are Baseline, Main and High Very large range using levels I, P, generalize d B, SP and SI Up to 16 Integer DCT 1 to 16 per MB (P slices) and 1to 32 (B slices) 1/4 pel CAVLC and CABAC In loop SVC 2007 Robust delivery, graceful deletion, broadcasting, Yes Very large range using layers I, P and generalize d B, Up to 16 Integer DCT 1 to 16 per MB (?) 1/4 pel CAVLC and CABAC In loop MVC 2009 Stereo TV, Free viewpoint TV Yes Very large range using levels I, P, B, Up to 16 Integer DCT 1 to 16 per MB (?) 1/4 pel CAVLC and CABAC In loop
Audio and Video Communication, Fernando Pereira, 2014/2015
Bibliography
Transactions on Circuits and Systems for Video Technology, November 2007
Video Coding Extensions of the H.264/AVC Standard, Proceedings of the IEEE, April 2011
Transmission, IEEE Transactions on Broadcasting, June 2011
Dimensional Displays: A Review and Applications Analysis, IEEE Transactions on Broadcasting, June 2011
Depth Data, IEEE Transactions on Image Processing, Sep. 2013
Recent Advances, Image Communication, 2013