Audiovisual Communications, Fernando Pereira, 2011
ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING
Fernando Pereira Instituto Superior Técnico
ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando - - PowerPoint PPT Presentation
ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior Tcnico Audiovisual Communications, Fernando Pereira, 2011 Video Coding in MPEG Video Coding in MPEG-4 Video Coding in MPEG Video Coding in MPEG-4
Audiovisual Communications, Fernando Pereira, 2011
Fernando Pereira Instituto Superior Técnico
Audiovisual Communications, Fernando Pereira, 2011
There are two Parts in the MPEG-4 standard dealing with video coding:
Part 2: Visual (1998) – Specifies several coding tools targeting the efficient and error resilient of video, including arbitrarily shaped video; it also includes coding of 3D faces and bodies.
Part 10: Advanced Video Coding (AVC) (2003) – Specifies more efficient (about 50%) and more resilient frame based video coding tools; this Part has been jointly developed by ISO/IEC MPEG and ITU-T through the Joint Video Team (JVT) and it is often known as H.264/AVC. Each of these 2 Parts specifies several profiles with different video coding functionalities and compression efficiency versus complexity trade-
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Coding of rectangular video with increased efficiency: about Coding of rectangular video with increased efficiency: about 50% less rate for the same quality regarding existing 50% less rate for the same quality regarding existing standards such as H.263, MPEG standards such as H.263, MPEG-2 Video and MPEG 2 Video and MPEG-4 4 Visual. Visual.
This standard (joint between ISO/IEC MPEG and ITU-T VCEG) offers also good flexibility in terms of efficiency-complexity trade-offs as well as good performance in terms of error resilience for mobile environments and fixed and wireless Internet (both progressive and interlaced formats).
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Video Coding Layer Data Partitioning Network Abstraction Layer H.320 MP4FF H.323/IP MPEG-2 etc. Control Data Coded Macroblock Coded Slice/Partition
To address this need for flexibility and customizability, the H.264/AVC design covers:
video content
conveyance by a variety of transport layers or storage media
Audiovisual Communications, Fernando Pereira, 2011
The H.264/AVC standard is based on the same hybrid coding architecture used for previous video coding standards with some important differences:
which all together allow achieving substantial gains regarding the bitrate needed to reach a certain quality level. The H.264/AVC standard addresses a vast set of applications, from personal communications to storage and broadcasting, at various qualities and resolutions.
Audiovisual Communications, Fernando Pereira, 2011
bit/sample):
several slices
× × ×16 luminance samples and 2 × × × × 8× × × ×8 chrominance samples (4:2:0 content)
0 1 2 … Slice #0 Slice #1 Slice #2 Macroblock #40 0 1 2 … Slice #0 Slice #1 Slice #2 Slice #0 Slice #1 Slice #2 Macroblock #40
Audiovisual Communications, Fernando Pereira, 2011
Allocation Map
location in raster scan order
“leftover” slice group
Slice Group #0 Slice Group #1 Slice Group #2 Slice Group #0 Slice Group #1 Slice Group #2 Slice Group #0 Slice Group #1 Slice Group #0 Slice Group #1 Slice Group #0 Slice Group #1 Slice Group #2 Slice Group #0 Slice Group #1 Slice Group #2 Slice Group #0 Slice Group #1 Slice Group #2
Audiovisual Communications, Fernando Pereira, 2011
Intra (I) Slices - A slice in which all macroblocks of the slice are coded using intra prediction:
pictures whereby the latter do not necessarily provide the random access property as pictures before the intra pictures may be used as reference for succeeding predictively coded pictures.
coded macroblocks is disallowed. The corresponding constraint intra flag is signaled in the PPS.
P Slices - In addition, some P slice macroblocks can also be coded using inter prediction with at most one motion-compensated prediction signal per prediction block.
B Slices - In addition, some B slice macroblocks can also be coded using inter prediction with two motion-compensated prediction signals per prediction block.
Audiovisual Communications, Fernando Pereira, 2011
Input Video Signal Split into Macroblocks 16x16 pixels Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.
Motion Data Intra/Inter Coder Control
Decoder
Motion Estimation Transform/ Scal./Quant.
Prediction Deblocking Filter Output Video Signal Input Video Signal Split into Macroblocks 16x16 pixels Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.
Motion Data Intra/Inter Coder Control
Decoder
Motion Estimation Transform Scal./Quant.
Prediction Deblocking Filter Output Video Signal
Audiovisual Communications, Fernando Pereira, 2011
× × × 16 luminance + 2 × × × × 8× × × ×8 chrominance samples
sub-sampling of chrominance (4:2:0, 4:2:2, 4:4:4)
Audiovisual Communications, Fernando Pereira, 2011
each MB the correlation with adjacent blocks or MBs in the same picture.
the previously coded and decoded blocks or MBs in the same picture.
being coded.
used to form the Intra prediction. This type of Intra coding may imply error propagation if the prediction uses adjacent MBs which have been Inter coded; this may be solved by using the so-called Constrained Intra Coding Mode where only adjacent Intra coded MBs are used to form the prediction.
Audiovisual Communications, Fernando Pereira, 2011
Intra predictions may be performed in several ways:
1.
Single prediction for the whole MB (Intra16× × × ×16): four modes are possible (vertical, horizontal, DC e planar) -> uniform areas !
2.
Different predictions for the 16 samples of the several 4× × × ×4 blocks in a MB (Intra4× × × ×4): nine modes (DC and 8 direccionalmodes -> areas with detail !
3.
Single prediction for the chrominance: four modes (vertical, horizontal, DC and planar)
Directional spatial prediction (9 types for luma, 1 chroma)
diagonal down/right prediction a, f, k, p are predicted by (A + 2Q + I + 2) >> 2
1 2 3 4 5 6 7 8
Q A B C D E F G H I a b c d J e f g h K i j k l L m n o p
Directional spatial prediction (9 types for luma, 1 chroma)
diagonal down/right prediction a, f, k, p are predicted by (A + 2Q + I + 2) >> 2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Q A B C D E F G H I a b c d J e f g h K i j k l L m n o p Q A B C D E F G H I a b c d J e f g h K i j k l L m n o p
Audiovisual Communications, Fernando Pereira, 2011
× × ×16 MB (Intra16× × × ×16 modes).
smooth variation.
Média de todos
vizinhos
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.
Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant.
Video Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal Motion vector accuracy 1/4 (6-tap filter) 8x8 4x8 1 1 2 3 4x4 8x4 1 8x8 Types 8x8 8x8 4x8 1 4x8 1 1 2 3 4x4 1 2 3 4x4 8x4 1 8x4 1 8x8 Types 8x8 Types 16x16 1 8x16 MB Types 8x8 1 2 3 16x8 1 16x16 16x16 1 8x16 1 8x16 MB Types MB Types 8x8 1 2 3 8x8 1 2 3 16x8 1 16x8 1
Audiovisual Communications, Fernando Pereira, 2011
describe the motion with ¼ pel accuracy.
× × ×4 to 16× × × ×16 luminance samples, with many options between the two limits.
× × ×16) may be divided in four ways - Inter16× × × ×16, Inter16× × × ×8, Inter8× × × ×16 and Inter8× × × ×8 – corresponding to the four prediction modes at MB level.
× × ×8 mode is selected, each sub-MB (with 8× × × ×8 samples) may be divided again (or not), obtaining 8× × × ×8, 8× × × ×4, 4× × × ×8 and 4× × × ×4 partitions which correspond to the four predictions modes at sub-MB level.
For example, a maximum of 16 motion vectors may be used for a P coded MB.
Audiovisual Communications, Fernando Pereira, 2011
MBs and sub MBs and sub-MBs Partitioning for Motion Compensation MBs Partitioning for Motion Compensation MBs and sub MBs and sub-MBs Partitioning for Motion Compensation MBs Partitioning for Motion Compensation
Motion vectors are differentially coded but not across slices.
Macroblocos 1 1 1 2 3
16 16 8 8 8 8 8 8 8 8 16 161 1 1 2 3
8 8 4 4 4 4 4 4 4 4 8 8Sub-macroblocos
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
The H.264/AVC standard supports motion compensation with multiple reference frames this means that more than one previously coded picture may be simultaneously used as prediction reference for the motion compensation of the MBs in a picture (at the cost of memory and computation).
multiple frames; up to 16 reference frames are allowed.
by means of memory control commands which are included in the coded bitstream.
Audiovisual Communications, Fernando Pereira, 2011
H.264/AVC Other standards
Audiovisual Communications, Fernando Pereira, 2011
The B frame concept is generalized in the H.264/AVC standard since now any frame may use as prediction reference for motion compensation also the B frames; this means the selection of the prediction frames only depends on the memory management performed by the encoder.
blocks or MBs in two reference frames, both in the past, both in the future, or one in the past and another in the future.
frames.
with a lower prediction error.
Audiovisual Communications, Fernando Pereira, 2011
the past, both in the future or one in the past and another in the future
next frames due to the availability of multiple reference frames
(using only references from the past)
notably in terms of the memory bandwidth (double fetching)
Audiovisual Communications, Fernando Pereira, 2011
I P P P P B B B B B B B B I P B B P B B B B B P B B
Known dependencies, e.g. MPEG-1 Video, MPEG-2 Video, etc. New types of dependencies:
display order are decoupled, e.g. a P frame may not use for prediction the previous P frames
picture type are decoupled, e.g. it is possible to use a B frame as reference
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Comparative Performance: Mobile & Calendar, Comparative Performance: Mobile & Calendar, CIF, 30 Hz CIF, 30 Hz Comparative Performance: Mobile & Calendar, Comparative Performance: Mobile & Calendar, CIF, 30 Hz CIF, 30 Hz
1 2 3 4 26 27 28 29 30 31 32 33 34 35 36 37 38 R [Mbit/s] PSNR Y [dB]
PBB... with generalized B pictures PBB... with classic B pictures PPP... with 5 previous references PPP... with 1 previous reference
~40%
Audiovisual Communications, Fernando Pereira, 2011
The H.264/AVC standard uses three transforms depending on the type of prediction residue to code:
× × ×4 Hadamard Transform for the luminance DC coefficients in MBs coded with the Intra 16× × × ×16 mode
× × ×2 Hadamard Transform for the chrominance DC coefficients in any MB
× × ×4 Integer Transform based on DCT for all the other blocks
Audiovisual Communications, Fernando Pereira, 2011
Chroma 4x4 block order for 4x4 residual coding, shown as 16-25, and Intra4x4 prediction, shown as 18-21 and 22-25
1 4 5 2 3 6 7 8 9 12 13 10 11 14 15
...
Luma 4x4 block order for 4x4 intra prediction and 4x4 residual coding
1 4 5 2 3 6 7 8 9 12 13 10 11 14 15
...
Intra_16x16 macroblock type
2x2 DC AC Cb Cr 16 17 18 19 20 21 22 23 24 25 2x2 DC AC Cb Cr 16 17 18 19 20 21 22 23 24 25
Integer DCT Integer DCT Hadamard Hadamard
Audiovisual Communications, Fernando Pereira, 2011
The H.264/AVC standard uses transform coding to code the prediction residue.
× × ×4 blocks using a separable transform with properties similar to a 4× × × ×4 DCT
× × ×4 Integer DCT Transform
T h x v x
4 4 4 4
− − − − − − = = 1 2 2 1 1 1 1 1 2 1 1 2 1 1 1 1 T T
h v
Audiovisual Communications, Fernando Pereira, 2011
rather substantial bitrate reduction.
factor while inverse quantization (reconstruction) corresponds to the multiplication of each coefficient by the same factor (there is a quantization error involved ...).
factor for all the transform coefficients in the MB; some changes in this respect were made later.
each MB indexed through the quantization step (Qp) using a table which defines the relation between Qp and Qstep.
approximately 12.5% in the bitrate for an increment of 1 in the quantization step value, Qstep.
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
There are two building blocks within the H.264/AVC architecture which can be a source of blocking artifacts:
(DCTs) in intra and inter frame prediction error coding. Coarse quantization
block boundaries.
Motion compensated blocks are generated by copying interpolated pixel data from different locations of possibly different reference frames. Since there is almost never a perfect fit for this data, discontinuities on the edges of the copied blocks of data typically arise. Additionally, in the copying process, existing edge discontinuities in reference frames are carried into the interior
Although the small 4×4 sample transform size used in H.264/MPEG-4 AVC somewhat reduces the problem, a deblocking filter is still an advantageous tool to maximize coding performance.
Audiovisual Communications, Fernando Pereira, 2011
There are two main approaches in integrating deblocking filters into video codecs, as post filters or as loop filters.
and thus are not normative in the standardization process. Because their use is
used as reference frames for motion compensation of subsequent coded
filtering to stay in synchronization with the encoder. Naturally, a decoder can still perform post filtering in addition to the loop filtering if found necessary in a specific application.
Audiovisual Communications, Fernando Pereira, 2011
The H.264/AVC standard specifies the use of an adaptive deblocking filter which operates at the block edges with the target to increase the final subjective and objective qualities. The filter performs simple operations to detect and analyze artifacts on coded block boundaries and attenuates those by applying a selected filter.
since the filtered blocks are after used for motion estimation (filter in the loop). This filter has a superior performance to a post-processing filter (not in the loop and thus not normative).
subjective quality.
after prediction, this means reducing the bitrate for the same target quality.
Audiovisual Communications, Fernando Pereira, 2011
the image and those created by quantization of the DCT coefficients. To preserve image sharpness, the true edges should be left unfiltered as much as possible while filtering artificial edges to reduce their visibility.
the edges of 2 blocks should only be filtered if it can be attributed to quantization;
filtered.
The filter is adaptive to the content, essentially removing the block effect without unnecessarily smoothing the image:
sequence.
coding (Intra or Inter), the motion and the coded residues.
filtering for each individual sample..
Audiovisual Communications, Fernando Pereira, 2011
blocking artifacts are mainly due to intra and prediction error coding and are to a smaller extent caused by block motion compensation.
corresponding value is assigned to Bs. For Bs = 0, no sample is filtered while for Bs = 4 the filter reduces the most the block effect.
is allocated, at the decoder, to every edge between two 4×4 luminance sample blocks to define the filter strength. The value depends on the modes and coding conditions
filtering, whereas a value of 0 means no filtering is applied on this specific edge. In the standard mode of filtering, which is applied for edges with Bs from 1 to 3, the value of Bs affects the maximum modification of the sample values.
Audiovisual Communications, Fernando Pereira, 2011
chrominance on each side of the edge may be modified by the filtering process.
three conditions all hold
are dependant on the average quantization parameter (QP) employed over the edge, as well as encoder selected offset values that can be used to control the properties of the deblocking filter at the slice level.
filtering to the general quality of the reconstructed picture prior to filtering.
p0 q0 p1 p2 q1 q2 p0 q0 p1 p2 q1 q2
Audiovisual Communications, Fernando Pereira, 2011
H.264/AVC H.264/AVC Deblocking Deblocking: Subjective Result for Intra : Subjective Result for Intra Coding at 0.28 bit/sample Coding at 0.28 bit/sample H.264/AVC H.264/AVC Deblocking Deblocking: Subjective Result for Intra : Subjective Result for Intra Coding at 0.28 bit/sample Coding at 0.28 bit/sample 1) Without filter 2) With H.264/AVC deblocking
Audiovisual Communications, Fernando Pereira, 2011
H.264/AVC H.264/AVC Deblocking Deblocking: Subjective Result for Strong : Subjective Result for Strong Inter Coding Inter Coding H.264/AVC H.264/AVC Deblocking Deblocking: Subjective Result for Strong : Subjective Result for Strong Inter Coding Inter Coding 1) Without Filter 2) With H.264/AVC deblocking
Audiovisual Communications, Fernando Pereira, 2011
SOLUTION 1
transform coefficients
coefficients
SOLUTION 2 (5-15% less bitrate)
1 1 1 1 1 0 0 0 …
Audiovisual Communications, Fernando Pereira, 2011
Complexity (memory and computation) typically increases 4× × × × at the encoder and 3× × × × at the decoder regarding MPEG-2 Video, Main profile. Problematic aspects:
access)
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
among others, the vendors of H.264/AVC products and services are expected to pay patent licensing royalties for the patented technology that their products use.
applying to this standard is a private
affiliated in any way with the MPEG standardization organization); MPEG LA also administers patent pools for MPEG-2 Part 1 Systems, MPEG-2 Part 2 Video, MPEG-4 Part 2 Video, and other technologies.
Audiovisual Communications, Fernando Pereira, 2011
Decoder Decoder-Encoder Royalties Encoder Royalties Decoder Decoder-Encoder Royalties Encoder Royalties
(“unit”) begin at US $0.20 per unit after the first 100,000 units each year. There are no royalties on the first 100,000 units each year. Above 5 million units per year, the royalty is US $0.10 per unit.
than 50% owned subsidiaries) is $3.5 million per year in 2005-2006, $4.25 million per year in 2007-08 and $5 million per year in 2009-10.
an Enterprise selling decoders or encoders both (i) as end products under its own brand name to end users for use in personal computers and (ii) for incorporation under its brand name into personal computers sold to end users by other licensees, also may pay royalties
limited to $10.5 million per year in 2005-2006, $11 million per year in 2007-2008 and $11.5 million per year in 2009-2010.
adoption and start-up, the License will provide a grace period in which no royalties will be payable on decoders and encoders sold before January 1, 2005.
Audiovisual Communications, Fernando Pereira, 2011
for on title-by-title basis, e.g., PPV, VOD, or digital download, where viewer determines titles to be viewed or number of viewable titles are otherwise limited), there are no royalties up to 12 minutes in length. For AVC video greater than 12 minutes in length, royalties are the lower of (a) 2% of the price paid to the licensee from licensee’s first arms length sale or (b) $0.02 per title. Categories of licensees include (i) replicators of physical media, and (ii) service/content providers (e.g., cable, satellite, video DSL, internet and mobile) of VOD, PPV and electronic downloads to end users.
title-by-title), no royalties are payable by a system (satellite, internet, local mobile
systems with greater than 100,000 AVC video subscribers, the annual participation fee is $25,000 per year up to 250,000 subscribers, $50,000 per year for greater than 250,000 AVC video subscribers up to 500,000 subscribers, $75,000 per year for greater than 500,000 AVC video subscribers up to 1,000,000 subscribers, and $100,000 per year for greater than 1,000,000 AVC video subscribers.
Audiovisual Communications, Fernando Pereira, 2011
AVC video to markets of 100,000 or fewer households. For over-the-air free broadcast AVC video to markets of greater than 100,000 households, royalties are $10,000 per year per local market service (by a transmitter or transmitter simultaneously with repeaters, e.g., multiple transmitters serving one station).
developing, no royalties will be payable for internet broadcast services (non- subscription, not title-by-title) during the initial term of the license (which runs through December 31, 2010) and then shall not exceed the over-the-air free broadcast TV encoding fee during the renewal term.
and greater than 50% owned subsidiaries) is $3.5 million per year in 2006-2007, $4.25 million in 2008-09 and $5 million in 2010.
encourage early marketplace adoption and start-up, the License will provide for a grace period in which no Participation Fees will be payable for products or services sold before January 1, 2006.
Audiovisual Communications, Fernando Pereira, 2011
JPEG JPEG-LS JPEG 2000 MJPEG 2000 JPEG XR AIC ? H.261 H.263 H.264/AVC/SVC/MVC MPEG-1 Video H.262/MPEG-2 Video MPEG-4 Visual RVC HEVC
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Scalability is a functionality regarding the decoding of parts of the coded bitstream, ideally
1.
while achieving an RD performance at any supported spatial, temporal, or SNR resolution that is comparable to single-layer coding at that particular resolution, and
2.
without significantly increasing the decoding complexity.
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
For each spatial resolution (except the lowest), the scalable stream asks for a bitrate overhead regarding the corresponding alternative non-scalable stream, although the total bitrate is lower than the total simulcasting bitrate.
Non-Scalable Streams Spatial Scalable Stream
CIF SDTV HDTV CIF SDTV HDTV CIF SDTV HDTV
Simulcasting Scalability overhead Simulcasting overhead
Audiovisual Communications, Fernando Pereira, 2011
The SVC standard objective was to enable the encoding of a high-quality video bit stream that contains one or more subset bit streams that can themselves be decoded with a complexity and reconstruction quality similar to that achieved using the existing H.264/AVC design with the same quantity of data as in the subset bit stream.
transmission environments as well as bitrate, format, and power adaptation; this should provide enhancements to transmission and storage applications.
already defined codecs that were not successful due the characteristics of traditional video transmission systems, the significant loss in coding efficiency as well as the large increase in decoder complexity in comparison with non- scalable solutions.
Audiovisual Communications, Fernando Pereira, 2011
subset of the scalable bit stream.
decoding that scales with the decoded spatio-temporal resolution and bitrate.
case).
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
varying capability
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Spatio Spatio-
Temporal-Quality Cube Quality Cube Spatio Spatio-
Temporal-Quality Cube Quality Cube
Spatial Resolution Temporal Resolution 4CIF CIF QCIF 7.5 15 30 60 Bit Rate (Quality, SNR) high low global bit-stream
Audiovisual Communications, Fernando Pereira, 2011
Hierarchical MCP & Intra prediction Base layer coding texture motion Hierarchical MCP & Intra prediction Base layer coding texture motion Progressive SNR refinement texture coding Inter-layer prediction:
Spatial decimation Spatial decimation H.264/AVC compatible base layer bit-stream Hierarchical MCP & Intra prediction Base layer coding Multiplex texture motion Scalable bit-stream H.264/AVC compatible encoder Inter-layer prediction:
Progressive SNR refinement texture coding Progressive SNR refinement texture coding
identifiers in the NAL unit header
and deblocking
target layer
Audiovisual Communications, Fernando Pereira, 2011
The main goal of inter layer prediction is to enable the usage of as much lower layer information as possible for improving the RD performance
Audiovisual Communications, Fernando Pereira, 2011
distortion performance.
bitrate increase relative to non-scalable H.264/AVC coding, at the same fidelity, can be as low as 10% for dyadic spatial scalability. The results typically become worse as spatial resolution of both layers decreases and results improve as spatial resolution increases.
increase relative to non-scalable H.264/AVC coding, at the same fidelity, can be as low as 10% for all supported rate points when spanning a bitrate range with a factor of 2-3 between the lowest and highest supported rate point.
From IEEE Transactions on Circuits and Systems for Video Technology, September 2007.
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
[Segall& Sullivan, T-CSVT, Sept’07]
Audiovisual Communications, Fernando Pereira, 2011
already with some adoption
embracing SVC for DTV broadcast services
possible …
ATSC-M/H
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
by C. Wheatstone
stereoscope
anaglyph glasses
technique
projection
using polarization
short films
IMAX screens by J. Cameron
times more revenue in 3D than 2D
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Source: DisplaySearch, 3D Display Technology and Market Forecast Report
Audiovisual Communications, Fernando Pereira, 2011
technology
transition costs or turned off by viewing discomfort or fatigue
providing interoperability through the delivery chain and taking into consideration the constraints imposed by each delivery channel
Audiovisual Communications, Fernando Pereira, 2011
an illusion of depth for the scene
which allow navigating the 3D scene by changing the viewpoint and view direction within certain ranges (each view may be stereo)
video representation or its important stereo-view special case.
Audiovisual Communications, Fernando Pereira, 2011
available in 2D projections; this is why images on a television screen and at the cinema make sense. Perceptual cues for 3D perception include:
maintain a clear image (focus) on an object as its distance changes.
presenting a slightly different image to each eye. It is important to note that the motion parallax cue is still not satisfied with stereoscopy and, therefore, the illusion of depth is incomplete.
Audiovisual Communications, Fernando Pereira, 2011
Free viewpoint systems require the acquisition of multiple scene views taken from different angles, allowing the user to navigate around the scene.
Audiovisual Communications, Fernando Pereira, 2011
formats are closely coupled to 3D display types and application scenarios.
format/standard would be highly desirable to support any 3D video application in an efficient way, while decoupling content creation from display and application.
earlier this decade: a slow start, then a rapid ascent in sales once enough content exists to attract mainstream buyers.
Audiovisual Communications, Fernando Pereira, 2011
independent compression of each view.
displayed by starting the decoder at a random access point and decoding a relatively small quantity of data on which that image may depend.
reduced in quality to a degree commensurate with the quantity of data in the subset used for the decoding process – although accessing only a portion of a bitstream.
limited number subset of the set of encoded views.
‘base view’ is decodable by an ordinary (non-MVC) H.264/AVC decoder.
encoding quality of the various views.
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
(ignoring they are like ‘brothers’ due to the interview redundancy).
as H.264/AVC is more likely.
2010 World Cup games.
Audiovisual Communications, Fernando Pereira, 2011
‘as usual’:
half represent the right view; thus, each coded view has half the resolution of the full coded frame.
Left Right Left Right time
Left Right
Audiovisual Communications, Fernando Pereira, 2011
Advantages
compressed with existing encoders, transmitted through existing channels, and decoded by existing receivers)
(some increase expected)
and consumer interfaces)
Drawbacks
native display formats (further quality degradation)
characteristics
Audiovisual Communications, Fernando Pereira, 2011
views are coded exploiting their interview redundancy.
coding solutions with increased compression efficiency.
Combined temporal and interview prediction
Audiovisual Communications, Fernando Pereira, 2011
needed)
depth range
Audiovisual Communications, Fernando Pereira, 2011
Multi-view video (MVV) refers to a set of N temporally synchronized video streams coming from cameras capturing the same real scenery from different viewpoints.
eye, in order to generate a depth impression
VIEW-1 VIEW-2 VIEW-3
TV/HDTV
3DTV
Stereo system
Channel
VIEW-1 VIEW-2 VIEW-3
TV/HDTV
3DTV
Stereo system
Channel
Audiovisual Communications, Fernando Pereira, 2011
changes of the slice layer syntax and below and
multi-view.
inter-camera prediction to reduce the required bitrate.
include a base view, which is independently coded from other non-base views.
view is reduced around 30%
are about 25%
Audiovisual Communications, Fernando Pereira, 2011
Many prediction structures possible to exploit interview redundancy, trading-off differently memory, delay, computation and coding efficiency.
View
MPEG-2 Video Multi-view profile
Pictures are not only predicted from temporal references, but also from
interview references.
The prediction is adaptive, so the best predictor among temporal and interview
references can be selected on a block basis in terms of rate-distortion cost.
Audiovisual Communications, Fernando Pereira, 2011
Time View
The MVC standard enables interview prediction, as well as supporting
Interview prediction is a key feature of the MVC design, and it is enabled in a
way that makes use of the flexible reference picture management capabilities that had already been designed into H.264/AVC.
It also supports backward compatibility with existing legacy systems by
structuring the MVC bitstream to include a compatible ‘base view’.
Base View with GOP size 6
For complexity reasons, the MVC design does not allow the prediction of a picture in one view at a given time using a picture from another view at a different time.
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
The core macroblock-level and lower-level decoding modules of an MVC decoder are the same, regardless of whether a reference picture is a temporal or an interview
single-layer AVC hardware;
unit type)
multi-view extension of the sequence parameter set (SPS) defined by H.264/AVC.
identification; ii) view dependency information; and iii) level index for operation points.
reference or multi-view reference
Audiovisual Communications, Fernando Pereira, 2011
There are two MVC profiles with support for more than
H.264/AVC High profile:
supports multiple views and does not support interlaced coding tools.
limited to two views, but does support interlaced coding tools.
Levels impose constraints on the MVC bitstreams to establish bounds on the necessary decoder resources and complexity. The level limits include limits on the amount of frame memory required for the decoding of a bitstream, the maximum throughput in terms of macroblocks per second, maximum picture size, overall bit rate, etc.
Audiovisual Communications, Fernando Pereira, 2011
Simulcasting versus interview prediction comparison
8 views (640×480), and considering the rate for all views ~25% bit rate savings over all views
Ballroom
31 32 33 34 35 36 37 38 39 40 200 400 600 800 1000 1200 1400 1600 1800
Bitrate (Kb/s) PSNR (db) Simulcast MVC Race1
32 33 34 35 36 37 38 39 40 41 42 200 400 600 800 1000 1200 1400 1600
Bitrate (Kb/s) PSNR (db) Simulcast MVC
Audiovisual Communications, Fernando Pereira, 2011
MVC achieves comparable stereo quality to simulcast with as little as 25% rate for dependent view.
1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50
O r i g i n a l S i m u l c a s t ( A V C + A V C ) 1 2 L _ 5 P c t 1 2 L _ 3 5 P c t 1 2 L _ 2 5 P c t 1 2 L _ 2 P c t 1 2 L _ 1 5 P c t 1 2 L _ 1 P c t 1 2 L _ 5 P c t Mean Opinion Score
Base view fixed at 12Mbps Dependent view at varying percentage of base view rate
Audiovisual Communications, Fernando Pereira, 2011
achieve a typical compression gain of about 50%, largely at the cost
smaller) block size motion compensation, multiple reference frames, smaller blocks transform, deblocking filter in the prediction loop, and improved entropy coding.
in video coding and it is currently being adopted by a growing number of organizations, companies and consortia.
relevance is already growing considering the increasing overall system heterogeneity.
3D video ...
Audiovisual Communications, Fernando Pereira, 2011
JPEG JPEG-LS JPEG 2000 MJPEG 2000 JPEG XR AIC ? H.261 H.263 H.264/AVC/SVC/MVC MPEG-1 Video H.262/MPEG-2 Video MPEG-4 Visual HEVC RVC
Audiovisual Communications, Fernando Pereira, 2011
Standard Year Main Applications Profiles Main Bitrates Frame Types Ref. Frames Transf
Number Motion Vectors (if any) Motion Vectors Precision Entropy Coding Deblocking Filter
H.261 1988 Videotelephony and videoconference No p× 64 kbit/s
DCT 1 per MB Integer pel Huffman based In loop MPEG
Video 1991 Digital storage in CD- ROM No Around 1- 1.2 Mbit/s I, P, B , and D 0-2 DCT 1 or 2 per MB (P and B) Half pel Huffman based Out of the loop H.262/MPEG- 2 Video 1994 Digital TV and DVD Yes, most used is Main Profile From 2 to 10 Mbit/s I, P and B 0-2 DCT 1 or 2 per MB (2 to 4 for interlaced video ) Half pel Huffman based Out of the loop H.263 1995 Videotelephony and videoconference and more Only in extensions From very low rates to around 1 Mbit/s I, P and B 0-2 DCT 1 or 2 per MB (4 in the optional modes) Half pel Huffman based Out of the loop MPEG
Visual 1998 Large range with
Yes, most used are Simple and Advanced Simple Very large range using levels I, P and B 0-2 DCT 1 or 2 per MB (4 in the optional modes); also global motion vectors 1/4 pel Huffman based; arithmetic coding for the shape Out of the loop H.264/AVC 2004 Large range, from mobile to Blu-ray Yes, most used are Baseline, Main and High Very large range using levels I, P, generalize d B, SP and SI Up to 16 Integer DCT 1 to 16 per MB (P slices) and 1to 32 (B slices) 1/4 pel CAVLC and CABAC In loop SVC 2007 Robust delivery, graceful deletion, broadcasting, Yes Very large range using layers I, P and generalize d B, Up to 16 Integer DCT 1 to 16 per MB (?) 1/4 pel CAVLC and CABAC In loop MVC 2009 Stereo TV, Free viewpoint TV Yes Very large range using levels I, P, B, Up to 16 Integer DCT 1 to 16 per MB (?) 1/4 pel CAVLC and CABAC In loop
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
Audio
Kbps), Protected AAC (from iTunes Store), MP3 (16 to 320 Kbps), MP3 VBR, Audible (formats 2, 3, and 4), Apple Lossless, WAV, and AIFF
Video
VC video, up to 1.5 Mbps, 640 by 480 pixels, 30 frames per second, Low-Complexity version of the H.264/A V Baseline Profile with AAC- LC audio up to 160 Kbps, 48kHz, stereo audio in .m4v, .mp4, and .mov file formats;
VC video, up to 2.5 Mbps, 640 by 480 pixels, 30 frames per second, Baseline Profile up to Level 3.0 with AAC-LC audio up to 160 Kbps, 48kHz, stereo audio in .m4v, .mp4, and .mov file formats;
480 pixels, 30 frames per second, Simple Profile with AAC-LC audio up to 160 Kbps, 48kHz, stereo audio in .m4v, .mp4, and .mov file formats
Audiovisual Communications, Fernando Pereira, 2011
Audiovisual Communications, Fernando Pereira, 2011
"Amplifiers at Bolling Field, 1921." Two giant horns with ear tubes, evidently designed to listen for approaching aircraft.
Audiovisual Communications, Fernando Pereira, 2011
Audio
AAC, MP3, MP3 VBR, Audible (formats 1, 2, and 3), Apple Lossless, AIFF, and WAV
Video
VC video, up to 1.5 Mbps, 640 by 480 pixels, 30 frames per second, Low-Complexity version of the H.264 Baseline Profile with AAC-LC audio up to 160 Kbps, 48kHz, stereo audio in .m4v, .mp4, and .mov file formats;
VC video, up to 768 Kbps, 320 by 240 pixels, 30 frames per second, Baseline Profile up to Level 1.3 with AAC-LC audio up to 160 Kbps, 48kHz, stereo audio in .m4v, .mp4, and .mov file formats;
480 pixels, 30 frames per second, Simple Profile with AAC-LC audio up to 160 Kbps, 48kHz, stereo audio in .m4v, .mp4, and .mov file formats
Audiovisual Communications, Fernando Pereira, 2011
& Sons, 2003