1
H.264/AVC Standard H.264/AVC Standard 1 History History - - PowerPoint PPT Presentation
H.264/AVC Standard H.264/AVC Standard 1 History History - - PowerPoint PPT Presentation
H.264/AVC Standard H.264/AVC Standard 1 History History Objectives: 50% bit rate savings compared to MPEG-2 High quality video at both low and high bit rates: 64kbps to 240Mbps Network-friendly: more error resilient
2
History History
- Objectives:
- 50% bit rate savings compared to MPEG-2
- High quality video at both low and high bit rates:
- 64kbps to 240Mbps
- Network-friendly: more error resilient tools
- Support both conversational and non-conversational applications:
- Conversational: video conference
- Non-conversational: storage, broadcast, streaming
- 1998: Call for proposal for H.26L issued by ITU-T VCEG (Video
Coding Expert Group)
- Oct. 1999: First draft design
- Dec. 2001: ITU and ISO formed the Joint Video Team (JVT)
- Mar. 2003: approved
- ITU-T H.264 and ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC)
- Jul 2004: Fidelity Range Extensions (FRExt)
- Current: Scalability Extensions
3
Terminology Terminology
- Block: An MxN (M-column by N-row) array of samples, or an
MxN array of transform coefficients. (i.e. Contains coded data corresponding to a 4x4 sample region
- f the video frame )
- Macro-block: A 16x16 block of luma samples and two
corresponding blocks of chroma samples.
- sub-macroblock: One quarter of the samples of a macroblock,
i.e., an 8x8 luma block and two corresponding chroma blocks of which one corner is located at a corner of the macroblock.
- Slice: Contains an integral number of MBs, from 1 (1MB per
slice) to the total number of MBs in a picture (1 slice per picture)
4
Terminology Terminology
- Reference Picture: the previously encoded frames that may
be used as a reference in motion estimation (i.e. A reference picture contains samples that may be used for inter prediction in the decoding process of subsequent pictures in decoding order.)
- Motion Vector: A two-dimensional vector used for inter
prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture.
- Profile: A set of coding functions and each specifies what is
requiered of an encoder or decoder that complies with the profile.
- Level: Performance limits for CODECs are defined by a set of
Levels, each placing limits on parameters such as sample processing rate, picture size, coded bitrate and memory requirements.
5
Key Idea in Video Coding Key Idea in Video Coding
- Predict each frame from the previous frame and only encode the
prediction error:
- Pred. error has smaller energy and is easier to compress
- Predict at what level?
- Frame level: good for camera panning
- Block level: too many motion information to code
- Macro-block (MB) level (16 x 16 pixels): most widely used
6
Scope of Picture and Video Coding Scope of Picture and Video Coding Standardization Standardization
- Only the Syntax and Decoder are standardized
7
Coded Data Format Coded Data Format
- H.264 makes a distinction between a Video Coding Layer (VCL) and a
Network Abstraction Layer (NAL).
- The output of the encoding process is VCL data (a sequence of bits
representing the coded video data) which are mapped to NAL units prior to transmission or storage.
- Each NAL unit contains a Raw Byte Sequence Payload (RBSP), a set of
data corresponding to coded video data or header information.
- A coded video sequence is represented by a sequence of NAL units
(below Figure) that can be transmitted over a packet-based network
- r a bitstream transmission link or stored in a file.
8
NAL
H.264 consists of a video coding layer (VCL) and a
network adaptation layer (NAL).
NAL includes header and a payload. The header
specifies the NAL unit type and the payload contains the related data.
NAL design specified in the Recommendation is
appropriate for the adaptation of H.264 over RTP/UDP/IP, H.324/M, MPEG-2 transport, and H.320.
9
NALU
- NAL header contains three fields:
1.
The Nal_Ref_Idc (NRI) contains two bits that indicate the priority of the NALU payload, where 11 is the highest transport priority, followed by 10, then by 01 and, finally, 00 is the lowest.
10
NALU
2.
The NALU type (T) is a 5-bit field that characterizes the NALU as one of 32 different types. Types 1–12 are currently defined by H.264. Types 24 to 31 are made available for uses outside of H.264
3.
The forbidden_bit, finally, is specified to be zero in H.264 encoding. Network elements can set this bit to 1 when they identify bit errors in the NALU
11
Nal_Ref_Idc (NRI)
12
H.264/AVC Profiles H.264/AVC Profiles
- The AVC scheme includes different profiles:
- Baseline Profile – for low-delay end-to-end applications.
- Main Profile – for broadcasting application at SD (Standard
Definition) level.
- Extended Profile – for mobile applications and e-streaming.
- High Profile- To address the needs of the most demanding
applications– such as contribution and distribution of content, studio editing and post processing – named as the "fidelity range extensions" or FRExt:
The High profile (HP): 8-bit per sample, 4:2:0 sampling The High 10 profile (Hi10P): 8-10 bit per sample, 4:2:0 sampling The High 4:2:2 profile (H422P): 10-bit per sample, 4:2:2 sampling The High 4:4:4 profile (H444P): 12-bit per sample, supporting up
to 4:4:4 chroma sampling, and additionally supporting efficient lossless region coding.
13
Baseline
Intra coding Inter coding using I-Slice and P-Slice Entropy coding using context-adaptive
variable length codes (CAVLC)
14
Main
Interlaced video Inter coding using B-Slices Inter coding using Weighted Prediction Entropy coding using Context-Based
Arithmetic Coding (CABAC)
15
Extended
SI and SP Slices to enable efficient
switching between coded bit streams
Improved error resilience (data
partitioning)
Does not have interlaced video or
CABAC
16
Applications
Baseline: video telephony, video
conferencing, wireless communications
Main: television broadcasting, video
storage
Extended: streaming media applications
17
Video Format (Progressive)
18
Video Format (Interlaced)
19
Frame and Field Coding
From coding efficiency point of view, a
decision needs to be made whether to compress video as one single frame or as two separate fields
20
Reference Pictures
Reference picture management: Uses a window of n frames Two lists list0 and list1
21
New Added Features to H.264/AVC New Added Features to H.264/AVC
Intra-prediction Motion Estimation
Variable block sizes Multiple reference frames Sub pixel Motion Estimation
Image transform De-blocking filter Entropy coding
22
Basic Macro Basic Macro-
- block Coding Structure
block Coding Structure
Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.
- Transf. coeffs
Motion Data Intra/Inter Coder Control
Decoder
Motion Estimation Transform/ Scal./Quant.
- Input Video
Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal
23
Encoder (Forward Path) Encoder (Forward Path)
- An input frame or field Fn is processed in units of a MBs.
- Each MB is encoded in intra or inter mode
- For each block in the MB, a prediction PRED (marked ‘P’ in
Figure) is formed based on reconstructed picture samples.
24
Encoder (Forward Path) Encoder (Forward Path)
- In Intra mode,
- PRED is formed from samples in the current slice that have
previously encoded, decoded and reconstructed (uF’n in the figures; note that unfiltered samples are used to form PRED).
- In Inter mode,
- PRED is formed by motion-compensated prediction from one or
more reference picture(s) selected from a set of reference pictures
25
Encoder (Forward Path) Encoder (Forward Path)
- P is subtracted from the current block to produce Dn (a residual block) that is
transformed and quantized to give X
- X: a set of quantized transform coefficients which are reordered and entropy
encoded.
- The entropy-encoded coefficients, together with side information (prediction
modes, quantization parameter, motion vector information, etc.) form the compressed bitstream
- The compressed bitstream is passed to a Network Abstraction Layer (NAL) for
transmission or storage.
26
Encoder (Reconstruction Path) Encoder (Reconstruction Path)
- The encoder decodes (reconstructs) every MB to provide a reference for further
predictions.
- The coefficients X are scaled (Q−1) and inverse transformed (T−1) to produce a
difference block D’n.
- The prediction block P is added to D’n to create a reconstructed block uF’n (a
decoded version of the original block; u indicates that it is unfiltered).
- A filter is applied to reduce the effects of blocking distortion and the
reconstructed reference picture is created from a series of blocks F’n.
27
Decoder Decoder
- The decoder receives a compressed bitstream from the NAL
- Entropy decodes the data elements to produce a set of quantized coefficients X.
- These are scaled and inverse transformed to give D’n (identical to the D’n
shown in the Encoder).
- Using the header information decoded from the bitstream, the decoder creates a
prediction block PRED, (identical to the original prediction PRED formed in the encoder).
- PRED is added to D’n to produce uF’n which is filtered to create each decoded
block F’n.
28
Intra Intra-
- prediction
prediction
Motivation: intra-frames are natural images, so they
exhibit strong spatial correlation
Macro-blocks in intra-coded frames are predicted
based on previously-coded ones
Above and/or to the left of the current block The macro-block may be divided into 16, 4x4 sub-blocks
which are predicted in cascading fashion
9 modes for 4x4 and 4 modes for 16x16 size
29
Intra Intra-
- prediction (cont
prediction (cont’ ’d) d)
Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.
- Transf. coeffs
Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant.
- Input Video
Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal
Directional spatial prediction (9 types for luma, 4 chroma) e.g., Mode 3: diagonal down/right prediction a, f, k, p are predicted by (A + 2Q + I + 2) >> 2
1 2 3 4 5 6 7 8
Q A B C D E F G H I a b c d J e f g h K i j k l L m n o p M N O P
30
Luma 4x x4 Intramodes
31
Luma 4x x4 Intramodes
d = round (B/4 + C/2 + D/4)
32
Luma 16x x16 Intramodes
33
Intra4x4 Intra4x4-
- Prediction Ex.
Prediction Ex. -
- Vertical
Vertical
34
Intra4x4 Intra4x4-
- Prediction Ex.
Prediction Ex.-
- Horizontal
Horizontal
35
Intra4x4 Intra4x4-
- Prediction Ex.
Prediction Ex.-
- DC
DC
36
Intra4x4 Intra4x4-
- Prediction Ex.
Prediction Ex.– – Diagonal Diagonal Down Down-
- Right
Right
37
Optimal Intra4x4 Mode Selection Optimal Intra4x4 Mode Selection
Select the mode with the best R-D
tradeoff.
Full search method:
Divide each MB into sixteen 4x4 blocks. For each 4x4 block: For each of the nine lntra_4x4 prediction modes:
Predict the current 4x4 block by the current
mode. G t di ti id l
38
Intra_16x16 Prediction Intra_16x16 Prediction
Intra_16x16 prediction (4 modes)
Predict the entire 16 x 16 block Suitable for smooth areas
Four modes:
0: Vertical 1: Horizontal 2: DC
- 3. Plane
39
Optimal Intra16x16 Mode Selection Optimal Intra16x16 Mode Selection
- Full search method:
For each lntra_l6x16 prediction mode:
- Get prediction of the current MB.
- Find the prediction residual.
- Perform 2D 4-point Hadamard transform for each 4x4 block.
- Extract all the DC from the sixteen 4x4 blocks and apply 2D 4-point
Hadamard transform to the 4x4 DC again.
- Cost estimation: Compute the absolute value of all the Hadamard transform
coefficients.
end Find the mode with the smallest cost as the best Intra_16x16 prediction mode for this MB.
- Decision between Intra_4x4 and Intra_16x16:
- Compare the costs of Intra_4x4 mode and Intra_16x16 mode to find the
best mode.
40
Motion Estimation (ME) Motion Estimation (ME)
- For each block, find the best match in the previous frame (reference
frame)
- Upper-left corner of the block being encoded: (x0, y0)
- Upper-left corner of the matched block in the reference frame: (x1, y1)
- Motion vector (dx, dy): the offset of the two blocks:
- (dx, dy) = (x1 – x0, y1 – y0)
- (x0, y0) + (dx, dy) = (x1, y1)
- Motion vector need to be sent to the decoder.
41
Motion Compensation (MC) Motion Compensation (MC)
Given reference frame and the motion vector,
can obtain a prediction of the current frame
Prediction error: Difference between the
current frame and the prediction.
The prediction error will be coded by DCT,
quantization, and entropy coding.
42
GOP, I, P, and B Frames GOP, I, P, and B Frames
- GOP: Group of pictures (frames).
- I frames (Key frames):
- Intra-coded frame, coded as a still image. Can be decoded directly.
- Used for GOP head, or at scene changes.
- I frames also improve the error resilience.
- P frames: (Inter-coded frames)
- Predication-based coding, based on previous frames.
43
GOP, I, P, and B Frames GOP, I, P, and B Frames
B frames: Bi-directional interpolated prediction
frames
Predicted from both the previous frame and the next frame:
more flexibilities -> better prediction.
Encoding order: 1 4 2 3 7 5 6 Decoding order: 1 4 2 3 7 5 6 Display order: 1 2 3 4 5 6 7 Need more buffers Need buffer manipulations to
display the correct order.
44
Block Matching Algorithms for ME Block Matching Algorithms for ME
- Each frame splits into 16x16 pel blocks (MB), motion estimation
will be done for each macro-block.
- Search windows (Maximum movement): w: typically 8, 16 or 32
- Defining a cost for finding the best match for each block in
previous frame
- Mean Absolute Error (MAE) or sum Absolute Difference (SAD)
- Mean Square Error (MSE)
- Sum of the Squared Error (SSE)
- Motion vector (MV) calculation between current block and its
counterpart in previous frame
- Calculating macro block differences and sending it
45
Cost Function Cost Function
The best match is found by minimizing the SAD (sum
Absolute Difference) function that is computed as:
Where s being the original video signal and c being
the coded video signal
∑
= =
− − − =
16 , 16 1 , 1
] , [ ] , [ )) ( , (
y x y x
m y m x c y x s m c s SAD
46
Motion Estimation in H.264 Motion Estimation in H.264
- What is new?
Variable Block size Motion Estimation,
- Can yield 15% bit rate savings
Multiple reference frame Motion Estimation,
5-20% bit rate savings
Sub Pixel Motion Estimation,
20% bit rate savings over integer ME
47
Search Window Search Window
Search Window (in previous frame)
Rectangle with the same coordinate as current block in
current frame, extended by w pixels in each directions
w w w w q p q+2w p+2w
48
Cost Function Cost Function
The best match is found by minimizing the SAD (sum
Absolute Difference) function that is computed as :
Where s being the original video signal and c being
the coded video signal
∑
= =
− − − =
16 , 16 1 , 1
] , [ ] , [ )) ( , (
y x y x
m y m x c y x s m c s SAD
49
Full Search Method Full Search Method
- Full Search
- All candidates within search window
are examined
- (2w+1)2 positions should be
examined
- Advantage: Good accuracy, Finds
best match
- Disadvantage: Large amount of
computation, (2w+1)2 matches, 16x16 MAE for each match that is Impractical for real-time applications
- In order to avoid this complexity, we
should reduce search points so we have to use Fast Block Matching Algorithms
50
Initial Search Point Prediction Initial Search Point Prediction
- A median predictor is used for defining the initial search point
- That is the median value of the motion vectors of three spatially
adjacent blocks: left, top and top-right (top-left) of the current block.
- If C not exist then C=D
- If B, C not exist then prediction = MV_A
- If A, C not exist then prediction = MV_B
- If A, B not exist then prediction = MV_C
- Otherwise Prediction = median(MV_A,MV_B,MV_C)
A D B C E
) (
C mv B mv A mv median y pred x pred pred mv _ , _ , _ ) _ , _ ( _ =
51
2 2-
- D Logarithmic Search (TDL)
D Logarithmic Search (TDL)
- Examine central point & its
four surroundings
- Distance from center: w/2
- Find best match
- If the best match is not in the
center examine three new points centering previous best
- Half the distance, continue
until the distance is 1, use all 9 matches, find best. Stop
- Here the maximum search
points is: 2 + 7 log w
1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3
52
Three Step Search (TSS) Three Step Search (TSS)
- 1. check nine search points
- 2. Step size is reduced by half
after each step.
- 3. At the end of the search the
step size is one pel.
- Repeat algorithm 3 times
- Examines 25 points
- Number of search points: 1 +
8 log w
- Advantage: simple and regular
structure, good for HW implementation
- Disadvantage: a uniformly
allocated checking point that makes it inefficient for small motion.
1 1 1 1 1 1 1 1 1 2 3 2 2 2 2 2 2 2 3 3 3 3 3 3 3
53
Diamond Search (DS) Diamond Search (DS)
Experimental results show that:
- 53% to 98% of the motion
vectors are enclosed in a circular area with a radium of 2 pels and centered
- n the position of zero motion.
The block displacement of real-world
video sequences is mainly in horizontal and vertical directions.
the search points incurred within the circle with a
radium of 2 pels.
- utperforms the TSS algorithm
54
DS Algorithm DS Algorithm
- 1. 9 checking points of LDSP are tested. If
the minimum point is located at the center position, go to Step 2; otherwise recursively repeat this step for the best point.
- 2. Switch the search pattern from LDSP to
- SDSP. The minimum point found in the best
point.
LDSP SDSP
55
DS Algorithm DS Algorithm
- (b) LDSP->LDSP when minimum is at one of the corner points
- (c) LDSP->LDSP when minimum is along the edge of the
diamond
- (d) LDSP->SDSP when minimum is at the center of the search
pattern.
56
H.264 ME Algorithm (UMHexagonS) H.264 ME Algorithm (UMHexagonS)
1) Initial search point prediction 2) Unsymmetrical-cross search 3) Uneven multi-hexagonal-grid search 4) Extended hexagonal based search
Note that the ME is not a mandatory part, Here just the implemented ME in reference software is described.
57
Initial Search Point Prediction Initial Search Point Prediction
A median predictor is used for defining the initial
search point
That is the median value of the motion vectors of
three spatially adjacent blocks- left, top and top-right (top-left) of the current block.
A D B C E
) (
C mv B mv A mv median y pred x pred pred mv _ , _ , _ ) _ , _ ( _ =
58
Unsymmetrical Unsymmetrical-
- Cross Search
Cross Search
- the movement in the horizontal direction is much heavier than
that in the vertical direction- Based on experimental results
- The distance between search points is chosen to be 2
- The minimum cost MV will be chosen as search center of next
search step
59
Uneven Multi Uneven Multi-
- Hexagonal
Hexagonal-
- Grid Search
Grid Search
60
Extended Hexagonal Extended Hexagonal-
- Based Search
Based Search
- When previous optimum
MV locates in the outer concentric area, the search result has relatively low accuracy
- motion vector
refinement by extended hexagonal based search method.
61
Motion Estimation in H.264 Motion Estimation in H.264
On of the main H. 264 enhancement feature is its
motion estimation algorithm
- What is new?
Variable Block size Motion Estimation,
- Can yield 15% bit rate savings
Multiple reference frame Motion Estimation,
5-20% bit rate savings
Sub Pixel Motion Estimation,
20% bit rate savings over integer ME
62
Variable Block Size ME Variable Block Size ME
- A 16x16 macro block may contain more than one object
- In other words: size of moving/stationary objects is variable
- The objects may move in different directions,
- ne motion vector is not enough to describe all objects movement
- By defining one MV some part of the object will describe well and the other part
will give a big error.
- The solution is defining variable block size
- The macro block with more details will be coded using a smaller block size
7 various block size in H.264 block size partitioning
63
Variable Block Size ME (Cont Variable Block Size ME (Cont’ ’d) d)
Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.
- Transf. coeffs
Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant.
- Input Video
Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal 8x8 4x8 1 1 2 3 4x4 8x4 1 8x8 Types 16x16 1 8x16 MB Types 8x8 1 2 3 16x8 1
64
Partitions of MB Partitions of MB
65
Variable Block Size ME (Cont Variable Block Size ME (Cont’ ’d) d)
Inter MB can be partitioned into smaller
regions for ME:
Up to 16 MVs MVs are differentially encoded. Need lots of optimization efforts to decide
the best mode: SAD + λ(Q) R
Mode decision:
R-D optimization with Lagrangian method Also an active research area.
66
Variable Block Size ME Variable Block Size ME -
- Example
Example
T=1 T=2
67
Variable Block Size ME Variable Block Size ME -
- Example
Example
T=1 T=2
68
Variable Block Size ME Variable Block Size ME -
- Example
Example
T=1 T=2
69
Multiple Reference Frames ME Multiple Reference Frames ME
- In previous standards up to 2 reference frames used for ME
- Here, up to five different reference frames can be selected
- resulting better subjective video quality and more efficient
coding of the video sequence.
- might help making the H.264 bit stream error resilient.
70
Multiple Reference Frames ME Multiple Reference Frames ME
In H.263, the reference frame for prediction is always
the previous frame
In MPEG and H.26L, some frames are predicted from
both the previous and the next frames (bi-prediction)
In H.264, up to 16 frames may be used as reference:
Encoder and decoder maintain synchronized buffers of
available frames (previously decoded)
resulting better subjective video quality and more
efficient coding of the video sequence
might help making the H.264 bit stream error
resilient
71
Multiple Reference Frames ME Multiple Reference Frames ME
Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.
- Transf. coeffs
Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant.
- Input Video
Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal
Multiple Reference Frames for Motion Compensation
72
Subpixel Subpixel Motion Estimation Motion Estimation
- When an object has a sub-pixel movement the integer pixel ME can’t
describe it, so sub pixel ME is defined
- H.263 uses only half pixel and MPEG-4 uses quarter pixel accuracy
- A gain of 1.5-2dB across the board over ½-pixel
- H.264 uses higher precision of spatial accuracy for ME up to eighth
pixel accuracy
73
Example
b = round [(E – 5F + 20G + 20H – 5I + J)/32]
74
Example (cont’d)
a = round [(G + b)/2]
75
Chroma Motion Vector
76
- H. 264 Cost Function
- H. 264 Cost Function
- The best match is found by minimizing the cost function:
- m=(mx,my)T is the motion vector
- p=(px,py)T is the predicted motion vector
- λmotion is the Lagrange multiplier
- R(m-p) represents the bits used to encode the motion information
- The SAD (sum Absolute Difference) is computed as:
- Where B = 16, 8 or 4 and s being the original video signal and c being the coded video
signal
) ( . )) ( , ( ) , ( p m R m c s SAD m J
motion motion
− + = λ λ
∑
= =
− − − =
B B y x y x
m y m x c y x s m c s SAD
, 1 , 1
] , [ ] , [ )) ( , (
77
MB Modes MB Modes
A MB can select one of these modes:
Intra_16x16 Intra_8x8 (not allowed in Baseline) Intra_4x4 I_PCM:
enables an encoder to transmit the values of the image
samples directly (without prediction or transformation).
Inter_16x16 Inter_16x8 Inter_8x16 Inter_8x8 SKIP
78
P_SKIP Type P_SKIP Type
For this type, neither a quantized prediction
error signal, nor a motion or reference index parameter is transmitted
The reference picture is located at index 0 in
the multi-picture buffer
The motion vector is predicted from motion
vector predictor
- It’s used for large are with no change or
constant motion.
Its size is 16x16
79
Mode Decision Method in H.264/AVC Mode Decision Method in H.264/AVC
- Calculate the RDCost for each Intra mode
- Calculate the RDCost for SKIP mode
- For each inter mode (16x16, 16x8, 8x16 and 8x8),
- For each block in the current mode
- Do ME in a search area, select the point that minimizes below equation:
- End
- Calculate the RDCost using:
RDCost = Distortion + λ×Rate
- Note that :
- Rate needs doing: Transform, Quantization and entropy coding
- Distortion needs doing: Transform, Quantization Transform-1 and Quantization-1
- End
- From the calculated RDCosts:
(RDCost_Intra_16x16, RDCost_Intra_4x4, RDCost_I_PCM, RDCost_SKIP, RDCost_Inter_16x16, RDCost_Inter_16x8, RDCost_Inter_8x16 and RDCost_Inter_8x8)
select the least one as the best mode.
) ( . )) ( , ( ) , ( p m R m c s SAD m J
motion motion
− + = λ λ
80
Slice
Each frame can be coded in one or more
slices, each containing one (16x
x16) or all the
macroblocks in the frame (1 slice per picture)
The number of macroblocks per slice need
not be constant within a picture
Because of minimal inter-dependency
between coded slices propagation of error can be limited
81
Slice Coding Slice Coding
- Slices can have different shapes and sizes
- Slices do not have to be consecutive in the raster scan
- Each slice is self-contained
- Can be decoded without knowing data other slices
- Useful for:
- Error resilience and concealment
- Parallel processing
82
Slice Type Slice Type
Each slice can be coded as one of 5 types:
I slice:
All MBs are coded using intra mode.
P slice:
A MB can be coded in intra mode or inter mode with at most
- ne prediction signal per block.
B slices:
In addition to modes in P slice, some MBs can also be predicted
using two prediction signal per block.
SP slice: Switching-P slice
To facilitate switching between different video streams
SI slice: Switching-I slice
- Using only Intra prediction
83
Slice Modes in H.264 Slice Modes in H.264
84
Slice Syntax Slice Syntax
85
Slice Syntax
A macroblock contains coded data
corresponding to a 16x
x16 sample region of a
video frame 16x
x16 for luma and 8x x8 for cr, cb
86
Slices
The H.264 encoder intelligently groups MBs into a
slice whose size is less than (or equal to) the size of the maximum transportation unit (MTU).
Slices are decoded independently Prediction beyond the slice
boundaries is forbidden to prevent error propagation from intra-frame predictions
87
Arbitrary Slice Order (ASO)
- The Baseline Profile supports the decoding order of
the slices to be arbitrary.
permits, for example, to reduce decoding delay in
case of out-of-order delivery of NAL units.
Application example
reduce end-end transmission delay in RT app
88
Flexible Macroblock Ordering (FMO)
Using FMO, it is no longer required that slices consist
- f neighboring macroblock.
provide efficient methods for error concealment in
error-prone channels
The objective behind the flexible macroblock ordering
(FMO) is to scatter possible errors to the whole frame as equally as possible to avoid error accumulation in a limited region.
89
Slice Group
Slice Group : a subset of the macroblocks and may
contain one or more slices
In FMO frame is divided to some slice groups. Each macroblock could be assigned freely to a certain
slice group using a MAP function.
90
MAP Function
91
Redundant Coded Picture
Send the duplicated part or all of a coded picture In normal operation, the decoder reconstructs the
frame from ‘primary’ (nonredundant)’ pictures and discards any redundant pictures.
However, if a primary coded picture is
damaged (e.g. due to a transmission error), the decoder may replace the damaged area with decoded data from a redundant picture if available.
92
MB Prediction Types MB Prediction Types
Intra:
MB is predicted from the neighboring blocks of the
same frame. Intra prediction is performed on 16x16, 4x4 and 8x8 (in FRExt profile) blocks.
Inter:
MB is predicted form the regions in previous
(next) frames, using motion estimation.
93
MB Syntax Element MB Syntax Element
94
Transformation and Quantization in H.264
95
Transformation
H.264 uses three transforms
Hadamard transform for the 4x
x4 array of
luma DC coefficients
Hadamard transform for the 2x
x2 array of
chroma DC coefficients
DCT-based transform for all other 4x
x4
blocks in the residual data
96
Transformation
[1]
97
Transformation
Fundamental differences between H.264
transform and DCT
It is an integer transform It is possible to ensure zero mismatch between
encoder and decoder
Can be implemented using only additions and
shifts
A scaling multiplication is integrated into the
quantizer
Can be carried out using 16-bit integer arithmetic
98
Transformation
DCT
[1]
99
Transformation
DCT Approximation
[1]
100
Transformation
4x
x4 Hadamard Transform
2x
x2 Hadamard Transform
[1]
101
Quantization
Requirements of complicated forward and
inverse quantizers
Avoid division and/or floating point arithmetic Incorporate the post and pre scaling matrices
The basic forward quantizer A total of 52 values of Qstep are supported
by the standard, indexed by a quantization parameter
102
Quantization
The wide range of quantizer step sizes
makes it possible for an encoder to control the tradeoff accurately and flexibly between bit rate and quality
QP can be different for luma and
chroma (commonly QPC > QPY)
103
Quantization
Forward quatization in integer
arithmetic
Inverse quatization in integer arithmetic
[1]
104
Complete T&Q
[1]
105
Complete T&Q
Encoding
Input: 4 × 4 residual samples: X Forward ‘core’ transform: W = Cf XCTf
(followed by forward transform for Chroma DC or Intra-16 Luma DC coefficients).
Post-scaling and quantization: Z =
W.round(PF/Qstep) (different for Chroma DC or Intra-16 Luma DC).
106
Complete T&Q
Decoding:
(Inverse transform for Chroma DC or Intra-16
Luma DC coefficients)
Decoder scaling (incorporating inverse transform
pre-scaling): W’ = Z.Qstep.PF.64 (different for Chroma DC or Intra-16 Luma DC).
Inverse ‘core’ transform: X’ = CT
i W’Ci
Post-scaling: X” = round(X’/64) Output: 4 × 4 residual samples: X”
107
Deblocking Filter
Applied to each decoded macroblock to
reduce the blocking distortion
Filtering is applied to vertical or
horizontal edges of 4x
x4 blocks in a
macroblock (except for edges on slice boundaries)
The filter is stronger at places where
there is likely to be significant blocking distortion such as the boundary of an Intra coded macroblock or a boundary
108
Deblocking Filter (cont’d)
The effect of the filter decision is to
switch off the filter when there is a significant change (gradient) across the block boundary in the original image
109
Error Resilience Tools
Error Resilience Tools: 1) Flexible Macroblock Order (FMO) 2) Arbitrary Slice Order (ASO) 3) Redundant Slices (RS) 4) Data Partitioning (DP).
110
FMO
FMO can work to randomize the data
prior to transmission, so that if a segment of data is lost, the errors are distributed more randomly over the video pictures
the relevant neighboring data is
available for concealment of lost content.
111
FMO
112
ASO
ASO allows slices of a picture to appear in
any order for delay reduction (particularly for use on networks that can deliver data packets out of order).
113
RS
Coding with a low QP (and hence in
good quality), and the RS is coded with a high QP (utilizing fewer bits). decoder decodes primary slice, if it is available, and discards the RS.
if the primary slice is missing the RS
can be reconstructed
114
DP
Type A: Header information, including MB types,
quantization parameters, and motion vectors
The Intra Partition, called type B partition. It carries
Intra CBPs and Intra coefficients
The Inter Partition, called type C partition. It
contains only Inter CBPs and Inter coefficients
115
- Baseline (Progressive, Videoconferencing & Wireless)
I and P picture types (not B)
Uses list0 for P-Slices
In-loop deblocking filter 1/4-sample motion compensation Tree-structured motion segmentation down to 4x4 block size VLC-based entropy coding (UVLC and CAVLC) Some enhanced error resilience features
Flexible macroblock ordering/arbitrary slice ordering Redundant slices
Baseline Profile Baseline Profile
116
Main
May Use list0 and/or list1 for B-Slices B-Slices may use:
One past and one future reference Two past references Two future references
117
B-Slice Prediction
118
Prediction Options
B Slices may be predicted in one of several ways
119
Direct Prediction
No motion vector is transmitted for a B slice
macroblock or macroblock partition encoded in direct mode
Decoder calculates list0 and list1 vectors
based on previously coded vectors and uses these to carry out bi-predictive motion compensation of the decoded residual samples
A skipped macroblock in a B slice is
reconstructed at the decoder using Direct prediction
120
Weighted Prediction
The weights w0,w1 may be transmitted
from encoder to decoder in implicit mode or the decoder calculates them based on relative temporal positions of the list0 and list1 reference pictures in explicit mode
121
Extended
SI and SP slices are specially coded
slices that enable (among other things) efficient switching between video streams and efficient random access for video decoders
A common requirement in a streaming
application is for a video decoder to switch between one of several encoded streams
122
Example
The same video material is coded at
multiple bitrates for transmission across the internet and a decoder attempts to decode the highest-bitrate stream it can receive but may require switching automatically to a lower-bitrate stream if the data throughput drops
123
Switching
124
Switching (cont’d)
125
SP Slice
126
SP Slice (cont’d)
127
Random Access to Video Frames
We can decode A0, create SP-Slice A0-10,
predict A10 from A0
128
SI Slices
The same as SP Slices but their
prediction is Intra 4x
x4 from the
previously decoded and reconstructed image samples
129
H.264/AVC Profiles H.264/AVC Profiles
130
H.264 Rate Control
131
Compression in H.264
Compression unit
MacroBlock (MB)
Coding parameters
Motion estimation search area Quantization step size Etc
Constant parameters variable bit-rate Variable bit-rate problems
Practical delivery Storage mechanism (a film on VCD or DVD) Network (circuit switch, packet switch) Delay, FIFO size, Decoder stall
132
Encoder Output Encoder Buffer Decoder Buffer Decoder Buffer
Variable Bitrate: Example
- 25 Frame/Sec
- Qp = 12
- Net = 100 Kbps
- Net = 4 Kb/frame
- F1 = 54 Kbit
133
Rate Control
- Problem
- Decoder Stall
Freeze playback
- Wider bit-rate variation
Larger buffer size Larger decoding delay
- Increase delay avoid stall
Online applications
- Buffer Overflow (Under flow?)
- Solutions
- Rate control modify coding parameters
Maintain a target bit-rate Minimize distortion in decoding The most obvious parameter = Qp Increasing Qp reduce bit-rate (and quality)
- The choice of rate control
Nature of video application
134
Nature of Video Applications
- Offline encoding on DVD
- Processing time = not important
- Goals
- fit in DVD
- best quality
- DVD Buffer = not over/under flow
- Rate control= Two-pass encoding
- Pass1= Statistic of video
- Pass2= Coding
- Live video for broadcast
- Encoder= One, High quality, Fast hardware
- Decoder= Multi, limited buffer/processing
- A few delay = acceptable
- Rate control = medium complexity, two pass in frame
- Video-conferencing
- Each side = both encoding and decoding power=limited
- Delay = minimum as possible (0.5 Sec)
- Rate control = low complexity
- Encoder = tightly control bit-rate Quality vary (drop in high motion)
135
H.264/AVC
In the first version of the standard: The 4:2:0 chroma format (typically derived by
performing an RGB-to-YCbCr color-space transformation)
8 bit sample precision for luma and chroma
values
136
Color Space
137
FRExt Color Space
The FRExt amendment extended the
standard to 4:2:2 and 4:4:4 chroma formats and higher than 8 bits precision
In 4:2:0 chroma format, each macroblock
consists of a 16x16 region of luma samples and two corresponding 8x8 chroma sample
- arrays. In a macroblock of 4:2:2 chroma
format video, the chroma sample arrays are 8x16 in size; and in a macroblock of 4:4:4 chroma format video, they are 16x16 in size.
138
FRExt Color Space
called YCgCo (where the "Cg" stands for
green chroma and the "Co" stands for orange chroma), which is much simpler and typically has equal or better coding efficiency.
139
FRExt Color Space
140