In the name of Allah In the name of Allah
THE COMPASSIONATE THE MERCIFUL THE COMPASSIONATE THE MERCIFUL
In the name of Allah In the name of Allah
THE COMPASSIONATE, THE MERCIFUL THE COMPASSIONATE, THE MERCIFUL
In the name of Allah In the name of Allah In the name of Allah In - - PowerPoint PPT Presentation
In the name of Allah In the name of Allah In the name of Allah In the name of Allah THE COMPASSIONATE THE MERCIFUL THE COMPASSIONATE THE MERCIFUL THE COMPASSIONATE, THE MERCIFUL THE COMPASSIONATE, THE MERCIFUL Digital Video Processing
THE COMPASSIONATE THE MERCIFUL THE COMPASSIONATE THE MERCIFUL
THE COMPASSIONATE, THE MERCIFUL THE COMPASSIONATE, THE MERCIFUL
ROOM: CE ROOM: CE 307 307
D E P A R T M E N T O F C O M P U T E R E N G I N E E R I N G D E P A R T M E N T O F C O M P U T E R E N G I N E E R I N G S H A R I F U N I V E R S I T Y O F T E C H N O L O G Y S H A R I F U N I V E R S I T Y O F T E C H N O L O G Y E -
M A I L : S K A S A E I @ S H A R I F . E D U S K A S A E I @ S H A R I F . E D U W E B P A G E : W E B P A G E : H T T P : / / S H A R I F . E D U / ~ S K A S A E I H T T P : / / S H A R I F . E D U / ~ S K A S A E I L A B . W E B S I T E : L A B . W E B S I T E : H T T P : / / I P L . C E . S H A R I F . E D U H T T P : / / I P L . C E . S H A R I F . E D U
5
50% bit rate savings compared to MPEG-2. High quality video at both low and high bit rates:
64kbps to 240Mbps.
Network-friendly: more error resilient tools.
b h l d l l
Supports both conversational and non-conversational applications:
Conversational: video conference. Non-conversational: storage, broadcast, streaming.
(VCEG) (VCEG).
ITU T H 264 and ISO/IEC MPEG 4 Part 10 advanced video coding (AVC) ITU-T H.264 and ISO/IEC MPEG-4 Part 10 advanced video coding (AVC).
Kasaei
6
Kasaei
7
Kasaei
QCIF formatted frames with 10 Hz frame rate.
8
9
Block: An MxN (M-column by N-row) array of samples, or an
Block: An MxN (M column by N row) array of samples, or an MxN array of transform coefficients.
M bl k A 16 16 bl k f l l d t
Macro-block: A 16x16 block of luma samples and two
corresponding blocks of chroma samples.
sub-macroblock: One quarter of the samples of a macroblock.
sub macroblock: One quarter of the samples of a macroblock.
located at a corner of the macroblock.
Slice: Contains an integral number of MBs from 1 (1MB per Slice: Contains an integral number of MBs, from 1 (1MB per
slice) to the total number of MBs in a picture (1 slice per picture).
Kasaei
10
Reference Picture: Previously encoded frames that may be used as a
Reference Picture: Previously encoded frames that may be used as a reference in motion estimation.
process of subsequent pictures in decoding order.
M i V A D d f i di i h id
Motion Vector: A 2-D vector used for inter prediction that provides
an offset from the coordinates in the decoded picture to the coordinates in a reference picture. fil f di f i h ifi h i i d f
Profile: A set of coding functions, each specifies what is required of an
encoder or decoder that complies with the profile.
Level: Performance limits for CODECs are defined by a set of Levels.
memory requirements.
Kasaei
Predicts each frame from a previous frame and only encodes the
prediction error:
11
prediction error:
Frame level: good for camera panning. Block level: too many motion information to code. Macro-block (MB) level (16 x 16 pixels): most widely used. Kasaei
Only the Syntax and the Decoder are standardized.
12
Kasaei
13
Kasaei
b t ti l (NAL) 14 abstraction layer (NAL).
architectures.
f bi i h d d id d hi h d i i i i
storage.
Figure).
Kasaei
15
H.264 consists of a video coding layer and a network abstraction layer.
g y y
NAL includes header and a payload.
NAL design specified in the recommendation is appropriate for the
adaptation of H.264 over RTP/UDP/IP, H.324/M, MPEG-2 transport, and H.320.
Kasaei
16
1.
Nal_Ref_Idc (NRI) contains two bits that indicate the priority of the NALU payload.
2.
NALU type (T) is a 5-bit field that characterizes the NALU as one of 32 different types. yp
H.264.
3
Forbidden bit is specified to be zero in H 264 encoding
3.
Forbidden_bit, is specified to be zero in H.264 encoding.
Kasaei
17
Kasaei
18
Kasaei
19
AVC scheme includes different profiles:
Baseline Profile – For low-delay end-to-end applications. Main Profile – For broadcasting application at standard definition (SD) level. Extended Profile – For mobile applications and e-streaming. High Profile - To address the needs of the most demanding applications (such as
contribution and distribution of content, studio editing and post processing) named as , g p p g) fidelity range extensions (FRExt):
High profile (HP): 8-bit per sample, 4:2:0 sampling. High 10 profile (Hi10P): 8-10 bit per sample, 4:2:0 sampling. High 4:2:2 profile (H422P): 10-bit per sample, 4:2:2 sampling. High 4:4:4 profile (H444P): 12-bit per sample supporting up to 4:4:4 chroma sampling High 4:4:4 profile (H444P): 12-bit per sample, supporting up to 4:4:4 chroma sampling,
and additionally supporting efficient lossless region coding.
Kasaei
20
Kasaei
21
Kasaei
22
Intra coding Inter coding using I-slice and P-slice Entropy coding using context-based adaptive variable length coding
(CAVLC)
Kasaei
23
Interlaced video Inter coding using B-slices Inter coding using weighted prediction Entropy coding using context based adaptive binary arithmetic coding Entropy coding using context-based adaptive binary arithmetic coding
(CABAC)
Kasaei
24
SI and SP slices to enable efficient switching between coded bit streams Improved error resilience (data partitioning) Does not have interlaced video or CABAC
Kasaei
25
Baseline:
Main:
Extended:
Kasaei
Reference picture management:
26
Kasaei
27
Intra-prediction Motion Estimation
Variable block sizes Multiple reference frames
p
Sub-pixel motion estimation
Image transform De blocking filter De-blocking filter Entropy coding
Kasaei
28
Coder Control Data Quant. Coder Control Transform/ S l /Q t Input Video Signal Entropy Scaling & Inv. Transform Quant.
Decoder
Scal./Quant.
Macroblocks (16x16 pixels) Entropy Coding (16x16 pixels) Intra-frame De-blocking Filter Motion- Compensation Intra/Inter Prediction Output Video Signal Motion Data Motion Estimation
Kasaei
An input frame or field Fn is processed in units of MBs.
E h MB i d d i i i d
29
Each MB is encoded in intra or inter mode. For each block in the MB, a prediction PRED (marked ‘P’ in the
figure) is formed based on reconstructed picture samples.
Kasaei
In Intra mode:
PRED is formed from samples in the current slice that have been
i l d d d d d d d
30
previously encoded, decoded and reconstructed.
uF’n in the figures; note that unfiltered samples are used to form PRED.
In Inter mode:
PRED is formed by motion-compensated prediction from one or
y p p more reference picture(s) selected from a set of reference pictures.
Kasaei
transformed and quantized to give X. X i t f ti d t f ffi i t hi h d d d t 31
encoded.
quantization parameter, motion vector information, etc.) form the compressed bitstream.
Kasaei
predictions. 32
difference block D’n.
reconstructed reference picture is created from a series of blocks F’n.
Kasaei
33 Entropy decodes the data elements to produce a set of quantized coefficients X.
prediction block PRED.
block F’n.
Kasaei
34
Motivation: Intra-frames are natural images, so they
Macro-blocks in intra-coded frames are predicted based on
Above and (or to the left of) the current block. Macro-block may be divided into 16 4x4 sub-blocks which are Macro-block may be divided into 16, 4x4 sub-blocks which are
predicted in a cascading fashion. 9 modes for 4x4 and 4 modes for 16x16 size 9 modes for 4x4 and 4 modes for 16x16 size.
Kasaei
35
Coder Control Data Quant. Coder Control Transform/ S l /Q t Input Video Signal Entropy Scaling & Inv. Transform Quant.
Decoder Scal./Quant.
Macroblocks (16x16 pixels) Entropy Coding (16x16 pixels) Intra-frame De-blocking Filter Motion- Compensation Intra/Inter Prediction Output Video Signal Motion Data Motion Estimation
Kasaei
36
Coder Control Data Quant. Coder Control Transform/ S l /Q t Input Video Signal
Directional spatial prediction (9 types for luma, 4 chroma)
Q A B C D E F G H I a b c d
Entropy Scaling & Inv. Transform Quant.
Decoder Scal./Quant.
Macroblocks (16x16 pixels)
I a b c d J e f g h K i j k l L m n o p M
Entropy Coding (16x16 pixels) Intra-frame De-blocking Filter 2 7 8
N O P
Motion- Compensation Intra/Inter Prediction Output Video Signal
e.g., Mode 3: Diagonal down/right prediction
1 3 4 5 6 8 Motion Data Motion Estimation
a, f, k, p are predicted by (A + 2Q + I + 2) >> 2
Kasaei
Reducing spatial redundancies within a picture.
37
Prediction using surrounding pixels. Used only in H.264. 2 sizes for intra prediction.
4x4 16x16
38
Kasaei
39
Kasaei
40
Selects the mode with the best R-D tradeoff. Full search method:
Divide each MB into sixteen 4x4 blocks. For each 4x4 block: For each of the nine lntra 4x4 prediction modes: For each of the nine lntra_4x4 prediction modes:
Predict the current 4x4 block by the current mode. Get prediction residual. Apply transform, quantization, entropy coding, inverse quantization, and inverse
transform, find the output bits R, and the reconstruction error SAD or MSE. C t th j i t t SAD λ(Q) R
End Find the mode with the smallest cost as the best Intra_4x4 prediction mode _4 4 p for this 4x4 block. End
Fast method: An active research area.
Reduce the number of searches.
Kasaei
41
Full search method:
For each lntra l6x16 prediction mode: For each lntra_l6x16 prediction mode:
Get prediction of the current MB. Find the prediction residual. Perform 2D 4-point Hadamard transform for each 4x4 block. Extract all the DC from the sixteen 4x4 blocks and apply 2D 4-point Hadamard Extract all the DC from the sixteen 4x4 blocks and apply 2D 4-point Hadamard
transform to the 4x4 DC again.
Cost estimation: Compute the absolute value of all the Hadamard transform
coefficients.
end Find the mode with the smallest cost as the best Intra_16x16 prediction mode for this MB.
Decision between Intra 4x4 and Intra 16x16: Decision between Intra_4x4 and Intra_16x16:
Compare the costs of Intra_4x4 mode and Intra_16x16 mode to find the best mode.
Kasaei
Block size for inter-prediction:
H.261 & MPEG-1: 16X16 MPEG-2: 16X16, 16X8 H.263 & MPEG-4: 16X16 & 8X8
3 4
H.264: 16x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4
Motion estimation accuracy: Motion estimation accuracy:
H.261: integer-pel MPEG-1 & MPEG-2 & H.263: half-pel
MPEG 4 & H 264: quarter pel
MPEG-4 & H.264: quarter-pel
Skip prediction
42
For each block, find the best match in the previous frame (reference
frame)
43
frame)
Upper-left corner of the block being encoded: (x0, y0) Upper-left corner of the matched block in the reference frame: (x1, y1)
Motion vector (dx, dy): the offset of the two blocks:
(d d ) ( 1 0 1 0)
Motion vector need to be sent to the decoder.
Kasaei
44
Given reference frame and the motion vector, can obtain a prediction of
h f the current frame.
Prediction error: Difference between the current frame and the
predicted frame.
Prediction error will be coded by DCT, quantization, and entropy
y , q , py coded.
Kasaei
45
GOP: Group of pictures (frames). I frames (Key frames):
Intra-coded frame, coded as a still image.
Can be decoded directly.
Used for GOP head, or at scene changes. I frames also improve the error resilience.
P frames: (Inter-coded frames)
Predication-based coding, based on previous frames.
Kasaei
B frames: Bi-directional interpolated prediction
46
Predicted from both the previous frame and the next frame:
more flexibilities -> better prediction. Encoding order: 1 4 2 3 7 5 6. Decoding order: 1 4 2 3 7 5 6. Display order: 1 2 3 4 5 6 7 Display order: 1 2 3 4 5 6 7. Needs more buffers. Needs buffer manipulations to
Kasaei
47
Split each frame into 16x16 blocks (MB), apply motion estimation for
each macro-block each macro block.
Search window (maximum movement): w
Define a cost for finding the best match for each block in the previous
frame.
Mean absolute error (MAE) or sum of absolute difference (SAD). Mean squared error (MSE). Sum of squared error (SSE).
f q ( )
Calculate the motion vector (MV) between the current block and its
counterpart in the previous frame.
Calculate the macro-block differences and send them.
Kasaei
The best match is found by minimizing the sum of
48
16 , 16
= =
− − − =
6 , 6 1 , 1
] , [ ] , [ )) ( , (
y x y x
m y m x c y x s m c s SAD where s is the original video signal and c is the coded video signal video signal.
Kasaei
49
What is new?
Variable block size ME.
Can yield 15% bit rate savings.
Multiple reference frame ME.
5-20% bit rate savings 5-20% bit rate savings
Sub-pixel ME.
20% bit rate savings over integer ME.
Kasaei
50
Coder Control Data Quant. Coder Control Transform/ Scal /Quant Input Video Signal Entropy Scaling & Inv. Transform
Decoder Scal./Quant.
Macroblocks (16x16 pixels) Entropy Coding ( p ) Intra-frame P di ti De-blocking Filter16x16 8x16 MB 8x8 1 16x8 Motion- Compensation Intra/Inter Prediction Output Video Signal 8x8 4x8 1 1 4x4 8x4 8x8 1 Types 2 3 1 Motion Data Motion Estimation 1 2 3 1 8x8 Types
Kasaei
51
Kasaei
52
Kasaei
53
Kasaei
54
Coder Control Data Quant. Coder Control Transform/ S l /Q t Input Video Signal Entropy Scaling & Inv. Transform Quant.
Decoder Scal./Quant.
Macroblocks (16x16 pixels) Entropy Coding (16x16 pixels) Intra-frame De-blocking Filter Motion- Compensation Intra/Inter Prediction Output Video Signal Motion Data Motion Estimation
Multiple Reference Frames for Motion Compensation
Kasaei
describe it; so sub-pixel ME is defined
55
describe it; so sub-pixel ME is defined.
accuracy.
Kasaei
Search window (in previous frame)
56
Rectangle with the same coordinate as current block in current
frame, extended by w pixels at each direction.
q+2w q 2w w q p+2w w w q p w
Kasaei
Full Search:
57
All candidates within search window
are examined.
(2w+1)2 positions should be examined. Advantage: Good accuracy; finds the
b h best match.
Disadvantage: Large amount of
computation; (2w+1)2 matches, 16x16 MAE for each match that is impractical for real time applications for real-time applications.
In order to avoid this complexity, we
should reduce the search points so we have to use fast block matching algorithms algorithms.
Kasaei
It is not a normative part of the standard.
58
Due to variety of encoding modes for a MB, a mode selection should be
made:
d i f f b QP f b MB f f b mod int mod int mod × + × × = es ra
number QPs
number es er
number pics ref
number QPs
number MB a for es
number
Existing criteria:
816 ) 4 9 ( 3 259 1 3 = + × + × × =
g
59
T
R X R t s X D ≤ ) ( . . ) ( min
T
) (
Lagrange method: min D(x) + λ R(X) λ : trading off D and R
60
61
C l l h RDC i ) ( . )) ( , ( ) , ( p m R m c s SAD m J
motion motion
− + = λ λ
RDCost = Distortion + λ×Rate
(RDC t I t 16 16 RDC t I t 4 4 RDC t I PCM RDC t SKIP (RDCost_Intra_16x16, RDCost_Intra_4x4, RDCost_I_PCM, RDCost_SKIP, RDCost_Inter_16x16, RDCost_Inter_16x8, RDCost_Inter_8x16 and RDCost_Inter_8x8)
Kasaei
62
Each frame can be coded in one or more slices.
Number of macro-blocks per slice need not be constant within a
picture.
Because of minimal inter-dependency between coded slices the
propagation of error can be limited.
Kasaei
Slices can have different shapes and sizes.
Sli d t h t b ti i th t 63
Slices do not have to be consecutive in the raster scan.
Each slice is self-contained.
Can be decoded without knowing the data of other slices.
Useful for:
Error resilience and concealment. Parallel processing.
Kasaei
64
Each slice can be coded as one of 5 types:
I slice: All MBs are coded using intra mode. P slice: An MB can be coded in intra mode or inter mode with at most one
prediction signal per block.
B slices: In addition to modes in P slice some MBs can also be predicted using two In addition to modes in P slice, some MBs can also be predicted using two
prediction signal per block.
SP slice: Switching-P slice. To facilitate switching between different video streams.
g b
SI slice: Switching-I slice. Using only Intra prediction.
Kasaei
Transform Coding
65
Reducing the spatial redundancy of prediction error
Like 2-D Fourier transform Lossy Compression (not by itself only but) after quantization (ignoring y p ( y y ) q ( g g high frequency components)
Using integer transform instead of DCT
Similar properties with DCT Integer operations No worries about DCT and IDCT matching
Transform size
Other standards: 8x8 H.264: 4x4, 8x8
66
H.264 uses three types of transforms:
Hadamard transform for 4x
x4 array of luma DC coefficients.
Hadamard transform for 2x
x2 array of chroma DC coefficients.
DCT-based transform for all other 4x4 blocks in residual data.
4 4
Kasaei
67
Kasaei
68
Fundamental differences between Hadamard transform and DCT:
It is an integer transform. It is possible to ensure zero mismatch between encoder and decoder. Can be implemented using only additions and shifts.
p g y
A scaling multiplication is integrated into the quantizer. Can be carried out using 16-bit integer arithmetic.
Kasaei
69
DCT:
[1]
Kasaei
70
DCT Approximation:
[1]
Kasaei
71
4x
2x2 Hadamard transform:
[1]
Kasaei
72
Requirements of complicated forward and inverse quantizers.
Avoid division and/or floating point arithmetic. Incorporate the post- and pre-scaling matrices.
The basic forward quantizer: A total of 52 values of Qstep are supported by the standard, indexed by
ti ti t a quantization parameter.
Kasaei
73
Wide range of quantizer step sizes makes it possible for an encoder to
l h d ff l d fl ibl b bi d li control the tradeoff accurately and flexibly between bit rate and quality.
QP can be different for luma and chroma.
Kasaei
74
Forward quantization in integer arithmetic:
Inverse quantization in integer arithmetic:
Kasaei
75
Kasaei
76
Applied to each decoded macro-block to reduce the blocking distortion. Filtering is applied to vertical or horizontal edges of 4x
x4 blocks in a macro-block (except for edges on slice boundaries).
Filter is stronger at places where there is likely to be significant
blocking distortion.
contain coded coefficients.
Effect of the filter decision is to switch off the filter when there is a
significant change (gradient) across the block boundary in the original image image.
Kasaei
Only in H.26x standards.
77
With d bl ki filt Without deblocking filter. With deblocking filter.
78
Error Resilience Tools:
To minimize the visual effect of error within a frame. To avoid error propagation. These tools include:
1.
Flexible macro-block ordering (FMO).
2.
Arbitrary slice ordering (ASO).
3.
Redundant slices (RS).
4
Slice data partitioning (DP)
4.
Slice data partitioning (DP).
5.
Slice structured coding.
6.
Flexible reference frame concept.
7.
Picture switching. g
8.
Intra-coding.
Kasaei
79
FMO can work to randomize the data prior to transmission.
t f d t i l t th di t ib t d d l th id
pictures.
Relevant neighboring data is available for concealment of lost content.
g g
Kasaei
80
ASO allows slices of a picture to appear in any order for delay reduction.
Kasaei
81
Coding with a low QP (and hence in good quality), the RS is coded with
hi h QP ( ili i f bi ) a high QP (utilizing fewer bits).
Decoder decodes primary slice, if it is available, and discards the RS. If the primary slice is missing, the RS can be reconstructed.
Kasaei
82
Type A: Header information.
T B I t P titi
Type B: Intra Partition.
Type C: Inter Partition Type C: Inter Partition.
Kasaei
83
Baseline: Progressive, Videoconferencing & Wireless.
I d P i ( B)
I and P picture types (not B).
Uses list0 for P-Slices.
In-loop deblocking filter.
1/4 l ti ti
1/4-sample motion compensation. Tree-structured motion segmentation down to 4x4 block size. VLC-based entropy coding (UVLC and CAVLC).
S h d ili f t
Some enhanced error resilience features. Flexible macro-block ordering/arbitrary slice ordering. Redundant slices.
Kasaei
84
May use list0 and/or list1 for B-Slices. B-Slices may use:
One past and one future reference.
p
Two past references. Two future references.
Kasaei
85
Kasaei
86
SI and SP slices are specially coded slices that enable (among other things)
efficient s itching bet een ideo streams and efficient random access for ideo efficient switching between video streams and efficient random access for video decoders.
A common requirement in a streaming application is for a video decoder to A common requirement in a streaming application is for a video decoder to
switch between one of several encoded streams.
SI slices are the same as SP Slices but their prediction is Intra 4x4 from SI slices are the same as SP Slices, but their prediction is Intra 4x4 from
the previously decoded and reconstructed image samples.
Kasaei
87
Kasaei
88
Kasaei
Can decode A0, create SP-Slice A0-10, and predict A11 from A0.
89
Kasaei
90
The first version of the standard uses:
The 4:2:0 chroma format.
Typically derived by performing an RGB-to-YCbCr color-space transformation.
8 bit sample precision for luma and chroma values.
Kasaei
91
The FRExt amendment extended the standard to 4:2:2 and 4:4:4
h f d hi h h 8 bi i i chroma formats and higher than 8 bits precision.
In 4:2:0 chroma format, each macro-block consists of a 16x16 region of
luma samples and two corresponding 8x8 chroma sample arrays.
In a macro-block of 4:2:2 chroma format video, the chroma sample
4 , p arrays are 8x16 in size; and in a macro-block of 4:4:4 chroma format video, they are 16x16 in size.
Frext uses YCgCo the color space (where the "Cg" stands for green
chroma and the "Co" stands for orange chroma), which is much simpler and typically has equal or better coding efficiency. yp y q g y
Kasaei
92
Kasaei
Hybrid video coding:
94
Motion compensated DPCM + spatial decorrelating
transformations.
Difference between encoder and decoder prediction loop
Kasaei
MPEG issued a call for proposals for efficient
95
12 of 14 submitted proposals were based on 3-D
Kasaei
After six months of extensive study, the scalable
96
In January 2005, MPEG and VECG agreed to jointly
Kasaei
Temporal Scalability.
97
Spatial scalability. Quality scalability.
Coarse grain scalability. Fine grain scalability.
Kasaei
Temporal layers.
98
MCP is restricted to reference pictures of the lower
Hierarchical B-pictures. In general, hierarchical prediction structures can be
Kasaei
99
Kasaei
It is possible to arbitrarily adjust the structural delay
100
However, the coding efficiency typically decreases.
Kasaei
To reduce the complexity of RDO the following
101
Based on the QP0 for the base layer:
Q Q ( i h l l b ) QPt=QP0+3+T (T is the temporal layer number).
Kasaei
102
Kasaei
Inter layer prediction:
103
Inter layer intra prediction. Inter layer motion prediction.
I l id l di i
Inter layer residual prediction.
Kasaei
Macroblock partitioning is obtained by upsampling
104
Reference picture indices are copied from the co-
MVs may be refined by quarter pel accuracy.
Kasaei
Base layer signal of the co-located block is
105
Kasaei
When the co-located 8x8 sub-macroblock in the
106
Corresponding reconstructed intra signal of the
Kasaei
SNR scalability:
107
Coarse grain scalability. Fine grain scalability.
Coarse grain scalability
It is achieved by using the concepts for spatial scalability It is achieved by using the concepts for spatial scalability. Difference: no upsampling for inter layer prediction.
Kasaei
Progressive refinement (PR) slices.
108
Each PR slice represents a refinement the residual
Only a single inverse transform has to be performed
Kasaei
Drift: When MCP loops at decoder and encoder are
109
The highest quality reference available is employed
MCP for key pictures is done by only using the base
Kasaei
110
Kasaei
Temporal, spatial, and SNR scalability can be
111
Kasaei
Temporal Scalability: Without loss in RD
112
Spatial Scalability: Bit rate increase relative to non
SNR Scalability: 10% increase in the best case.
Kasaei
113
Kasaei