H.264/AVC Standard H.264/AVC Standard 1 History History - - PowerPoint PPT Presentation

h 264 avc standard h 264 avc standard
SMART_READER_LITE
LIVE PREVIEW

H.264/AVC Standard H.264/AVC Standard 1 History History - - PowerPoint PPT Presentation

H.264/AVC Standard H.264/AVC Standard 1 History History Objectives: 50% bit rate savings compared to MPEG-2 High quality video at both low and high bit rates: 64kbps to 240Mbps Network-friendly: more error resilient


slide-1
SLIDE 1

1

H.264/AVC Standard H.264/AVC Standard

slide-2
SLIDE 2

2

History History

  • Objectives:
  • 50% bit rate savings compared to MPEG-2
  • High quality video at both low and high bit rates:
  • 64kbps to 240Mbps
  • Network-friendly: more error resilient tools
  • Support both conversational and non-conversational applications:
  • Conversational: video conference
  • Non-conversational: storage, broadcast, streaming
  • 1998: Call for proposal for H.26L issued by ITU-T VCEG (Video

Coding Expert Group)

  • Oct. 1999: First draft design
  • Dec. 2001: ITU and ISO formed the Joint Video Team (JVT)
  • Mar. 2003: approved
  • ITU-T H.264 and ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC)
  • Jul 2004: Fidelity Range Extensions (FRExt)
  • Current: Scalability Extensions
slide-3
SLIDE 3

3

Terminology Terminology

  • Block: An MxN (M-column by N-row) array of samples, or an

MxN array of transform coefficients. (i.e. Contains coded data corresponding to a 4x4 sample region

  • f the video frame )
  • Macro-block: A 16x16 block of luma samples and two

corresponding blocks of chroma samples.

  • sub-macroblock: One quarter of the samples of a macroblock,

i.e., an 8x8 luma block and two corresponding chroma blocks of which one corner is located at a corner of the macroblock.

  • Slice: Contains an integral number of MBs, from 1 (1MB per

slice) to the total number of MBs in a picture (1 slice per picture)

slide-4
SLIDE 4

4

Terminology Terminology

  • Reference Picture: the previously encoded frames that may

be used as a reference in motion estimation (i.e. A reference picture contains samples that may be used for inter prediction in the decoding process of subsequent pictures in decoding order.)

  • Motion Vector: A two-dimensional vector used for inter

prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture.

  • Profile: A set of coding functions and each specifies what is

requiered of an encoder or decoder that complies with the profile.

  • Level: Performance limits for CODECs are defined by a set of

Levels, each placing limits on parameters such as sample processing rate, picture size, coded bitrate and memory requirements.

slide-5
SLIDE 5

5

Key Idea in Video Coding Key Idea in Video Coding

  • Predict each frame from the previous frame and only encode the

prediction error:

  • Pred. error has smaller energy and is easier to compress
  • Predict at what level?
  • Frame level: good for camera panning
  • Block level: too many motion information to code
  • Macro-block (MB) level (16 x 16 pixels): most widely used
slide-6
SLIDE 6

6

Scope of Picture and Video Coding Scope of Picture and Video Coding Standardization Standardization

  • Only the Syntax and Decoder are standardized
slide-7
SLIDE 7

7

Coded Data Format Coded Data Format

  • H.264 makes a distinction between a Video Coding Layer (VCL) and a

Network Abstraction Layer (NAL).

  • The output of the encoding process is VCL data (a sequence of bits

representing the coded video data) which are mapped to NAL units prior to transmission or storage.

  • Each NAL unit contains a Raw Byte Sequence Payload (RBSP), a set of

data corresponding to coded video data or header information.

  • A coded video sequence is represented by a sequence of NAL units

(below Figure) that can be transmitted over a packet-based network

  • r a bitstream transmission link or stored in a file.
slide-8
SLIDE 8

8

NAL

H.264 consists of a video coding layer (VCL) and a

network adaptation layer (NAL).

NAL includes header and a payload. The header

specifies the NAL unit type and the payload contains the related data.

NAL design specified in the Recommendation is

appropriate for the adaptation of H.264 over RTP/UDP/IP, H.324/M, MPEG-2 transport, and H.320.

slide-9
SLIDE 9

9

NALU

  • NAL header contains three fields:

1.

The Nal_Ref_Idc (NRI) contains two bits that indicate the priority of the NALU payload, where 11 is the highest transport priority, followed by 10, then by 01 and, finally, 00 is the lowest.

slide-10
SLIDE 10

10

NALU

2.

The NALU type (T) is a 5-bit field that characterizes the NALU as one of 32 different types. Types 1–12 are currently defined by H.264. Types 24 to 31 are made available for uses outside of H.264

3.

The forbidden_bit, finally, is specified to be zero in H.264 encoding. Network elements can set this bit to 1 when they identify bit errors in the NALU

slide-11
SLIDE 11

11

Nal_Ref_Idc (NRI)

slide-12
SLIDE 12

12

H.264/AVC Profiles H.264/AVC Profiles

  • The AVC scheme includes different profiles:
  • Baseline Profile – for low-delay end-to-end applications.
  • Main Profile – for broadcasting application at SD (Standard

Definition) level.

  • Extended Profile – for mobile applications and e-streaming.
  • High Profile- To address the needs of the most demanding

applications– such as contribution and distribution of content, studio editing and post processing – named as the "fidelity range extensions" or FRExt:

The High profile (HP): 8-bit per sample, 4:2:0 sampling The High 10 profile (Hi10P): 8-10 bit per sample, 4:2:0 sampling The High 4:2:2 profile (H422P): 10-bit per sample, 4:2:2 sampling The High 4:4:4 profile (H444P): 12-bit per sample, supporting up

to 4:4:4 chroma sampling, and additionally supporting efficient lossless region coding.

slide-13
SLIDE 13

13

Baseline

Intra coding Inter coding using I-Slice and P-Slice Entropy coding using context-adaptive

variable length codes (CAVLC)

slide-14
SLIDE 14

14

Main

Interlaced video Inter coding using B-Slices Inter coding using Weighted Prediction Entropy coding using Context-Based

Arithmetic Coding (CABAC)

slide-15
SLIDE 15

15

Extended

SI and SP Slices to enable efficient

switching between coded bit streams

Improved error resilience (data

partitioning)

Does not have interlaced video or

CABAC

slide-16
SLIDE 16

16

Applications

Baseline: video telephony, video

conferencing, wireless communications

Main: television broadcasting, video

storage

Extended: streaming media applications

slide-17
SLIDE 17

17

Video Format (Progressive)

slide-18
SLIDE 18

18

Video Format (Interlaced)

slide-19
SLIDE 19

19

Frame and Field Coding

From coding efficiency point of view, a

decision needs to be made whether to compress video as one single frame or as two separate fields

slide-20
SLIDE 20

20

Reference Pictures

Reference picture management: Uses a window of n frames Two lists list0 and list1

slide-21
SLIDE 21

21

New Added Features to H.264/AVC New Added Features to H.264/AVC

Intra-prediction Motion Estimation

Variable block sizes Multiple reference frames Sub pixel Motion Estimation

Image transform De-blocking filter Entropy coding

slide-22
SLIDE 22

22

Basic Macro Basic Macro-

  • block Coding Structure

block Coding Structure

Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.

  • Transf. coeffs

Motion Data Intra/Inter Coder Control

Decoder

Motion Estimation Transform/ Scal./Quant.

  • Input Video

Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal

slide-23
SLIDE 23

23

Encoder (Forward Path) Encoder (Forward Path)

  • An input frame or field Fn is processed in units of a MBs.
  • Each MB is encoded in intra or inter mode
  • For each block in the MB, a prediction PRED (marked ‘P’ in

Figure) is formed based on reconstructed picture samples.

slide-24
SLIDE 24

24

Encoder (Forward Path) Encoder (Forward Path)

  • In Intra mode,
  • PRED is formed from samples in the current slice that have

previously encoded, decoded and reconstructed (uF’n in the figures; note that unfiltered samples are used to form PRED).

  • In Inter mode,
  • PRED is formed by motion-compensated prediction from one or

more reference picture(s) selected from a set of reference pictures

slide-25
SLIDE 25

25

Encoder (Forward Path) Encoder (Forward Path)

  • P is subtracted from the current block to produce Dn (a residual block) that is

transformed and quantized to give X

  • X: a set of quantized transform coefficients which are reordered and entropy

encoded.

  • The entropy-encoded coefficients, together with side information (prediction

modes, quantization parameter, motion vector information, etc.) form the compressed bitstream

  • The compressed bitstream is passed to a Network Abstraction Layer (NAL) for

transmission or storage.

slide-26
SLIDE 26

26

Encoder (Reconstruction Path) Encoder (Reconstruction Path)

  • The encoder decodes (reconstructs) every MB to provide a reference for further

predictions.

  • The coefficients X are scaled (Q−1) and inverse transformed (T−1) to produce a

difference block D’n.

  • The prediction block P is added to D’n to create a reconstructed block uF’n (a

decoded version of the original block; u indicates that it is unfiltered).

  • A filter is applied to reduce the effects of blocking distortion and the

reconstructed reference picture is created from a series of blocks F’n.

slide-27
SLIDE 27

27

Decoder Decoder

  • The decoder receives a compressed bitstream from the NAL
  • Entropy decodes the data elements to produce a set of quantized coefficients X.
  • These are scaled and inverse transformed to give D’n (identical to the D’n

shown in the Encoder).

  • Using the header information decoded from the bitstream, the decoder creates a

prediction block PRED, (identical to the original prediction PRED formed in the encoder).

  • PRED is added to D’n to produce uF’n which is filtered to create each decoded

block F’n.

slide-28
SLIDE 28

28

Intra Intra-

  • prediction

prediction

Motivation: intra-frames are natural images, so they

exhibit strong spatial correlation

Macro-blocks in intra-coded frames are predicted

based on previously-coded ones

Above and/or to the left of the current block The macro-block may be divided into 16, 4x4 sub-blocks

which are predicted in cascading fashion

9 modes for 4x4 and 4 modes for 16x16 size

slide-29
SLIDE 29

29

Intra Intra-

  • prediction (cont

prediction (cont’ ’d) d)

Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.

  • Transf. coeffs

Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant.

  • Input Video

Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal

Directional spatial prediction (9 types for luma, 4 chroma) e.g., Mode 3: diagonal down/right prediction a, f, k, p are predicted by (A + 2Q + I + 2) >> 2

1 2 3 4 5 6 7 8

Q A B C D E F G H I a b c d J e f g h K i j k l L m n o p M N O P

slide-30
SLIDE 30

30

Luma 4x x4 Intramodes

slide-31
SLIDE 31

31

Luma 4x x4 Intramodes

d = round (B/4 + C/2 + D/4)

slide-32
SLIDE 32

32

Luma 16x x16 Intramodes

slide-33
SLIDE 33

33

Intra4x4 Intra4x4-

  • Prediction Ex.

Prediction Ex. -

  • Vertical

Vertical

slide-34
SLIDE 34

34

Intra4x4 Intra4x4-

  • Prediction Ex.

Prediction Ex.-

  • Horizontal

Horizontal

slide-35
SLIDE 35

35

Intra4x4 Intra4x4-

  • Prediction Ex.

Prediction Ex.-

  • DC

DC

slide-36
SLIDE 36

36

Intra4x4 Intra4x4-

  • Prediction Ex.

Prediction Ex.– – Diagonal Diagonal Down Down-

  • Right

Right

slide-37
SLIDE 37

37

Optimal Intra4x4 Mode Selection Optimal Intra4x4 Mode Selection

Select the mode with the best R-D

tradeoff.

Full search method:

Divide each MB into sixteen 4x4 blocks. For each 4x4 block: For each of the nine lntra_4x4 prediction modes:

Predict the current 4x4 block by the current

mode. G t di ti id l

slide-38
SLIDE 38

38

Intra_16x16 Prediction Intra_16x16 Prediction

Intra_16x16 prediction (4 modes)

Predict the entire 16 x 16 block Suitable for smooth areas

Four modes:

0: Vertical 1: Horizontal 2: DC

  • 3. Plane
slide-39
SLIDE 39

39

Optimal Intra16x16 Mode Selection Optimal Intra16x16 Mode Selection

  • Full search method:

For each lntra_l6x16 prediction mode:

  • Get prediction of the current MB.
  • Find the prediction residual.
  • Perform 2D 4-point Hadamard transform for each 4x4 block.
  • Extract all the DC from the sixteen 4x4 blocks and apply 2D 4-point

Hadamard transform to the 4x4 DC again.

  • Cost estimation: Compute the absolute value of all the Hadamard transform

coefficients.

end Find the mode with the smallest cost as the best Intra_16x16 prediction mode for this MB.

  • Decision between Intra_4x4 and Intra_16x16:
  • Compare the costs of Intra_4x4 mode and Intra_16x16 mode to find the

best mode.

slide-40
SLIDE 40

40

Motion Estimation (ME) Motion Estimation (ME)

  • For each block, find the best match in the previous frame (reference

frame)

  • Upper-left corner of the block being encoded: (x0, y0)
  • Upper-left corner of the matched block in the reference frame: (x1, y1)
  • Motion vector (dx, dy): the offset of the two blocks:
  • (dx, dy) = (x1 – x0, y1 – y0)
  • (x0, y0) + (dx, dy) = (x1, y1)
  • Motion vector need to be sent to the decoder.
slide-41
SLIDE 41

41

Motion Compensation (MC) Motion Compensation (MC)

Given reference frame and the motion vector,

can obtain a prediction of the current frame

Prediction error: Difference between the

current frame and the prediction.

The prediction error will be coded by DCT,

quantization, and entropy coding.

slide-42
SLIDE 42

42

GOP, I, P, and B Frames GOP, I, P, and B Frames

  • GOP: Group of pictures (frames).
  • I frames (Key frames):
  • Intra-coded frame, coded as a still image. Can be decoded directly.
  • Used for GOP head, or at scene changes.
  • I frames also improve the error resilience.
  • P frames: (Inter-coded frames)
  • Predication-based coding, based on previous frames.
slide-43
SLIDE 43

43

GOP, I, P, and B Frames GOP, I, P, and B Frames

B frames: Bi-directional interpolated prediction

frames

Predicted from both the previous frame and the next frame:

more flexibilities -> better prediction.

Encoding order: 1 4 2 3 7 5 6 Decoding order: 1 4 2 3 7 5 6 Display order: 1 2 3 4 5 6 7 Need more buffers Need buffer manipulations to

display the correct order.

slide-44
SLIDE 44

44

Block Matching Algorithms for ME Block Matching Algorithms for ME

  • Each frame splits into 16x16 pel blocks (MB), motion estimation

will be done for each macro-block.

  • Search windows (Maximum movement): w: typically 8, 16 or 32
  • Defining a cost for finding the best match for each block in

previous frame

  • Mean Absolute Error (MAE) or sum Absolute Difference (SAD)
  • Mean Square Error (MSE)
  • Sum of the Squared Error (SSE)
  • Motion vector (MV) calculation between current block and its

counterpart in previous frame

  • Calculating macro block differences and sending it
slide-45
SLIDE 45

45

Cost Function Cost Function

The best match is found by minimizing the SAD (sum

Absolute Difference) function that is computed as:

Where s being the original video signal and c being

the coded video signal

= =

− − − =

16 , 16 1 , 1

] , [ ] , [ )) ( , (

y x y x

m y m x c y x s m c s SAD

slide-46
SLIDE 46

46

Motion Estimation in H.264 Motion Estimation in H.264

  • What is new?

Variable Block size Motion Estimation,

  • Can yield 15% bit rate savings

Multiple reference frame Motion Estimation,

5-20% bit rate savings

Sub Pixel Motion Estimation,

20% bit rate savings over integer ME

slide-47
SLIDE 47

47

Search Window Search Window

Search Window (in previous frame)

Rectangle with the same coordinate as current block in

current frame, extended by w pixels in each directions

w w w w q p q+2w p+2w

slide-48
SLIDE 48

48

Cost Function Cost Function

The best match is found by minimizing the SAD (sum

Absolute Difference) function that is computed as :

Where s being the original video signal and c being

the coded video signal

= =

− − − =

16 , 16 1 , 1

] , [ ] , [ )) ( , (

y x y x

m y m x c y x s m c s SAD

slide-49
SLIDE 49

49

Full Search Method Full Search Method

  • Full Search
  • All candidates within search window

are examined

  • (2w+1)2 positions should be

examined

  • Advantage: Good accuracy, Finds

best match

  • Disadvantage: Large amount of

computation, (2w+1)2 matches, 16x16 MAE for each match that is Impractical for real-time applications

  • In order to avoid this complexity, we

should reduce search points so we have to use Fast Block Matching Algorithms

slide-50
SLIDE 50

50

Initial Search Point Prediction Initial Search Point Prediction

  • A median predictor is used for defining the initial search point
  • That is the median value of the motion vectors of three spatially

adjacent blocks: left, top and top-right (top-left) of the current block.

  • If C not exist then C=D
  • If B, C not exist then prediction = MV_A
  • If A, C not exist then prediction = MV_B
  • If A, B not exist then prediction = MV_C
  • Otherwise Prediction = median(MV_A,MV_B,MV_C)

A D B C E

) (

C mv B mv A mv median y pred x pred pred mv _ , _ , _ ) _ , _ ( _ =

slide-51
SLIDE 51

51

2 2-

  • D Logarithmic Search (TDL)

D Logarithmic Search (TDL)

  • Examine central point & its

four surroundings

  • Distance from center: w/2
  • Find best match
  • If the best match is not in the

center examine three new points centering previous best

  • Half the distance, continue

until the distance is 1, use all 9 matches, find best. Stop

  • Here the maximum search

points is: 2 + 7 log w

1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3

slide-52
SLIDE 52

52

Three Step Search (TSS) Three Step Search (TSS)

  • 1. check nine search points
  • 2. Step size is reduced by half

after each step.

  • 3. At the end of the search the

step size is one pel.

  • Repeat algorithm 3 times
  • Examines 25 points
  • Number of search points: 1 +

8 log w

  • Advantage: simple and regular

structure, good for HW implementation

  • Disadvantage: a uniformly

allocated checking point that makes it inefficient for small motion.

1 1 1 1 1 1 1 1 1 2 3 2 2 2 2 2 2 2 3 3 3 3 3 3 3

slide-53
SLIDE 53

53

Diamond Search (DS) Diamond Search (DS)

Experimental results show that:

  • 53% to 98% of the motion

vectors are enclosed in a circular area with a radium of 2 pels and centered

  • n the position of zero motion.

The block displacement of real-world

video sequences is mainly in horizontal and vertical directions.

the search points incurred within the circle with a

radium of 2 pels.

  • utperforms the TSS algorithm
slide-54
SLIDE 54

54

DS Algorithm DS Algorithm

  • 1. 9 checking points of LDSP are tested. If

the minimum point is located at the center position, go to Step 2; otherwise recursively repeat this step for the best point.

  • 2. Switch the search pattern from LDSP to
  • SDSP. The minimum point found in the best

point.

LDSP SDSP

slide-55
SLIDE 55

55

DS Algorithm DS Algorithm

  • (b) LDSP->LDSP when minimum is at one of the corner points
  • (c) LDSP->LDSP when minimum is along the edge of the

diamond

  • (d) LDSP->SDSP when minimum is at the center of the search

pattern.

slide-56
SLIDE 56

56

H.264 ME Algorithm (UMHexagonS) H.264 ME Algorithm (UMHexagonS)

1) Initial search point prediction 2) Unsymmetrical-cross search 3) Uneven multi-hexagonal-grid search 4) Extended hexagonal based search

Note that the ME is not a mandatory part, Here just the implemented ME in reference software is described.

slide-57
SLIDE 57

57

Initial Search Point Prediction Initial Search Point Prediction

A median predictor is used for defining the initial

search point

That is the median value of the motion vectors of

three spatially adjacent blocks- left, top and top-right (top-left) of the current block.

A D B C E

) (

C mv B mv A mv median y pred x pred pred mv _ , _ , _ ) _ , _ ( _ =

slide-58
SLIDE 58

58

Unsymmetrical Unsymmetrical-

  • Cross Search

Cross Search

  • the movement in the horizontal direction is much heavier than

that in the vertical direction- Based on experimental results

  • The distance between search points is chosen to be 2
  • The minimum cost MV will be chosen as search center of next

search step

slide-59
SLIDE 59

59

Uneven Multi Uneven Multi-

  • Hexagonal

Hexagonal-

  • Grid Search

Grid Search

slide-60
SLIDE 60

60

Extended Hexagonal Extended Hexagonal-

  • Based Search

Based Search

  • When previous optimum

MV locates in the outer concentric area, the search result has relatively low accuracy

  • motion vector

refinement by extended hexagonal based search method.

slide-61
SLIDE 61

61

Motion Estimation in H.264 Motion Estimation in H.264

On of the main H. 264 enhancement feature is its

motion estimation algorithm

  • What is new?

Variable Block size Motion Estimation,

  • Can yield 15% bit rate savings

Multiple reference frame Motion Estimation,

5-20% bit rate savings

Sub Pixel Motion Estimation,

20% bit rate savings over integer ME

slide-62
SLIDE 62

62

Variable Block Size ME Variable Block Size ME

  • A 16x16 macro block may contain more than one object
  • In other words: size of moving/stationary objects is variable
  • The objects may move in different directions,
  • ne motion vector is not enough to describe all objects movement
  • By defining one MV some part of the object will describe well and the other part

will give a big error.

  • The solution is defining variable block size
  • The macro block with more details will be coded using a smaller block size

7 various block size in H.264 block size partitioning

slide-63
SLIDE 63

63

Variable Block Size ME (Cont Variable Block Size ME (Cont’ ’d) d)

Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.

  • Transf. coeffs

Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant.

  • Input Video

Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal 8x8 4x8 1 1 2 3 4x4 8x4 1 8x8 Types 16x16 1 8x16 MB Types 8x8 1 2 3 16x8 1

slide-64
SLIDE 64

64

Partitions of MB Partitions of MB

slide-65
SLIDE 65

65

Variable Block Size ME (Cont Variable Block Size ME (Cont’ ’d) d)

Inter MB can be partitioned into smaller

regions for ME:

Up to 16 MVs MVs are differentially encoded. Need lots of optimization efforts to decide

the best mode: SAD + λ(Q) R

Mode decision:

R-D optimization with Lagrangian method Also an active research area.

slide-66
SLIDE 66

66

Variable Block Size ME Variable Block Size ME -

  • Example

Example

T=1 T=2

slide-67
SLIDE 67

67

Variable Block Size ME Variable Block Size ME -

  • Example

Example

T=1 T=2

slide-68
SLIDE 68

68

Variable Block Size ME Variable Block Size ME -

  • Example

Example

T=1 T=2

slide-69
SLIDE 69

69

Multiple Reference Frames ME Multiple Reference Frames ME

  • In previous standards up to 2 reference frames used for ME
  • Here, up to five different reference frames can be selected
  • resulting better subjective video quality and more efficient

coding of the video sequence.

  • might help making the H.264 bit stream error resilient.
slide-70
SLIDE 70

70

Multiple Reference Frames ME Multiple Reference Frames ME

In H.263, the reference frame for prediction is always

the previous frame

In MPEG and H.26L, some frames are predicted from

both the previous and the next frames (bi-prediction)

In H.264, up to 16 frames may be used as reference:

Encoder and decoder maintain synchronized buffers of

available frames (previously decoded)

resulting better subjective video quality and more

efficient coding of the video sequence

might help making the H.264 bit stream error

resilient

slide-71
SLIDE 71

71

Multiple Reference Frames ME Multiple Reference Frames ME

Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant.

  • Transf. coeffs

Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant.

  • Input Video

Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal

Multiple Reference Frames for Motion Compensation

slide-72
SLIDE 72

72

Subpixel Subpixel Motion Estimation Motion Estimation

  • When an object has a sub-pixel movement the integer pixel ME can’t

describe it, so sub pixel ME is defined

  • H.263 uses only half pixel and MPEG-4 uses quarter pixel accuracy
  • A gain of 1.5-2dB across the board over ½-pixel
  • H.264 uses higher precision of spatial accuracy for ME up to eighth

pixel accuracy

slide-73
SLIDE 73

73

Example

b = round [(E – 5F + 20G + 20H – 5I + J)/32]

slide-74
SLIDE 74

74

Example (cont’d)

a = round [(G + b)/2]

slide-75
SLIDE 75

75

Chroma Motion Vector

slide-76
SLIDE 76

76

  • H. 264 Cost Function
  • H. 264 Cost Function
  • The best match is found by minimizing the cost function:
  • m=(mx,my)T is the motion vector
  • p=(px,py)T is the predicted motion vector
  • λmotion is the Lagrange multiplier
  • R(m-p) represents the bits used to encode the motion information
  • The SAD (sum Absolute Difference) is computed as:
  • Where B = 16, 8 or 4 and s being the original video signal and c being the coded video

signal

) ( . )) ( , ( ) , ( p m R m c s SAD m J

motion motion

− + = λ λ

= =

− − − =

B B y x y x

m y m x c y x s m c s SAD

, 1 , 1

] , [ ] , [ )) ( , (

slide-77
SLIDE 77

77

MB Modes MB Modes

A MB can select one of these modes:

Intra_16x16 Intra_8x8 (not allowed in Baseline) Intra_4x4 I_PCM:

enables an encoder to transmit the values of the image

samples directly (without prediction or transformation).

Inter_16x16 Inter_16x8 Inter_8x16 Inter_8x8 SKIP

slide-78
SLIDE 78

78

P_SKIP Type P_SKIP Type

For this type, neither a quantized prediction

error signal, nor a motion or reference index parameter is transmitted

The reference picture is located at index 0 in

the multi-picture buffer

The motion vector is predicted from motion

vector predictor

  • It’s used for large are with no change or

constant motion.

Its size is 16x16

slide-79
SLIDE 79

79

Mode Decision Method in H.264/AVC Mode Decision Method in H.264/AVC

  • Calculate the RDCost for each Intra mode
  • Calculate the RDCost for SKIP mode
  • For each inter mode (16x16, 16x8, 8x16 and 8x8),
  • For each block in the current mode
  • Do ME in a search area, select the point that minimizes below equation:
  • End
  • Calculate the RDCost using:

RDCost = Distortion + λ×Rate

  • Note that :
  • Rate needs doing: Transform, Quantization and entropy coding
  • Distortion needs doing: Transform, Quantization Transform-1 and Quantization-1
  • End
  • From the calculated RDCosts:

(RDCost_Intra_16x16, RDCost_Intra_4x4, RDCost_I_PCM, RDCost_SKIP, RDCost_Inter_16x16, RDCost_Inter_16x8, RDCost_Inter_8x16 and RDCost_Inter_8x8)

select the least one as the best mode.

) ( . )) ( , ( ) , ( p m R m c s SAD m J

motion motion

− + = λ λ

slide-80
SLIDE 80

80

Slice

Each frame can be coded in one or more

slices, each containing one (16x

x16) or all the

macroblocks in the frame (1 slice per picture)

The number of macroblocks per slice need

not be constant within a picture

Because of minimal inter-dependency

between coded slices propagation of error can be limited

slide-81
SLIDE 81

81

Slice Coding Slice Coding

  • Slices can have different shapes and sizes
  • Slices do not have to be consecutive in the raster scan
  • Each slice is self-contained
  • Can be decoded without knowing data other slices
  • Useful for:
  • Error resilience and concealment
  • Parallel processing
slide-82
SLIDE 82

82

Slice Type Slice Type

Each slice can be coded as one of 5 types:

I slice:

All MBs are coded using intra mode.

P slice:

A MB can be coded in intra mode or inter mode with at most

  • ne prediction signal per block.

B slices:

In addition to modes in P slice, some MBs can also be predicted

using two prediction signal per block.

SP slice: Switching-P slice

To facilitate switching between different video streams

SI slice: Switching-I slice

  • Using only Intra prediction
slide-83
SLIDE 83

83

Slice Modes in H.264 Slice Modes in H.264

slide-84
SLIDE 84

84

Slice Syntax Slice Syntax

slide-85
SLIDE 85

85

Slice Syntax

A macroblock contains coded data

corresponding to a 16x

x16 sample region of a

video frame 16x

x16 for luma and 8x x8 for cr, cb

slide-86
SLIDE 86

86

Slices

The H.264 encoder intelligently groups MBs into a

slice whose size is less than (or equal to) the size of the maximum transportation unit (MTU).

Slices are decoded independently Prediction beyond the slice

boundaries is forbidden to prevent error propagation from intra-frame predictions

slide-87
SLIDE 87

87

Arbitrary Slice Order (ASO)

  • The Baseline Profile supports the decoding order of

the slices to be arbitrary.

permits, for example, to reduce decoding delay in

case of out-of-order delivery of NAL units.

Application example

reduce end-end transmission delay in RT app

slide-88
SLIDE 88

88

Flexible Macroblock Ordering (FMO)

Using FMO, it is no longer required that slices consist

  • f neighboring macroblock.

provide efficient methods for error concealment in

error-prone channels

The objective behind the flexible macroblock ordering

(FMO) is to scatter possible errors to the whole frame as equally as possible to avoid error accumulation in a limited region.

slide-89
SLIDE 89

89

Slice Group

Slice Group : a subset of the macroblocks and may

contain one or more slices

In FMO frame is divided to some slice groups. Each macroblock could be assigned freely to a certain

slice group using a MAP function.

slide-90
SLIDE 90

90

MAP Function

slide-91
SLIDE 91

91

Redundant Coded Picture

Send the duplicated part or all of a coded picture In normal operation, the decoder reconstructs the

frame from ‘primary’ (nonredundant)’ pictures and discards any redundant pictures.

However, if a primary coded picture is

damaged (e.g. due to a transmission error), the decoder may replace the damaged area with decoded data from a redundant picture if available.

slide-92
SLIDE 92

92

MB Prediction Types MB Prediction Types

Intra:

MB is predicted from the neighboring blocks of the

same frame. Intra prediction is performed on 16x16, 4x4 and 8x8 (in FRExt profile) blocks.

Inter:

MB is predicted form the regions in previous

(next) frames, using motion estimation.

slide-93
SLIDE 93

93

MB Syntax Element MB Syntax Element

slide-94
SLIDE 94

94

Transformation and Quantization in H.264

slide-95
SLIDE 95

95

Transformation

H.264 uses three transforms

Hadamard transform for the 4x

x4 array of

luma DC coefficients

Hadamard transform for the 2x

x2 array of

chroma DC coefficients

DCT-based transform for all other 4x

x4

blocks in the residual data

slide-96
SLIDE 96

96

Transformation

[1]

slide-97
SLIDE 97

97

Transformation

Fundamental differences between H.264

transform and DCT

It is an integer transform It is possible to ensure zero mismatch between

encoder and decoder

Can be implemented using only additions and

shifts

A scaling multiplication is integrated into the

quantizer

Can be carried out using 16-bit integer arithmetic

slide-98
SLIDE 98

98

Transformation

DCT

[1]

slide-99
SLIDE 99

99

Transformation

DCT Approximation

[1]

slide-100
SLIDE 100

100

Transformation

4x

x4 Hadamard Transform

2x

x2 Hadamard Transform

[1]

slide-101
SLIDE 101

101

Quantization

Requirements of complicated forward and

inverse quantizers

Avoid division and/or floating point arithmetic Incorporate the post and pre scaling matrices

The basic forward quantizer A total of 52 values of Qstep are supported

by the standard, indexed by a quantization parameter

slide-102
SLIDE 102

102

Quantization

The wide range of quantizer step sizes

makes it possible for an encoder to control the tradeoff accurately and flexibly between bit rate and quality

QP can be different for luma and

chroma (commonly QPC > QPY)

slide-103
SLIDE 103

103

Quantization

Forward quatization in integer

arithmetic

Inverse quatization in integer arithmetic

[1]

slide-104
SLIDE 104

104

Complete T&Q

[1]

slide-105
SLIDE 105

105

Complete T&Q

Encoding

Input: 4 × 4 residual samples: X Forward ‘core’ transform: W = Cf XCTf

(followed by forward transform for Chroma DC or Intra-16 Luma DC coefficients).

Post-scaling and quantization: Z =

W.round(PF/Qstep) (different for Chroma DC or Intra-16 Luma DC).

slide-106
SLIDE 106

106

Complete T&Q

Decoding:

(Inverse transform for Chroma DC or Intra-16

Luma DC coefficients)

Decoder scaling (incorporating inverse transform

pre-scaling): W’ = Z.Qstep.PF.64 (different for Chroma DC or Intra-16 Luma DC).

Inverse ‘core’ transform: X’ = CT

i W’Ci

Post-scaling: X” = round(X’/64) Output: 4 × 4 residual samples: X”

slide-107
SLIDE 107

107

Deblocking Filter

Applied to each decoded macroblock to

reduce the blocking distortion

Filtering is applied to vertical or

horizontal edges of 4x

x4 blocks in a

macroblock (except for edges on slice boundaries)

The filter is stronger at places where

there is likely to be significant blocking distortion such as the boundary of an Intra coded macroblock or a boundary

slide-108
SLIDE 108

108

Deblocking Filter (cont’d)

The effect of the filter decision is to

switch off the filter when there is a significant change (gradient) across the block boundary in the original image

slide-109
SLIDE 109

109

Error Resilience Tools

Error Resilience Tools: 1) Flexible Macroblock Order (FMO) 2) Arbitrary Slice Order (ASO) 3) Redundant Slices (RS) 4) Data Partitioning (DP).

slide-110
SLIDE 110

110

FMO

FMO can work to randomize the data

prior to transmission, so that if a segment of data is lost, the errors are distributed more randomly over the video pictures

the relevant neighboring data is

available for concealment of lost content.

slide-111
SLIDE 111

111

FMO

slide-112
SLIDE 112

112

ASO

ASO allows slices of a picture to appear in

any order for delay reduction (particularly for use on networks that can deliver data packets out of order).

slide-113
SLIDE 113

113

RS

Coding with a low QP (and hence in

good quality), and the RS is coded with a high QP (utilizing fewer bits). decoder decodes primary slice, if it is available, and discards the RS.

if the primary slice is missing the RS

can be reconstructed

slide-114
SLIDE 114

114

DP

Type A: Header information, including MB types,

quantization parameters, and motion vectors

The Intra Partition, called type B partition. It carries

Intra CBPs and Intra coefficients

The Inter Partition, called type C partition. It

contains only Inter CBPs and Inter coefficients

slide-115
SLIDE 115

115

  • Baseline (Progressive, Videoconferencing & Wireless)

I and P picture types (not B)

Uses list0 for P-Slices

In-loop deblocking filter 1/4-sample motion compensation Tree-structured motion segmentation down to 4x4 block size VLC-based entropy coding (UVLC and CAVLC) Some enhanced error resilience features

Flexible macroblock ordering/arbitrary slice ordering Redundant slices

Baseline Profile Baseline Profile

slide-116
SLIDE 116

116

Main

May Use list0 and/or list1 for B-Slices B-Slices may use:

One past and one future reference Two past references Two future references

slide-117
SLIDE 117

117

B-Slice Prediction

slide-118
SLIDE 118

118

Prediction Options

B Slices may be predicted in one of several ways

slide-119
SLIDE 119

119

Direct Prediction

No motion vector is transmitted for a B slice

macroblock or macroblock partition encoded in direct mode

Decoder calculates list0 and list1 vectors

based on previously coded vectors and uses these to carry out bi-predictive motion compensation of the decoded residual samples

A skipped macroblock in a B slice is

reconstructed at the decoder using Direct prediction

slide-120
SLIDE 120

120

Weighted Prediction

The weights w0,w1 may be transmitted

from encoder to decoder in implicit mode or the decoder calculates them based on relative temporal positions of the list0 and list1 reference pictures in explicit mode

slide-121
SLIDE 121

121

Extended

SI and SP slices are specially coded

slices that enable (among other things) efficient switching between video streams and efficient random access for video decoders

A common requirement in a streaming

application is for a video decoder to switch between one of several encoded streams

slide-122
SLIDE 122

122

Example

The same video material is coded at

multiple bitrates for transmission across the internet and a decoder attempts to decode the highest-bitrate stream it can receive but may require switching automatically to a lower-bitrate stream if the data throughput drops

slide-123
SLIDE 123

123

Switching

slide-124
SLIDE 124

124

Switching (cont’d)

slide-125
SLIDE 125

125

SP Slice

slide-126
SLIDE 126

126

SP Slice (cont’d)

slide-127
SLIDE 127

127

Random Access to Video Frames

We can decode A0, create SP-Slice A0-10,

predict A10 from A0

slide-128
SLIDE 128

128

SI Slices

The same as SP Slices but their

prediction is Intra 4x

x4 from the

previously decoded and reconstructed image samples

slide-129
SLIDE 129

129

H.264/AVC Profiles H.264/AVC Profiles

slide-130
SLIDE 130

130

H.264 Rate Control

slide-131
SLIDE 131

131

Compression in H.264

Compression unit

MacroBlock (MB)

Coding parameters

Motion estimation search area Quantization step size Etc

Constant parameters variable bit-rate Variable bit-rate problems

Practical delivery Storage mechanism (a film on VCD or DVD) Network (circuit switch, packet switch) Delay, FIFO size, Decoder stall

slide-132
SLIDE 132

132

Encoder Output Encoder Buffer Decoder Buffer Decoder Buffer

Variable Bitrate: Example

  • 25 Frame/Sec
  • Qp = 12
  • Net = 100 Kbps
  • Net = 4 Kb/frame
  • F1 = 54 Kbit
slide-133
SLIDE 133

133

Rate Control

  • Problem
  • Decoder Stall

Freeze playback

  • Wider bit-rate variation

Larger buffer size Larger decoding delay

  • Increase delay avoid stall

Online applications

  • Buffer Overflow (Under flow?)
  • Solutions
  • Rate control modify coding parameters

Maintain a target bit-rate Minimize distortion in decoding The most obvious parameter = Qp Increasing Qp reduce bit-rate (and quality)

  • The choice of rate control

Nature of video application

slide-134
SLIDE 134

134

Nature of Video Applications

  • Offline encoding on DVD
  • Processing time = not important
  • Goals
  • fit in DVD
  • best quality
  • DVD Buffer = not over/under flow
  • Rate control= Two-pass encoding
  • Pass1= Statistic of video
  • Pass2= Coding
  • Live video for broadcast
  • Encoder= One, High quality, Fast hardware
  • Decoder= Multi, limited buffer/processing
  • A few delay = acceptable
  • Rate control = medium complexity, two pass in frame
  • Video-conferencing
  • Each side = both encoding and decoding power=limited
  • Delay = minimum as possible (0.5 Sec)
  • Rate control = low complexity
  • Encoder = tightly control bit-rate Quality vary (drop in high motion)
slide-135
SLIDE 135

135

H.264/AVC

In the first version of the standard: The 4:2:0 chroma format (typically derived by

performing an RGB-to-YCbCr color-space transformation)

8 bit sample precision for luma and chroma

values

slide-136
SLIDE 136

136

Color Space

slide-137
SLIDE 137

137

FRExt Color Space

The FRExt amendment extended the

standard to 4:2:2 and 4:4:4 chroma formats and higher than 8 bits precision

In 4:2:0 chroma format, each macroblock

consists of a 16x16 region of luma samples and two corresponding 8x8 chroma sample

  • arrays. In a macroblock of 4:2:2 chroma

format video, the chroma sample arrays are 8x16 in size; and in a macroblock of 4:4:4 chroma format video, they are 16x16 in size.

slide-138
SLIDE 138

138

FRExt Color Space

called YCgCo (where the "Cg" stands for

green chroma and the "Co" stands for orange chroma), which is much simpler and typically has equal or better coding efficiency.

slide-139
SLIDE 139

139

FRExt Color Space

slide-140
SLIDE 140

140

The End