In the name of Allah In the name of Allah In the name of Allah In - - PowerPoint PPT Presentation

in the name of allah in the name of allah in the name of
SMART_READER_LITE
LIVE PREVIEW

In the name of Allah In the name of Allah In the name of Allah In - - PowerPoint PPT Presentation

In the name of Allah In the name of Allah In the name of Allah In the name of Allah THE COMPASSIONATE THE MERCIFUL THE COMPASSIONATE THE MERCIFUL THE COMPASSIONATE, THE MERCIFUL THE COMPASSIONATE, THE MERCIFUL Digital Video Processing


slide-1
SLIDE 1

In the name of Allah In the name of Allah

THE COMPASSIONATE THE MERCIFUL THE COMPASSIONATE THE MERCIFUL

In the name of Allah In the name of Allah

THE COMPASSIONATE, THE MERCIFUL THE COMPASSIONATE, THE MERCIFUL

slide-2
SLIDE 2

Digital Video Processing Digital Video Processing Digital Video Processing Digital Video Processing

  • S. KASAEI
  • S. KASAEI

ROOM: CE ROOM: CE 307 307

D E P A R T M E N T O F C O M P U T E R E N G I N E E R I N G D E P A R T M E N T O F C O M P U T E R E N G I N E E R I N G S H A R I F U N I V E R S I T Y O F T E C H N O L O G Y S H A R I F U N I V E R S I T Y O F T E C H N O L O G Y E -

  • M A I L :

M A I L : S K A S A E I @ S H A R I F . E D U S K A S A E I @ S H A R I F . E D U W E B P A G E : W E B P A G E : H T T P : / / S H A R I F . E D U / ~ S K A S A E I H T T P : / / S H A R I F . E D U / ~ S K A S A E I L A B . W E B S I T E : L A B . W E B S I T E : H T T P : / / I P L . C E . S H A R I F . E D U H T T P : / / I P L . C E . S H A R I F . E D U

slide-3
SLIDE 3

Chapters Chapters 13 13 Chapters Chapters 13 13

VIDEO COMPRESSION STANDARDS VIDEO COMPRESSION STANDARDS

slide-4
SLIDE 4

H. H.264 264/AVC Standard /AVC Standard 4/

slide-5
SLIDE 5

H. H.264 264/AVC /AVC History History

5

  • Objectives:

50% bit rate savings compared to MPEG-2. High quality video at both low and high bit rates:

64kbps to 240Mbps.

Network-friendly: more error resilient tools.

b h l d l l

Supports both conversational and non-conversational applications:

Conversational: video conference. Non-conversational: storage, broadcast, streaming.

  • 1998: Call for proposal for H.26L issued by ITU-T video coding expert group

(VCEG) (VCEG).

  • Oct. 1999: First draft design.
  • Dec. 2001: ITU and ISO formed the joint video team (JVT).
  • Mar. 2003: approved:

ITU T H 264 and ISO/IEC MPEG 4 Part 10 advanced video coding (AVC) ITU-T H.264 and ISO/IEC MPEG-4 Part 10 advanced video coding (AVC).

  • Jul 2004: Fidelity range extensions (FRExt).
  • Current: Scalable video coding (SVC) extensions.

Kasaei

slide-6
SLIDE 6

Performance of H. Performance of H.264 264

6

Kasaei

slide-7
SLIDE 7

Rate Rate-

  • Distortion Function in H.

Distortion Function in H.264 264

7

Kasaei

QCIF formatted frames with 10 Hz frame rate.

slide-8
SLIDE 8

Comparison with other Video Coders Comparison with other Video Coders

8

slide-9
SLIDE 9

H. H.264 264/AVC Terminology /AVC Terminology

9

Block: An MxN (M-column by N-row) array of samples, or an

Block: An MxN (M column by N row) array of samples, or an MxN array of transform coefficients.

  • Contains coded data corresponding to a 4x4 sample region of the video frame.

M bl k A 16 16 bl k f l l d t

Macro-block: A 16x16 block of luma samples and two

corresponding blocks of chroma samples.

sub-macroblock: One quarter of the samples of a macroblock.

sub macroblock: One quarter of the samples of a macroblock.

  • An 8x8 luma block and two corresponding chroma blocks of which one corner is

located at a corner of the macroblock.

Slice: Contains an integral number of MBs from 1 (1MB per Slice: Contains an integral number of MBs, from 1 (1MB per

slice) to the total number of MBs in a picture (1 slice per picture).

Kasaei

slide-10
SLIDE 10

H. H.264 264/AVC Terminology /AVC Terminology

10

Reference Picture: Previously encoded frames that may be used as a

Reference Picture: Previously encoded frames that may be used as a reference in motion estimation.

  • A reference picture contains samples that may be used for inter prediction in the decoding

process of subsequent pictures in decoding order.

M i V A D d f i di i h id

Motion Vector: A 2-D vector used for inter prediction that provides

an offset from the coordinates in the decoded picture to the coordinates in a reference picture. fil f di f i h ifi h i i d f

Profile: A set of coding functions, each specifies what is required of an

encoder or decoder that complies with the profile.

Level: Performance limits for CODECs are defined by a set of Levels.

  • Each placing limits on parameters such as sample processing rate, picture size, coded bitrate and

memory requirements.

Kasaei

slide-11
SLIDE 11

Key Idea in Video Coding Key Idea in Video Coding

Predicts each frame from a previous frame and only encodes the

prediction error:

11

prediction error:

  • Prediction error has smaller energy and is easier to compress.
  • Predict at what level?

Frame level: good for camera panning. Block level: too many motion information to code. Macro-block (MB) level (16 x 16 pixels): most widely used. Kasaei

slide-12
SLIDE 12

Scope of Picture and Video Coding Scope of Picture and Video Coding Standardization Standardization Standardization Standardization

Only the Syntax and the Decoder are standardized.

12

Kasaei

slide-13
SLIDE 13

VCL & NAL in H. VCL & NAL in H.264 264

13

Kasaei

slide-14
SLIDE 14

Coded Data Format Coded Data Format

  • H.264 makes a distinction between a video coding layer (VCL) and a network

b t ti l (NAL) 14 abstraction layer (NAL).

  • NAL is an easy integration of the coded video into all current and possible future protocol and multiplex

architectures.

  • Encoding process output is a VCL data.

f bi i h d d id d hi h d i i i i

  • A sequence of bits representing the coded video data which are mapped to NAL units prior to transmission or

storage.

  • Each NAL unit contains a raw byte sequence payload (RBSP).
  • A set of data corresponding to coded video data or header information.
  • A coded video sequence is represented by a sequence of NAL units (below

Figure).

  • That can be transmitted over a packet-based network ,or a bit stream transmission link ,or stored in a file.

Kasaei

slide-15
SLIDE 15

NAL NAL

15

H.264 consists of a video coding layer and a network abstraction layer.

g y y

NAL includes header and a payload.

  • The header specifies the NAL unit type and the payload contains the related data.

NAL design specified in the recommendation is appropriate for the

adaptation of H.264 over RTP/UDP/IP, H.324/M, MPEG-2 transport, and H.320.

Kasaei

slide-16
SLIDE 16

NALU NALU

16

  • NAL header contains three fields:

1.

Nal_Ref_Idc (NRI) contains two bits that indicate the priority of the NALU payload.

  • Where 11 is the highest transport priority followed by 10 then by 01 and finally 00 is the lowest
  • Where 11 is the highest transport priority, followed by 10, then by 01, and finally, 00 is the lowest.

2.

NALU type (T) is a 5-bit field that characterizes the NALU as one of 32 different types. yp

  • Types 1–12 are currently defined by H.264. Types 24 to 31 are made available for use outside of

H.264.

3

Forbidden bit is specified to be zero in H 264 encoding

3.

Forbidden_bit, is specified to be zero in H.264 encoding.

  • Network elements can set this bit to 1 when they identify bit errors in NALU.

Kasaei

slide-17
SLIDE 17

Nal Nal-

  • Ref

Ref-

  • Idc

Idc ( (NRI) NRI)

17

Kasaei

slide-18
SLIDE 18

H. H.264 264/AVC Profiles /AVC Profiles

18

Kasaei

slide-19
SLIDE 19

H. H.264 264/AVC Profiles /AVC Profiles

19

AVC scheme includes different profiles:

Baseline Profile – For low-delay end-to-end applications. Main Profile – For broadcasting application at standard definition (SD) level. Extended Profile – For mobile applications and e-streaming. High Profile - To address the needs of the most demanding applications (such as

contribution and distribution of content, studio editing and post processing) named as , g p p g) fidelity range extensions (FRExt):

High profile (HP): 8-bit per sample, 4:2:0 sampling. High 10 profile (Hi10P): 8-10 bit per sample, 4:2:0 sampling. High 4:2:2 profile (H422P): 10-bit per sample, 4:2:2 sampling. High 4:4:4 profile (H444P): 12-bit per sample supporting up to 4:4:4 chroma sampling High 4:4:4 profile (H444P): 12-bit per sample, supporting up to 4:4:4 chroma sampling,

and additionally supporting efficient lossless region coding.

Kasaei

slide-20
SLIDE 20

H. H.264 264/AVC Profiles /AVC Profiles

20

Kasaei

slide-21
SLIDE 21

H. H.264 264/AVC Levels /AVC Levels

21

Kasaei

slide-22
SLIDE 22

Baseline Baseline

22

Intra coding Inter coding using I-slice and P-slice Entropy coding using context-based adaptive variable length coding

(CAVLC)

Kasaei

slide-23
SLIDE 23

Main Main

23

Interlaced video Inter coding using B-slices Inter coding using weighted prediction Entropy coding using context based adaptive binary arithmetic coding Entropy coding using context-based adaptive binary arithmetic coding

(CABAC)

Kasaei

slide-24
SLIDE 24

Extended Extended

24

SI and SP slices to enable efficient switching between coded bit streams Improved error resilience (data partitioning) Does not have interlaced video or CABAC

Kasaei

slide-25
SLIDE 25

Applications Applications

25

Baseline:

  • Video telephony, video conferencing, wireless communications.

Main:

  • Television broadcasting, video storage.

Extended:

  • Streaming media applications.

Kasaei

slide-26
SLIDE 26

Reference Pictures Reference Pictures

Reference picture management:

26

  • Uses a window of n frames.
  • Two lists list0 and list1.

Kasaei

slide-27
SLIDE 27

New Added Features to H. New Added Features to H.264 264/AVC /AVC

27

Intra-prediction Motion Estimation

Variable block sizes Multiple reference frames

p

Sub-pixel motion estimation

Image transform De blocking filter De-blocking filter Entropy coding

Kasaei

slide-28
SLIDE 28

Basic Basic Macroblock Macroblock Coding Structure Coding Structure

28

Coder Control Data Quant. Coder Control Transform/ S l /Q t Input Video Signal Entropy Scaling & Inv. Transform Quant.

  • Transf. coeffs

Decoder

Scal./Quant.

  • Split into

Macroblocks (16x16 pixels) Entropy Coding (16x16 pixels) Intra-frame De-blocking Filter Motion- Compensation Intra/Inter Prediction Output Video Signal Motion Data Motion Estimation

Kasaei

slide-29
SLIDE 29

Encoder (Forward Path) Encoder (Forward Path)

An input frame or field Fn is processed in units of MBs.

E h MB i d d i i i d

29

Each MB is encoded in intra or inter mode. For each block in the MB, a prediction PRED (marked ‘P’ in the

figure) is formed based on reconstructed picture samples.

Kasaei

slide-30
SLIDE 30

Encoder (Forward Path) Encoder (Forward Path)

In Intra mode:

PRED is formed from samples in the current slice that have been

i l d d d d d d d

30

previously encoded, decoded and reconstructed.

uF’n in the figures; note that unfiltered samples are used to form PRED.

In Inter mode:

PRED is formed by motion-compensated prediction from one or

y p p more reference picture(s) selected from a set of reference pictures.

Kasaei

slide-31
SLIDE 31

Encoder (Forward Path) Encoder (Forward Path)

  • P is subtracted from the current block to produce Dn (a residual block) that is

transformed and quantized to give X. X i t f ti d t f ffi i t hi h d d d t 31

  • X is a set of quantized transform coefficients which are reordered and entropy

encoded.

  • Entropy-encoded coefficients, together with side information (prediction modes,

quantization parameter, motion vector information, etc.) form the compressed bitstream.

  • Compressed bitstream is passed to an NAL for transmission or storage.

Kasaei

slide-32
SLIDE 32

Encoder (Reconstruction Path) Encoder (Reconstruction Path)

  • Encoder decodes (reconstructs) every MB to provide a reference for further

predictions. 32

  • Coefficients X are scaled (Q−1) and inverse transformed (T−1) to produce a

difference block D’n.

  • Prediction block P is added to D’n to create a reconstructed block uF’n
  • A decoded version of the original block.
  • u indicates that it is unfiltered.
  • u indicates that it is unfiltered.
  • A filter is applied to reduce the effects of blocking distortion and the

reconstructed reference picture is created from a series of blocks F’n.

Kasaei

slide-33
SLIDE 33

Decoder Decoder

  • Decoder receives a compressed bitstream from the NAL.
  • Entropy decodes the data elements to produce a set of quantized coefficients X.

33 Entropy decodes the data elements to produce a set of quantized coefficients X.

  • These are scaled and inverse transformed to give D’n
  • Identical to the D’n shown in the Encoder.
  • Using the header information decoded from the bitstream, the decoder creates a

prediction block PRED.

  • Identical to the original prediction PRED formed in the encoder
  • Identical to the original prediction PRED formed in the encoder.
  • PRED is added to D’n to produce uF’n , which is filtered to create each decoded

block F’n.

Kasaei

slide-34
SLIDE 34

Intra Intra-

  • Prediction

Prediction

34

Motivation: Intra-frames are natural images, so they

hibi i l l i exhibit strong spatial correlation.

Macro-blocks in intra-coded frames are predicted based on

Macro blocks in intra coded frames are predicted based on previously-coded macro-blocks.

Above and (or to the left of) the current block. Macro-block may be divided into 16 4x4 sub-blocks which are Macro-block may be divided into 16, 4x4 sub-blocks which are

predicted in a cascading fashion. 9 modes for 4x4 and 4 modes for 16x16 size 9 modes for 4x4 and 4 modes for 16x16 size.

Kasaei

slide-35
SLIDE 35

Intra Intra-

  • Prediction

Prediction

35

Coder Control Data Quant. Coder Control Transform/ S l /Q t Input Video Signal Entropy Scaling & Inv. Transform Quant.

  • Transf. coeffs

Decoder Scal./Quant.

  • Split into

Macroblocks (16x16 pixels) Entropy Coding (16x16 pixels) Intra-frame De-blocking Filter Motion- Compensation Intra/Inter Prediction Output Video Signal Motion Data Motion Estimation

Kasaei

slide-36
SLIDE 36

Intra Intra-

  • Prediction

Prediction

36

Coder Control Data Quant. Coder Control Transform/ S l /Q t Input Video Signal

Directional spatial prediction (9 types for luma, 4 chroma)

Q A B C D E F G H I a b c d

Entropy Scaling & Inv. Transform Quant.

  • Transf. coeffs

Decoder Scal./Quant.

  • Split into

Macroblocks (16x16 pixels)

I a b c d J e f g h K i j k l L m n o p M

Entropy Coding (16x16 pixels) Intra-frame De-blocking Filter 2 7 8

N O P

Motion- Compensation Intra/Inter Prediction Output Video Signal

e.g., Mode 3: Diagonal down/right prediction

1 3 4 5 6 8 Motion Data Motion Estimation

a, f, k, p are predicted by (A + 2Q + I + 2) >> 2

Kasaei

slide-37
SLIDE 37

Intra Intra-

  • Prediction

Prediction

Reducing spatial redundancies within a picture.

37

Reducing spatial redundancies within a picture.

Prediction using surrounding pixels. Used only in H.264. 2 sizes for intra prediction.

4x4 16x16

slide-38
SLIDE 38

Luma 4x x4 Intra Modes

38

Kasaei

slide-39
SLIDE 39

Luma 16x x16 Intra Modes

39

Kasaei

slide-40
SLIDE 40

Optimal Intra Optimal Intra 4 4x x4 4 Mode Selection Mode Selection

40

Selects the mode with the best R-D tradeoff. Full search method:

Divide each MB into sixteen 4x4 blocks. For each 4x4 block: For each of the nine lntra 4x4 prediction modes: For each of the nine lntra_4x4 prediction modes:

Predict the current 4x4 block by the current mode. Get prediction residual. Apply transform, quantization, entropy coding, inverse quantization, and inverse

transform, find the output bits R, and the reconstruction error SAD or MSE. C t th j i t t SAD λ(Q) R

  • Compute the joint cost: SAD + λ(Q) R:
  • Q: quantization step.
  • λ(Q) = 0.85 x 2 ^ (Q-12) / 3: obtained through experiments.

End Find the mode with the smallest cost as the best Intra_4x4 prediction mode _4 4 p for this 4x4 block. End

Fast method: An active research area.

Reduce the number of searches.

Kasaei

slide-41
SLIDE 41

Optimal Intra Optimal Intra 16 16x x16 16 Mode Selection Mode Selection

41

Full search method:

For each lntra l6x16 prediction mode: For each lntra_l6x16 prediction mode:

Get prediction of the current MB. Find the prediction residual. Perform 2D 4-point Hadamard transform for each 4x4 block. Extract all the DC from the sixteen 4x4 blocks and apply 2D 4-point Hadamard Extract all the DC from the sixteen 4x4 blocks and apply 2D 4-point Hadamard

transform to the 4x4 DC again.

Cost estimation: Compute the absolute value of all the Hadamard transform

coefficients.

end Find the mode with the smallest cost as the best Intra_16x16 prediction mode for this MB.

Decision between Intra 4x4 and Intra 16x16: Decision between Intra_4x4 and Intra_16x16:

Compare the costs of Intra_4x4 mode and Intra_16x16 mode to find the best mode.

Kasaei

slide-42
SLIDE 42

Inter Inter-

  • Prediction

Prediction

Block size for inter-prediction:

p

H.261 & MPEG-1: 16X16 MPEG-2: 16X16, 16X8 H.263 & MPEG-4: 16X16 & 8X8

3 4

H.264: 16x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4

Motion estimation accuracy: Motion estimation accuracy:

H.261: integer-pel MPEG-1 & MPEG-2 & H.263: half-pel

MPEG 4 & H 264: quarter pel

MPEG-4 & H.264: quarter-pel

Skip prediction

42

slide-43
SLIDE 43

Motion Estimation (ME) Motion Estimation (ME)

For each block, find the best match in the previous frame (reference

frame)

43

frame)

Upper-left corner of the block being encoded: (x0, y0) Upper-left corner of the matched block in the reference frame: (x1, y1)

Motion vector (dx, dy): the offset of the two blocks:

(d d ) ( 1 0 1 0)

  • (dx, dy) = (x1 – x0, y1 – y0)
  • (x0, y0) + (dx, dy) = (x1, y1)

Motion vector need to be sent to the decoder.

Kasaei

slide-44
SLIDE 44

Motion Compensation (MC) Motion Compensation (MC)

44

Given reference frame and the motion vector, can obtain a prediction of

h f the current frame.

Prediction error: Difference between the current frame and the

predicted frame.

Prediction error will be coded by DCT, quantization, and entropy

y , q , py coded.

Kasaei

slide-45
SLIDE 45

GOP, I, and P Frames GOP, I, and P Frames

45

GOP: Group of pictures (frames). I frames (Key frames):

Intra-coded frame, coded as a still image.

Can be decoded directly.

Used for GOP head, or at scene changes. I frames also improve the error resilience.

P frames: (Inter-coded frames)

Predication-based coding, based on previous frames.

Kasaei

slide-46
SLIDE 46

GOP, I, P, and B Frames GOP, I, P, and B Frames

B frames: Bi-directional interpolated prediction

46

p p frames.

Predicted from both the previous frame and the next frame:

more flexibilities -> better prediction. Encoding order: 1 4 2 3 7 5 6. Decoding order: 1 4 2 3 7 5 6. Display order: 1 2 3 4 5 6 7 Display order: 1 2 3 4 5 6 7. Needs more buffers. Needs buffer manipulations to

p display the correct order.

Kasaei

slide-47
SLIDE 47

Block Matching Algorithms for ME Block Matching Algorithms for ME

47

Split each frame into 16x16 blocks (MB), apply motion estimation for

each macro-block each macro block.

Search window (maximum movement): w

  • Typically 8, 16 or 32.

Define a cost for finding the best match for each block in the previous

frame.

Mean absolute error (MAE) or sum of absolute difference (SAD). Mean squared error (MSE). Sum of squared error (SSE).

f q ( )

Calculate the motion vector (MV) between the current block and its

counterpart in the previous frame.

Calculate the macro-block differences and send them.

Kasaei

slide-48
SLIDE 48

Cost Function Cost Function

The best match is found by minimizing the sum of

absolute difference (SAD) function

48

absolute difference (SAD) function

16 , 16

= =

− − − =

6 , 6 1 , 1

] , [ ] , [ )) ( , (

y x y x

m y m x c y x s m c s SAD where s is the original video signal and c is the coded video signal video signal.

Kasaei

slide-49
SLIDE 49

Motion Estimation in H. Motion Estimation in H.264 264

49

What is new?

Variable block size ME.

Can yield 15% bit rate savings.

Multiple reference frame ME.

5-20% bit rate savings 5-20% bit rate savings

Sub-pixel ME.

p

20% bit rate savings over integer ME.

Kasaei

slide-50
SLIDE 50

Variable Block Size ME Variable Block Size ME

50

Coder Control Data Quant. Coder Control Transform/ Scal /Quant Input Video Signal Entropy Scaling & Inv. Transform

  • Transf. coeffs

Decoder Scal./Quant.

  • Split into

Macroblocks (16x16 pixels) Entropy Coding ( p ) Intra-frame P di ti De-blocking Filter16x16 8x16 MB 8x8 1 16x8 Motion- Compensation Intra/Inter Prediction Output Video Signal 8x8 4x8 1 1 4x4 8x4 8x8 1 Types 2 3 1 Motion Data Motion Estimation 1 2 3 1 8x8 Types

Kasaei

slide-51
SLIDE 51

Variable Block Size ME Variable Block Size ME -

  • Example

Example

51

T=1 T=2

Kasaei

slide-52
SLIDE 52

Variable Block Size ME Variable Block Size ME -

  • Example

Example

52

T=1 T=2

Kasaei

slide-53
SLIDE 53

Variable Block Size ME Variable Block Size ME -

  • Example

Example

53

T=1 T=2

Kasaei

slide-54
SLIDE 54

Multiple Reference Frames ME Multiple Reference Frames ME

54

Coder Control Data Quant. Coder Control Transform/ S l /Q t Input Video Signal Entropy Scaling & Inv. Transform Quant.

  • Transf. coeffs

Decoder Scal./Quant.

  • Split into

Macroblocks (16x16 pixels) Entropy Coding (16x16 pixels) Intra-frame De-blocking Filter Motion- Compensation Intra/Inter Prediction Output Video Signal Motion Data Motion Estimation

Multiple Reference Frames for Motion Compensation

Kasaei

slide-55
SLIDE 55

Sub Sub-

  • Pixel Motion Estimation

Pixel Motion Estimation

  • When an object has a sub-pixel movement, the integer pixel ME cannot

describe it; so sub-pixel ME is defined

55

describe it; so sub-pixel ME is defined.

  • H.263 uses only half-pixel and MPEG-4 uses quarter-pixel accuracy
  • A gain of 1.5-2dB across the board over ½-pixel.
  • H.264 uses higher precision of spatial accuracy for ME up to eighth-pixel

accuracy.

Kasaei

slide-56
SLIDE 56

Search Window Search Window

Search window (in previous frame)

56

Rectangle with the same coordinate as current block in current

frame, extended by w pixels at each direction.

q+2w q 2w w q p+2w w w q p w

Kasaei

slide-57
SLIDE 57

Full Search Method Full Search Method

Full Search:

57

All candidates within search window

are examined.

(2w+1)2 positions should be examined. Advantage: Good accuracy; finds the

b h best match.

Disadvantage: Large amount of

computation; (2w+1)2 matches, 16x16 MAE for each match that is impractical for real time applications for real-time applications.

In order to avoid this complexity, we

should reduce the search points so we have to use fast block matching algorithms algorithms.

Kasaei

slide-58
SLIDE 58

Mode Selection Mode Selection

It is not a normative part of the standard.

58

Due to variety of encoding modes for a MB, a mode selection should be

made:

d i f f b QP f b MB f f b mod int mod int mod × + × × = es ra

  • f

number QPs

  • f

number es er

  • f

number pics ref

  • f

number QPs

  • f

number MB a for es

  • f

number

Existing criteria:

816 ) 4 9 ( 3 259 1 3 = + × + × × =

g

  • SAD.
  • SATD.
  • Rate-distortion optimization.
slide-59
SLIDE 59

Mode Selection Criteria Mode Selection Criteria

  • SAD:
  • SATD:

59

slide-60
SLIDE 60

Mode Selection Criteria Mode Selection Criteria

  • Rate-Distortion Optimization:

p

T

R X R t s X D ≤ ) ( . . ) ( min

T

) (

Lagrange method: min D(x) + λ R(X) λ : trading off D and R

60

slide-61
SLIDE 61

Mode Decision Method in H. Mode Decision Method in H.264 264/AVC /AVC

  • Calculate the RDCost for each Intra mode.
  • For each inter mode (16x16 16x8 8x16 and 8x8)

61

  • For each inter mode (16x16, 16x8, 8x16 and 8x8),
  • For each block in the current mode:
  • Do ME in a search area, select the point that minimizes below equation:
  • End

C l l h RDC i ) ( . )) ( , ( ) , ( p m R m c s SAD m J

motion motion

− + = λ λ

  • Calculate the RDCost using:

RDCost = Distortion + λ×Rate

  • Note that:
  • Rate needs doing: Transform, Quantization, and entropy coding.
  • Distortion needs doing: Transform, Quantization Transform-1 , and Quantization-1
  • End
  • From the calculated RDCosts:

(RDC t I t 16 16 RDC t I t 4 4 RDC t I PCM RDC t SKIP (RDCost_Intra_16x16, RDCost_Intra_4x4, RDCost_I_PCM, RDCost_SKIP, RDCost_Inter_16x16, RDCost_Inter_16x8, RDCost_Inter_8x16 and RDCost_Inter_8x8)

  • Select the least one as the best mode.

Kasaei

slide-62
SLIDE 62

Slice Slice

62

Each frame can be coded in one or more slices.

  • Each containing one (16x16) or all the macro-blocks in the frame (1 slice per picture)
  • Each containing one (16x16) or all the macro blocks in the frame (1 slice per picture).

Number of macro-blocks per slice need not be constant within a

picture.

Because of minimal inter-dependency between coded slices the

propagation of error can be limited.

Kasaei

slide-63
SLIDE 63

Slice Coding Slice Coding

Slices can have different shapes and sizes.

Sli d t h t b ti i th t 63

Slices do not have to be consecutive in the raster scan.

Each slice is self-contained.

Can be decoded without knowing the data of other slices.

Useful for:

Error resilience and concealment. Parallel processing.

Kasaei

slide-64
SLIDE 64

Slice Type Slice Type

64

Each slice can be coded as one of 5 types:

I slice: All MBs are coded using intra mode. P slice: An MB can be coded in intra mode or inter mode with at most one

prediction signal per block.

B slices: In addition to modes in P slice some MBs can also be predicted using two In addition to modes in P slice, some MBs can also be predicted using two

prediction signal per block.

SP slice: Switching-P slice. To facilitate switching between different video streams.

g b

SI slice: Switching-I slice. Using only Intra prediction.

Kasaei

slide-65
SLIDE 65

Transformation Transformation

Transform Coding

65

g

Reducing the spatial redundancy of prediction error

Like 2-D Fourier transform Lossy Compression (not by itself only but) after quantization (ignoring y p ( y y ) q ( g g high frequency components)

Using integer transform instead of DCT

Similar properties with DCT Integer operations No worries about DCT and IDCT matching

Transform size

Other standards: 8x8 H.264: 4x4, 8x8

slide-66
SLIDE 66

Transformation Transformation

66

H.264 uses three types of transforms:

Hadamard transform for 4x

x4 array of luma DC coefficients.

Hadamard transform for 2x

x2 array of chroma DC coefficients.

DCT-based transform for all other 4x4 blocks in residual data.

4 4

Kasaei

slide-67
SLIDE 67

Transformation Transformation

67

Kasaei

slide-68
SLIDE 68

Transformation Transformation

68

Fundamental differences between Hadamard transform and DCT:

It is an integer transform. It is possible to ensure zero mismatch between encoder and decoder. Can be implemented using only additions and shifts.

p g y

A scaling multiplication is integrated into the quantizer. Can be carried out using 16-bit integer arithmetic.

Kasaei

slide-69
SLIDE 69

Transformation Transformation

69

DCT:

[1]

Kasaei

slide-70
SLIDE 70

Transformation Transformation

70

DCT Approximation:

pp

[1]

Kasaei

slide-71
SLIDE 71

Transformation Transformation

71

4x

x4 Hadamard transform: 2x2 Hadamard transform:

2x2 Hadamard transform:

[1]

Kasaei

slide-72
SLIDE 72

Quantization Quantization

72

Requirements of complicated forward and inverse quantizers.

Avoid division and/or floating point arithmetic. Incorporate the post- and pre-scaling matrices.

The basic forward quantizer: A total of 52 values of Qstep are supported by the standard, indexed by

ti ti t a quantization parameter.

Kasaei

slide-73
SLIDE 73

Quantization Quantization

73

Wide range of quantizer step sizes makes it possible for an encoder to

l h d ff l d fl ibl b bi d li control the tradeoff accurately and flexibly between bit rate and quality.

QP can be different for luma and chroma.

  • Commonly QPC > QPY.

Kasaei

slide-74
SLIDE 74

Quantization Quantization

74

Forward quantization in integer arithmetic:

  • For Intra mode:
  • For Inter mode:
  • MF: Multiplication Factor.

Inverse quantization in integer arithmetic:

Kasaei

slide-75
SLIDE 75

Complete T & Q Complete T & Q

75

Kasaei

slide-76
SLIDE 76

Deblocking Deblocking Filter Filter

76

Applied to each decoded macro-block to reduce the blocking distortion. Filtering is applied to vertical or horizontal edges of 4x

x4 blocks in a macro-block (except for edges on slice boundaries).

Filter is stronger at places where there is likely to be significant

blocking distortion.

  • Such as the boundary of an Intra coded macro-block or a boundary between blocks that

contain coded coefficients.

Effect of the filter decision is to switch off the filter when there is a

significant change (gradient) across the block boundary in the original image image.

Kasaei

slide-77
SLIDE 77

Deblocking Deblocking Filter Filter

Only in H.26x standards.

77

Only in H.26x standards.

With d bl ki filt Without deblocking filter. With deblocking filter.

slide-78
SLIDE 78

Error Resilience Tools Error Resilience Tools

78

Error Resilience Tools:

To minimize the visual effect of error within a frame. To avoid error propagation. These tools include:

1.

Flexible macro-block ordering (FMO).

2.

Arbitrary slice ordering (ASO).

3.

Redundant slices (RS).

4

Slice data partitioning (DP)

4.

Slice data partitioning (DP).

5.

Slice structured coding.

6.

Flexible reference frame concept.

7.

Picture switching. g

8.

Intra-coding.

Kasaei

slide-79
SLIDE 79

Flexible Macro Flexible Macro-

  • Block Order

Block Order

79

FMO can work to randomize the data prior to transmission.

  • S th t if

t f d t i l t th di t ib t d d l th id

  • So that if a segment of data is lost, the errors are distributed more randomly over the video

pictures.

Relevant neighboring data is available for concealment of lost content.

g g

Kasaei

slide-80
SLIDE 80

Arbitrary Slice Order Arbitrary Slice Order

80

ASO allows slices of a picture to appear in any order for delay reduction.

  • Particularly for use on networks that can deliver data packets out of order.

Kasaei

slide-81
SLIDE 81

Redundant Slices Redundant Slices

81

Coding with a low QP (and hence in good quality), the RS is coded with

hi h QP ( ili i f bi ) a high QP (utilizing fewer bits).

Decoder decodes primary slice, if it is available, and discards the RS. If the primary slice is missing, the RS can be reconstructed.

Kasaei

slide-82
SLIDE 82

Data Partitioning Data Partitioning

82

Type A: Header information.

  • Including MB types, quantization parameters, and motion vectors.

T B I t P titi

Type B: Intra Partition.

  • It carries Intra CBPs and Intra coefficients.

Type C: Inter Partition Type C: Inter Partition.

  • It contains only Inter CBPs and Inter coefficients.

Kasaei

slide-83
SLIDE 83

Baseline Profile Baseline Profile

83

Baseline: Progressive, Videoconferencing & Wireless.

I d P i ( B)

I and P picture types (not B).

Uses list0 for P-Slices.

In-loop deblocking filter.

1/4 l ti ti

1/4-sample motion compensation. Tree-structured motion segmentation down to 4x4 block size. VLC-based entropy coding (UVLC and CAVLC).

S h d ili f t

Some enhanced error resilience features. Flexible macro-block ordering/arbitrary slice ordering. Redundant slices.

Kasaei

slide-84
SLIDE 84

Main Main Profile Profile

84

May use list0 and/or list1 for B-Slices. B-Slices may use:

One past and one future reference.

p

Two past references. Two future references.

Kasaei

slide-85
SLIDE 85

B-

  • Slice Prediction

Slice Prediction

85

Kasaei

slide-86
SLIDE 86

Extended Extended Profile Profile

86

SI and SP slices are specially coded slices that enable (among other things)

efficient s itching bet een ideo streams and efficient random access for ideo efficient switching between video streams and efficient random access for video decoders.

A common requirement in a streaming application is for a video decoder to A common requirement in a streaming application is for a video decoder to

switch between one of several encoded streams.

SI slices are the same as SP Slices but their prediction is Intra 4x4 from SI slices are the same as SP Slices, but their prediction is Intra 4x4 from

the previously decoded and reconstructed image samples.

Kasaei

slide-87
SLIDE 87

Switching Switching

87

Kasaei

slide-88
SLIDE 88

SP Slice SP Slice

88

Kasaei

slide-89
SLIDE 89

Random Access to Video Frames Random Access to Video Frames

Can decode A0, create SP-Slice A0-10, and predict A11 from A0.

89

Kasaei

slide-90
SLIDE 90

H. H.264 264/AVC Extension ( /AVC Extension (Frext Frext) )

90

The first version of the standard uses:

The 4:2:0 chroma format.

Typically derived by performing an RGB-to-YCbCr color-space transformation.

8 bit sample precision for luma and chroma values.

Kasaei

slide-91
SLIDE 91

FRExt FRExt Color Space Color Space

91

The FRExt amendment extended the standard to 4:2:2 and 4:4:4

h f d hi h h 8 bi i i chroma formats and higher than 8 bits precision.

In 4:2:0 chroma format, each macro-block consists of a 16x16 region of

luma samples and two corresponding 8x8 chroma sample arrays.

In a macro-block of 4:2:2 chroma format video, the chroma sample

4 , p arrays are 8x16 in size; and in a macro-block of 4:4:4 chroma format video, they are 16x16 in size.

Frext uses YCgCo the color space (where the "Cg" stands for green

chroma and the "Co" stands for orange chroma), which is much simpler and typically has equal or better coding efficiency. yp y q g y

Kasaei

slide-92
SLIDE 92

FRExt FRExt Color Space Color Space

92

Kasaei

slide-93
SLIDE 93

Scalable Video Coding Extension Scalable Video Coding Extension

  • f H.
  • f H.264

264/AVC /AVC

  • f H.
  • f H.264

264/AVC /AVC

slide-94
SLIDE 94

History of SVC History of SVC

Hybrid video coding:

94

y g

Motion compensated DPCM + spatial decorrelating

transformations.

Diff b d d d d di i l

Difference between encoder and decoder prediction loop

leads to the drift problem. Thus, video coding techniques based on motion-

compensated 3-D wavelet transform have been compensated 3 D wavelet transform have been extensively studied.

Kasaei

slide-95
SLIDE 95

History of SVC (Cont.) History of SVC (Cont.)

MPEG issued a call for proposals for efficient

95

p p scalable video coding technology in October 2003.

12 of 14 submitted proposals were based on 3-D

wavelet transforms, the other two were extensions of H 6 /AVC H.264/AVC.

Kasaei

slide-96
SLIDE 96

History of SVC (Cont.) History of SVC (Cont.)

After six months of extensive study, the scalable

96

y, extension of H.264/AVC was chosen as the starting point of MPEG’s SVC project in October 2004.

In January 2005, MPEG and VECG agreed to jointly

fi li th SVC j t A d t f finalize the SVC project as an Amendment of H.264/AVC within the joint video team.

Kasaei

slide-97
SLIDE 97

Types of Scalability in SVC Types of Scalability in SVC

Temporal Scalability.

97

p y

Spatial scalability. Quality scalability.

Q y y

Coarse grain scalability. Fine grain scalability.

Kasaei

slide-98
SLIDE 98

Temporal Scalability Temporal Scalability

Temporal layers.

98

p y

MCP is restricted to reference pictures of the lower

temporal layer.

Hierarchical B-pictures. In general, hierarchical prediction structures can be

combined with the multiple reference picture concept of H.264/AVC.

Kasaei

slide-99
SLIDE 99

Temporal Scalability Temporal Scalability

99

Kasaei

slide-100
SLIDE 100

Temporal Scalability Temporal Scalability

It is possible to arbitrarily adjust the structural delay

100

p y j y between encoding and decoding of a picture by restricting MCP from that follow the picture to be di t d di l d (Fi ) predicted display order (Fig. 1c). H th di ffi i t i ll d

However, the coding efficiency typically decreases.

Kasaei

slide-101
SLIDE 101

Temporal Scalability Temporal Scalability

To reduce the complexity of RDO the following

101

p y g strategy is used:

Based on the QP0 for the base layer:

Q Q ( i h l l b ) QPt=QP0+3+T (T is the temporal layer number).

Kasaei

slide-102
SLIDE 102

Spatial Scalability Spatial Scalability

102

Kasaei

slide-103
SLIDE 103

Spatial Scalability Spatial Scalability

Inter layer prediction:

103

y p

Inter layer intra prediction. Inter layer motion prediction.

I l id l di i

Inter layer residual prediction.

Kasaei

slide-104
SLIDE 104

Inter Inter-

  • Layer Motion Prediction

Layer Motion Prediction

Macroblock partitioning is obtained by upsampling

104

p g y p p g the partitioning of the 8x8 co-located block in the lower resolution.

Reference picture indices are copied from the co-

l t d b l bl k d th i t d ti located base layer blocks, and the associated motion vector is scaled.

MVs may be refined by quarter pel accuracy.

Kasaei

slide-105
SLIDE 105

Inter Inter-

  • Layer Residual Prediction

Layer Residual Prediction

Base layer signal of the co-located block is

105

y g upsampled and used as prediction for the residual signal of the current macro-block.

Kasaei

slide-106
SLIDE 106

Inter Inter-

  • Layer Intra Prediction

Layer Intra Prediction

When the co-located 8x8 sub-macroblock in the

106

reference frame is intra-coded, the prediction signal

  • f the enhancement layer MB is obtained by inter-

l i t di ti layer intra prediction. C di t t d i t i l f th

Corresponding reconstructed intra signal of the

reference layer is upsampled.

Kasaei

slide-107
SLIDE 107

Quality Scalability Quality Scalability

SNR scalability:

107

y

Coarse grain scalability. Fine grain scalability.

Coarse grain scalability

It is achieved by using the concepts for spatial scalability It is achieved by using the concepts for spatial scalability. Difference: no upsampling for inter layer prediction.

Kasaei

slide-108
SLIDE 108

Fine Grain Scalability Fine Grain Scalability

Progressive refinement (PR) slices.

108

g ( )

Each PR slice represents a refinement the residual

p signal with QP increase of 6.

Only a single inverse transform has to be performed

for each transform block at the decoder.

Kasaei

slide-109
SLIDE 109

Fine Grain Scalability Fine Grain Scalability

Drift: When MCP loops at decoder and encoder are

109

not synchronized (e.g., loss of quality refinement packets).

The highest quality reference available is employed

for the MCP for the MCP.

MCP for key pictures is done by only using the base

MCP for key pictures is done by only using the base layer representation of reference pictures.

Kasaei

slide-110
SLIDE 110

Fine Grain Scalability Fine Grain Scalability

110

Kasaei

slide-111
SLIDE 111

Combined Scalability Combined Scalability

Temporal, spatial, and SNR scalability can be

111

p , p , y combined.

Kasaei

slide-112
SLIDE 112

Performance Performance

Temporal Scalability: Without loss in RD

112

performance. S ti l S l bilit Bit t i l ti t

Spatial Scalability: Bit rate increase relative to non

scalable H.264/AVC can be as low as 10%.

SNR Scalability: 10% increase in the best case.

Kasaei

slide-113
SLIDE 113

References References

[1] H. Schwarz, D. Marpe, T. Wiegand, Overview of the

113

, p , g , f Scalable Video Coding Extension of the H.264/AVC, IEEE transactions on Circuits and Systems for Vid T h l S t b Video Technology, September 2007. [ ] H S h D M T Wi d O i f [2] H. Schwarz, D. Marpe, T. Wiegand, Overview of the Scalable H.264/MPEG4-AVC Extension, Heinrich Hertz Insitute Berlin Germany Heinrich Hertz Insitute, Berlin, Germany.

Kasaei

slide-114
SLIDE 114

The End The End The End The End