Why H.264? End-to-end protocol H.264 Luma Predictor Better - - PowerPoint PPT Presentation

why h 264
SMART_READER_LITE
LIVE PREVIEW

Why H.264? End-to-end protocol H.264 Luma Predictor Better - - PowerPoint PPT Presentation

Why H.264? End-to-end protocol H.264 Luma Predictor Better compression Maxine Lee, Alex Moore Designed for efficient encoding May 17, 2006 ITU standard Its on your iPod Integrated Systems Group Massachusetts Institute of


slide-1
SLIDE 1

Integrated Systems Group Massachusetts Institute of Technology

H.264 Luma Predictor

Maxine Lee, Alex Moore May 17, 2006

Integrated Systems Group 2

Why H.264?

End-to-end protocol Better compression Designed for efficient encoding ITU standard It’s on your iPod Integrated Systems Group 3

Project Scope

Prediction module of H.264 Encoder

Intraframe Prediction Interframe Prediction Transforms Luma only (no color information!)

Why?

85%+ of encoder computation time Rich problem with lots of exploration

Integrated Systems Group 4

Intraframe Prediction Motivation

slide-2
SLIDE 2

Integrated Systems Group 5

Intraframe Prediction Block Diagram

Integrated Systems Group 6

Interframe Prediction

Integrated Systems Group 7

Intra-Frame Prediction

Use spatial similarities to compress each frame

Use neighboring pixels to make a prediction on a block Transmit the difference between actual and predicted Tradeoff : prediction accuracy vs. # control bits

H.264 Answer : 4x4 and 16x16 prediction !

homogenous Huge gradient

Integrated Systems Group 8

Intra – 4x4 Prediction

  • 9 prediction modes
  • Prediction proceeds left

to right, top to bottom

  • When not all boundary

pixels available (i.e. we’re at border of picture), can’t predict with all the modes Current Pixels Previously predicted and reconstructed blocks

slide-3
SLIDE 3

Integrated Systems Group 9

Intra - 16x16 Prediction

average

Mode 0 : Vertical Mode 1 : Horizontal Mode 2 : DC Mode 3 : Plane

Integrated Systems Group 10

Advantages/Disadvantages

Encoder’s job to compare options and pick the best

Exhaustive search … Uses a cost function to compare different modes

9 modes = 4 bits for

every 16 pixels (!)

4 modes = 2 Good for detailed areas Lots of options Good for smooth areas

Intra 4x4 Intra 16x16

Integrated Systems Group 11

Block Diagram (Baseline)

Input video

DCT IDCT Quant IQuant Choose Prediction Mode

Picture Parsing Get 4x4 Prediction Residual Get 16x16 Prediction Residual Config Compute 4x4 Cost Compute 16x16 Cost QP QP

Try all 9 modes Try all 4 modes Loop through 16 4x4 blocks Initialize prediction variables Get best mode – Send to output Output (to entropy encoder)

16x16 4x4

Integrated Systems Group 12

Intra – 16x16 Considerations

Process

Loop through the available*** modes Generate the prediction Compute cost of residual

Cost ~ SAD ( sum of absolute diff )

***What’s available?

Depends on location in the frame!

Get 16x16 Prediction Residual Compute 16x16 Cost

Try all 4 modes

All modes possible Only DC possible

slide-4
SLIDE 4

Integrated Systems Group 13

Intra – 4x4 Considerations

Process:

Loop through all 16 blocks For each block, loop through

available modes

Get ***cost = SAD + 4*P*λ(QP) Pick best mode – send to DCT Save reconstructed 4x4 block,

so you can use it to predict the next 4x4 block

Cost :

f ( QP ), since overhead bits

hurt more with higher compression

P : most probable mode

Get 4x4 Prediction Residual Compute 4x4 Cost QP

Try all 9 modes Loop through 16 4x4 blocks

Overhead!!!

A B

Integrated Systems Group 14

Extra Concerns with Intra 4x4

Which boundary pixels do you use?

Boundary depends on where in the picture you are

AND which 4x4 block you’re working on

Only left boundary available, and in another macroblock Upper right pixels not available (can extrapolate)

Integrated Systems Group 15

Storing Boundary Pixels

  • To predict current macroblock,

need pixels from FOUR neighbors (A-D)

  • D can be stored in a register,

since it is immediately used

  • Pixels for previous row (A-C)

have to be stored in a register file

  • Also save A in register to limit

regfile reads to 2 B A C D

Integrated Systems Group 16

Synthesis Numbers

Note: not P+R – not enough RAM / hard disk (ask us tomorrow if you’re really curious about P+R numbers )

Total Area = 609,940 um^2 Clock Cycle = 7.27 ns (quant multiplications)

9% Misc. 15% Quant (with QP lookup tables ) 10% DCT/IDCT 66% Predictor

slide-5
SLIDE 5

Integrated Systems Group 17

Only Three Regions of Change

Integrated Systems Group 18

Interframe Prediction

Use previous frame(s) to predict macroblocks

  • f current frame

Most of the time, majority of frame isn’t

moving

If change within macroblock is sufficiently

small, just reproduce it exactly!

Integrated Systems Group 19

Interframe Prediction

Integrated Systems Group 20

Interframe Prediction

slide-6
SLIDE 6

Integrated Systems Group 21

Interprediction Algorithm

Use a motion vector to predict the current

macroblock.

Start at (0,0) – same block – and calculate

error for each motion vector

Full-Search algorithm. Try all possible motion

vectors within a window

Final prediction will be block given by motion

vector with minimum error

Integrated Systems Group 22

Interprediction Algorithm

Integrated Systems Group 23

Interprediction Algorithm

Integrated Systems Group 24

Interprediction Algorithm

slide-7
SLIDE 7

Integrated Systems Group 25

Interprediction Algorithm

Integrated Systems Group 26

Problem…

Assume a window size of 16 (conservative) 1024 possible motion vectors to check per

macroblock (vs. 9 for intra)

307200 possible motion vectors per frame!

Integrated Systems Group 27

Solution

A better algorithm! Assume motion

estimation gets better as we get closer to ideal motion vector.

Diamond-shaped algorithm reduces points

checked by ~80% with mean error per pixel about 3 (vs about 2) for FS.

Hexagonal algorithm reduces by another

~35% (3.2 mean error vs 3.0)

Integrated Systems Group 28

Hexagonal Algorithm

slide-8
SLIDE 8

Integrated Systems Group 29

Circuit Implementation

Residual And Cost Frame Buffer Predict Control Transforms Network Layer

Integrated Systems Group 30

Results…

Results? What Results? H.264 predictor ~40x size of SMIPS

processor

Frame buffer adds ~18000 area (+4%)

But we’re cheating (64x48 video size)

Interprediction block adds ~35000 area (+7%) Performance evaluation TBA

Integrated Systems Group 31

References