H.264 Luma Predictor Maxine Lee, Alex Moore May 17, 2006 - - PowerPoint PPT Presentation

h 264 luma predictor
SMART_READER_LITE
LIVE PREVIEW

H.264 Luma Predictor Maxine Lee, Alex Moore May 17, 2006 - - PowerPoint PPT Presentation

H.264 Luma Predictor Maxine Lee, Alex Moore May 17, 2006 Integrated Systems Group Massachusetts Institute of Technology Why H.264? End-to-end protocol Better compression Designed for efficient encoding ITU standard Its on


slide-1
SLIDE 1

Integrated Systems Group Massachusetts Institute of Technology

H.264 Luma Predictor

Maxine Lee, Alex Moore May 17, 2006

slide-2
SLIDE 2

Integrated Systems Group 2

Why H.264?

End-to-end protocol Better compression Designed for efficient encoding ITU standard It’s on your iPod

slide-3
SLIDE 3

Integrated Systems Group 3

Project Scope

Prediction module of H.264 Encoder

Intraframe Prediction Interframe Prediction Transforms Luma only (no color information!)

Why?

85%+ of encoder computation time Rich problem with lots of exploration

slide-4
SLIDE 4

Integrated Systems Group 4

Intraframe Prediction Motivation

slide-5
SLIDE 5

Integrated Systems Group 5

Intraframe Prediction Block Diagram

slide-6
SLIDE 6

Integrated Systems Group 6

Interframe Prediction

slide-7
SLIDE 7

Integrated Systems Group 7

Intra-Frame Prediction

Use spatial similarities to compress each frame

Use neighboring pixels to make a prediction on a block Transmit the difference between actual and predicted Tradeoff : prediction accuracy vs. # control bits

H.264 Answer : 4x4 and 16x16 prediction !

homogenous Huge gradient

slide-8
SLIDE 8

Integrated Systems Group 8

Intra – 4x4 Prediction

  • 9 prediction modes
  • Prediction proceeds left

to right, top to bottom

  • When not all boundary

pixels available (i.e. we’re at border of picture), can’t predict with all the modes

Current Pixels Previously predicted and reconstructed blocks

slide-9
SLIDE 9

Integrated Systems Group 9

Intra - 16x16 Prediction

average

Mode 0 : Vertical Mode 1 : Horizontal Mode 2 : DC Mode 3 : Plane

slide-10
SLIDE 10

Integrated Systems Group 10

Advantages/Disadvantages

Encoder’s job to compare options and pick the best

Exhaustive search … Uses a cost function to compare different modes

9 modes = 4 bits for

every 16 pixels (!)

4 modes = 2 Good for detailed areas Lots of options Good for smooth areas

Intra 4x4 Intra 16x16

slide-11
SLIDE 11

Integrated Systems Group 11

Block Diagram (Baseline)

Input video

DCT IDCT Quant IQuant Choose Prediction Mode

Picture Parsing Get 4x4 Prediction Residual Get 16x16 Prediction Residual Config Compute 4x4 Cost Compute 16x16 Cost QP QP

Try all 9 modes Try all 4 modes Loop through 16 4x4 blocks Initialize prediction variables Get best mode – Send to output Output (to entropy encoder)

16x16 4x4

slide-12
SLIDE 12

Integrated Systems Group 12

Intra – 16x16 Considerations

Process

Loop through the available*** modes Generate the prediction Compute cost of residual

Cost ~ SAD ( sum of absolute diff )

***What’s available?

Depends on location in the frame!

Get 16x16 Prediction Residual Compute 16x16 Cost

Try all 4 modes

All modes possible Only DC possible

slide-13
SLIDE 13

Integrated Systems Group 13

Intra – 4x4 Considerations

Process:

Loop through all 16 blocks For each block, loop through

available modes

Get ***cost = SAD + 4*P*λ(QP) Pick best mode – send to DCT Save reconstructed 4x4 block,

so you can use it to predict the next 4x4 block

Cost :

f ( QP ), since overhead bits

hurt more with higher compression

P : most probable mode

Get 4x4 Prediction Residual Compute 4x4 Cost QP

Try all 9 modes Loop through 16 4x4 blocks

Overhead!!!

A B

slide-14
SLIDE 14

Integrated Systems Group 14

Extra Concerns with Intra 4x4

Which boundary pixels do you use?

Boundary depends on where in the picture you are

AND which 4x4 block you’re working on

Only left boundary available, and in another macroblock Upper right pixels not available (can extrapolate)

slide-15
SLIDE 15

Integrated Systems Group 15

Storing Boundary Pixels

  • To predict current macroblock,

need pixels from FOUR neighbors (A-D)

  • D can be stored in a register,

since it is immediately used

  • Pixels for previous row (A-C)

have to be stored in a register file

  • Also save A in register to limit

regfile reads to 2

B A C D

slide-16
SLIDE 16

Integrated Systems Group 16

Synthesis Numbers

Note: not P+R – not enough RAM / hard disk (ask us tomorrow if you’re really curious about P+R numbers )

Total Area = 609,940 um^2 Clock Cycle = 7.27 ns (quant multiplications)

9% Misc. 15% Quant (with QP lookup tables ) 10% DCT/IDCT 66% Predictor

slide-17
SLIDE 17

Integrated Systems Group 17

Only Three Regions of Change

slide-18
SLIDE 18

Integrated Systems Group 18

Interframe Prediction

Use previous frame(s) to predict macroblocks

  • f current frame

Most of the time, majority of frame isn’t

moving

If change within macroblock is sufficiently

small, just reproduce it exactly!

slide-19
SLIDE 19

Integrated Systems Group 19

Interframe Prediction

slide-20
SLIDE 20

Integrated Systems Group 20

Interframe Prediction

slide-21
SLIDE 21

Integrated Systems Group 21

Interprediction Algorithm

Use a motion vector to predict the current

macroblock.

Start at (0,0) – same block – and calculate

error for each motion vector

Full-Search algorithm. Try all possible motion

vectors within a window

Final prediction will be block given by motion

vector with minimum error

slide-22
SLIDE 22

Integrated Systems Group 22

Interprediction Algorithm

slide-23
SLIDE 23

Integrated Systems Group 23

Interprediction Algorithm

slide-24
SLIDE 24

Integrated Systems Group 24

Interprediction Algorithm

slide-25
SLIDE 25

Integrated Systems Group 25

Interprediction Algorithm

slide-26
SLIDE 26

Integrated Systems Group 26

Problem…

Assume a window size of 16 (conservative) 1024 possible motion vectors to check per

macroblock (vs. 9 for intra)

307200 possible motion vectors per frame!

slide-27
SLIDE 27

Integrated Systems Group 27

Solution

A better algorithm! Assume motion

estimation gets better as we get closer to ideal motion vector.

Diamond-shaped algorithm reduces points

checked by ~80% with mean error per pixel about 3 (vs about 2) for FS.

Hexagonal algorithm reduces by another

~35% (3.2 mean error vs 3.0)

slide-28
SLIDE 28

Integrated Systems Group 28

Hexagonal Algorithm

slide-29
SLIDE 29

Integrated Systems Group 29

Circuit Implementation

Residual And Cost Frame Buffer Predict Control Transforms Network Layer

slide-30
SLIDE 30

Integrated Systems Group 30

Results…

Results? What Results? H.264 predictor ~40x size of SMIPS

processor

Frame buffer adds ~18000 area (+4%)

But we’re cheating (64x48 video size)

Interprediction block adds ~35000 area (+7%) Performance evaluation TBA

slide-31
SLIDE 31

Integrated Systems Group 31

References