An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia - - PowerPoint PPT Presentation

an h 264 avc main profile video decoder accelerator in a
SMART_READER_LITE
LIVE PREVIEW

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia - - PowerPoint PPT Presentation

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, TAIWAN 300 ylin@cs.nthu.edu.tw 2006/08/16 MPSOC Colorado, USA Main


slide-1
SLIDE 1

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform

Youn-Long Lin

Department of Computer Science National Tsing Hua University Hsin-Chu, TAIWAN 300 ylin@cs.nthu.edu.tw

2006/08/16 MPSOC Colorado, USA

slide-2
SLIDE 2

YLLIN NTHU-CS 2

Main Points

  • Hardwired design has excellent area,

performance, power advantages

  • If it is to be used by 1B people everyday, every

bit and every cycle count

  • It is not difficult

– 15 CS student-years, no background in video nor HDL-based design; neither is the professor – RTL design is the easy part; – Understanding algorithm and designing architecture are most critical

slide-3
SLIDE 3

YLLIN NTHU-CS 3

Video Coding Standards

64kbps ~ 150Mbps 64kbps~2Mbps 2-15 Mbps Up to 1.5 Mbps Transmission rate I, P, B I, P, B I, P, B I, P, B Picture type Multiple (5) frames One frame One frame One frame Reference frames ¼ pel ¼ pel ½ pel ½ pel Pixel accuracy 41 MVs per MB Yes Yes Yes ME, MC

VLC, CAVLC and CABAC

VLC VLC VLC Entropy coding 4*4 int transform DCT/ Wavelet DCT DCT Transform

16*16, 16*8, 8*16, 8*8, 8*4, 4*8, 4*4

16*16, 8*8 8*8 8*8 Block size 16*16 16*16

16*16(frame)

16*16 MB size H.264 MPEG-4 MPEG-2 MPEG-1 Standard

slide-4
SLIDE 4

YLLIN NTHU-CS 4

Get More for Less

MPEG-2 H.264

slide-5
SLIDE 5

YLLIN NTHU-CS 5

H.264/AVC Profiles

Weighted prediction B slice I slice P slice CAVLC Slice group ASO Redundant Slice SP, SI slice Data partition Interlace CABAC 8x8 transform Quantization matrix Color Sampling 8/10/12 bit sampling Extended profile Main profile

Baseline profile

FREext (High) profile

slide-6
SLIDE 6

YLLIN NTHU-CS 6

NTHU H.264/AVC Main Profile Video Decoder Prototype

Multimedia SOC Platform FPGA @ 10MHz Main Profile CIF(352x288)@ 30 fps FPGA @ 24MHz Main Profile D1 (720x480)@30fps

slide-7
SLIDE 7

YLLIN NTHU-CS 7

A Multimedia SOC Platform

CPU Accelerator (FPGA)

USB(PHY) Daughter Board ROM/ Flash Memory SRAM SDRAM

VIC USB 2.0

Static memory SDRAM Controller(4-CH) JPEG Codec

DMA

SRAM PWM WDT TIMER

APB Bridge

Capture

Display Controller

DAI SSI SD SM UART GPIO 12C

Audio Codec I2S Flash memory with SSI Flash Card

Button LED

Video-In CCIR601

TV/LCD

High-Speed Bus Peripheral Bus FPGA

slide-8
SLIDE 8

YLLIN NTHU-CS 8

AHB1 AHB2 SDC

ARM926EJS

Slave Slave SDRAM TV Master Timer Slave SD Card Slave H264 Master Slave LM UART Slave SDRAM SDC Slave Slave

H.264/AVC Decoder System Diagram

slide-9
SLIDE 9

YLLIN NTHU-CS 9

H.264/AVC Decoder Architecture

IQ/IDCT Residual SRAM CABAD MBinfo SRAM Coeff SRAM MC Intra pred Pred SRAM Pic Rec reconstruct SRAM unfilter SRAM MV SRAM Ref idx SRAM DF Para SRAM Parser DECODER CAVLD SDRAM

Input/Ref./Display Frame

AHB Display

Storage Device

CPU SD Card

slide-10
SLIDE 10

YLLIN NTHU-CS 10

Hierarchical FSM in Main Controller

Type decoder rden rd_addr rd_data

CABAC MC IPRED IQ/IDCT PICREC DF

Main controller Main FSM

CABAC FSM MC FSM IPRED FSM IQ/IDCT FSM PICREC FSM DF FSM

Frame Level MB Level

slide-11
SLIDE 11

YLLIN NTHU-CS 11

H.264 Decoder control register slave wrapper MFU master wrapper 1 SDC arbiter 1 arbiter 2 VLC & TV OUT DF & MC master wrapper 2 AHB B AHB A LM

AMBA interface

slide-12
SLIDE 12

YLLIN NTHU-CS 12

System description Compilation Software image

FPGA Verify System Integrate User Spec.

System configuration System.h API HW lib. HDL IPs

Acceleration

Our Design Flow

FPGA prototyping Area & Timing & Power evaluation

Embedded Software Co-Sim

Yes No

HW IP Synthesizer

Evaluation

Platform spec. Software spec. in C & Acceleration specify HW/SW co-simulation Accelerator.v System.v Parameterized ISS System generation Acceleration Pin assignment & Hardware compilation Integration Hardware image Platform model SW lib. C models, drivers

Performance constraint

slide-13
SLIDE 13

YLLIN NTHU-CS 13

Memory Traffic Consideration

SDRAM

Encoded Bitstream Reference Frames Currently Reconstructed Frame Display Buffer One SDRAM for All External Storage SDRAM Burst Mode Internal Storage for Compact Access & Data Reuse

slide-14
SLIDE 14

YLLIN NTHU-CS 14

Buffer Size vs Bus Traffic

10 20 30 40 50 60 1 3 5 7 9 11 13 15 17 19 21 16MHz/TV 16MHz/LCD 24MHz/TV 24MHz/LCD Buffer Size Frame per Sec

slide-15
SLIDE 15

YLLIN NTHU-CS 15

Performance Comparison

30 CIF (352x288) Main 10 180K 30 15 Frame Rate D1 (720x480) QCIF (176x144) Resolution Main Baseline Profile 24 200 MHz 180K 230K Gate Count HW Accelerated DSP Core

slide-16
SLIDE 16

YLLIN NTHU-CS 16

Summary

  • An H.264/AVC main profile decoder on an ad

hoc multimedia SOC platform

  • Hardware-accelerated approach is high-

performance and energy-efficient

  • Memory traffic has major impact on performance
  • It is not as difficult as you may think; algorithm

and architecture are critical; writing Verilog is no difference from writing C

  • Do not try to parallelize Reference Software; it is

just proof of concept; not an implementation

slide-17
SLIDE 17

YLLIN NTHU-CS 17

Demo Video