Design Challenge of a QuadHDTV Video Decoder
Youn-Long Lin Department of Computer Science National Tsing Hua University
MPSOC2007, Japan
Design Challenge of a QuadHDTV Video Decoder Youn-Long Lin - - PowerPoint PPT Presentation
Design Challenge of a QuadHDTV Video Decoder Youn-Long Lin Department of Computer Science National Tsing Hua University MPSOC2007, Japan 2 YLLIN NTHU-CS More Pixels NHK Proposes UHD TV Broadcast Super HiVision 7680x4320 pixels at 60
MPSOC2007, Japan
YLLIN NTHU-CS 2
YLLIN NTHU-CS 3
YLLIN NTHU-CS 4
3840x2160 – QFHD TV 7680x4320 – UHD TV
SDTV
1920x1080 – HDTV
YLLIN NTHU-CS 5
H.264 50% 69%
YLLIN NTHU-CS 6
64kbps ~ 150Mbps 64kbps~2Mbps 2-15 Mbps Up to 1.5 Mbps Transmission rate I, P, B I, P, B I, P, B I, P, B Picture type Multiple (5) frames One frame One frame One frame Reference frames ¼ pel ¼ pel ½ pel ½ pel Pixel accuracy 41 MVs per MB Yes Yes Yes ME, MC
VLC, CAVLC and CABAC
VLC VLC VLC Entropy coding 4*4 int transform DCT/ Wavelet DCT DCT Transform
16*16, 16*8, 8*16, 8*8, 8*4, 4*8, 4*4
16*16, 8*8 8*8 8*8 Block size 16*16 16*16
16*16(frame)
16*16 MB size H.264 MPEG-4 MPEG-2 MPEG-1 Standard
YLLIN NTHU-CS 7
Video Coding with H.264/AVC: Tools, Performance and Complexity, J. Ostermann et al, IEEE CAS Mag., Q1 2004.
Relative Computational Complexity
YLLIN NTHU-CS 8
Decoding Capability of a 600MHz CPU
YLLIN NTHU-CS 9
YLLIN NTHU-CS 10
675.0 170.0 75.0 28.1 8.3 2.0 1.0
Size
Digital signage、Medical video、 Satellite image、 Space exploration 249 MHz QFHD (3840 x 2160) 62 MHz 1080HD (1920 x 1088) Home theater 30 MHz 720HD (1080 x 720) Car TV、Surveillance 10 MHz D2 (720 x 480) Mobile TV 3 MHz CIF (352 x 288) 0.8 MHz QCIF (176 x 144) Video phone 0.4 MHz SQCIF (128 x 96)
Application Clock Frequency Resolution
YLLIN NTHU-CS 11
YLLIN NTHU-CS 12
Parser CAVLD/ CABAD IQ & IT MVG IPRED INTERP BSG DF MAU & AMBA Interface Translator
H.264 Video Decoder
CPU Display Memory Controller Ethernet AHB para & predinfo recon bs residual mv & ridx coeff mvdinfo
YLLIN NTHU-CS 14
Memory Bandwidth (MB/s) Memory Size (Bytes)
19658 240 1200 4977 124929 1516 317 62 A B C D
YLLIN NTHU-CS 15
CB mem rf0 mem rf1 mem CMB reg CB AG rf AG rf reg array comparator comparator comparator comparator MV mem
IME block diagram
CMB reg CMB reg CMB reg rf router
MVGen rf0 MVGen rf0 MVGen rf0 MVGen rf0 MVGen rf0 MVGen rf0 MVGen rf0 MVGen rf0
MV AG
YLLIN NTHU-CS 16
Memory Bandwidth (MB/s) Memory Size (Bytes)
19658 240 1200 4977 124929 1516 317 62 A B C D
YLLIN NTHU-CS 17
– Collecting several MB’s motion vectors, and read the same place by only one single operation
– Averagely 2 burst initials per MB (1 for luma, 1 for chroma) : a group of sequentially read (burst read)
YLLIN NTHU-CS 18
CABAC
MB7 MB8 MB9 MB10
Motion Vector Generator
Translator Reference Region & Index Register
Region Analyzer / Searcher OES manager
MB6 MB7 MB4 MB5 MB6 MB7 MB4 MB5 MB2 MB3 MB4 MB1 MB2 MB0 MB1 MB2 MB0
R0 R1
R0 R1 R2 R3 R4 R5 R6 R7
Buffer
R2
MB7 MB6 MB5 MB4 MB3 MB2 MB1 MB0
R2 Information R2 Information
Interp
R0/R1 Data R2 Data from SDRAM MB7 Information MB7 MV MB7 Region Information
MAU Interface
YLLIN NTHU-CS 20
4
t
3 212
1 1 1 1 1 1
195
~ 16 0~15 chroma ac_6_7 1 0~16 luma ac_0_1 0~16 luma ac_14_15 0~16 luma ac_0_1 0~16 luma ac_14_15 1 1 1 4 1 1 1 1 4 ~4 dc 0~15 chroma ac_0_1 0~15 chroma ac_6_7 0~15 chroma ac_0_1 0~15 chroma ac_6_7 1 1 1 1 1 4 1 1 1 1 4
122 140 144 161
2 4 4 4
219
0~16 luma ac_0_1 ~ 16 ~ 16 ~ 16 ~ 16 0~15 chroma ac_0_1 ~ 15 ~ 15 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 16 ~ 15 ~ 15 ~ 15 ~ 15 0~16 luma ac_14_15 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 4 1 1
IDCT stage 1 coeflag_mem read coeff_mem read IQ stage 1 IQ stage 2 residual_mem write IDCT stage 2
4 4 4 4 4 4 4 4 4
YLLIN NTHU-CS 21
YLLIN NTHU-CS 22
L31 L30 L32 L33 L21 L20 L22 L23 L11 L10 L12 L13 L01 L00 L02 L03 Strong filter (Bs=4)/ Left delta calculation M01 M00 M02 M03 R01 R00 R02 R03 M11 M10 M12 M13 M21 M20 M22 M23 M31 M30 M32 M33 R11 R10 R12 R13 R21 R20 R22 R23 R31 R30 R32 R33 Right Weak filter (Bs<4) R21 delta R21 delta calculation Right delta Left Weak Filter (Bs<4) Right delta calculation R21 filter
Right Weak filter (Bs<4)
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
Left delta
Read Pixels Write Pixels
YLLIN NTHU-CS 24
CABAD IQ/IT BSG DF
(time)
Header information decode Initial context table and condition offset
IPRED
MB 0 decode MB 1 decode MB 2 decode
PARSER
YLLIN NTHU-CS 25
CABAD PARSER IQ/IT BSG DF
(time)
Header information decode Initial context table and condition offset
IPRED
MB 0 decode MB 4 decode MB 1 decode MB 5 decode MB 2 decode MB 3 decode MB 6 decode
YLLIN NTHU-CS 26
CABAD IQ/IT BSG DF
(time)
Header information decode Initial context table and condition offset
IPRED
MB 0 decode MB 4 decode MB 1 decode MB 5 decode MB 2 decode MB 3 decode MB 6 decode MB 8 decode MB 7 decode
PARSER
YLLIN NTHU-CS 27
2.62 5.6 5.6 8.3 486 620 644 540 486 161 159 140
50 100 150 200 250 300 350 400 450 500 550 600 650
Sequential Elastic Pipeline ASAP Ping-Pong ASAP Cyclic- queue
1 2 3 4 5 6 7 8 9
SRAM Usage Turnaround Cycle Processing Cycle
(Cycles/ MB) KB
Test Pattern: “pedestrian” Resolution: 720*480 QP: 28 GOP: III… Frame #: 30
YLLIN NTHU-CS 28
mfu parser cabad idct ipred interp df bsg main_ctrl top amba_wrap mvg def rtl syn vn nlint gate_sim rtl_sim filelist tbench Sub IP hd_amba
H264 filelist fpga_lib gate_sim asic_lib syn jm11.0 mem netlist rtl_sim tbench lm_wrap nlint vn
xilinx_mem altera_mem artisan_mem
Easy Bug Tracing
YLLIN NTHU-CS 29
CPU Accelerator (FPGA)
USB(PHY) Daughter Board ROM/ Flash Memory SRAM SDRAM
VIC USB 2.0
Static memory SDRAM Controller(4-CH) JPEG Codec
DMA
SRAM PWM WDT TIMER
APB Bridge
Capture
Display Controller
DAI SSI SD SM UART GPIO 12C
Audio Codec I2S Flash memory with SSI Flash Card
Button LED
Video-In CCIR601
TV/LCD
High-Speed Bus Peripheral Bus FPGA
YLLIN NTHU-CS 30