Super-Resolution System for Ultra High Definition Videos Zhuolun - - PowerPoint PPT Presentation

β–Ά
super resolution system
SMART_READER_LITE
LIVE PREVIEW

Super-Resolution System for Ultra High Definition Videos Zhuolun - - PowerPoint PPT Presentation

FPGA-based Real-Time Super-Resolution System for Ultra High Definition Videos Zhuolun He, Hanxian Huang, Ming Jiang, Yuanchao Bai, and Guojie Luo Peking University FCCM 2018 Ultra High Definition (UHD) Technology Content? Limited Creators


slide-1
SLIDE 1

FPGA-based Real-Time Super-Resolution System for Ultra High Definition Videos

Zhuolun He, Hanxian Huang, Ming Jiang, Yuanchao Bai, and Guojie Luo Peking University FCCM 2018

slide-2
SLIDE 2

Ultra High Definition (UHD) Technology

UHD Television UHD Projector UHD Phone UHD Camera

Content?

  • Limited Creators
  • High network

bandwidth cost

  • Huge storage cost
slide-3
SLIDE 3

High-Resolution <---> Low-Resolution

Desired HR Image 𝒀 Blur Down-Sampling Observed LR Image 𝒁 Noise π‘œ Super-Resolution

slide-4
SLIDE 4

Spectrum of Super Resolution Methods

Interpolation

  • Fast
  • Easy to implement
  • Blurry results

Model-based

  • Interpretable
  • High complexity
  • Assumed known

blur kernel/noise Example-based

  • State-of-the-art quality
  • High complexity
  • Training data needed

Complicated Simple

slide-5
SLIDE 5

Model-based Method is also Compute-Intensive

Desired HR Image 𝒀 Blur Down-Sampling Observed LR Image 𝒁 Noise π‘œ Super-Resolution

… Iteration 1 Iteration 2 X

Model-based methods may not be needed

  • The computation also has a layered structure
  • We can use a neural network to approximate
slide-6
SLIDE 6

Total Variation Distribution

Fact:

Blocks contain DIFFERENT amount of information (NOT equally important)

Insight:

Use DIFFERENT upscaling methods for different blocks

slide-7
SLIDE 7

A Hybrid Algorithm

INPUT: LR Image 𝒁

  • 1. Crop 𝒁 into sub-images {𝒛}

2.1. π’š <- Upscale(𝒛) IF 𝑡 π’š > 𝑼 2.2. ELSE π’š <- CheapUpscale(𝒛)

  • 3. Mosaic 𝒀 with {π’š}

OUTPUT: HR Image 𝒀

M: Total Variation (TV) Upscale: FSRCNN-s CheapUpscale: Intepolation

slide-8
SLIDE 8

Overall System

Low-Res Image High-Res Image

Pipelined Neural Network

Conv(32, 1, 5) Conv(5, 3, 5) Conv(5, 1, 32)

Interpolator

Accelerator

Deconv(32, 9, 1) Conv(1, 5, 32) Feature Extration Shrinking Mapping Expanding Deconvolution

Dispatcher

slide-9
SLIDE 9

Stencil Access of TV Computation

x[offset] f3 x[right] f2

……

x[down] f1 height width 𝑂 𝑂 (𝛼𝑦)offset = 𝑏𝑐𝑑(𝑦 right βˆ’ 𝑦 offset ) + 𝑏𝑐𝑑(𝑦 down βˆ’ 𝑦 offset ) x[offset] f3 x[right] f2

……

x[down] f1

slide-10
SLIDE 10

Micro-architecture for Stencil Computation

s1 buffer1(𝑂-1) s2 s3 f1 f2 f3 buffer2(1) x[i][j]…x[i-1][j+2] x[i-1][j+1] x[i][j] (x[down]) x[i-1][j+1] (x[right]) x[i-1][j] (x[offset]) Buffering System for array x Computation Kernel (𝛼𝑦)𝑗,π‘˜

slide-11
SLIDE 11

Convolutional Neural Network

Pipelined Neural Network

Conv(32, 1, 5) Conv(5, 3, 5) Conv(5, 1, 32) Deconv(32, 9, 1) Conv(1, 5, 32)

Feature Extraction Shrinking Mapping Expanding Deconvolution

slide-12
SLIDE 12

Convolution

𝑂i 𝑂i+1 ni ci fi

Input Compute Output

sliding window(s) 1 Conv(ci, fi, ni)

slide-13
SLIDE 13

Deconvolution

Input Compute Output

sliding window(s)

𝑂i 𝑂i+1

s Deconv(ci, fi, ni)

ci fi ni

slide-14
SLIDE 14

Pipeline Balancing

Layer 𝒅𝒋 π’ˆπ’‹ 𝒐𝒋 𝑢𝒋 #Mult. Ideal #DSP Ideal II Alloc. #DSP

  • Alloc. II

Extraction 1 5 32 36 819200 201 4076 200 4096 Shrinking 32 1 5 32 163840 40 4096 32 4096 Mapping 5 3 5 32 202500 50 4050 45 4500 Expanding 5 1 32 30 144000 35 4115 32 4500 Deconvolution 32 9 1 30 2332800 573 4072 519 4500 Overall

  • 3662340 899

4115 828 4500 Available (ZC706) -

  • 900
  • 900
slide-15
SLIDE 15

Sub-image Size

  • Padding
  • 𝑂𝑗 ≑ 𝑙 +

𝑔

𝑗 βˆ’ 1 #π·π‘π‘œπ‘€ 𝑗

  • If sub-image size too small
  • Large border-to-block ratio
  • Limited by memory bandwidth
  • If sub-image size too large
  • Large feature maps
  • Limited by on-chip BRAM capacity
slide-16
SLIDE 16

Sub-image Size vs. Performance vs. #mult.

0,915 0,920 0,925 0,930 0,935 0,940 36,0 36,5 37,0 37,5 38,0 38,5 39,0 10 20 30 40 50 SSIM PSNR (dB) Block Size PSNR SSIM 8,00E+09 8,50E+09 9,00E+09 9,50E+09 10 20 30 40 50 Block Size Multiplications

slide-17
SLIDE 17

Overall Comparisons

  • Compared six configurations

No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138 2 None Neural Network 8.2*10^9 38.55 0.9421 3 Blocking Interpolation 6.6*10^7 35.51 0.9138 4 Blocking Neural Network 8.4*10^9 38.55 0.9420 5 Blocking Mixed-Random 2.2*10^9 36.10 0.9211 6 Blocking Mixed-TV 2.2*10^9 37.36 0.9287 +3.04dB No Performance Loss +1.26dB

  • 1.19dB
  • 75%

>100x

slide-18
SLIDE 18

Example Outputs

Configuration 1 None/Interpolation Configuration 2 None/Neural Network Configuration 3 Blocking/Interpolation Configuration 4 Blocking/Neural Network Configuration 5 Blocking/Mixed-Random Configuration 6 Blocking/Mixed-TV

slide-19
SLIDE 19

Summary Flow

  • Crop each frame into blocks
  • Suitable for low-level (pixel-level) tasks
  • GOOD: on-chip buffer friendly
  • BAD: Computation overheads
  • Dispatch blocks according to TV value
  • Micro-architecture for buffering system
  • Fully-pipelined CNN for upscaling
  • Sliding window for convolution/deconvolultion
  • Pipeline balancing
  • Performance
  • Full-HD (1920x1080) -> Ultra-HD (3940x2160): 31.7fps
slide-20
SLIDE 20

Thank you!

slide-21
SLIDE 21

TV Threshold vs. Performance vs. #mult.

0,910 0,915 0,920 0,925 0,930 0,935 0,940 0,945 35,0 35,5 36,0 36,5 37,0 37,5 38,0 38,5 30 40 50 60 70 SSIM PSNR (dB) TV Threshold PSNR SSIM 0,0E+00 5,0E+09 1,0E+10 1,5E+10 2,0E+10 2,5E+10 30 40 50 60 70 TV Threshold Multiplications

slide-22
SLIDE 22

Resource Utilizations

Component BRAM DSP FF LUT Dispatcher 1 2 618 1138 Neural Network 178 844 63149 98439 Interpolator 10 1414 3076 Total 327 858 66261 103714 Available 1090 900 437200 218600 Utilization (%) 30 95 15 47