A Highly Efficient and Comprehensive Image Processing Library for C - PowerPoint PPT Presentation

A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis M. Akif Özkan, Oliver Reiche, Frank Hannig, and Jürgen Teich Hardware/Software Co-Design, Friedrich-Alexander University Erlangen-Nürnberg FSP , September 7, 2017, Ghent

Motivation Opportunity: FPGAs have a great potential for improving throughput per watt Challenge: Hardware design is time consuming and needs expertise Solution: High Level Synthesis (HLS) for providing the best suitable architecture from a traditional C ++ code

Motivation Opportunity: FPGAs have a great potential for improving throughput per watt Challenge: Hardware design is time consuming and needs expertise Solution: High Level Synthesis (HLS) for providing the best suitable architecture from a traditional C ++ code What would be better is asking to Siri; “Siri, could you please design a ConvNet accelerator for my 200 dollars FPGA!”

Motivation Opportunity: FPGAs have a great potential for improving throughput per watt Challenge: Hardware design is time consuming and needs expertise Solution: High Level Synthesis (HLS) for providing the best suitable architecture from a traditional C ++ code What would be better is asking to Siri; “Siri, could you please design a ConvNet accelerator for my 200 dollars FPGA!” Unfortunately, we are not there yet!

Motivation Opportunity: FPGAs have a great potential for improving throughput per watt Challenge: Hardware design is time consuming and needs expertise Solution: High Level Synthesis (HLS) for providing the best suitable architecture from a traditional C ++ code Programming methodologies for other platforms are not there yet as well: GPUs: map, gather, and scatter operations with a different language, i. e., OpenCL, CUDA Multi-core CPUs: OpenMP or Cilk Plus for proper thread level parallelism for programming Xeon Phi architectures CPUs: explicit vectorization

Motivation Opportunity: FPGAs have a great potential for improving throughput per watt Challenge: Hardware design is time consuming and needs expertise Solution: High Level Synthesis (HLS) for providing the best suitable architecture from a traditional C ++ code Maybe it is the time to reconsider abstractions for FPGA design? • Computational parallel patterns: i. e. gather, scatter • Domain Specific Languages: HIPAcc, Halide, Polymage • Hardware favorable library objects for essential algorithmic instances

Motivation Opportunity: FPGAs have a great potential for improving throughput per watt Challenge: Hardware design is time consuming and needs expertise Solution: High Level Synthesis (HLS) for providing the best suitable architecture from a traditional C ++ code “Best” is hard to reach: • Definition of the “best” depends on the design objectives (i. e. speed, area) • Multiple alternative architectures exist for the same algorithmic instances • The Pareto-optimal hardware architecture of an algorithmic instance for given design objectives might not be the optimal for different scheduling specifications (i. e. filter size, parallelization factor)

Motivation Opportunity: FPGAs have a great potential for improving throughput per watt Challenge: Hardware design is time consuming and needs expertise Solution: High Level Synthesis (HLS) for providing the best suitable architecture from a traditional C ++ code “Best” is hard to reach: A design space exploration is needed! • Definition of the “best” depends on the design objectives (i. e. speed, area) • Multiple alternative architectures exist for the same algorithmic instances • The Pareto-optimal hardware architecture of an algorithmic instance for given design objectives might not be the optimal for different scheduling specifications (i. e. filter size, parallelization factor) Efficiency is important when the cost is considered!

Motivation Opportunity: FPGAs have a great potential for improving throughput per watt Challenge: Hardware design is time consuming and needs expertise Solution: High Level Synthesis (HLS) for providing the best suitable architecture from a traditional C ++ code Not all bad news: • HLS became sophisticated enough for data path design • Different speed constraints are possible • Support for deploying FPGAs in a heterogeneous system

Outline Analysis of the Domain Proposed Image Processing Library A Deeper Look Into the Library Evaluation and Results

Analysis of the Domain

Image Processing Applications We can define three characteristic data operations in image processing applications: input image output image Point Operators: Output data is determined by single input data input image output image Local Operators: Output data is determined by a local region of the input data (stencil pattern-based calculations) input image output image Global Operators: Output data is determined by all of the input data FSP’17 2 M. Akif Özkan | Hardware/Software Co-Design | A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis

Image Processing Applications A great portion of image processing applications can be described as task graphs of point, local, and global operators: dx sx gx input output gxy sxy hc gy dy sy An example task graph for Harris Corner Detection (square: local operator, circle: point operator) FSP’17 3 M. Akif Özkan | Hardware/Software Co-Design | A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis

Coarse-Grained Parallelism Memory bandwidth limits can be reached by processing multiple pixels per cycle {sx, sx, {gx, gx, gx, gx} sx, sx} output input {dx, dx, dx, dx} {hc, {sxy, hc, sxy, {gxy, gxy, gxy, gxy} hc, sxy, hc} sxy} {dy, dy, dy, dy} {sy, sy, {gy, gy, gy, gy} sy, sy} FSP’17 4 M. Akif Özkan | Hardware/Software Co-Design | A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis

Image Border Handling • a fundamental image processing issue for local operators • should be considered together with coarse-grained parallelization 0 0 0 1 2 3 3 3 5 4 4 5 6 7 7 6 10 9 8 9 10 11 10 9 c c c c c c c c c c c c c c c c 0 0 0 1 2 3 3 3 1 0 0 1 2 3 3 2 6 5 4 5 6 7 6 5 c c c c 0 0 0 1 2 3 3 3 1 0 0 1 2 3 3 2 2 1 0 1 2 3 2 1 0 1 2 3 4 4 4 5 6 7 7 7 5 4 4 5 6 7 7 6 6 5 4 5 6 7 6 5 c c 4 5 6 7 c c 8 8 8 9 10 11 11 11 9 8 8 9 10 11 11 10 10 9 8 9 10 11 10 9 c c 8 9 10 11 c c c c c c 12 12 12 13 14 15 15 15 13 12 12 13 14 15 15 14 14 13 12 13 14 15 14 13 12 13 14 15 c c c c c c c c 12 12 12 13 14 15 15 15 13 12 12 13 14 15 15 14 10 9 8 9 10 11 10 9 12 12 12 13 14 15 15 15 9 8 8 9 10 11 11 10 6 5 4 5 6 7 6 5 c c c c c c c c (a) clamp (b) mirror (c) mirror-101 (d) constant Common border handling modes. FSP’17 5 M. Akif Özkan | Hardware/Software Co-Design | A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis

Proposed Image Processing Library

Description of an Application Data Flow Graph #define W 1024 // Image Width #define H 1024 // Image Height #define pFactor 1 // Parallelization factor // Data type descriptions ... // Local operator definitions localOp <W, H, pFactor , ..., MIRROR > sobelX , sobelY; localOp <W, H, pFactor , ...> gaussX , gaussY , gaussXY; dx sx gx pointOp <W, H, pFactor , ...> square , mult , harrisCorner; // Hardware top function input output sxy gxy hc void harris_corner(hls::stream <inVecDataType > &out_s , hls::stream <outVecDataType > &in_s) { #pragma HLS dataflow dy sy gy // Stream definitions hls::stream <VecDataType1 > in_sx , in_sy , ...; hls::stream <VecDataType2 > ...; ... // Data path construction sobelX. run (Dx_s , in_sx); sobelY. run (Dy_s , in_sy); square. run (Mx_s , Dx_s1 , square_kernel); square. run (My_s , Dy_s1 , square_kernel); mult. run (Mxy_s , Dy_s2 , Dx_s2 , mult_kernel); gaussX. run (Gx_s , Mx_s , gauss_kernel); gaussY. run (Gy_s , My_s , gauss_kernel); gaussXY. run (Gxy_s , Mxy_s , gauss_kernel); harrisCorner. run (out_s , Gxy_s , Gy_s , Gx_s , threshold_kernel); } FSP’17 6 M. Akif Özkan | Hardware/Software Co-Design | A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis

Specification of a Data Path Data path is a regular C ++ function point operator reads from an input data element local operator reads from a window (2D array) outDataType datapath( inDataType in_d){ #pragma HLS inline return in_d * in_d; } Datapath of a multiplication (point operator). FSP’17 7 M. Akif Özkan | Hardware/Software Co-Design | A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis

Specification of a Data Path Data path is a regular C ++ function point operator reads from an input data element local operator reads from a window (2D array) outDataT datapath( inDataT win[KernelH ][ KernelW ]){ #pragma HLS inline unsigned sum=0; for(uint j=0; j<KernelH; j++){ #pragma HLS unroll for(uint i=0; i<KernelW; i++){ #pragma HLS unroll sum += win[j][i]; } } return ( outDataT )(sum / (KernelH*KernelW)); } Datapath of a mean filter (local operator). FSP’17 7 M. Akif Özkan | Hardware/Software Co-Design | A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis

A Highly Efficient and Comprehensive Image Processing Library for C - PowerPoint PPT Presentation

A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis M. Akif zkan, Oliver Reiche, Frank Hannig, and Jrgen Teich Hardware/Software Co-Design, Friedrich-Alexander University Erlangen-Nrnberg FSP

Introduction: What is Image Processing? CS 4640: Image Processing Basics January 10, 2012 What

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

Image Processing CS 110 Why Image Processing? Medical Images

Highly Efficient Gradient Computation for Highly Efficient Gradient Computation for Density-

Color image processing The use of color in image processing is primarily motivated by two Image

Image Transforma1ons image filtering : change range of image Image Processing : g(x) =

Image restoration IMAGE P ROCES S IN G IN P YTH ON Rebeca Gonzalez Data Engineer Restore an

David Tschumperl Image Team, GREYC / CNRS (UMR 6072) IPOL Workshop on Image Processing

CCD Image Processing: CCD Image Processing: [ ] [ ] r x y , d x y , Raw File [ ]

CCD Image Processing: CCD Image Processing: Issues & Solutions Issues & Solutions 1

Lecture 1 Introduction Objectives Digital image processing, Why? Scope of digital image

PLT Project SIP(Simplified Image Processing) A Language for image processing Why SIP ??

Introduction to Digital Image Processing Asim Banerjee IEEE Workshop on Image Processing. 1 st

BBM 413 Today Fundamentals of What is image processing? Image Processing What does it

Barriers to Successful Martineau RILS, September 2018 Implementation of Balanced Assessment

Yocto for PELUX 2018-03-29 1 The Yocto Project The Yocto Project is an open source

Garden of the Gods F acility Improvement Plan Parks Board Feb 14 th , 2019 David Deitemeyer

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

Application scenarios Desmond van der Meer Sergey Gerasimenko Content selection: Browsing

Dynamic Document Generation in Stata Bill Rising StataCorp LLC 2017 Brazilian Stata Users Group

Organization for the Organization for the Advancement of Structured Advancement of Structured

Matplotlib Neelofer Banglawala nbanglaw@epcc.ed.ac.uk Kevin Stratford

A Highly Efficient and Comprehensive Image Processing Library for C - PowerPoint PPT Presentation

A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis M. Akif zkan, Oliver Reiche, Frank Hannig, and Jrgen Teich Hardware/Software Co-Design, Friedrich-Alexander University Erlangen-Nrnberg FSP

Introduction: What is Image Processing? CS 4640: Image Processing Basics January 10, 2012 What

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

Image Processing CS 110 Why Image Processing? Medical Images

Highly Efficient Gradient Computation for Highly Efficient Gradient Computation for Density-

Color image processing The use of color in image processing is primarily motivated by two Image

Image Transforma1ons image filtering : change range of image Image Processing : g(x) =

Image restoration IMAGE P ROCES S IN G IN P YTH ON Rebeca Gonzalez Data Engineer Restore an

David Tschumperl Image Team, GREYC / CNRS (UMR 6072) IPOL Workshop on Image Processing

CCD Image Processing: CCD Image Processing: [ ] [ ] r x y , d x y , Raw File [ ]

CCD Image Processing: CCD Image Processing: Issues &amp; Solutions Issues &amp; Solutions 1

Lecture 1 Introduction Objectives Digital image processing, Why? Scope of digital image

PLT Project SIP(Simplified Image Processing) A Language for image processing Why SIP ??

Introduction to Digital Image Processing Asim Banerjee IEEE Workshop on Image Processing. 1 st

BBM 413 Today Fundamentals of What is image processing? Image Processing What does it

Barriers to Successful Martineau RILS, September 2018 Implementation of Balanced Assessment

Yocto for PELUX 2018-03-29 1 The Yocto Project The Yocto Project is an open source

Garden of the Gods F acility Improvement Plan Parks Board Feb 14 th , 2019 David Deitemeyer

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

Application scenarios Desmond van der Meer Sergey Gerasimenko Content selection: Browsing

Dynamic Document Generation in Stata Bill Rising StataCorp LLC 2017 Brazilian Stata Users Group

Organization for the Organization for the Advancement of Structured Advancement of Structured

Matplotlib Neelofer Banglawala nbanglaw@epcc.ed.ac.uk Kevin Stratford

CCD Image Processing: CCD Image Processing: Issues & Solutions Issues & Solutions 1