High Level Synthesis Eunike, Pierri, Matthew Seminar Overview - PowerPoint PPT Presentation

High Level Synthesis Eunike, Pierri, Matthew

Seminar Overview Significance of HLS Breakdown of HLS Possibilities of HLS Eunike Pierri Matthew Overview How it works The future of ● ● ● ● What’s so good HLS about it ● What are the challenges it faces

Introduction to HLS

Software vs. Hardware SOFTWARE HARDWARE ONE SPEEDY BOI

WHAT IS HIGH-LEVEL SYNTHESIS? “[a design process which enables] the automatic synthesis of high level, untimed or partially timed specifications, such as C or high level, untimed or partially timed specifications, such as C or SystemC, to low level cycle-accurate RTL specifications for SystemC, to low level cycle-accurate RTL specifications for * efficient implementation in ASICS or FPGAs” efficient implementation in ASICS or FPGAs” * Cong, J. et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 473 (2011)

BENEFITS OF HLS General Decreases code complexity ● Perspective ● Codesign and coverification Software Perspective

SOFTWARE PERSPECTIVE “RTL programming in VHDL or Verilog is unacceptable to most unacceptable unacceptable * software application developers...” * Cong, J. et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 474 (2011)

BENEFITS OF HLS General Decreases code complexity ● Perspective ● Codesign and coverification Software Don’t need hardware expertise ● Perspective ● Can benefit from hardware performance Hardware ● Can design faster Perspective ● Can experiment with hardware faster

DOWNFALLS OF HLS Timing, interface information and constraints ● Design need to be specified Specifications ● Cannot be implemented on different targets ● Lack of built-in constructs eg. bit accuracy Choice of specification, timing, concurrency... Language ● Complex constructs eg. pointers, dynamic memory management, polymorphism… ● Too many options in the past

HLS: How it Works

Stages Parsing & Optimisation Scheduling Binding ● Transform C, C++ code into ● Sort the operations of the IR ● Choose the hardware to be an intermediate into a series of control steps used for each operation representation (IR) (library components, muxes, ● Can be optimised for minimum etc.) ● Can take advantage of resources or time existing tools, e.g. gcc ● Introduce registers where ● Available resource/time values are used across cycles constraints can be specified

Parsing & Optimisation Goal: Transform high-level code (C, C++) into IR Parsing & Optimisation Scheduling Binding ● Typical IR is a control & data flow graph ( CDFG ) ● Each node represents a simple operation, ● Transform C, C++ code into ● Sort the operations of the IR ● Choose the hardware to be an intermediate into a series of control steps used for each operation e.g. add, read/write, compare representation (IR) (library components, muxes, ● Can be optimised for minimum etc.) ● Can take advantage of resources or time ● Parsing and optimisation of high-level code can be existing tools, e.g. gcc ● Introduce registers where done using existing tools like gcc ● Available resource/time values are used across cycles constraints can be specified ● Besides the usual optimisation techniques, some HLS-specific optimisations can be used out = (A+B) * (B-C);

Parsing & Optimisation Optimisations ● Constant propagation/dead code elimination ○ Typical compiler technique - avoid recalculation of constant values at run-time int a = 30; int b = 9 - (a / 5) int c = b*4; int c = 12; if (c > 10) { if (true) { c -= 10; c = 2; } } return c * (60 / a); return c * 2; return 4;

Parsing & Optimisation ● Loop unrolling & pipelining Unrolling is typical - write out iterations manually to reduce branching ○ ○ On an FPGA we can also execute multiple iterations simultaneously ○ Pipelining is done by starting a new loop iteration as soon as data dependencies are cleared, even if the previous one is still in progress ○ May even be able to use the same components, depending on the datapath ● If-conversion Better than branch prediction - execute both branches in parallel, and discard the incorrect ○ one’s results Can provide nearly zero-cost branches in some situations ○

Parsing & Optimisation ● Strength reduction/simplification Replace operators with less expensive equivalents ○ ○ May also use more specific operators if available, e.g. add increment res = x % (2^n); res = x & (2^n - 1); ● Range analysis FPGA datapath width can be freely changed, unlike processors with a fixed bus size ○ ○ Track range of values through a program to minimise bit width of variables and operators 0..4 ?? ADD 0..3

Parsing & Optimisation ● Bitwise analysis ???? Variant of range analysis using bitwise checks ○ __?_ AND ○ Performed together with range analysis, as results 0010 are better in some cases and worse in others 0..15 ???? 0..60 SHL ????__ SHL 2 0010 Range - 6 bits Bit width - 4 bits! ● The LegUp HLS tool also performs profiling-based range analysis, where actual runtime values are recorded and bit-widths are adjusted based on that data

Parsing & Optimisation ● Memory analysis Identify opportunities for parallelism in memory accesses, e.g. writing an array ○ ○ May involve splitting an array across multiple memory banks to allow simultaneous access ○ Array scalarization can be applied to remove a memory access altogether Instead of instantiating a memory component for an array, convert it to a list of registers ○ A0 = A0 + x; for (i = 0; i < 4; i++) { A1 = A1 + x; A[i] = A[i] + x; A2 = A2 + x; } A3 = A3 + x; ○ The above example saves a read & write cycle per iteration, and all 4 iterations can be performed at once on the right

Parsing & Optimisation Scheduling Goal: Organise the CDFG into a series of control steps ● Memory analysis Identify opportunities for parallelism in memory accesses, e.g. writing an array ○ ● Each operation is assigned a control step, which typically corresponds to a ○ May involve splitting an array across multiple memory banks to allow simultaneous access single clock cycle ○ Array scalarization can be applied to remove a memory access altogether Instead of instantiating a memory component for an array, convert it to a list of registers ○ Each of the control steps will eventually become a state in a finite state ● machine , which is the final RTL output of the HLS process A0 = A0 + x; for (i = 0; i < 4; i++) { A1 = A1 + x; A[i] = A[i] + x; A2 = A2 + x; } ● Time and resource constraints can be specified (e.g. function f must finish A3 = A3 + x; within 4 cycles, using at most 2 adders and 1 multiplier) ○ The above example saves a read & write cycle per iteration, and all 4 iterations can be performed at once on the right

Scheduling ● A fully organised CDFG is a schedule , and many schedules are possible for each CDFG ● Computing one is an NP-complete problem - many algorithms have been developed based on heuristics to find optimal results

Scheduling ASAP (As Soon As Possible) ● From first to last operation, inserts into the earliest control step ● To schedule a new operation, its predecessors must have been scheduled in an earlier step ALAP (As Late As Possible) ● Opposite of ASAP, starts at final operation and inserts into the latest control step Requires successors to have been scheduled in a later step ● Both of the above finish successfully if all operations have been scheduled. Both assume infinite resources (i.e. no resource constraints, only time)

Scheduling Example (4 cycle time constraint): ALAP: 2 less multipliers, 1 more adder CDFG ASAP ALAP

Scheduling FDS (Force Directed Scheduling) ● Combines ASAP and ALAP to maximise resource utilization, and therefore minimise total resources required First calculate both ASAP and ALAP . Any operations that have the same step in ● both can remain unchanged. ● The remaining ones could potentially be scheduled anywhere between their ASAP location and ALAP location This difference in steps is called the range ●

Scheduling CDFG ASAP ALAP Working with one type of operation at a time, try each possible control step, calculating ● the cost function each time to find the minimum ● The cost function is probability-based and takes into account the expected operations that will be required in each step Scheduling an operator can cause the cost function to change due to data dependencies ●

Scheduling List Scheduling ● Unlike the previous time-constrained algorithms, LS is resource-constrained ● Working 1 control step at a time, LS schedules as many operations as possible, subject to data dependencies and resource constraints ● If multiple operations are competing for a resource, one is chosen based on a priority function ● This function is typically its ASAP/ALAP range , where operations with smaller ranges are given higher priority

High Level Synthesis Eunike, Pierri, Matthew Seminar Overview - PowerPoint PPT Presentation

High Level Synthesis Eunike, Pierri, Matthew Seminar Overview Significance of HLS Breakdown of HLS Possibilities of HLS Eunike Pierri Matthew Overview How it works The future of Whats so good HLS about it

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng

Towards Layout-Friendly High-Level Synthesis Jason Cong UCLA Bin Liu UCLA Peking University

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

Co-synthesis techniques for embedded systems embedded systems Kelvin Yuk June 5, 2002 EEC282 -

PowerWizard Level 1.0 & Level 2.0 Control Systems Training Systems Comparison Level 2

Hardware Design for Cryptographers P . Schaumont Bradley Department of Electrical and Computer

CSSE 232 Computer Architecture I Verilog 1 / 10 What it is Verilog is hardware description

Computer Graphics Texture mapping and parameterization By Olga Sorkine Some slides courtesy of

Seminar: Current Topics in Computer Graphics & Geometry Processing Johannes Frohn, Henrik

Tools in CMOS design Alexander Aulin Niklas Claesson 21 februari 2012 Introduction Table of

autoVHDL: A Domain-Specific Modeling Language for the Auto-Generation of VHDL Core Wrappers Erica

Vhdl Bounded Model Checker (VBMC): A Formal Verification Tool for VHDL Designs Ajith John, A. K.

ECE U530 Digital Hardware Synthesis Prof. Miriam Leeser mel@coe.neu.edu Sept 11, 2006

High Level Synthesis Eunike, Pierri, Matthew Seminar Overview - PowerPoint PPT Presentation

High Level Synthesis Eunike, Pierri, Matthew Seminar Overview Significance of HLS Breakdown of HLS Possibilities of HLS Eunike Pierri Matthew Overview How it works The future of Whats so good HLS about it

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci &amp; Eng

Towards Layout-Friendly High-Level Synthesis Jason Cong UCLA Bin Liu UCLA Peking University

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

Co-synthesis techniques for embedded systems embedded systems Kelvin Yuk June 5, 2002 EEC282 -

PowerWizard Level 1.0 &amp; Level 2.0 Control Systems Training Systems Comparison Level 2

Hardware Design for Cryptographers P . Schaumont Bradley Department of Electrical and Computer

CSSE 232 Computer Architecture I Verilog 1 / 10 What it is Verilog is hardware description

Computer Graphics Texture mapping and parameterization By Olga Sorkine Some slides courtesy of

Seminar: Current Topics in Computer Graphics &amp; Geometry Processing Johannes Frohn, Henrik

Tools in CMOS design Alexander Aulin Niklas Claesson 21 februari 2012 Introduction Table of

autoVHDL: A Domain-Specific Modeling Language for the Auto-Generation of VHDL Core Wrappers Erica

Vhdl Bounded Model Checker (VBMC): A Formal Verification Tool for VHDL Designs Ajith John, A. K.

ECE U530 Digital Hardware Synthesis Prof. Miriam Leeser mel@coe.neu.edu Sept 11, 2006

High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng

PowerWizard Level 1.0 & Level 2.0 Control Systems Training Systems Comparison Level 2

Seminar: Current Topics in Computer Graphics & Geometry Processing Johannes Frohn, Henrik