I n t r o d u c t i o n t o H i g h - l e v e - - PowerPoint PPT Presentation

i n t r o d u c t i o n t o h i g h l e v e l s y n t h e
SMART_READER_LITE
LIVE PREVIEW

I n t r o d u c t i o n t o H i g h - l e v e - - PowerPoint PPT Presentation

J o i n t I C T P - I A E A S c h o o l o n Z y n q - 7 0 0 0 S o C a n d i t s A p p l i c a t i o n s f o r N u c l e a r a n d R e l a t e d I n s t r u me n t a t


slide-1
SLIDE 1

Smr3143 – ICTP & IAEA (Aug. & Sept. 2017)

J

  • i

n t I C T P

  • I

A E A S c h

  • l
  • n

Z y n q

  • 7

S

  • C

a n d i t s A p p l i c a t i

  • n

s f

  • r

N u c l e a r a n d R e l a t e d I n s t r u me n t a t i

  • n

I n t r

  • d

u c t i

  • n

t

  • H

i g h

  • l

e v e l S y n t h e s i s

Fernando Rincón

fernando.rincon@uclm.es

slide-2
SLIDE 2

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 2

Contents

  • What is High-level Synthesis?
  • Why HLS?
  • How Does it Work?
  • HLS Coding
  • An example: Matrix Multiplication
  • Validation Flow
  • RTL Export
  • Design analysis
slide-3
SLIDE 3

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 3

What is High-level Synthesis?

  • Compilation of behavioral algorithms into RTL descriptions

Algorithm B e h a v i

  • r

a l D e s c r i p t i

  • n

R T L D e s c r i p t i

  • n

Datapath Finite State Machine Constraints

I/O description Timing Memory

Control & datapath extraction

Micro-architecture evaluation Operations Extraction

H L S

slide-4
SLIDE 4

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 4

Why HLS?

  • Need for productivity improvement at design level

– Design Space Exploration – Reduced Time-to-market – Trend to use FPGAs as Hw accelerators

  • Electronic System Level Design is based in

– Hw/Sw Co-design

  • SystemC / SystemVerilog
  • Transaction-Level Modelling

– One common C-based description of the system – Iterative refnement – Intregration of models at a very diferent level of abstraction – But need an efcient way to get to the silicon

  • Rising the level of abstraction enables Sw programmers to have access to

silicon

slide-5
SLIDE 5

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 5

Why HLS?

Video Design Example Input C Simulation Time RTL Simulation Time Improvement 10 frames 1280x720 10s ~2 days (ModelSim) ~12000x

RTL (Spec) RTL (Spec) RTL (Sim) RTL (Sim) C (Spec/Sim) C (Spec/Sim) RTL (Sim) RTL (Sim)

slide-6
SLIDE 6

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 6

HLS Benefts

  • Design Space Exploration

– Early estimation of main design variables: latency, performance,

consumption

– Can be targeted to diferent technologies

  • Verifcation

– Reuse of C-based testbenches – Can be complemented with formal verifcation

  • Reuse

– Higher abstraction provides better reuse opportunities – Cores can be exported to diferent bus technologies

slide-7
SLIDE 7

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 7

Design Space Exploration

… l

  • p

: f

  • r

( i = 3 ; i > = ; i

  • )

{ i f ( i = = ) { a c c + = x * c [ ] ; s h i f t _ r e g [ ] = x ; } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i

  • 1

] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } … . … l

  • p

: f

  • r

( i = 3 ; i > = ; i

  • )

{ i f ( i = = ) { a c c + = x * c [ ] ; s h i f t _ r e g [ ] = x ; } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i

  • 1

] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } … .

S s a me h a r d w a r e i s u s e d f

  • r

e a c h l

  • p

i t e r a t i

  • n

:

  • S

ma l l a r e a

  • L
  • n

g l a t e n c y

  • L
  • w

t h r

  • u

g h p u t D i ff e r e n t i t e r a t i

  • n

s e x e c u t e d c

  • n

c u r r e n t l y :

  • H

i g h e r a r e a

  • S

h

  • r

t l a t e n c y

  • B

e s t t h r

  • u

g h p u t D i ff e r e n t h a r d w a r e f

  • r

e a c h l

  • p

i t e r a t i

  • n

:

  • H

i g h e r a r e a

  • S

h

  • r

t l a t e n c y

  • B

e t t e r t h r

  • u

g h p u t

slide-8
SLIDE 8

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 8

How Does it Work? - Control Extraction

v

  • i

d f i r ( d a t a _ t * y , c

  • e

f _ t c [ 4 ] , d a t a _ t x ) { s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; i n t i ; a c c = ; l

  • p

: f

  • r

( i = 3 ; i > = ; i

  • )

{ i f ( i = = ) { a c c + = x * c [ ] ; s h i f t _ r e g [ ] = x ; } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i

  • 1

] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } * y = a c c ; } v

  • i

d f i r ( d a t a _ t * y , c

  • e

f _ t c [ 4 ] , d a t a _ t x ) { s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; i n t i ; a c c = ; l

  • p

: f

  • r

( i = 3 ; i > = ; i

  • )

{ i f ( i = = ) { a c c + = x * c [ ] ; s h i f t _ r e g [ ] = x ; } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i

  • 1

] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } * y = a c c ; }

C

  • d

e

F r

  • m

a n y C c

  • d

e e x a m- p l e . . T h e l

  • p

s i n t h e C c

  • d

e c

  • r

r e l a t e d t

  • s

t a t e s

  • f

b e h a v i

  • r

F u n c t i

  • n

S t a r t F

  • r
  • L
  • p

S t a r t F

  • r
  • L
  • p

E n d F u n c t i

  • n

E n d

2 2

C

  • n

t r

  • l

B e h a v i

  • r

1 1

F i n i t e S t a t e Ma c h i n e ( F S M) s t a t e s

T h i s b e h a v i

  • r

i s e x t r a c t e d i n t

  • a

h a r d w a r e s t a t e ma

  • c

h i n e

slide-9
SLIDE 9

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 9

How does it work? - Datapath Extraction

v

  • i

d f i r ( d a t a _ t * y , c

  • e

f _ t c [ 4 ] , d a t a _ t x ) { s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; i n t i ; a c c = ; l

  • p

: f

  • r

( i = 3 ; i > = ; i

  • )

{ i f ( i = = ) { a c c + = x * c [ ] ; s h i f t _ r e g [ ] = x ; } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i

  • 1

] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } * y = a c c ; } v

  • i

d f i r ( d a t a _ t * y , c

  • e

f _ t c [ 4 ] , d a t a _ t x ) { s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; i n t i ; a c c = ; l

  • p

: f

  • r

( i = 3 ; i > = ; i

  • )

{ i f ( i = = ) { a c c + = x * c [ ] ; s h i f t _ r e g [ ] = x ; } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i

  • 1

] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } * y = a c c ; }

C

  • d

e

F r

  • m

a n y C c

  • d

e e x a m- p l e . .

O p e r a t i

  • n

s

O p e r a t i

  • n

s a r e e x t r a c t e d …

  • ==

+ >= * + *

RDx WRy RDc

C

  • n

t r

  • l

& D a t a p a t h B e h a v i

  • r

A u n i fi e d c

  • n

t r

  • l

d a t a fl

  • w

b e

  • h

a v i

  • r

i s c r e a t e d .

Control Dataflow

>=

  • +

== * + *

WRy

  • RDx

RDc

S c h e d u l i n g + B i n d i n g

slide-10
SLIDE 10

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 10

How Does it Work? - Scheduling & Binding

  • Scheduling and Binding are at the heart of HLS
  • Scheduling determines in which clock cycle an operation will occur

– Takes into account the control, datafow and user directives – The allocation of resources can be constrained

  • Binding determines which library cell is used for each operation

– Takes into account component delays, user directives

Design Source

(C, C++, SystemC)

Design Source

(C, C++, SystemC)

Scheduling Scheduling Binding Binding

RTL

(Verilog, VHDL, SystemC)

RTL

(Verilog, VHDL, SystemC)

Technology Library Technology Library User Directives User Directives

slide-11
SLIDE 11

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 11

How Does it Work? - Scheduling

void foo ( … t1 = a * b; t2 = c + t1; t3 = d * t2;

  • ut = t3 – e;

} void foo ( … t1 = a * b; t2 = c + t1; t3 = d * t2;

  • ut = t3 – e;

} + +

* *

a b c

  • *

*

d e

  • ut

* *

  • *

*

+ +

Schedule 1

* *

  • *

*

+ +

Schedule 2 Wh e n a f a s t e r t e c h n

  • l
  • g

y

  • r

s l

  • w

e r c l

  • c

k . . .

  • Operations are mapped into clock cycles, depending on timing,

resources, user directives, ...

slide-12
SLIDE 12

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 12

How Does it Work? - Allocation & Binding

Operations are assigned to functional units available in the library

slide-13
SLIDE 13

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 13

Vivado HLS

RTL RTL RTL RTL RTL RTL RTL RTL RTL RTL RTL RTL

……………… ……………… ……………… ………………

VHDL Verilog System C VHDL Verilog System C

Vivado HLS Vivado HLS

Constraints/ Direc- tives Constraints/ Direc- tives

……………… ……………… ……………… ………………

C, C++, Sys- temC C, C++, Sys- temC RTL Export RTL Export IP-XACT IP-XACT Sys Gen Sys Gen PCore PCore

  • High-level Synthesis Suite from Xilinx

TODO: transparencia de transición Visualizar el flujo

slide-14
SLIDE 14

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 14

Source Code: Language Support

  • Vivado HLS supports C, C++, SystemC and OpenCL API C kernel

– Provided it is statically defned at compile time – Default extensions: .c for C / .cpp for C++ & SystemC

  • Modeling with bit-accuracy

– Supports arbitrary precision types for all input languages – Allowing the exact bit-widths to be modeled and synthesized

  • Floating point support

– Support for the use of foat and double in the code

  • Support for OpenCV functions

– Enable migration of OpenCV designs into Xilinx FPGA – Libraries target real-time full HD video processing

slide-15
SLIDE 15

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 15

Source Code: Key Attributes

v

  • i

d fir ( d a t a _ t * y , c

  • e

f _ t c [ 4 ] , d a t a _ t x ) { s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; int i ; a c c = ; l

  • p

: for ( i = 3 ; i > = ; i

  • )

{ i f ( i = = ) { a c c + = x * c [ ] ; shift_reg[0]= x ; } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i

  • 1

] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } * y = a c c ; } v

  • i

d fir ( d a t a _ t * y , c

  • e

f _ t c [ 4 ] , d a t a _ t x ) { s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; int i ; a c c = ; l

  • p

: for ( i = 3 ; i > = ; i

  • )

{ i f ( i = = ) { a c c + = x * c [ ] ; shift_reg[0]= x ; } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i

  • 1

] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } * y = a c c ; }

F u n c t i

  • n

s : Represent the design hierarchy L

  • p

s : Their scheduling has major impact on area and performance A r r a y s : Mapped into memory. May become main performance bottlenecks O p e r a t

  • r

s : Can be shared or replicated to meet performance T y p e s : Type infuences area and performance T

  • p

L e v e l I O : Top-level arguments determine Interface ports

  • Only one top-level function is allowed
slide-16
SLIDE 16

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 16

Functions & RTL Hierarchy

v

  • i

d A ( ) { . . b

  • d

y A . . } v

  • i

d B ( ) { . . b

  • d

y B . . } v

  • i

d C ( ) { B ( ) ; } v

  • i

d D ( ) { B ( ) ; } v

  • i

d f

  • _

t

  • p

( ) { A ( … ) ; C ( … ) ; D ( … ) } v

  • i

d A ( ) { . . b

  • d

y A . . } v

  • i

d B ( ) { . . b

  • d

y B . . } v

  • i

d C ( ) { B ( ) ; } v

  • i

d D ( ) { B ( ) ; } v

  • i

d f

  • _

t

  • p

( ) { A ( … ) ; C ( … ) ; D ( … ) }

foo_top

A C B D B S

  • u

r c e C

  • d

e R T L h i e r a r c h y

my_code.c my_code.c

  • Each function is translated into an RTL block.
  • Can be shared or inlined (dissolved)
slide-17
SLIDE 17

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 17

Operator Types

  • S

t a n d a r d C T y p e s

– Integers:

  • long long

=> 64 bits

  • int => 32 bits
  • short => 16 bits

– Characters:

  • char => 8 bits

– Floating Point

  • Float => 32 bits
  • Double => 64 bits
  • A

r b i t r a r y P r e c i s s i

  • n

T y p e s

– C

  • ap(u)int => (1-1024)

C++:

  • ap_(u)int => (1-1024)
  • ap_fixed

C++ / SystemC:

  • sc_(u)int => (1-1024)
  • sc_fixed
  • They defne the size of the hardware used
slide-18
SLIDE 18

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 18

Loops

  • Rolled by default

– Each iteration implemented in the same state – Each iteration implemented with the same resources

  • Loops can be unrolled if their indices are statically determinable at

elaboration time

– Not when the number of iterations is variable – Result in more elements to schedule but greater operator mobility

void foo_top (…) { ... Add: for (i=3;i>=0;i--) { b = a[i] + b; ... } void foo_top (…) { ... Add: for (i=3;i>=0;i--) { b = a[i] + b; ... }

f

  • _

t

  • p

+ +

Synthesis Synthesis

a [ N ] b

N N

slide-19
SLIDE 19

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 19

void fir ( … acc=0; loop: for (i=3;i>=0;i--) { if (i==0) { acc+=x*c[0]; shift_reg[0]=x; } else { shift_reg[i]=shift_reg[i-1]; acc+=shift_reg[i]*c[i]; } } *y=acc; } void fir ( … acc=0; loop: for (i=3;i>=0;i--) { if (i==0) { acc+=x*c[0]; shift_reg[0]=x; } else { shift_reg[i]=shift_reg[i-1]; acc+=shift_reg[i]*c[i]; } } *y=acc; }

+ + == ==

  • >=

>=

RDx RDx

* *

+ + == ==

  • >=

>=

* *

+ + == ==

  • >=

>=

* *

+ + == ==

  • >=

>=

* *

WRy WRy

Iteration 1 Iteration 2 Iteration 3 Iteration 4

  • RDc

RDc RDc RDc RDc RDc RDc RDc

The read X operation has good mobility

Data Dependencies: Good

Default Schedule

  • Example of good mobility

– The read on data port X can occur anywhere from the start to iteration 4

  • The only constraint on RDx is that it occur before the fnal multiplication

– Vivado HLS has a lot of freedom with this operation

  • It waits until the read is required, saving a register
  • Input reads can be optionally registered
slide-20
SLIDE 20

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 20

Data Dependencies: Bad

  • The fnal multiplication must occur before the read and fnal addition
  • Loops are rolled by default

– Each iteration cannot start till the previous iteration completes – The fnal multiplication (in iteration 4) must wait for earlier iterations to

complete

  • The structure of the code is forcing a particular schedule

– There is little mobility for most operations

void fir ( … acc=0; loop: for (i=3;i>=0;i--) { if (i==0) { acc+=x*c[0]; shift_reg[0]=x; } else { shift_reg[i]=shift_reg[i-1]; acc+=shift_reg[i]*c[i]; } } *y=acc; } void fir ( … acc=0; loop: for (i=3;i>=0;i--) { if (i==0) { acc+=x*c[0]; shift_reg[0]=x; } else { shift_reg[i]=shift_reg[i-1]; acc+=shift_reg[i]*c[i]; } } *y=acc; }

+ + == ==

  • >=

>=

RDx RDx

* *

+ + == ==

  • >=

>=

* *

+ + == ==

  • >=

>=

* *

+ + == ==

  • >=

>=

* *

WRy WRy

Iteration 1 Iteration 2 Iteration 3 Iteration 4

  • RDc

RDc RDc RDc RDc RDc RDc RDc

Mult is very constrained

Default Schedule

slide-21
SLIDE 21

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 21

Arrays

void foo_top(int x, …) { int A[N]; L1: for (i = 0; i < N; i++) A[i+x] = A[i] + i; } void foo_top(int x, …) { int A[N]; L1: for (i = 0; i < N; i++) A[i+x] = A[i] + i; }

N-1 N-1 N-2 N-2 … … 1 1

Synthesis Synthesis

foo_top

DOUT DIN ADDR CE WE

SPRAMB A[N]

A_out A_in

  • By default implemeted as RAM

– Dual port if performance can be improved otherwise Single Port RAM – optionally as a FIFO or registers bank

  • Can be targeted to any memory resource in the library
  • Can be merged with other arrays and reconfgured
  • Arrays can be partitioned into individual elements

– Implemented as smaller RAMs or registers

slide-22
SLIDE 22

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 22

Top-Level IO Ports

sum_dataout in1 in2 ap_done ap_start ap_idle ap_ return in1_ap_ack in2_read in1_ap_vld in2_empty_n sum_req_write sum_rsp_read sum_req_din sum_address sum_size sum_req_full_n sum_rsp_empty_n sum_datain

adders

ap_clk ap_rst

22- 22

#include "adders.h" int adders(int in1, int in2, int *sum) { int temp; *sum = in1 + in2 + *sum; temp = in1 + in2; return temp; } #include "adders.h" int adders(int in1, int in2, int *sum) { int temp; *sum = in1 + in2 + *sum; temp = in1 + in2; return temp; }

slide-23
SLIDE 23

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 23

An example: Matrix Multiply

Loop Latency Iteration latency Trip count Initiation interval a_row_loop 37408 2338 16 b_col_loop 2336 146 16 a_col_loop 144 9 16 C l

  • c

k c y c l e : 6.68 ns Resources

BRAM DSP FF LUT

Total 4 207 170

slide-24
SLIDE 24

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 24

Pipelined version

Loop Latency Iteration latency Trip count Initiation interval all_fused 4105 11 4096 1 C l

  • c

k c y c l e : 7.83 ns Resources

BRAM DSP FF LUT

Total 4 45 21

Latency = 3 cycles Throughput = 1 cycle

RD RD CMP CMP WR WR RD RD CMP CMP WR WR

Loop Latency = 4 cycles

RD RD CMP CMP WR WR

slide-25
SLIDE 25

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 25

Parallel Dot-Product MM

Loop Latency Iteration latency Trip count Initiation interval all_fused 264 10 256 1 C l

  • c

k c y c l e : 7.23 ns Resources

BRAM DSP FF LUT

Total 64 720 336

slide-26
SLIDE 26

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 26

18-bit Parallel Dot-Product MM

Loop Latency Iteration latency Trip count Initiation interval all_fused 260 6 256 1 C l

  • c

k c y c l e : 7.64 ns Resources

BRAM DSP FF LUT

Total 16 560 214

slide-27
SLIDE 27

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 27

Pipelined Floating-Point MM

Loop Latency Iteration latency Trip count Initiation interval all_fused 2125 8 256 8 C l

  • c

k c y c l e : 8.03 ns Resources

BRAM DSP FF LUT

Total 10 696 1424

slide-28
SLIDE 28

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 28

MM Interface Synthesis

RTL ports dir bits Protocol C Type

ap_clk in 1 ap_ctrl_hs return value ap_rst in 1 ap_ctrl_hs return value ap_start in 1 ap_ctrl_hs return value ap_done

  • ut

1 ap_ctrl_hs return value ap_idle

  • ut

1 ap_ctrl_hs return value ap_ready

  • ut

1 ap_ctrl_hs return value in_a_address0

  • ut

8 ap_memory array in_a_ce0

  • ut

1 ap_memory array in_a_q0 in 32 ap_memory array in_b_address0

  • ut

8 ap_memory array in_b_ce0

  • ut

1 ap_memory array in_b_q0 in 32 ap_memory array in_c_address0

  • ut

8 ap_memory array in_c_ce0

  • ut

1 ap_memory array in_c_we0

  • ut

1 ap_memory array in_c_d0

  • ut

32 ap_memory array

F u n c t i

  • n

a c t i v a t i

  • n

i n t e r f a c e S y n t h e s i z e d me mo r y p

  • r

t s C a n b e d i s a b l e d a p _ c

  • n

t r

  • l

_ n

  • n

e A l s

  • d

u a l

  • p
  • r

t e d I n t h e a r r a y p a r t i t i

  • n

e d V e r s i

  • n

, 1 6 me m p

  • r

t s . O n e p e r p a r t i a l p r

  • d

u c t

slide-29
SLIDE 29

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 29

Validation Flow

  • There are two steps to verifying the

design

– Pre-synthesis: C Validation – Validate the algorithm is correct

  • Post-synthesis: RTL Verifcation

– Verify the RTL is correct

  • C validation

– A HUGE reason users want to use HLS

  • Fast, free verifcation

– Validate the algorithm is correct before

synthesis

  • Follow the test bench tips given over
  • RTL Verifcation

– Vivado HLS can co-simulate the RTL

with the

  • riginal test bench

Validate C Verify RTL

slide-30
SLIDE 30

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 30

Test benches

  • The test bench should be in a separate fle
  • Or excluded from synthesis

– The Macro __SYNTHESIS__ can be used to isolate code which will not

be synthesized

// test.c #include <stdio.h> void test (int d[10]) { int acc = 0; int i; for (i=0;i<10;i++) { acc += d[i]; d[i] = acc; } } #ifndef __SYNTHESIS__ int main () { int d[10], i; for (i=0;i<10;i++) { d[i] = i; } test(d); for (i=0;i<10;i++) { printf("%d %d\n", i, d[i]); } return 0; } #endif // test.c #include <stdio.h> void test (int d[10]) { int acc = 0; int i; for (i=0;i<10;i++) { acc += d[i]; d[i] = acc; } } #ifndef __SYNTHESIS__ int main () { int d[10], i; for (i=0;i<10;i++) { d[i] = i; } test(d); for (i=0;i<10;i++) { printf("%d %d\n", i, d[i]); } return 0; } #endif

Design to be synthesized Test Bench Nothing in this ifndef will be read by Vivado HLS (will be read by gcc)

slide-31
SLIDE 31

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 31

Test benches

  • Ideal test bench

– Should be self checking

  • RTL verifcation will re-use the C test bench

– If the test bench is self-checking

  • Allows RTL Verifcation to be run without a requirement to check the results again

– RTL verifcation “passes” if the test bench return value is 0 (zero)

  • Actively return a 0 if the simulation passes

– Non-synthesizable constructs may be added to a synthesize function if

__SYNTHESIS__ is used

int main () { // Compare results int ret = system("diff --brief -w test_data/output.dat test_data/output.golden.dat"); if (ret != 0) { printf("Test failed !!!\n", ret); return 1; } else { printf("Test passed !\n", ret); return 0; } int main () { // Compare results int ret = system("diff --brief -w test_data/output.dat test_data/output.golden.dat"); if (ret != 0) { printf("Test failed !!!\n", ret); return 1; } else { printf("Test passed !\n", ret); return 0; } The –w option ensures the “new- line” does not cause a difference between Windows and Linux files #ifndef __SYNTHESIS__ image_t *yuv = (image_t *)malloc(sizeof(image_t)); #else // Workaround malloc() calls w/o changing rest of code image_t _yuv; #endif #ifndef __SYNTHESIS__ image_t *yuv = (image_t *)malloc(sizeof(image_t)); #else // Workaround malloc() calls w/o changing rest of code image_t _yuv; #endif

slide-32
SLIDE 32

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 32

RTL Export

RTL output in Verilog, VHDL and SystemC RTL output in Verilog, VHDL and SystemC Scripts created for RTL synthesis tools Scripts created for RTL synthesis tools IP-XACT and SysGen => Vivado HLS for 7 Series and Zynq families PCore => Only Vivado HLS Standalone for all fami- lies IP-XACT and SysGen => Vivado HLS for 7 Series and Zynq families PCore => Only Vivado HLS Standalone for all fami- lies RTL Export to IP-XACT, SysGen, and Pcore for- mats RTL Export to IP-XACT, SysGen, and Pcore for- mats

slide-33
SLIDE 33

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 33

Design Analysis

  • Perspective for design analysis

– Allows interactive analysis

slide-34
SLIDE 34

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 34

Performance Analysis

slide-35
SLIDE 35

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 35

Resources Analisys

slide-36
SLIDE 36

Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 36

References

  • M. Fingerof, “High-Level Synthesis Blue Book”, X libris Corporation,

2010

  • P. Coussy, A. Morawiec, “High-Level Synthesis:

from Algorithm to Digital Circuit”, Springer, 2008

  • “High-Level Synthesis Flow on Zynq” Course materials

from the Xilinx University Program, 2016