J o i n t I C T P - I A E A S c h o o l o n Z y n q - 7 0 0 0 S o C a n d i t s A p p l i c a t i o n s f o r N u c l e a r a n d R e l a t e d I n s t r u me n t a t i o n I n t r o d u c t i o n t o H i g h - l e v e l S y n t h e s i s Fernando Rincón fernando.rincon@uclm.es Smr3143 – ICTP & IAEA (Aug. & Sept. 2017)
Contents ● What is High-level Synthesis? ● Why HLS? ● How Does it Work? ● HLS Coding ● An example: Matrix Multiplication ● Validation Flow ● RTL Export ● Design analysis Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 2
What is High-level Synthesis? ● Compilation of behavioral algorithms into RTL descriptions B e h a v i o r a l D e s c r i p t i o n Algorithm Constraints I/O description H L S Timing Operations Extraction Memory Micro-architecture evaluation R T L D e s c r i p t i o n Control & datapath extraction Finite State Machine Datapath Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 3
Why HLS? ● Need for productivity improvement at design level – Design Space Exploration – Reduced Time-to-market – Trend to use FPGAs as Hw accelerators ● Electronic System Level Design is based in – Hw/Sw Co-design ● SystemC / SystemVerilog ● Transaction-Level Modelling – One common C-based description of the system – Iterative refnement – Intregration of models at a very diferent level of abstraction – But need an efcient way to get to the silicon ● Rising the level of abstraction enables Sw programmers to have access to silicon Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 4
Why HLS? Video Design Example RTL (Spec) RTL (Sim) RTL (Spec) RTL (Sim) Input C Simulation Time RTL Simulation Time Improvement 10 frames 10s ~2 days ~12000x C (Spec/Sim) RTL (Sim) C (Spec/Sim) RTL (Sim) 1280x720 (ModelSim) Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 5
HLS Benefts ● Design Space Exploration – Early estimation of main design variables: latency, performance, consumption – Can be targeted to diferent technologies ● Verifcation – Reuse of C-based testbenches – Can be complemented with formal verifcation ● Reuse – Higher abstraction provides better reuse opportunities – Cores can be exported to diferent bus technologies Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 6
Design Space Exploration … … l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { i f ( i = = 0 ) { i f ( i = = 0 ) { a c c + = x * c [ 0 ] ; a c c + = x * c [ 0 ] ; s h i f t _ r e g [ 0 ] = x ; s h i f t _ r e g [ 0 ] = x ; } e l s e { } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } } S s a me h a r d w a r e i s u s e d f o r e a c h l o o p i t e r a t i o n : } D i ff e r e n t h a r d w a r e f o r e a c h l o o p i t e r a t i o n : D i ff e r e n t i t e r a t i o n s e x e c u t e d c o n c u r r e n t l y : • • S ma l l a r e a • … . H i g h e r a r e a H i g h e r a r e a … . • • L o n g l a t e n c y • S h o r t l a t e n c y S h o r t l a t e n c y • • L o w t h r o u g h p u t • B e t t e r t h r o u g h p u t B e s t t h r o u g h p u t Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 7
How Does it Work? - Control Extraction C o d e C o n t r o l B e h a v i o r v o i d f i r ( v o i d f i r ( d a t a _ t * y , d a t a _ t * y , c o e f _ t c [ 4 ] , c o e f _ t c [ 4 ] , d a t a _ t x d a t a _ t x F i n i t e S t a t e Ma c h i n e ) { F u n c t i o n S t a r t ) { ( F S M) s t a t e s s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; a c c _ t a c c ; i n t i ; i n t i ; 0 0 a c c = 0 ; a c c = 0 ; F o r - L o o p S t a r t l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { i f ( i = = 0 ) { i f ( i = = 0 ) { a c c + = x * c [ 0 ] ; 1 a c c + = x * c [ 0 ] ; 1 s h i f t _ r e g [ 0 ] = x ; s h i f t _ r e g [ 0 ] = x ; } e l s e { } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } 2 2 F o r - L o o p E n d } } * y = a c c ; * y = a c c ; F u n c t i o n E n d } } T h e l o o p s i n t h e C T h i s b e h a v i o r i s e x t r a c t e d c o d e c o r r e l a t e d t o i n t o a h a r d w a r e s t a t e ma - F r o m a n y C c o d e e x a m- s t a t e s o f b e h a v i o r c h i n e p l e . . Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 8
How does it work? - Datapath Extraction C o d e O p e r a t i o n s C o n t r o l & D a t a p a t h B e h a v i o r v o i d f i r ( Control Dataflow v o i d f i r ( d a t a _ t * y , d a t a _ t * y , c o e f _ t c [ 4 ] , c o e f _ t c [ 4 ] , RDx d a t a _ t x d a t a _ t x ) { ) { RDc >= s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; a c c _ t a c c ; - i n t i ; RDx RDc i n t i ; >= - == a c c = 0 ; a c c = 0 ; == - l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { + i f ( i = = 0 ) { i f ( i = = 0 ) { + * a c c + = x * c [ 0 ] ; a c c + = x * c [ 0 ] ; * s h i f t _ r e g [ 0 ] = x ; s h i f t _ r e g [ 0 ] = x ; + * } e l s e { } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; + s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; * a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } WRy } } WRy * y = a c c ; * y = a c c ; } } A u n i fi e d c o n t r o l d a t a fl o w b e - F r o m a n y C c o d e e x a m- O p e r a t i o n s a r e h a v i o r i s c r e a t e d . p l e . . e x t r a c t e d … S c h e d u l i n g + B i n d i n g Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 9
How Does it Work? - Scheduling & Binding ● Scheduling and Binding are at the heart of HLS ● Scheduling determines in which clock cycle an operation will occur – Takes into account the control, datafow and user directives – The allocation of resources can be constrained ● Binding determines which library cell is used for each operation – Takes into account component delays, user directives Technology Design Source Technology Design Source Library Library (C, C++, SystemC) (C, C++, SystemC) Scheduling Binding Scheduling Binding RTL RTL User User (Verilog, VHDL, SystemC) (Verilog, VHDL, SystemC) Directives Directives Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 10
How Does it Work? - Scheduling ● Operations are mapped into clock cycles, depending on timing, resources, user directives, ... a void foo ( void foo ( * * … … b t1 = a * b; t1 = a * b; + c + t2 = c + t1; t2 = c + t1; * d * t3 = d * t2; t3 = d * t2; out = t3 – e; out = t3 – e; out - e - } } Schedule 1 * * * + * - + - Wh e n a f a s t e r t e c h n o l o g y o r s l o w e r c l o c k . . . Schedule 2 * * * + * - + - Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 11
How Does it Work? - Allocation & Binding Operations are assigned to functional units available in the library Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 12
Recommend
More recommend