I n t r o d u c t i o n t o H i g h - l e v e - PowerPoint PPT Presentation

J o i n t I C T P - I A E A S c h o o l o n Z y n q - 7 0 0 0 S o C a n d i t s A p p l i c a t i o n s f o r N u c l e a r a n d R e l a t e d I n s t r u me n t a t i o n I n t r o d u c t i o n t o H i g h - l e v e l S y n t h e s i s Fernando Rincón fernando.rincon@uclm.es Smr3143 – ICTP & IAEA (Aug. & Sept. 2017)

Contents ● What is High-level Synthesis? ● Why HLS? ● How Does it Work? ● HLS Coding ● An example: Matrix Multiplication ● Validation Flow ● RTL Export ● Design analysis Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 2

What is High-level Synthesis? ● Compilation of behavioral algorithms into RTL descriptions B e h a v i o r a l D e s c r i p t i o n Algorithm Constraints I/O description H L S Timing Operations Extraction Memory Micro-architecture evaluation R T L D e s c r i p t i o n Control & datapath extraction Finite State Machine Datapath Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 3

Why HLS? ● Need for productivity improvement at design level – Design Space Exploration – Reduced Time-to-market – Trend to use FPGAs as Hw accelerators ● Electronic System Level Design is based in – Hw/Sw Co-design ● SystemC / SystemVerilog ● Transaction-Level Modelling – One common C-based description of the system – Iterative refnement – Intregration of models at a very diferent level of abstraction – But need an efcient way to get to the silicon ● Rising the level of abstraction enables Sw programmers to have access to silicon Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 4

Why HLS? Video Design Example RTL (Spec) RTL (Sim) RTL (Spec) RTL (Sim) Input C Simulation Time RTL Simulation Time Improvement 10 frames 10s ~2 days ~12000x C (Spec/Sim) RTL (Sim) C (Spec/Sim) RTL (Sim) 1280x720 (ModelSim) Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 5

HLS Benefts ● Design Space Exploration – Early estimation of main design variables: latency, performance, consumption – Can be targeted to diferent technologies ● Verifcation – Reuse of C-based testbenches – Can be complemented with formal verifcation ● Reuse – Higher abstraction provides better reuse opportunities – Cores can be exported to diferent bus technologies Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 6

Design Space Exploration … … l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { i f ( i = = 0 ) { i f ( i = = 0 ) { a c c + = x * c [ 0 ] ; a c c + = x * c [ 0 ] ; s h i f t _ r e g [ 0 ] = x ; s h i f t _ r e g [ 0 ] = x ; } e l s e { } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } } S s a me h a r d w a r e i s u s e d f o r e a c h l o o p i t e r a t i o n : } D i ff e r e n t h a r d w a r e f o r e a c h l o o p i t e r a t i o n : D i ff e r e n t i t e r a t i o n s e x e c u t e d c o n c u r r e n t l y : • • S ma l l a r e a • … . H i g h e r a r e a H i g h e r a r e a … . • • L o n g l a t e n c y • S h o r t l a t e n c y S h o r t l a t e n c y • • L o w t h r o u g h p u t • B e t t e r t h r o u g h p u t B e s t t h r o u g h p u t Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 7

How Does it Work? - Control Extraction C o d e C o n t r o l B e h a v i o r v o i d f i r ( v o i d f i r ( d a t a _ t * y , d a t a _ t * y , c o e f _ t c [ 4 ] , c o e f _ t c [ 4 ] , d a t a _ t x d a t a _ t x F i n i t e S t a t e Ma c h i n e ) { F u n c t i o n S t a r t ) { ( F S M) s t a t e s s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; a c c _ t a c c ; i n t i ; i n t i ; 0 0 a c c = 0 ; a c c = 0 ; F o r - L o o p S t a r t l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { i f ( i = = 0 ) { i f ( i = = 0 ) { a c c + = x * c [ 0 ] ; 1 a c c + = x * c [ 0 ] ; 1 s h i f t _ r e g [ 0 ] = x ; s h i f t _ r e g [ 0 ] = x ; } e l s e { } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } 2 2 F o r - L o o p E n d } } * y = a c c ; * y = a c c ; F u n c t i o n E n d } } T h e l o o p s i n t h e C T h i s b e h a v i o r i s e x t r a c t e d c o d e c o r r e l a t e d t o i n t o a h a r d w a r e s t a t e ma - F r o m a n y C c o d e e x a m- s t a t e s o f b e h a v i o r c h i n e p l e . . Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 8

How does it work? - Datapath Extraction C o d e O p e r a t i o n s C o n t r o l & D a t a p a t h B e h a v i o r v o i d f i r ( Control Dataflow v o i d f i r ( d a t a _ t * y , d a t a _ t * y , c o e f _ t c [ 4 ] , c o e f _ t c [ 4 ] , RDx d a t a _ t x d a t a _ t x ) { ) { RDc >= s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; s t a t i c d a t a _ t s h i f t _ r e g [ 4 ] ; a c c _ t a c c ; a c c _ t a c c ; - i n t i ; RDx RDc i n t i ; >= - == a c c = 0 ; a c c = 0 ; == - l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { l o o p : f o r ( i = 3 ; i > = 0 ; i - - ) { + i f ( i = = 0 ) { i f ( i = = 0 ) { + * a c c + = x * c [ 0 ] ; a c c + = x * c [ 0 ] ; * s h i f t _ r e g [ 0 ] = x ; s h i f t _ r e g [ 0 ] = x ; + * } e l s e { } e l s e { s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; + s h i f t _ r e g [ i ] = s h i f t _ r e g [ i - 1 ] ; a c c + = s h i f t _ r e g [ i ] * c [ i ] ; * a c c + = s h i f t _ r e g [ i ] * c [ i ] ; } } WRy } } WRy * y = a c c ; * y = a c c ; } } A u n i fi e d c o n t r o l d a t a fl o w b e - F r o m a n y C c o d e e x a m- O p e r a t i o n s a r e h a v i o r i s c r e a t e d . p l e . . e x t r a c t e d … S c h e d u l i n g + B i n d i n g Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 9

How Does it Work? - Scheduling & Binding ● Scheduling and Binding are at the heart of HLS ● Scheduling determines in which clock cycle an operation will occur – Takes into account the control, datafow and user directives – The allocation of resources can be constrained ● Binding determines which library cell is used for each operation – Takes into account component delays, user directives Technology Design Source Technology Design Source Library Library (C, C++, SystemC) (C, C++, SystemC) Scheduling Binding Scheduling Binding RTL RTL User User (Verilog, VHDL, SystemC) (Verilog, VHDL, SystemC) Directives Directives Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 10

How Does it Work? - Scheduling ● Operations are mapped into clock cycles, depending on timing, resources, user directives, ... a void foo ( void foo ( * * … … b t1 = a * b; t1 = a * b; + c + t2 = c + t1; t2 = c + t1; * d * t3 = d * t2; t3 = d * t2; out = t3 – e; out = t3 – e; out - e - } } Schedule 1 * * * + * - + - Wh e n a f a s t e r t e c h n o l o g y o r s l o w e r c l o c k . . . Schedule 2 * * * + * - + - Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 11

How Does it Work? - Allocation & Binding Operations are assigned to functional units available in the library Introduction to High-level Synthesi Smr3143 – ICTP & IAEA (Aug. & Sept. 2017) 12

I n t r o d u c t i o n t o H i g h - l e v e - PowerPoint PPT Presentation

J o i n t I C T P - I A E A S c h o o l o n Z y n q - 7 0 0 0 S o C a n d i t s A p p l i c a t i o n s f o r N u c l e a r a n d R e l a t e d I n s t r u me n t a t

Classification Problems From Regression to Classification x } Suppose we have two classes of

PREVENTING THE THREATS OF TOMORROW AND BEYOND Jonathan Kaftzan VP Product marketing &

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for

On the Algorithmic Power of Spiking Neural Networks Chi-Ning Chou Kai-Min Chung Chi-Jen Lu

Chapter 8. Support Vector Machines Wei Pan Division of Biostatistics, School of Public Health,

and 611 + for small : Integer factorization Sieving 1 612 2 2 3 3 D. J. Bernstein

Summary 1 Things you should know now: Basic ideas about databases and DBMSs What is a data

Database Design and Programming Peter Schneider-Kamp DM 505, Spring 2009, 3 rd Quarter 1 Course

All your GPS Trackers belong to Us 1 Who we are Pierre Barre, Lead Security Researcher,

TOW ARD ZERO-EMI SSI ON FREI GHT AT SOUTHERN CALI FORNI A'S PORTS PROSPECTS, PITFALLS &

The Quake-Catcher Network: A Distributed Computing Seismic Network Elizabeth S. Cochran UC

MISSO The Premier Business IT Student Organization Welcome to Orientation! Create internship and

Greatest Internet Mystery That you probably dont know about. @iBotPeaches Story Time

Types of games deterministic chance perfect information chess, checkers, backgammon go,

Energy Storage Technologies for Grid- Connected and Off-Grid Power System Applications By By

NY Energy Storage Initiative Update Panelists: H.G. Chissell, Viridity Energy , Senior Vice

Testing ColdADC ASICs at UF Ivan Furic, UF Shanshan Gao, BNL Things we learned even before

Understanding Sources of Inefficiency in General-Purpose Chips Rehan Hameed Wajahat Qadeer

Chip-Firing and Algebraic Combinatorics Caroline J. Klivans Brown University Chip-Firing

MORC A MANYCORE ORIENTED COMPRESSED CACHE TRI M. NGUYEN, DAVID WENTZLAFF 12/7/2015 1

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology Mikkel B. Stensgaard and Jens

ASICs and Front-End Motherboards FEMB Plans (and other topics) Marco Verzocchi Fermilab 5

On-Chip Communications Somayyeh Koohi Department of Computer Engineering Sharif University of

Small is beautiful 1. Cramming More Components Onto Integrated Circuits, G.E. Moore, 1965 Where

I n t r o d u c t i o n t o H i g h - l e v e - PowerPoint PPT Presentation

J o i n t I C T P - I A E A S c h o o l o n Z y n q - 7 0 0 0 S o C a n d i t s A p p l i c a t i o n s f o r N u c l e a r a n d R e l a t e d I n s t r u me n t a t

Classification Problems From Regression to Classification x } Suppose we have two classes of

PREVENTING THE THREATS OF TOMORROW AND BEYOND Jonathan Kaftzan VP Product marketing &amp;

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for

On the Algorithmic Power of Spiking Neural Networks Chi-Ning Chou Kai-Min Chung Chi-Jen Lu

Chapter 8. Support Vector Machines Wei Pan Division of Biostatistics, School of Public Health,

and 611 + for small : Integer factorization Sieving 1 612 2 2 3 3 D. J. Bernstein

Summary 1 Things you should know now: Basic ideas about databases and DBMSs What is a data

Database Design and Programming Peter Schneider-Kamp DM 505, Spring 2009, 3 rd Quarter 1 Course

All your GPS Trackers belong to Us 1 Who we are Pierre Barre, Lead Security Researcher,

TOW ARD ZERO-EMI SSI ON FREI GHT AT SOUTHERN CALI FORNI A'S PORTS PROSPECTS, PITFALLS &amp;

The Quake-Catcher Network: A Distributed Computing Seismic Network Elizabeth S. Cochran UC

MISSO The Premier Business IT Student Organization Welcome to Orientation! Create internship and

Greatest Internet Mystery That you probably dont know about. @iBotPeaches Story Time

Types of games deterministic chance perfect information chess, checkers, backgammon go,

Energy Storage Technologies for Grid- Connected and Off-Grid Power System Applications By By

NY Energy Storage Initiative Update Panelists: H.G. Chissell, Viridity Energy , Senior Vice

Testing ColdADC ASICs at UF Ivan Furic, UF Shanshan Gao, BNL Things we learned even before

Understanding Sources of Inefficiency in General-Purpose Chips Rehan Hameed Wajahat Qadeer

Chip-Firing and Algebraic Combinatorics Caroline J. Klivans Brown University Chip-Firing

MORC A MANYCORE ORIENTED COMPRESSED CACHE TRI M. NGUYEN, DAVID WENTZLAFF 12/7/2015 1

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology Mikkel B. Stensgaard and Jens

ASICs and Front-End Motherboards FEMB Plans (and other topics) Marco Verzocchi Fermilab 5

On-Chip Communications Somayyeh Koohi Department of Computer Engineering Sharif University of

Small is beautiful 1. Cramming More Components Onto Integrated Circuits, G.E. Moore, 1965 Where

PREVENTING THE THREATS OF TOMORROW AND BEYOND Jonathan Kaftzan VP Product marketing &

TOW ARD ZERO-EMI SSI ON FREI GHT AT SOUTHERN CALI FORNI A'S PORTS PROSPECTS, PITFALLS &