Will FPGA reconfiguration change the synthesis problem? Prof. Dirk - - PowerPoint PPT Presentation

will fpga reconfiguration change the synthesis problem
SMART_READER_LITE
LIVE PREVIEW

Will FPGA reconfiguration change the synthesis problem? Prof. Dirk - - PowerPoint PPT Presentation

Will FPGA reconfiguration change the synthesis problem? Prof. Dirk Stroobandt Ghent University, Belgium Hardware and Embedded Systems group Universiteit Gent Faculteit Ingenieurswetenschappen Vakgroep Elektronica en Informatiesystemen


slide-1
SLIDE 1 Will FPGA reconfiguration change the synthesis problem?
  • Prof. Dirk Stroobandt
Ghent University, Belgium Hardware and Embedded Systems group Universiteit Gent – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen – 11 December 2015
slide-2
SLIDE 2 Outline
  • What is Parameterized Run-time Reconfiguration?
  • The importance of the parameter choice
  • Effects on logic synthesis
2
slide-3
SLIDE 3 Outline
  • What is Parameterized Run-time Reconfiguration?
  • The importance of the parameter choice
  • Effects on logic synthesis
3
slide-4
SLIDE 4 FPGA Run-Time Reconfiguration?
  • Today: configurability on a large time scale
– Prototyping – System update – ...
  • We: configurability on a smaller time scale
– Dynamic circuit specialization
  • Frequently changing (regular) inputs vs. infrequently changing
parameters
  • Parameters trigger a reconfiguration (through configuration manager)
– Goals:
  • Improve performance
  • Reduce area
  • Minimize design effort
4
slide-5
SLIDE 5 Configuration Interface config. DB Configuration Manager Application Software Reconfiguration Request CPU config. DB F1 F2 Static Conventional Dynamic Reconfiguration FPGA F1 F2 config. DB Dynamic F1 F2 5
slide-6
SLIDE 6 Conventional Tool Flow F1 HDL Static HDL Design F2 HDL Synthesis Synthesis Synthesis Tech. Mapping Tech. Mapping Tech. Mapping Place & Route Place & Route Place & Route Static Config. F1 Config. F2 Config.

… …

6
slide-7
SLIDE 7 Dynamic Circuit Specialization not feasible!
  • Application where part of the input data changes
infrequently – Conventional implementation (no reconfiguration): Generic circuit, Store data in memory, Overwrite memory – Dynamic circuit specialization: Reconfigure with configuration specialized for the data
  • Example: Adaptive FIR filter (16-tap, 8-bit
coefficients) ... 2128 possible configurations! 7
slide-8
SLIDE 8 Our solution: Parameterized Configuration Parameterized Configuration { 0 1 0 A+B AB A 1 } * K. Bruneel and D. Stroobandt, “Automatic Generation of Run-time Parameterizable Configurations,” FPL 2008. 1 1 1 1 A B Specialized Configurations { 0 1 0 0 0 0 1 } { 0 1 0 1 0 0 1 } { 0 1 0 1 0 1 1 } { 0 1 0 1 1 1 1 } Parameters 8
slide-9
SLIDE 9 config. DB Configuration Manager Application Software Reconfiguration Request FPGA Configuration Interface config. DB CPU FIR Dynamic Circuit Specialization (micro-reconfiguration) FIR(4,9) Static Dynamic FIR FIR(2, 8) config. DB 9
slide-10
SLIDE 10 Two stage approach
  • Off-line stage:
– In: Generic functionality
  • Specification of the generic functionality
  • Distinction regular and parameter inputs
– Out: Parameterizable Configuration
  • Software function
  • outputs specialized configurations for given
parameter values
  • On-line stage:
– Evaluate parameterizable configuration – Out: Specialized Configuration – Repeat every time parameters change 10 Generic Functionality Off-line Stage On-line Stage Parameterizable Configuration Specialized Configuration
slide-11
SLIDE 11
  • Param. Configuration Tool Flow
11
  • Param. HDL
Synthesis*
  • Tech. Mapping*
Place* & Route*
  • Param. Config.
  • Tunable truth table bits
– Adapted Tech. Mapper: TMAP – Map to Tunable LUTs (TLUTs) – [FPL2008], [ReConFig2008], [DATE2009]
  • Tunable routing bits
– Adapted Tech. Mapper – Adapted Placer – Adapted Router
slide-12
SLIDE 12 Outline
  • What is Parameterized Run-time Reconfiguration?
  • The importance of the parameter choice
  • Effects on logic synthesis
12
slide-13
SLIDE 13 entity multiplexer is port(
  • -BEGIN PARAM
sel : in std_logic_vector(2 downto 0);
  • -END PARAM
in : in std_logic_vector(7 downto 0);
  • ut : out std_logic
); end multiplexer; architecture behaviour of multiplexer is begin
  • ut <= in(conv_integer(sel));
end behaviour; Parameterizable HDL design 13 in0 in1 in2 in3 sel0 sel1
  • ut
sel2 in4 in5 in6 in7
slide-14
SLIDE 14 Synthesis* Two types of inputs:
  • Regular inputs
  • Parameter inputs
14 A A A A O O A A O in4 in5 in6 in7 sel0 sel1 A A O
  • ut
sel2 A A A A O O A A O in0 in1 in2 in3 sel0 sel1
slide-15
SLIDE 15 Conventional technology mapping
  • Tech. Mapping:
Search for covering
  • f input circuit with
K-input subcircuits. 15 A A A A O O A A O in4 in5 in6 in7 sel0 sel1 A A O
  • ut
sel2 A A A A O O A A O in0 in1 in2 in3 sel0 sel1 K-input LUT (K=3): Can implement any Boolean function with up to K arguments.
slide-16
SLIDE 16 TMAP: Tunable LUT mapping 16 A A A A O O A A O in4 in5 in6 in7 sel0 sel1 A A O
  • ut
sel2 A A A A O O A A O in0 in1 in2 in3 sel0 sel1 Tunable LUT (TLUT) can implement any Boolean function with K regular inputs and any number of parameter inputs. Search covering with subcircuits that have up to K regular inputs and any number of parameter inputs.
slide-17
SLIDE 17 LUT structure and functionality 17 ) . . .( . . . 1 1 1 1 2 3 in sel in sel sel L sel L in sel in sel L      in0 in1 sel0 sel1 L1 L0 in2 in3
  • ut
L5 L4 L3 in4 in5 in6 in7 sel2

slide-18
SLIDE 18 Place and Route 18 in0 in1 L1 L0 in2 in3
  • ut
L5 L4 L3 in4 in5 in6 in7 sel0 sel1 sel2
slide-19
SLIDE 19 The reduced generation time (5 orders) – No NP-hard problems (place and route) at run-time – Only evaluation of the tuning functions Less memory (only 29kB) – TMAP flow finds similarity between configurations – Compressed form of all configurations Experiment: 16-tap FIR, 8-bit coefficients Generic Parameterizable configuration Specialized area (LUTs) 2999 1146 clock freq. (MHz) 84 119
  • gen. time (ms)
35634 memory (kB) 2128 conf. 19 1301 (-56%) 115 (+37%) 0.166 29 Less area (-56%) – More functionality in one TLUT – Functionality is moved to the tuning functions Higher clock frequency (+37%) – Less LUTs can be placed closer together – Less congestion because less nets
slide-20
SLIDE 20 When should we use parameterized reonfiguration? Use the Functional Density as a measure for implementation efficiency. =
A: The area needed T: The total execution time N: The number of operations *A. M. Dehon, Reconfigurable architectures for general- purpose computing, Massachusetts Institute of Technology, 1996. 20
slide-21
SLIDE 21 Parameter Selection
  • Avg. Time between parameter changes (clock cycles)

Fu n c t io n a l D e n sit y ( O p s/ s/ L U T s)

21 Profiler to trade off gain versus overhead of reconfiguration
slide-22
SLIDE 22 Outline
  • What is Parameterized Run-time Reconfiguration?
  • The importance of the parameter choice
  • Effects on logic synthesis
22
slide-23
SLIDE 23 Original logic synthesis solution (3-input LUT) 23 A A A A O O A A O in4 in5 in6 in7 sel0 sel1 A A O
  • ut
sel2 A A A A O O A A O in0 in1 in2 in3 sel0 sel1
slide-24
SLIDE 24 Making subtrees according to K regular inputs 24 A O
  • ut
sel2 A A A A O in5 sel1 in4 sel0 A A A A O A O in0 in1 sel1 in2 sel0 A sel2 A A A A O A O in7 in6 sel2 in3 sel0 A sel1 O
slide-25
SLIDE 25 Separate parameters from other inputs 25 O
  • ut
O O A A A O O in0 in1 sel in2 sel A sel in3 sel A A A O O in7 in6 sel in5 sel A sel in4 sel
slide-26
SLIDE 26 Changing the tree depth 26 O
  • ut
O A A A O O in0 in1 sel in2 sel A sel in3 sel A A A in7 in6 sel in5 sel A sel in4 sel O O O
slide-27
SLIDE 27 Conclusions
  • Parameterized reconfiguration opens up new
  • ptimization possibilities using run-time reconfiguration
  • Parameters are to be treated differently in Technology
Mapping
  • Therefore parameters and regular inputs should be
treated differently in logic synthesis
  • Cost of parameter calculations (Boolean functions)
should also be taken into account
  • New challenge in synthesis
27
slide-28
SLIDE 28 Submit to IWLS 28 Paper abstract sumission: March 11, 2016 www.iwls.org
slide-29
SLIDE 29 Last slide
  • Much of this work was done in the framework of the EU-
FP7 project FASTER and is now continued in the EU- H2020 project (FETHPC) EXTRA
  • Tools at https://github.com/UGent-HES/tlut_flow
  • Questions?
  • More information: http://hes.elis.ugent.be/
29