Synthesis Challenges for Next- Synthesis Challenges for Next- - - PowerPoint PPT Presentation
Synthesis Challenges for Next- Synthesis Challenges for Next- - - PowerPoint PPT Presentation
Synthesis Challenges for Next- Synthesis Challenges for Next- Generation High-Performance and Generation High-Performance and High-Density PLDs High-Density PLDs Jason Cong Songjie Xu Jason Cong Songjie Xu Department of Computer Science
Slide 2
Outline Outline
N Introduction N Synthesis Challenges for New
Architectures
N Synthesis Challenges for High Density
and High Performance
N Concluding Remarks N Introduction N Introduction
Slide 3
PLD Industry Growth PLD Industry Growth
N Enjoyed the exponential growth as the rest of the
semiconductor industry
N With an even faster rate
I nt roduct ion
27.78% 36.07% 24.50%
15.71%
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00%
Annual Growth Rate (1994-1998)
Company/Industry
Semiconductor Industry Altera Intel LSI Logic
Slide 4
Definitions Definitions
N PLD (Programmable Logic Device)
T CPLD (Complex PLD)
T Extensions of early PAL T Consist of PLA-like blocks T Macrocell
T FPGA (Field Programmable Gate Array)
T Typically based on look-up tables (LUTs) T Multiple LUTs form a programmable logic block (PLB)
I nt roduct ion
Slide 5
CPLD CPLD
N Example: Altera MAX 7000
I nt roduct ion
Slide 6
Macrocell Macrocell
N Example: Altera MAX 7000
T Each macrocell has a logic array, a product-term
select matrix, and a programmable register
I nt roduct ion
Slide 7
Definitions Definitions
N PLD (Programmable Logic Device)
T CPLD (Complex PLD)
T Extensions of early PAL T Consist of PLA-like blocks T Macrocell
T FPGA (Field Programmable Gate Array)
T Typically based on look-up tables (LUTs) T Multiple LUTs form a programmable logic block (PLB)
I nt roduct ion
Slide 8
FPGA FPGA
N Example: Xilinx XC 4000
I nt roduct ion
Slide 9
PLB PLB
N Xilinx XC 4000
T Each PLB has two 4-LUTs, one 3-LUT and 2 FFs
I nt roduct ion
Slide 10
Advance of PLD Architectures Advance of PLD Architectures
I nt roduct ion
1980’s 1998/1999 Altera
MAX 5000:
32-192 P-terms 600-3,750 usable gates
APEX 20K:
51,840 Logic elements (LUTs) 442,368 RAM bits 3,456 P-term macrocells 60,000-1.5M usable gates
Xilinx
XC 2000:
64-100 LUTs 1,200-1,800 logic gates
Virtex:
58K-4M system gates 1Mb distributed RAM 832Kb embedded memory
1980’s 1998/1999 Altera
MAX 5000:
32-192 P-terms 600-3,750 usable gates
APEX 20K:
51,840 Logic elements (LUTs) 442,368 RAM bits 3,456 P-term macrocells 60,000-1.5M usable gates
Xilinx
XC 2000:
64-100 LUTs 1,200-1,800 logic gates
Virtex:
58K-4M system gates 1Mb distributed RAM 832Kb embedded memory
Slide 11
PLD Synthesis Tends to Fall Behind ... PLD Synthesis Tends to Fall Behind ...
N Additional features and capabilities in the new
architecture often place new requirements for synthesis tools
N Higher density and higher performance demand
better scalability and more efficient optimization
N Devil is always in the software …
T Tool effort is often being underestimated T Quick customization from ASIC or existing PLD
synthesis tool leads to considerably inferior results
T Software is often the bottleneck of new PLD product
release ... I nt roduct ion
Slide 12
Challenges to PLD Synthesis Challenges to PLD Synthesis
N Support for new PLD architectures
T Hierarchical architectures T Heterogeneous architectures
N Support for high-performance and high-
density PLD designs
T Layout-driven synthesis T Incremental synthesis T IP-based synthesis
I nt roduct ion
Slide 13
Outline Outline
N Introduction N Synthesis Challenges for New
Architectures
N Synthesis Challenges for High Density
and High Performance
N Concluding Remarks N Synthesis Challenges for New
Architectures
N Synthesis Challenges for New
Architectures
Slide 14
PLD Architecture Development PLD Architecture Development
N Two important trends
T Hierarchical architectures T Heterogeneous architectures
N Synthesis needs
Synt hesis Challenges f or New Archit ect ures
Slide 15
PLD Architecture Development Trend ……
Hierarchical Architectures
PLD Architecture Development Trend ……
Hierarchical Architectures
N Basic Idea
T Group of basic logic blocks into clusters T Fast local programmable interconnects inside
clusters
T May have multiple levels of hierarchy
N Benefits
T Exploit the inherent locality of interconnections
in most applications
T Lead to the improvement in both performance
and density
Synt hesis Challenges f or New Archit ect ures
Slide 16
Example Hierarchical Architectures Example Hierarchical Architectures
N Altera FLEX 10K
T Each LAB has 8 LEs T Each LE has a 4-LUT and a programmable register
Synt hesis Challenges f or New Archit ect ures
Slide 17
Two Types of Clusters Two Types of Clusters
N Hard-wired connection based cluster (HCC)
T Intra-cluster connection is formed by hard wires T e.g. CLB in XC4000
N Programmable interconnection based cluster (PIC)
T Intra-cluster connection is formed by a local
programmable interconnection array
T e.g. LAB in FLEX 10K and APEX 20K
Synt hesis Challenges f or New Archit ect ures
Slide 18
Existing Synthesis Results for HCC Existing Synthesis Results for HCC
N Traditional approach T Map into LUTs and then combine the LUTs to
form HCCs in a heuristic post-processing step
N Recent advance [Cong & Hwang, FPGA’97]
T Use Boolean matching techniques to completely
characterize the set of functions that can be implemented in a HCC
T Map a netlist directly into HCCs
Synt hesis Challenges f or New Archit ect ures
Slide 19
Hard-Wired Connection Based Clusters (HCCs) Hard-Wired Connection Based Clusters (HCCs)
N Example: Xilinx XC 4000 CLB
T Each CLB has two 4-LUTs connected to a 3-LUT
Synt hesis Challenges f or New Archit ect ures
Slide 20
N Characterization based on functional
decomposition
T f (X) = H ( F (X1) , G (X2) ), T f(X) = H ( F (X1) , G (X2) , x ), T f(X) = H (F(X1,x), G(X2), x ), T f(X) = H (F(X1,x), G(X2,x), x ).
N Conditions
T F and G input sizes ≤ 4
N Result: matched all “difficult
examples” (over 1,700) from Xilinx
T Best known tool produced only about 70%
match
XC4K CLB
G F H
x
f(X)
Example: Boolean Matching for XC4K CLB Example: Boolean Matching for XC4K CLB
Synt hesis Challenges f or New Archit ect ures
Slide 21
Example: Mapping to XC4K CLB Example: Mapping to XC4K CLB
J Given a function f(0,1,2,3,4,5) where
a = 1’ + 3, b = 1 + 3 f = 0’245b’ + 0’245’b + 0’145b + 012’5’a + 0’2’4’5a + 025b + 0’2’5’a’ + 045a’ + 05’b’
J How many XC4K CLBs are needed to
implement f(0,1,2,3,4,5) ?
Synt hesis Challenges f or New Archit ect ures
Slide 22
Mapping Packing #CLBs #Levels Chortle-crf simple 9 4 FlowMap simple 8 3 FlowMap functional 6 3 Boolean 1 1
G F H
3 1 2 5 4
The Boolean matching result
Example: Mapping to XC4K CLB (Cont’d) Example: Mapping to XC4K CLB (Cont’d)
Synt hesis Challenges f or New Archit ect ures
Slide 23
Programmable Interconnection Based Cluster (PIC) Programmable Interconnection Based Cluster (PIC)
N Example: Altera APEX 20K T Each LAB has 10 LEs (LUT + FF) connected
through a fully programmable matrix
Synt hesis Challenges f or New Archit ect ures
Slide 24
Existing Synthesis Results for PIC Existing Synthesis Results for PIC
N Common approaches
T Map into basic logic blocks and then group the
them into clusters under size and pin constraints
T Recent progress on circuit clustering
T Performance driven clustering for combinational
circuits [Lawler’69] [Yang & Wong, T-CAD’97]
T Simultaneous clustering with retiming for sequential
circuits [Pan, et al, T-CAD’98][Cong, et al, DAC’99] Synt hesis Challenges f or New Archit ect ures
Slide 25
Benefits of Considering Retiming during Clustering Benefits of Considering Retiming during Clustering
N
Proper clustering allows retiming to hide inter-cluster delays (E.g., assume gate_delay = 1, inter_cluster_delay = 2)
Φ=8
retiming cannot help
Φ=6
retiming reduces delay same cutsize
Φ=8
Clustering A
Φ=8
Clustering B
Slide 26
Major Challenge in Synthesis for Hierarchical Architectures Major Challenge in Synthesis for Hierarchical Architectures
N Can we synthesize a design directly into a
multi-level hierarchical architecture?
T Most existing PLD synthesis algorithms
transform a given design into a flat netlist of basic PLBs and then go through a separate clustering/partitioning step.
T Very few consider synthesizing directly for
hierarchical architectures
Synt hesis Challenges f or New Archit ect ures
Slide 27
PLD Architecture Development Trend ……
Heterogeneous Architectures
PLD Architecture Development Trend ……
Heterogeneous Architectures
N Three types of heterogeneous architectures
T Type 1: Multiple sizes and/or configurations of
the same type of logic blocks
T e.g. ORCA 2C, VF1, XC4000
T Type 2: Multiple types of logic blocks
T LUTs, macrocells, and MUXes T e.g. APEX 20K
T Type 3: Different kinds of resources on the same
chip
T Programmable logic blocks T Embedded memory blocks (EMBs) T Embedded processors
Synt hesis Challenges f or New Archit ect ures
Slide 28
Type 1 Heterogeneous Architectures Type 1 Heterogeneous Architectures
N Example: Xilinx XC 4000
T Each CLB can implement two 4-LUTs or one 5-LUT
Synt hesis Challenges f or New Archit ect ures
Slide 29
Synthesis Results for Type 1 Heterogeneous Architectures Synthesis Results for Type 1 Heterogeneous Architectures
N Area minimization
T [He & Rose, FPGA’94] T [Korupolu, et al, DAC’98] T [Cong, Ding & Wu, FPGA’99]
N Delay minimization
T HeteroMap [Cong & Xu, DAC’98]
T Delay optimal polynomial-time algorithm
N Evaluation results show
T Heterogeneous architectures are superior to
homogeneous ones for both area and delay
T “One size fits all” doesn’t produce best results.
Synt hesis Challenges f or New Archit ect ures
0.5 1 1.5 2 2.5 Mapping-Delay MemoryCell-Area 3-LUT-FPGA 4-LUT-FPGA 5-LUT-FPGA 6-LUT-FPGA 3-4-5-6-LUT- HeteroFPGA
Delay(3-LUT) : Delay(4-LUT) : Delay(5-LUT) : Delay(6-LUT) = 1 : 1.3 : 1.7 : 2 Area(3-LUT) : Area(4-LUT) : Area(5-LUT) : Area(6-LUT) = 1 : 2 : 4 : 8
Architecture Evaluation—
Homogeneous vs. Heterogeneous FPGAs
Architecture Evaluation—
Homogeneous vs. Heterogeneous FPGAs
Synt hesis Challenges f or New Archit ect ures
Delay(3-LUT) : Delay(4-LUT) : Delay(5-LUT) : Delay(6-LUT) = 1 : 1.3 : 1.7 : 2 Area(3-LUT) : Area(4-LUT) : Area(5-LUT) : Area(6-LUT) = 1 : r : r2 : r3
50000 100000 150000 200000 250000 300000 350000 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 r Area x Delay x Delay 3-LUT 4-LUT 5-LUT 6-LUT 3-4-5-6-LUT 50000 100000 150000 200000 250000 300000 350000 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 r Area x Delay x Delay 3-LUT 4-LUT 5-LUT 6-LUT 3-4-5-6-LUT
“AT2-Metric” for Homogeneous and Heterogeneous FPGAs “AT2-Metric” for Homogeneous and Heterogeneous FPGAs
Synt hesis Challenges f or New Archit ect ures
Slide 32
Type 2 Heterogeneous Architectures Type 2 Heterogeneous Architectures
N An example: Altera APEX 20K
T Embedded system blocks (ESB) can implement dual-
port RAM, ROM, FIFO, CAM blocks, and P-term logic
T In P-term mode, each ESB has 16 macrocells
T Each macrocell has two P-terms
Synt hesis Challenges f or New Archit ect ures
Slide 33
Synthesis for Type 2 Heterogeneous Architectures Synthesis for Type 2 Heterogeneous Architectures
N Very little work N Preliminary study for a hybrid architecture
- f LUTs and Pterm blocks [Kaviani, Ph.D.
thesis’99]
T Use a greedy approach for hybrid mapping
T Use LUTs for density optimization T Use Pterm blocks for performance optimization
Synt hesis Challenges f or New Archit ect ures
Slide 34
Type 3 Heterogeneous Architectures Type 3 Heterogeneous Architectures
N An example: FLEX 10K (logic array + embedded
memory blocks (EMBs))
T 576 to 12,160 LEs T 3 to 20 embedded array blocks (EABs)
T Each EAB has 2K bits (11x1, 10x2, 9x4, 8x8)
Synt hesis Challenges f or New Archit ect ures
Slide 35
Field-Programmable System-on-a-Chip (FPSOC) Field-Programmable System-on-a-Chip (FPSOC)
processor memory
Programmable Logic
General-Purpose FPSOC
processor memory
Programmable Logic
ASIC
Application-specific FPSOC
Synt hesis Challenges f or New Archit ect ures
Slide 36
Synthesis for Type 3 Heterogeneous Architectures Synthesis for Type 3 Heterogeneous Architectures
N Explore logic implementation using EMBs
T Area minimization
T EMB_Pack [Cong & Xu, FPGA’98]
T With Delay constraint
T SMAP [Wilton, FPGA’98]
T Delay minimization
T [Cong & Xu, ICCAD’98]
N The general synthesis problem for FPSOC
is largely untouched
Synt hesis Challenges f or New Archit ect ures
Slide 37
Synthesis Needs for FP-SOC Synthesis Needs for FP-SOC
N Partition the design/application to
heterogeneous resources. E.g.
T Software/hardware partitioning T Memory/logic partitioning
N Efficient use of each type of resources. E.g.
T Code generation for embedded CPUs T Automatic synthesis for FPGA
N Scheduling & synchronization of various
- components. E.g.
T Real-time O/S
N Trade-off between heterogeneous resources N Support for IP integration
Synt hesis Challenges f or New Archit ect ures
Slide 38
Outline Outline
N Introduction N Synthesis Challenges for New
Architectures
N Synthesis Challenges for High Density
and High Performance
N Concluding Remarks N Synthesis Challenges for High Density
and High Performance
N Synthesis Challenges for High Density
and High Performance
Slide 39
Important Synthesis Problems Important Synthesis Problems
N Layout-driven synthesis N Incremental synthesis N IP-based design
Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 40
Layout-Driven Synthesis Layout-Driven Synthesis
N Scaling of IC feature size [NTRS’97]
T Interconnect delay becomes more and more dominant in
the overall circuit delay
N FPGA design
T Interconnect delay has always been very significant (due
to programmable switches)
N Layout design has a significant impact on
performance
N Synthesis needs to consider impact on layout
Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 41
Logic v.s. Local Interconnect v.s. Global Interconnect Delay Logic v.s. Local Interconnect v.s. Global Interconnect Delay
Delay Resource Delay Value (ns) Logic Element (LE) 2.4 Local Inerconnect 0.5 Row Interconnect 4.7 Column Interconnect 7.2
Altera FLEX8K part
Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 42
Delay Distribution Delay Distribution
Logic 30% Local Interconnect 9% Global Interconnect 61%
Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 43
Challenges and Opportunities for Layout- Driven Synthesis Challenges and Opportunities for Layout- Driven Synthesis
N Challenges:
T Interconnect design is not finalized until after placement
and routing
T Both synthesis and layout are highly complex. How to
properly combine them without complexity explosion?
N Opportunities: substantial performance gain
T Example: Mapping with consideration of fast
interconnections (cascade chains) Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 44
Comparison between FlowMap and Fast Interconnection Mapping Comparison between FlowMap and Fast Interconnection Mapping
0.2 0.4 0.6 0.8 1 1.2 1.4 Mapping-Delay #4-LUT
FlowMap (K=4) Fast Interconnect Mapping
- 34%
+24%
- Delay Assumption: 4-LUT fast pin delay = 0.7ns
4-LUT slow pin delay = 2.7 ns fast interconnect delay = 0.2 ns general interconnect delay = 4.1 ns
- LUT fast interconnect is connected to the fast pin
Slide 45
Comparison between FlowMap and Fast Interconnection Mapping (Cont’d) Comparison between FlowMap and Fast Interconnection Mapping (Cont’d)
0.2 0.4 0.6 0.8 1 Mapping-Delay #4-LUT
FlowMap (K=4) Fast Interconnect Postprocessing
- 21%
+0%
- Delay Assumption:
4-LUT fast pin delay = 0.7ns 4-LUT slow pin delay = 2.7 ns fast interconnect delay = 0.2ns general interconnect delay = 4.1 ns
- LUT fast interconnect is connected to the fast pin
Slide 46
Incremental Synthesis Incremental Synthesis
N Motivation
T The PLD designs are getting more complex T All design process is iterative/incremental T Resynthesizing the entire large design is not
acceptable with consideration of multiple design iterations
T The highly incremental design process
requires fast incremental synthesis capabilities
Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 47
Requirements on Incremental Synthesis Requirements on Incremental Synthesis
N Preservability
T Preserve as much information as possible from
the existing synthesis solution
N Efficiency
T A faster synthesis system will enable more
design iterations and shorten the overall design time
N Quality of the synthesis solution
T Delay, area, etc. should be as close as possible
to that by complete re-synthesis
Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 48
Status on Incremental Synthesis Status on Incremental Synthesis
N Very few works
T ECO [Kukimoto & Fujita, ICCAD’92]
T No structural change is allowed T Only functional change is allowed
T Incremental mapping [Cong & Hui, DAC’2000]
T Preserve optimal mapping depth T Achieve over 300X speed-up for circuits of about
100,000 gates compared to re-mapping by FlowMap
N Much more work is needed in this area
Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 49
IP-Based Design IP-Based Design
N Motivation
T Design reuse to improve productivity T Better performance and density
N Example:
T Altera IP MegaStore
Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 50
Slide 51
Requirements on IP-Based Design Requirements on IP-Based Design
N IP representation -- should allow migration
between
T Different FPGA vendors T Different FPGA generations
N Characterization
T functionality T performance
N Interface with synthesis tools
T automatic inference/instantiation T optimization and constraint propagation T simulation and verification
N IP protection
T How to prevent un-authorized use? T E.g. Embed watermarks in FPGA mapping solutions
[Kirovski, et al, ICCAD’98] Synt hesis Challenges f or High Densit y and High P erf ormance
Slide 52
Outline Outline
N Introduction N Synthesis Challenges for New
Architectures
N Synthesis Challenges for High Density
and High Performance
N Concluding Remarks N Concluding Remarks N Concluding Remarks
Slide 53
Concluding Remarks Concluding Remarks
N PLD market is going through a rapid expansion N PLD synthesis is facing many new challenges
T Support for new PLD architectures
T Hierarchical architectures T Heterogeneous architectures
T Support for high-performance and high-density PLD
designs
T Layout-driven synthesis T Incremental synthesis T IP-based synthesis
N Many research and business opportunities
T UCLA VLSI CAD Laboratory T Aplus Design Technologies, Inc.
Concluding Remarks
Slide 54
PLD Synthesis Research at UCLA PLD Synthesis Research at UCLA
N Advanced synthesis algorithms
T Synthesis for heterogeneous architectures T Synthesis for sequential circuits with simultaneous mapping,
retiming, and pipelining
T Layout-driven synthesis T IP-based synthesis T Synthesis/compilation techniques for FPSOC … T Software prototype: RASP system
N Architecture evaluation
T Evaluation of PLB architecture T Evaluation of heterogeneous architectures T Evaluation of hierarchical architectures … T Software prototype: fpgaEva tool
N URL: http://cadlab.cs.ucla.edu/~xfpga
Concluding Remarks
Slide 55
UCLA RASP Synthesis System for LUT-Based FPGAs UCLA RASP Synthesis System for LUT-Based FPGAs
EDIF
netlist
HDL design
Internal netlist
LUT Mapping Engine
LUT netlist
PLB Mapping Engine
Vendor Specific netlist Xilinx, Altera, ORCA
Placement Routing
Chip Programming Information
Concluding Remarks
Slide 56
FPGA Architecture Evaluation FPGA Architecture Evaluation
Concluding Remarks
Slide 57
Aplus Design Technologies, Inc. Aplus Design Technologies, Inc.
N
A new start-up in PLD synthesis
T Based in Los Angeles (near UCLA)
N
Objective: provide Advanced Programmable Logic Unified Solution (APLUS)
T Unify architecture and synthesis T Unify synthesis and layout
N
Products & Services
T Next generation synthesis tool for high-density, high-
performance PLDs
T Architecture evaluation tool kits and services
N
Has already established strategic partnership with several major PLD vendors
N
URL: http://www.aplus-dt.com Concluding Remarks
THANK YOU!
J . Cong and S. Xu
Slide 59