ASPDAC'01 Tutorial Jason Cong 1
New Approaches to Harness Global Interconnects Jason Cong Computer - - PowerPoint PPT Presentation
New Approaches to Harness Global Interconnects Jason Cong Computer - - PowerPoint PPT Presentation
PART V New Approaches to Harness Global Interconnects Jason Cong Computer Science Department University of California at Los Angeles Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong ASPDAC'01 Tutorial Jason Cong 1
ASPDAC'01 Tutorial Jason Cong 2
Part V Outline
I I Interconnect
Interconnect-
- Centric Design Flow
Centric Design Flow
I I Interconnect Performance Estimation Models
Interconnect Performance Estimation Models
N IPEM for optimal
IPEM for optimal wiresizing wiresizing
N IPEM for
IPEM for wiresizing wiresizing and buffer insertion and buffer insertion
I I Interconnect Planning
Interconnect Planning
N Physical hierarchy generation
Physical hierarchy generation
N Floorplan
Floorplan/coarse placement with interconnect planning /coarse placement with interconnect planning
N Interconnect architecture planning
Interconnect architecture planning
I I Concluding Remarks
Concluding Remarks
ASPDAC'01 Tutorial Jason Cong 3
Clock cycles required for traveling 2cm line under BIWS
(buffer insertion and wire sizing)
1 G Hz 3 G Hz 5 G Hz 0.07 um 0.10 um 0.13 um 0.18 ym 0.25 um 1 2 3 4 5
clock cycle(s)
Estimated by IPEM On NTRS’97 technology Driver size: 100x min gate Receiver size: 100x min gate Buffer size: 100x min gate
ASPDAC'01 Tutorial Jason Cong 4
How Far Can We Go in Each Clock Cycle
7.52 15.04 22.56 24.9 (mm) 1 clock 2 clock 3 clock 4 clock 5 clock 6 clock 7 clock
I NTRS’97 0.07um Tech I 5 G Hz across-chip clock I 620 mm2 (24.9mm x
24.9mm)
I IPEM BIWS estimations
N Buffer size: 100x N Driver/receiver size: 100x
I From corner to corner:
N 7 clock cycles
ASPDAC'01 Tutorial Jason Cong 5
Two Important Implications
I I Interconnects determine the system
Interconnects determine the system performance performance
I I Need multiple clock cycles to cross the global
Need multiple clock cycles to cross the global interconnects in interconnects in giga giga-
- hertz designs
hertz designs Interconnect/communication-centric design methodology Pipelining/retiming on global interconnects
ASPDAC'01 Tutorial Jason Cong 6
Interconnect-Centric Design Methodology
device interconnect device interconnect Programs Data/Objects Programs Data/Objects
I I Proposed transition
Proposed transition
I I Analogy
Analogy
device/function centric interconnect/communication centric
ASPDAC'01 Tutorial Jason Cong 7
Interconnect-Centric IC Design Flow Under Development at UCLA
Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view
HDM
Synthesis and Placement under Physical Hierarchy Interconnect Planning
- Physical Hierarchy Generation
- Foorplan/Coarse Placement with Interconnect Planning
- Interconnect Architecture Planning
Interconnect Optimization (TRIO)
- Topology Optimization with Buffer Insertion
- Wire sizing and spacing
- Simultaneous Buffer Insertion and Wire Sizing
- Simultaneous Topology Construction
with Buffer Insertion and Wire Sizing
Interconnect Layout
Route Planning Point-to-Point Gridless Routing
Interconnect Performance Estimation Models (IPEM)
- OWS, SDWS, BISWS
Interconnect Synthesis
Performance-driven Global Routing Pseudo Pin Assignment under Noise Control
ASPDAC'01 Tutorial Jason Cong 8
Interconnect-Centric IC Design Flow Under Development at UCLA
Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view
HDM
Synthesis and Placement under Physical Hierarchy Interconnect Planning
- Physical Hierarchy Generation
- Foorplan/Coarse Placement with Interconnect Planning
- Interconnect Architecture Planning
Interconnect Optimization (TRIO)
- Topology Optimization with Buffer Insertion
- Wire sizing and spacing
- Simultaneous Buffer Insertion and Wire Sizing
- Simultaneous Topology Construction
with Buffer Insertion and Wire Sizing
Interconnect Layout
Route Planning Point-to-Point Gridless Routing
Interconnect Performance Estimation Models (IPEM)
- OWS, SDWS, BISWS
Interconnect Synthesis
Performance-driven Global Routing Pseudo Pin Assignment under Noise Control
Interconnect Synthesis
Performance-driven Global Routing Pseudo Pin Assignment under Noise Control
Interconnect Layout
Route Planning Point-to-Point Gridless Routing
Interconnect Performance Estimation Models (IPEM)
- OWS
- SDWS
- BISWS
Interconnect Optimization (TRIO)
- Topology Optimization with
Buffer Insertion
- Wire sizing and spacing
- Simultaneous Buffer Insertion
and Wire Sizing
- Simultaneous Topology Construction
with Buffer Insertion and Wire Sizing
Interconnect Planning
- Physical Hierarchy Generation
- Foorplan/Coarse Placement with
Interconnect Planning
- Interconnect Architecture Planning
ASPDAC'01 Tutorial Jason Cong 9
Interconnect-Centric IC Design Flow Under Development at UCLA
Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view
HDM
Synthesis and Placement under Physical Hierarchy Interconnect Planning
- Physical Hierarchy Generation
- Foorplan/Coarse Placement with Interconnect Planning
- Interconnect Architecture Planning
Interconnect Optimization (TRIO)
- Topology Optimization with Buffer Insertion
- Wire sizing and spacing
- Simultaneous Buffer Insertion and Wire Sizing
- Simultaneous Topology Construction
with Buffer Insertion and Wire Sizing
Interconnect Layout
Route Planning Point-to-Point Gridless Routing
Interconnect Performance Estimation Models (IPEM)
- OWS, SDWS, BISWS
Interconnect Synthesis
Performance-driven Global Routing Pseudo Pin Assignment under Noise Control
ASPDAC'01 Tutorial Jason Cong 10
Part V Outline
I I Interconnect
Interconnect-
- Centric Design Flow
Centric Design Flow
I I Interconnect Performance Estimation Models
Interconnect Performance Estimation Models
N N IPEM for optimal
IPEM for optimal wiresizing wiresizing
N N IPEM for
IPEM for wiresizing wiresizing and buffer insertion and buffer insertion
I I Interconnect Planning
Interconnect Planning
N N Physical hierarchy generation
Physical hierarchy generation
N N Floorplan
Floorplan/coarse placement with interconnect /coarse placement with interconnect planning planning
N N Interconnect architecture planning
Interconnect architecture planning
I I Concluding Remarks
Concluding Remarks
ASPDAC'01 Tutorial Jason Cong 11
Interconnect Performance Estimation
I I Introduction & Motivation
Introduction & Motivation
I I Problem Formulation
Problem Formulation
I I Interconnect Delay Estimation Models under Various
Interconnect Delay Estimation Models under Various Layout Optimizations Layout Optimizations
I I Application and Conclusion
Application and Conclusion
ASPDAC'01 Tutorial Jason Cong 12
Impact of Interconnect Optimization
- n Future Technology Generations
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0.25 0.18 0.15 0.13 0.1 0.07
Technology (um)
Delay (ns)
2cm DS 2cm BIS 2cm BISWS
G DS: Driver Sizing only G BIS: Buffer Insertion
and Sizing
G BISWS: Simultaneous
Buffer Insertion/Sizing and Wiresizing
ASPDAC'01 Tutorial Jason Cong 13
Complexity of Existing Interconnect
- Opt. Algorithms
I I 2cm line, W=20, B=10, segment every 500um
2cm line, W=20, B=10, segment every 500um
I I Use
Use best available best available algorithms: algorithms:
N Local Refinement (
Local Refinement (LR LR) )
N Dynamic Programming (
Dynamic Programming (DP DP) )
N Hybrid of
Hybrid of DP+LR DP+LR
Algorithm OWS BI+OWS BIWS BISWS Delay (ns) 4.5 1.6 1.02 0.81 CPU (s) 0.06 0.42 4.5 12.4 LR DP DP+LR
( HSPICE needs ( HSPICE needs additional 60 seconds! ) additional 60 seconds! )
ASPDAC'01 Tutorial Jason Cong 14
Needs for Efficient Interconnect Estimation Models
I I Efficiency
Efficiency
I I Abstraction
Abstraction to hide detailed design information to hide detailed design information
N granularity of wire segmentation
granularity of wire segmentation
N number of wire widths, buffer sizes, ...
number of wire widths, buffer sizes, ...
I I Explicit relation
Explicit relation to enable optimal design decision at to enable optimal design decision at high levels high levels
I I Ease of interaction
Ease of interaction with logic/high level synthesis tools with logic/high level synthesis tools
ASPDAC'01 Tutorial Jason Cong 15
I I Develop a set of
Develop a set of interconnect performance estimation interconnect performance estimation models models ( (IPEM IPEM), under different optimization alternatives: ), under different optimization alternatives:
N Optimal Wire Sizing
Optimal Wire Sizing (OWS) (OWS)
N Simultaneous Driver and Wire Sizing
Simultaneous Driver and Wire Sizing (SDWS) (SDWS)
N Simultaneous Buffer Insertion and Wire Sizing
Simultaneous Buffer Insertion and Wire Sizing (BIWS) (BIWS)
N Simultaneous Buffer Insertion/Sizing and Wire Sizing
Simultaneous Buffer Insertion/Sizing and Wire Sizing (BISWS) (BISWS)
I I IPEM have
IPEM have
N closed
closed-
- form formula or simple characteristic equations
form formula or simple characteristic equations
N constant running time in practice
constant running time in practice
N high accuracy (about 90% accuracy on average)
high accuracy (about 90% accuracy on average)
Interconnect Performance Estimation Modeling
[Cong-Pan, ASPDAC’99, TAU’99, DAC’99]
ASPDAC'01 Tutorial Jason Cong 16
I
R Rd0
d0
driver effective resistance of the input stage driver effective resistance of the input stage G G0
I
R Rd driver effective resistance of driver effective resistance of G G
I
l l interconnect wire length interconnect wire length
I
C CL loading capacitance loading capacitance
G Input G0
l
CL
What is the optimized delay? Do not run TRIO or other optimization tools !
Problem Formulation
ASPDAC'01 Tutorial Jason Cong 17
I I Interconnect
Interconnect
N N c
ca area capacitance coefficient area capacitance coefficient
N N c
cf fringing capacitance coefficient fringing capacitance coefficient
N N r
r sheet resistance sheet resistance
I I Device
Device
N N t
tg intrinsic gate delay intrinsic gate delay
N N c
cg input capacitance of the minimum gate input capacitance of the minimum gate
N N r
rg
- utput resistance of the minimum gate
- utput resistance of the minimum gate
I I Based on 1997 National Technology Roadmap for
Based on 1997 National Technology Roadmap for Semiconductors (NTRS’97) Semiconductors (NTRS’97)
Parameters and Notations
ASPDAC'01 Tutorial Jason Cong 18
I I Closed
Closed-
- form
form delay estimation formula delay estimation formula
l l c rc R c R l W l l W l C l R T
f a d f d L d
- ws
⋅ + + =
+
) ( 2 ) ( ) , , (
2 1 2 2 1
α α α α
where
a
rc
4 1 1 =
α
L d a
C R rc
2 1 2 =
α
, W(x) is Lambert’s W function defined as we
x
w =
I I Closed
Closed-
- form
form area estimation formula area estimation formula
l c R C l c r C l R A
a d L f L d
- ws
⋅ + = 2 ) 2 ( ) , , (
Delay/Area Estimation under OWS
ASPDAC'01 Tutorial Jason Cong 19
I I Theorem:
Theorem: T Tows
- ws is a sub
is a sub-
- quadratic, convex function of
quadratic, convex function of length length l l
I I Note: Without
Note: Without wiresizing wiresizing, wiring delay , wiring delay ∝ ∝ l l2, , as used in as used in some previous layout some previous layout-
- driven logic synthesis systems,
driven logic synthesis systems, such as [ such as [Ramachandran Ramachandran et al., ICCAD et al., ICCAD-
- 92], is no
92], is no longer accurate! longer accurate!
I I Closed
Closed-
- form DEM
form DEM-
- OWS will serve as a basis for
OWS will serve as a basis for deriving SDWS, BIWS and BISWS deriving SDWS, BIWS and BISWS
Property of DEM-OWS
ASPDAC'01 Tutorial Jason Cong 20
Delay modeling
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
2000 4000 6000 8000 10000 12000 14000 16000
length(um) ns
Model TRIO
Comparison of IPEM-OWS vs. TRIO
I 0.18um, Rd = rg/100, CL = cg x 100 I For expt., max wire width is 20x min, wire is segmented in every
10um
ASPDAC'01 Tutorial Jason Cong 21
Area Estimation for OWS
0.5 1 1.5 2
4000 8000 12000 16000 20000
length(um) w i d t h ( u m )
Model TRIO
ASPDAC'01 Tutorial Jason Cong 22
{ }
L b
- ws
g b d
- ws
L d biws
C l R T t C l R T C l R T , ) 1 ( , ( ) , , ( min 1 ) , , (
1
α α α − + + ≤ ≤ =
) , , (
L d
- ws
C l R T
Solve for l, => critical length lcrit (b, Rd , CL )
- Computed by
bisection method
- Constant time in
practice
CL
1 best buffer αl (1-α)l b
d
R
CL
Rd
No buffer l
Critical Length for BI under OWS
ASPDAC'01 Tutorial Jason Cong 23
Technology (um) 0.25 0.18 0.15 0.13 0.10 0.07 b=10x
4.12 3.80 3.97 3.61 2.92 2.08
b=50x
6.40 5.81 6.01 5.51 4.45 3.30
b=100x
7.47 6.83 7.04 6.39 5.30 3.91
b=200x
8.65 7.92 8.14 7.43 6.35 4.49
b=500x
9.98 9.10 9.30 8.57 7.13 5.21
Decrease unit: mm
- Cf. [Otten ISPD’98, Otten-Brayton DAC’98]
(uniform wire width)
- Min. WS
2.52 2.23 2.14 1.94 1.50 1.43
- Denote lc = lcrit (b, Rb , Cb)
Critical Lengths lcrit (b, Rb , Cb)
ASPDAC'01 Tutorial Jason Cong 24
“Logic Volume” within lc
Technology (um) 0.25 0.18 0.15 0.13 0.10 0.07 2-NAND (um2)
7.80 4.04 3.00 2.18 1.28 0.64
b=10x
0.55 0.89 1.31 1.49 1.66 1.69
b=50x
1.31 2.09 3.01 3.48 3.87 4.25
b=100x
1.79 2.88 4.13 4.68 5.48 5.97
b=200x
2.4 3.88 5.52 6.33 7.87 7.88
b=500x
3.19 5.12 7.21 8.42 9.93 10.6
Increase
- Defined as the number of min 2-input NAND gates
that can be packed within the area of lc/2 * lc/2 unit: million
ASPDAC'01 Tutorial Jason Cong 25
Property of BIWS
CL
b b b b lc lc lc llast
I I Theorem:
Theorem: For BIWS, the distances between adjacent For BIWS, the distances between adjacent buffers are the same, and equal to buffers are the same, and equal to l lc --
- - the critical
the critical length. length.
I I Proof
Proof: based on the convexity of : based on the convexity of T Tows
- ws
ASPDAC'01 Tutorial Jason Cong 26
IPEM for BIWS
g biws biws
t l T + ⋅ τ =
biws
τ
is the slope, and can be obtained from Tows(Rb , lc, Cb)
I I Original long interconnect is divided into
Original long interconnect is divided into l l/ /l lc stage stage
I I The
The stage number stage number is proportional to is proportional to l l
I I Each stage of length
Each stage of length l lc has delay has delay T Tows
- ws(
(R Rb , , l lc, , C Cb) )
² ² Linear DEM for BIWS
Linear DEM for BIWS
ASPDAC'01 Tutorial Jason Cong 27
IPEM for BIWS vs. TRIO
Delay Modeling
0.2 0.4 0.6 0.8 1
4000 8000 12000 16000 20000
length(um) ns Model TRIO
I 0.18um, Rd0 = rg/10, CL = cg x 10, buffer type is 100 x min. I For expt., max. wire width is 20x min. width, wire is segmented in every
100um.
ASPDAC'01 Tutorial Jason Cong 28
IPEM under BISWS
I I Observations from
Observations from extensive extensive experiments: experiments:
N Linear delay versus length
Linear delay versus length
N Internal buffers are about the same size
Internal buffers are about the same size
I I Therefore, we estimate BISWS by the best BIWS from
Therefore, we estimate BISWS by the best BIWS from available buffer types available buffer types
g bisws bisws
t l T + ⋅ τ =
biws bisws
B b τ τ ∈ = min
where , B is the buffer set
I I Linear delay model for optimal BISWS
Linear delay model for optimal BISWS
I I Complexity O(|
Complexity O(|B B|). Since the set |). Since the set B B is normally is normally less than 20, constant time in practice. less than 20, constant time in practice.
ASPDAC'01 Tutorial Jason Cong 29
Comparison of IPEM for BISWS vs. TRIO
Delay Modeling
0.2 0.4 0.6 0.8
4000 8000 12000 16000 20000
length(um) ns
Model TRIO
I 0.18um, Rd0 = rg/10, CL = cg x 10 I For expt., max. allowable buffer/driver size is 400x min device; max. wire
width is 20x min. width; wire is segmented in every 100um.
ASPDAC'01 Tutorial Jason Cong 30
IPEM for Multiple-Pin Nets
I Estimation with different optimization objectives:
N Minimize the delay to a single critical sink (SCS) N Minimize the maximum delay (defined as the tree delay) for
multiple critical sinks (MCS)
N Minimize weighted delay ...
G Input G0 Csn Cs2 Cs1 Sn S1 S2 S3 Cs3
ASPDAC'01 Tutorial Jason Cong 31
Some Applications of IPEM
I I Layout
Layout-
- driven physical and RTL level
driven physical and RTL level floorplanning floorplanning
N Predict accurate
Predict accurate interconnect delay and routing resource interconnect delay and routing resource without really going into layout details; without really going into layout details;
N Use accurate interconnect delay/area to guide
Use accurate interconnect delay/area to guide floorplanning floorplanning/placement /placement
I I Interconnect Architecture Planning
Interconnect Architecture Planning
N E.g. Wire width planning
E.g. Wire width planning
I I Floorplanning
Floorplanning + interconnect planning + interconnect planning
N E.g. Buffer block planning
E.g. Buffer block planning
I I Available from
Available from http:// http://cadlab cadlab. .cs cs. .ucla ucla. .edu edu/~cong /~cong
ASPDAC'01 Tutorial Jason Cong 32
Part V Outline
I I Interconnect
Interconnect-
- Centric Design Flow
Centric Design Flow
I I Interconnect Performance Estimation Models
Interconnect Performance Estimation Models
N IPEM for optimal
IPEM for optimal wiresizing wiresizing
N IPEM for
IPEM for wiresizing wiresizing and buffer insertion and buffer insertion
I I Interconnect Planning
Interconnect Planning
N Physical hierarchy generation
Physical hierarchy generation
N Floorplan
Floorplan/coarse placement with interconnect planning /coarse placement with interconnect planning
N Interconnect architecture planning
Interconnect architecture planning
I I Concluding Remarks
Concluding Remarks
ASPDAC'01 Tutorial Jason Cong 33
Physical Hierarchy Generation
I I Designs are hierarchical due to high complexity
Designs are hierarchical due to high complexity
I I Design specification (in HDL) follows logic hierarchy
Design specification (in HDL) follows logic hierarchy
I I Logic hierarchy may not be suitable to be embedded
Logic hierarchy may not be suitable to be embedded
- n a 2D silicon surface, resulting poor interconnect
- n a 2D silicon surface, resulting poor interconnect
designs designs
N N RT
RT-
- level
level floorplanning floorplanning is a bad idea! is a bad idea!
I I Solution: transform logic hierarchy to physical
Solution: transform logic hierarchy to physical hierarchy hierarchy
ASPDAC'01 Tutorial Jason Cong 34
Example of Logic Hierarchy in Final Layout
By courtesy of IBM (Tony Drumm)
ASPDAC'01 Tutorial Jason Cong 35
Example of Logic Hierarchy in Final Layout
By courtesy of IBM (Tony Drumm)
ASPDAC'01 Tutorial Jason Cong 36
Transform Logic Hierarchy to Physical Hierarchy
I I Simultaneous partitioning, coarse placement, and
Simultaneous partitioning, coarse placement, and retiming on the retiming on the flat flat netlist netlist to generate a good physical to generate a good physical hierarchy hierarchy
N Synthesis will follow
Synthesis will follow
I I Use multi
Use multi-
- level optimization to handle with the
level optimization to handle with the complexity complexity
ASPDAC'01 Tutorial Jason Cong 37
I I Importance of Partitioning:
Importance of Partitioning:
N Conventional view: enables divide
Conventional view: enables divide-
- and
and-
- conquer
conquer
N DSM view:
DSM view: defines global and local interconnects defines global and local interconnects
D >> d !!!
Local Interconnect d Global Interconnect D
Role of Partitioning
ASPDAC'01 Tutorial Jason Cong 38
Need of Considering Retiming during Partitioning
- Retiming/pipelining on global interconnects
I I Multiple clock cycles are needed to cross the chip
Multiple clock cycles are needed to cross the chip
I I Proper partitioning allows retiming to
Proper partitioning allows retiming to hide hide global global interconnect delays. interconnect delays.
same cutsize
f (A) = 8
Partitioning A
f (B) = 8
Partitioning B
f (B) = 8 f (A) = 6
ASPDAC'01 Tutorial Jason Cong 39
Sequential Arrival Time (SAT)
I I Definition
Definition [Pan et al, TCAD98]
[Pan et al, TCAD98]
N l
l( (v v) = max delay from PIs to ) = max delay from PIs to v v after opt. retiming under a given clock after opt. retiming under a given clock period period f f
N l
l( (v v) = max{ ) = max{l l( (u u) ) -
- f
f · · w w( (u,v u,v) + ) + d d( (u,v u,v) + ) + d d( (v v)} )}
N Relation to retiming:
Relation to retiming: r r( (v v) = ) = l l( (v v) / ) / f f -
- 1
1
N Theorem:
Theorem: P P can be retimed to can be retimed to f f + max{ + max{d d( (e e)} iff )} iff l l(POs) (POs) ≤ ≤ f f u w v l(u) = 7 l(w) = 3 d(v) = 1, d(e) = 2, f = 5 l(v) = max{7-5·1+2+1, 3+2+1} = 6 u v l(u) w(u,v) d(v)
ASPDAC'01 Tutorial Jason Cong 40
I I Minimize SAT during partitioning/placement
Minimize SAT during partitioning/placement
I I Apply optimal retiming to the resulting solution (best
Apply optimal retiming to the resulting solution (best suitable for retiming) suitable for retiming)
I I Partitioning/placement with retiming can be applied
Partitioning/placement with retiming can be applied recursively to generate physical hierarchy recursively to generate physical hierarchy
I I Good news: SAT can be computed efficiently (linear
Good news: SAT can be computed efficiently (linear time in practice, quadratic time in the worst case) time in practice, quadratic time in the worst case)
I I Difficulty: Flattened
Difficulty: Flattened netlist netlist can be very large! can be very large!
N Solution: use multi
Solution: use multi-
- level method
level method
Simultaneous Partitioning/Placement with Retiming
ASPDAC'01 Tutorial Jason Cong 41
Multi-level Partitioning
Coarsening Uncoarsening & Refinement Initial Partitioning I Iterative coarsening (clustering) to generate a multi-
level hierarchy
I Initial partitioning on the coarsest level I Iterative de-clustering and refinement
ASPDAC'01 Tutorial Jason Cong 42
I I Hierarchical approach: higher
Hierarchical approach: higher-
- level design
level design constrains constrains lower lower-
- level designs
level designs
N Not sufficient information at higher
Not sufficient information at higher-
- level
level
N Mistake at higher level is impossible or costly to correct
Mistake at higher level is impossible or costly to correct
I I Multi
Multi-
- level approach: finer
level approach: finer-
- level design
level design refines refines coarse coarse-
- level design
level design
N Converge to better solution as more details are considered
Converge to better solution as more details are considered
Hierarchical Approach vs Multi-Level Approach
ASPDAC'01 Tutorial Jason Cong 43
Example: Multi-Level Partitioning with Coarse Placement & Retiming
Timing analysis & cell move Timing analysis & cell move Next cluster level Timing analysis & cell move Next cluster level
I Bottom-up multi-level clustering I Top down cell move based multi-level partitioning I Sequential timing analysis at each level [Cong and Lim,
ICCAD00]
ASPDAC'01 Tutorial Jason Cong 44
Success of Multi-Level Approach
I I First used to solve partial differential equations (multi
First used to solve partial differential equations (multi-
- grid method)
grid method)
I I Successfully applied to circuit partitioning (
Successfully applied to circuit partitioning (hMetis hMetis
[ [Karypis Karypis et al, 1997] et al, 1997])
)
N Best
Best partitioner partitioner for cut for cut-
- size minimization
size minimization
I I Successfully applied to physical hierarchy generation
Successfully applied to physical hierarchy generation (HPM and GEO (HPM and GEO [Cong et al, DAC’00 & ICCAD’00]
[Cong et al, DAC’00 & ICCAD’00])
)
N 30
30-
- 40% delay reduction compared to
40% delay reduction compared to hMetis hMetis
I I Successfully applied to circuit placement
Successfully applied to circuit placement [Chan
[Chan et al, et al, ICCAD’00] ICCAD’00]
N 10x speed
10x speed-
- up over
up over GordianL GordianL
ASPDAC'01 Tutorial Jason Cong 45
Experimental Results
0.2 0.4 0.6 0.8 1 1.2 1.4 delay cutsize wire runtime hMetis+RT+FL HPM+FL GEO
I Comparison with existing algorithms
N hMetis [DAC97] + retiming + slicing floorplan [Algo89] N HPM [DAC00] + slicing floorplan [Algo89] N GEO: simultaneous partitioning + coarse placement + retiming
Close to 40% delay reduction!
ASPDAC'01 Tutorial Jason Cong 46
Interconnect Planning
I I Physical Hierarchy Generation
Physical Hierarchy Generation
I I Floorplan
Floorplan/Coarse Placement with Interconnect /Coarse Placement with Interconnect Planning Planning
N N Example: Buffer Block Planning in
Example: Buffer Block Planning in Floorplanning Floorplanning
I I Interconnect Architecture Planning
Interconnect Architecture Planning
Demand of Buffers in Nanometer Designs
( Estimated based on NTRS’97 & [Davis-Meindl’97] )
Technology (um) 0.25 0.18 0.13 0.10 0.07 #buffer per chip 5k 25k 54k 230k 797k
I I Need to insert buffers in long global interconnects for
Need to insert buffers in long global interconnects for performance optimization performance optimization Source: [Cong’97, SRC Work Paper] http://www.src.org/research/frontier.dgw
ASPDAC'01 Tutorial Jason Cong 48
Buffer Block Planning Problem
[Cong-Kong-Pan, ICCAD’99] buffer block
I
Restriction from hard IP blocks Restriction from hard IP blocks
I
Implications on P/G routing Implications on P/G routing
I
Impact on Impact on floorplan floorplan configuration configuration => need to plan ahead for buffers. => need to plan ahead for buffers.
ASPDAC'01 Tutorial Jason Cong 49
Optimal Buffer Location Can Be Relaxed
I I Closed
Closed-
- form
form formula of feasible region (FR) for formula of feasible region (FR) for inserting one buffer to meet delay constraint inserting one buffer to meet delay constraint 1 buffer driver CL
xmin l x xmax
x M A X K K K K K x M I N l K K K K K
m i n m a x
, , = − − = + − 4 2 4 2
2 2 2 1 3 1 2 2 2 1 3 1
x x x ∈[ , ]
min max
ASPDAC'01 Tutorial Jason Cong 50
Feasible Region (FR) Is Very Large
I I Even under tight delay constraint, FR for BI can still
Even under tight delay constraint, FR for BI can still be very large! be very large!
2000 4000 6000 8000 10000
0.1 0.2 0.3 0.4
Delta um
6000um 7000um 8000um 9000um
Delay budget is (1+Delta) Topt (the best delay by
- ptimal buffer
insertion) Delta FR 1% 19% 5% 43% 10% 60% 20% 86%
=> FR provides a lot of flexibility to plan buffer location
ASPDAC'01 Tutorial Jason Cong 51
Extension: 2D Feasible Region
I I FR extended to 2
FR extended to 2-
- dimension with obstacles
dimension with obstacles
source sink
2-D FR Locus of min-delay BI (Restricted lines)
ASPDAC'01 Tutorial Jason Cong 52
Experimental Results of Buffer Block Planning
Buffer block planning reduces # buffer blocks, better meets timing constraints, and use smaller area
0.2 0.4 0.6 0.8 1 1.2 1.4 No-planning With planning
#nets that meet delay constraints #Buffer Block area
ASPDAC'01 Tutorial Jason Cong 53
Concluding Remarks
I I High
High-
- performance designs in DSM technologies need
performance designs in DSM technologies need carefully interconnect planning carefully interconnect planning
I I Efficient interconnect performance estimation models
Efficient interconnect performance estimation models ( (IPEMs IPEMs) are important for interconnect planning ) are important for interconnect planning
I I Top
Top-
- level partitioning defines global and local
level partitioning defines global and local interconnects, and impacts performance significantly interconnects, and impacts performance significantly
I I Retiming and pipelining over global interconnects are
Retiming and pipelining over global interconnects are necessary for multi necessary for multi-
- gigahertz designs
gigahertz designs
I I A clever combination of partitioning and retiming can
A clever combination of partitioning and retiming can hide (some) global interconnect delays hide (some) global interconnect delays
I I Buffer block planning help to reduce complexity while
Buffer block planning help to reduce complexity while achieving good performance achieving good performance
ASPDAC'01 Tutorial Jason Cong 54
Acknowledgments
I I Thanks to Sung Lim, David Pan, and
Thanks to Sung Lim, David Pan, and Xin Xin Yuan at Yuan at UCLA for their help with slides UCLA for their help with slides
I I Thanks to SRC, MARCO/GSRC, and Intel Corp. for
Thanks to SRC, MARCO/GSRC, and Intel Corp. for their supports of a number of research projects covered their supports of a number of research projects covered in this tutorial in this tutorial
I I Updated slides in PDF file will be available at