[PPT] - New Approaches to Harness Global Interconnects Jason Cong Computer PowerPoint Presentation

SLIDE 1

ASPDAC'01 Tutorial Jason Cong 1

PART V

New Approaches to Harness Global Interconnects

Jason Cong Computer Science Department University of California at Los Angeles Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong

SLIDE 2

ASPDAC'01 Tutorial Jason Cong 2

Part V Outline

I I Interconnect

Interconnect-

Centric Design Flow

Centric Design Flow

I I Interconnect Performance Estimation Models

Interconnect Performance Estimation Models

N IPEM for optimal

IPEM for optimal wiresizing wiresizing

N IPEM for

IPEM for wiresizing wiresizing and buffer insertion and buffer insertion

I I Interconnect Planning

Interconnect Planning

N Physical hierarchy generation

Physical hierarchy generation

N Floorplan

Floorplan/coarse placement with interconnect planning /coarse placement with interconnect planning

N Interconnect architecture planning

Interconnect architecture planning

I I Concluding Remarks

Concluding Remarks

SLIDE 3

ASPDAC'01 Tutorial Jason Cong 3

Clock cycles required for traveling 2cm line under BIWS

(buffer insertion and wire sizing)

1 G Hz 3 G Hz 5 G Hz 0.07 um 0.10 um 0.13 um 0.18 ym 0.25 um 1 2 3 4 5

clock cycle(s)

Estimated by IPEM On NTRS’97 technology Driver size: 100x min gate Receiver size: 100x min gate Buffer size: 100x min gate

SLIDE 4

ASPDAC'01 Tutorial Jason Cong 4

How Far Can We Go in Each Clock Cycle

7.52 15.04 22.56 24.9 (mm) 1 clock 2 clock 3 clock 4 clock 5 clock 6 clock 7 clock

I NTRS’97 0.07um Tech I 5 G Hz across-chip clock I 620 mm2 (24.9mm x

24.9mm)

I IPEM BIWS estimations

N Buffer size: 100x N Driver/receiver size: 100x

I From corner to corner:

N 7 clock cycles

SLIDE 5

ASPDAC'01 Tutorial Jason Cong 5

Two Important Implications

I I Interconnects determine the system

Interconnects determine the system performance performance

I I Need multiple clock cycles to cross the global

Need multiple clock cycles to cross the global interconnects in interconnects in giga giga-

hertz designs

hertz designs Interconnect/communication-centric design methodology Pipelining/retiming on global interconnects

SLIDE 6

ASPDAC'01 Tutorial Jason Cong 6

Interconnect-Centric Design Methodology

device interconnect device interconnect Programs Data/Objects Programs Data/Objects

I I Proposed transition

Proposed transition

I I Analogy

Analogy

device/function centric interconnect/communication centric

SLIDE 7

ASPDAC'01 Tutorial Jason Cong 7

Interconnect-Centric IC Design Flow Under Development at UCLA

Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view

HDM

Synthesis and Placement under Physical Hierarchy Interconnect Planning

Physical Hierarchy Generation
Foorplan/Coarse Placement with Interconnect Planning
Interconnect Architecture Planning

Interconnect Optimization (TRIO)

Topology Optimization with Buffer Insertion
Wire sizing and spacing
Simultaneous Buffer Insertion and Wire Sizing
Simultaneous Topology Construction

with Buffer Insertion and Wire Sizing

Interconnect Layout

Route Planning Point-to-Point Gridless Routing

Interconnect Performance Estimation Models (IPEM)

OWS, SDWS, BISWS

Interconnect Synthesis

Performance-driven Global Routing Pseudo Pin Assignment under Noise Control

SLIDE 8

ASPDAC'01 Tutorial Jason Cong 8

Interconnect-Centric IC Design Flow Under Development at UCLA

Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view

HDM

Synthesis and Placement under Physical Hierarchy Interconnect Planning

Physical Hierarchy Generation
Foorplan/Coarse Placement with Interconnect Planning
Interconnect Architecture Planning

Interconnect Optimization (TRIO)

Topology Optimization with Buffer Insertion
Wire sizing and spacing
Simultaneous Buffer Insertion and Wire Sizing
Simultaneous Topology Construction

with Buffer Insertion and Wire Sizing

Interconnect Layout

Route Planning Point-to-Point Gridless Routing

Interconnect Performance Estimation Models (IPEM)

OWS, SDWS, BISWS

Interconnect Synthesis

Performance-driven Global Routing Pseudo Pin Assignment under Noise Control

Interconnect Synthesis

Performance-driven Global Routing Pseudo Pin Assignment under Noise Control

Interconnect Layout

Route Planning Point-to-Point Gridless Routing

Interconnect Performance Estimation Models (IPEM)

OWS
SDWS
BISWS

Interconnect Optimization (TRIO)

Topology Optimization with

Buffer Insertion

Wire sizing and spacing
Simultaneous Buffer Insertion

and Wire Sizing

Simultaneous Topology Construction

with Buffer Insertion and Wire Sizing

Interconnect Planning

Physical Hierarchy Generation
Foorplan/Coarse Placement with

Interconnect Planning

Interconnect Architecture Planning

SLIDE 9

ASPDAC'01 Tutorial Jason Cong 9

Interconnect-Centric IC Design Flow Under Development at UCLA

Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view

HDM

Synthesis and Placement under Physical Hierarchy Interconnect Planning

Physical Hierarchy Generation
Foorplan/Coarse Placement with Interconnect Planning
Interconnect Architecture Planning

Interconnect Optimization (TRIO)

Topology Optimization with Buffer Insertion
Wire sizing and spacing
Simultaneous Buffer Insertion and Wire Sizing
Simultaneous Topology Construction

with Buffer Insertion and Wire Sizing

Interconnect Layout

Route Planning Point-to-Point Gridless Routing

Interconnect Performance Estimation Models (IPEM)

OWS, SDWS, BISWS

Interconnect Synthesis

Performance-driven Global Routing Pseudo Pin Assignment under Noise Control

SLIDE 10

ASPDAC'01 Tutorial Jason Cong 10

Part V Outline

I I Interconnect

Interconnect-

Centric Design Flow

Centric Design Flow

I I Interconnect Performance Estimation Models

Interconnect Performance Estimation Models

N N IPEM for optimal

IPEM for optimal wiresizing wiresizing

N N IPEM for

IPEM for wiresizing wiresizing and buffer insertion and buffer insertion

I I Interconnect Planning

Interconnect Planning

N N Physical hierarchy generation

Physical hierarchy generation

N N Floorplan

Floorplan/coarse placement with interconnect /coarse placement with interconnect planning planning

N N Interconnect architecture planning

Interconnect architecture planning

I I Concluding Remarks

Concluding Remarks

SLIDE 11

ASPDAC'01 Tutorial Jason Cong 11

Interconnect Performance Estimation

I I Introduction & Motivation

Introduction & Motivation

I I Problem Formulation

Problem Formulation

I I Interconnect Delay Estimation Models under Various

Interconnect Delay Estimation Models under Various Layout Optimizations Layout Optimizations

I I Application and Conclusion

Application and Conclusion

SLIDE 12

ASPDAC'01 Tutorial Jason Cong 12

Impact of Interconnect Optimization

n Future Technology Generations

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0.25 0.18 0.15 0.13 0.1 0.07

Technology (um)

Delay (ns)

2cm DS 2cm BIS 2cm BISWS

G DS: Driver Sizing only G BIS: Buffer Insertion

and Sizing

G BISWS: Simultaneous

Buffer Insertion/Sizing and Wiresizing

SLIDE 13

ASPDAC'01 Tutorial Jason Cong 13

Complexity of Existing Interconnect

Opt. Algorithms

I I 2cm line, W=20, B=10, segment every 500um

2cm line, W=20, B=10, segment every 500um

I I Use

Use best available best available algorithms: algorithms:

N Local Refinement (

Local Refinement (LR LR) )

N Dynamic Programming (

Dynamic Programming (DP DP) )

N Hybrid of

Hybrid of DP+LR DP+LR

Algorithm OWS BI+OWS BIWS BISWS Delay (ns) 4.5 1.6 1.02 0.81 CPU (s) 0.06 0.42 4.5 12.4 LR DP DP+LR

( HSPICE needs ( HSPICE needs additional 60 seconds! ) additional 60 seconds! )

SLIDE 14

ASPDAC'01 Tutorial Jason Cong 14

Needs for Efficient Interconnect Estimation Models

I I Efficiency

Efficiency

I I Abstraction

Abstraction to hide detailed design information to hide detailed design information

N granularity of wire segmentation

granularity of wire segmentation

N number of wire widths, buffer sizes, ...

number of wire widths, buffer sizes, ...

I I Explicit relation

Explicit relation to enable optimal design decision at to enable optimal design decision at high levels high levels

I I Ease of interaction

Ease of interaction with logic/high level synthesis tools with logic/high level synthesis tools

SLIDE 15

ASPDAC'01 Tutorial Jason Cong 15

I I Develop a set of

Develop a set of interconnect performance estimation interconnect performance estimation models models ( (IPEM IPEM), under different optimization alternatives: ), under different optimization alternatives:

N Optimal Wire Sizing

Optimal Wire Sizing (OWS) (OWS)

N Simultaneous Driver and Wire Sizing

Simultaneous Driver and Wire Sizing (SDWS) (SDWS)

N Simultaneous Buffer Insertion and Wire Sizing

Simultaneous Buffer Insertion and Wire Sizing (BIWS) (BIWS)

N Simultaneous Buffer Insertion/Sizing and Wire Sizing

Simultaneous Buffer Insertion/Sizing and Wire Sizing (BISWS) (BISWS)

I I IPEM have

IPEM have

N closed

closed-

form formula or simple characteristic equations

form formula or simple characteristic equations

N constant running time in practice

constant running time in practice

N high accuracy (about 90% accuracy on average)

high accuracy (about 90% accuracy on average)

Interconnect Performance Estimation Modeling

[Cong-Pan, ASPDAC’99, TAU’99, DAC’99]

SLIDE 16

ASPDAC'01 Tutorial Jason Cong 16

I

R Rd0

d0

driver effective resistance of the input stage driver effective resistance of the input stage G G0

I

R Rd driver effective resistance of driver effective resistance of G G

I

l l interconnect wire length interconnect wire length

I

C CL loading capacitance loading capacitance

G Input G0

l

CL

What is the optimized delay? Do not run TRIO or other optimization tools !

Problem Formulation

SLIDE 17

ASPDAC'01 Tutorial Jason Cong 17

I I Interconnect

Interconnect

N N c

ca area capacitance coefficient area capacitance coefficient

N N c

cf fringing capacitance coefficient fringing capacitance coefficient

N N r

r sheet resistance sheet resistance

I I Device

Device

N N t

tg intrinsic gate delay intrinsic gate delay

N N c

cg input capacitance of the minimum gate input capacitance of the minimum gate

N N r

rg

utput resistance of the minimum gate
utput resistance of the minimum gate

I I Based on 1997 National Technology Roadmap for

Based on 1997 National Technology Roadmap for Semiconductors (NTRS’97) Semiconductors (NTRS’97)

Parameters and Notations

SLIDE 18

ASPDAC'01 Tutorial Jason Cong 18

I I Closed

Closed-

form

form delay estimation formula delay estimation formula

l l c rc R c R l W l l W l C l R T

f a d f d L d

ws

⋅       + + =

+

) ( 2 ) ( ) , , (

2 1 2 2 1

α α α α

where

a

rc

4 1 1 =

α

L d a

C R rc

2 1 2 =

α

, W(x) is Lambert’s W function defined as we

x

w =

I I Closed

Closed-

form

form area estimation formula area estimation formula

l c R C l c r C l R A

a d L f L d

ws

⋅ + = 2 ) 2 ( ) , , (

Delay/Area Estimation under OWS

SLIDE 19

ASPDAC'01 Tutorial Jason Cong 19

I I Theorem:

Theorem: T Tows

ws is a sub

is a sub-

quadratic, convex function of

quadratic, convex function of length length l l

I I Note: Without

Note: Without wiresizing wiresizing, wiring delay , wiring delay ∝ ∝ l l2, , as used in as used in some previous layout some previous layout-

driven logic synthesis systems,

driven logic synthesis systems, such as [ such as [Ramachandran Ramachandran et al., ICCAD et al., ICCAD-

92], is no

92], is no longer accurate! longer accurate!

I I Closed

Closed-

form DEM

form DEM-

OWS will serve as a basis for

OWS will serve as a basis for deriving SDWS, BIWS and BISWS deriving SDWS, BIWS and BISWS

Property of DEM-OWS

SLIDE 20

ASPDAC'01 Tutorial Jason Cong 20

Delay modeling

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

2000 4000 6000 8000 10000 12000 14000 16000

length(um) ns

Model TRIO

Comparison of IPEM-OWS vs. TRIO

I 0.18um, Rd = rg/100, CL = cg x 100 I For expt., max wire width is 20x min, wire is segmented in every

10um

SLIDE 21

ASPDAC'01 Tutorial Jason Cong 21

Area Estimation for OWS

0.5 1 1.5 2

4000 8000 12000 16000 20000

length(um) w i d t h ( u m )

Model TRIO

SLIDE 22

ASPDAC'01 Tutorial Jason Cong 22

{ }

L b

ws

g b d

ws

L d biws

C l R T t C l R T C l R T , ) 1 ( , ( ) , , ( min 1 ) , , (

1

α α α − + + ≤ ≤ =

) , , (

L d

ws

C l R T

Solve for l, => critical length lcrit (b, Rd , CL )

Computed by

bisection method

Constant time in

practice

CL

1 best buffer αl (1-α)l b

d

R

CL

Rd

No buffer l

Critical Length for BI under OWS

SLIDE 23

ASPDAC'01 Tutorial Jason Cong 23

Technology (um) 0.25 0.18 0.15 0.13 0.10 0.07 b=10x

4.12 3.80 3.97 3.61 2.92 2.08

b=50x

6.40 5.81 6.01 5.51 4.45 3.30

b=100x

7.47 6.83 7.04 6.39 5.30 3.91

b=200x

8.65 7.92 8.14 7.43 6.35 4.49

b=500x

9.98 9.10 9.30 8.57 7.13 5.21

Decrease unit: mm

Cf. [Otten ISPD’98, Otten-Brayton DAC’98]

(uniform wire width)

Min. WS

2.52 2.23 2.14 1.94 1.50 1.43

Denote lc = lcrit (b, Rb , Cb)

Critical Lengths lcrit (b, Rb , Cb)

SLIDE 24

ASPDAC'01 Tutorial Jason Cong 24

“Logic Volume” within lc

Technology (um) 0.25 0.18 0.15 0.13 0.10 0.07 2-NAND (um2)

7.80 4.04 3.00 2.18 1.28 0.64

b=10x

0.55 0.89 1.31 1.49 1.66 1.69

b=50x

1.31 2.09 3.01 3.48 3.87 4.25

b=100x

1.79 2.88 4.13 4.68 5.48 5.97

b=200x

2.4 3.88 5.52 6.33 7.87 7.88

b=500x

3.19 5.12 7.21 8.42 9.93 10.6

Increase

Defined as the number of min 2-input NAND gates

that can be packed within the area of lc/2 * lc/2 unit: million

SLIDE 25

ASPDAC'01 Tutorial Jason Cong 25

Property of BIWS

CL

b b b b lc lc lc llast

I I Theorem:

Theorem: For BIWS, the distances between adjacent For BIWS, the distances between adjacent buffers are the same, and equal to buffers are the same, and equal to l lc --

- the critical

the critical length. length.

I I Proof

Proof: based on the convexity of : based on the convexity of T Tows

ws

SLIDE 26

ASPDAC'01 Tutorial Jason Cong 26

IPEM for BIWS

g biws biws

t l T + ⋅ τ =

biws

τ

is the slope, and can be obtained from Tows(Rb , lc, Cb)

I I Original long interconnect is divided into

Original long interconnect is divided into  l l/ /l lc  stage stage

I I The

The stage number stage number is proportional to is proportional to l l

I I Each stage of length

Each stage of length l lc has delay has delay T Tows

ws(

(R Rb , , l lc, , C Cb) )

² ² Linear DEM for BIWS

Linear DEM for BIWS

SLIDE 27

ASPDAC'01 Tutorial Jason Cong 27

IPEM for BIWS vs. TRIO

Delay Modeling

0.2 0.4 0.6 0.8 1

4000 8000 12000 16000 20000

length(um) ns Model TRIO

I 0.18um, Rd0 = rg/10, CL = cg x 10, buffer type is 100 x min. I For expt., max. wire width is 20x min. width, wire is segmented in every

100um.

SLIDE 28

ASPDAC'01 Tutorial Jason Cong 28

IPEM under BISWS

I I Observations from

Observations from extensive extensive experiments: experiments:

N Linear delay versus length

Linear delay versus length

N Internal buffers are about the same size

Internal buffers are about the same size

I I Therefore, we estimate BISWS by the best BIWS from

Therefore, we estimate BISWS by the best BIWS from available buffer types available buffer types

g bisws bisws

t l T + ⋅ τ =

biws bisws

B b τ τ ∈ = min

where , B is the buffer set

I I Linear delay model for optimal BISWS

Linear delay model for optimal BISWS

I I Complexity O(|

Complexity O(|B B|). Since the set |). Since the set B B is normally is normally less than 20, constant time in practice. less than 20, constant time in practice.

SLIDE 29

ASPDAC'01 Tutorial Jason Cong 29

Comparison of IPEM for BISWS vs. TRIO

Delay Modeling

0.2 0.4 0.6 0.8

4000 8000 12000 16000 20000

length(um) ns

Model TRIO

I 0.18um, Rd0 = rg/10, CL = cg x 10 I For expt., max. allowable buffer/driver size is 400x min device; max. wire

width is 20x min. width; wire is segmented in every 100um.

SLIDE 30

ASPDAC'01 Tutorial Jason Cong 30

IPEM for Multiple-Pin Nets

I Estimation with different optimization objectives:

N Minimize the delay to a single critical sink (SCS) N Minimize the maximum delay (defined as the tree delay) for

multiple critical sinks (MCS)

N Minimize weighted delay ...

G Input G0 Csn Cs2 Cs1 Sn S1 S2 S3 Cs3

SLIDE 31

ASPDAC'01 Tutorial Jason Cong 31

Some Applications of IPEM

I I Layout

Layout-

driven physical and RTL level

driven physical and RTL level floorplanning floorplanning

N Predict accurate

Predict accurate interconnect delay and routing resource interconnect delay and routing resource without really going into layout details; without really going into layout details;

N Use accurate interconnect delay/area to guide

Use accurate interconnect delay/area to guide floorplanning floorplanning/placement /placement

I I Interconnect Architecture Planning

Interconnect Architecture Planning

N E.g. Wire width planning

E.g. Wire width planning

I I Floorplanning

Floorplanning + interconnect planning + interconnect planning

N E.g. Buffer block planning

E.g. Buffer block planning

I I Available from

Available from http:// http://cadlab cadlab. .cs cs. .ucla ucla. .edu edu/~cong /~cong

SLIDE 32

ASPDAC'01 Tutorial Jason Cong 32

Part V Outline

I I Interconnect

Interconnect-

Centric Design Flow

Centric Design Flow

I I Interconnect Performance Estimation Models

Interconnect Performance Estimation Models

N IPEM for optimal

IPEM for optimal wiresizing wiresizing

N IPEM for

IPEM for wiresizing wiresizing and buffer insertion and buffer insertion

I I Interconnect Planning

Interconnect Planning

N Physical hierarchy generation

Physical hierarchy generation

N Floorplan

Floorplan/coarse placement with interconnect planning /coarse placement with interconnect planning

N Interconnect architecture planning

Interconnect architecture planning

I I Concluding Remarks

Concluding Remarks

SLIDE 33

ASPDAC'01 Tutorial Jason Cong 33

Physical Hierarchy Generation

I I Designs are hierarchical due to high complexity

Designs are hierarchical due to high complexity

I I Design specification (in HDL) follows logic hierarchy

Design specification (in HDL) follows logic hierarchy

I I Logic hierarchy may not be suitable to be embedded

Logic hierarchy may not be suitable to be embedded

n a 2D silicon surface, resulting poor interconnect
n a 2D silicon surface, resulting poor interconnect

designs designs

N N RT

RT-

level

level floorplanning floorplanning is a bad idea! is a bad idea!

I I Solution: transform logic hierarchy to physical

Solution: transform logic hierarchy to physical hierarchy hierarchy

SLIDE 34

ASPDAC'01 Tutorial Jason Cong 34

Example of Logic Hierarchy in Final Layout

By courtesy of IBM (Tony Drumm)

SLIDE 35

ASPDAC'01 Tutorial Jason Cong 35

Example of Logic Hierarchy in Final Layout

By courtesy of IBM (Tony Drumm)

SLIDE 36

ASPDAC'01 Tutorial Jason Cong 36

Transform Logic Hierarchy to Physical Hierarchy

I I Simultaneous partitioning, coarse placement, and

Simultaneous partitioning, coarse placement, and retiming on the retiming on the flat flat netlist netlist to generate a good physical to generate a good physical hierarchy hierarchy

N Synthesis will follow

Synthesis will follow

I I Use multi

Use multi-

level optimization to handle with the

level optimization to handle with the complexity complexity

SLIDE 37

ASPDAC'01 Tutorial Jason Cong 37

I I Importance of Partitioning:

Importance of Partitioning:

N Conventional view: enables divide

Conventional view: enables divide-

and

and-

conquer

conquer

N DSM view:

DSM view: defines global and local interconnects defines global and local interconnects

D >> d !!!

Local Interconnect d Global Interconnect D

Role of Partitioning

SLIDE 38

ASPDAC'01 Tutorial Jason Cong 38

Need of Considering Retiming during Partitioning

Retiming/pipelining on global interconnects

I I Multiple clock cycles are needed to cross the chip

Multiple clock cycles are needed to cross the chip

I I Proper partitioning allows retiming to

Proper partitioning allows retiming to hide hide global global interconnect delays. interconnect delays.

same cutsize

f (A) = 8

Partitioning A

f (B) = 8

Partitioning B

f (B) = 8 f (A) = 6

SLIDE 39

ASPDAC'01 Tutorial Jason Cong 39

Sequential Arrival Time (SAT)

I I Definition

Definition [Pan et al, TCAD98]

[Pan et al, TCAD98]

N l

l( (v v) = max delay from PIs to ) = max delay from PIs to v v after opt. retiming under a given clock after opt. retiming under a given clock period period f f

N l

l( (v v) = max{ ) = max{l l( (u u) ) -

f

f · · w w( (u,v u,v) + ) + d d( (u,v u,v) + ) + d d( (v v)} )}

N Relation to retiming:

Relation to retiming: r r( (v v) = ) =  l l( (v v) / ) / f f   -

1

1

N Theorem:

Theorem: P P can be retimed to can be retimed to f f + max{ + max{d d( (e e)} iff )} iff l l(POs) (POs) ≤ ≤ f f u w v l(u) = 7 l(w) = 3 d(v) = 1, d(e) = 2, f = 5 l(v) = max{7-5·1+2+1, 3+2+1} = 6 u v l(u) w(u,v) d(v)

SLIDE 40

ASPDAC'01 Tutorial Jason Cong 40

I I Minimize SAT during partitioning/placement

Minimize SAT during partitioning/placement

I I Apply optimal retiming to the resulting solution (best

Apply optimal retiming to the resulting solution (best suitable for retiming) suitable for retiming)

I I Partitioning/placement with retiming can be applied

Partitioning/placement with retiming can be applied recursively to generate physical hierarchy recursively to generate physical hierarchy

I I Good news: SAT can be computed efficiently (linear

Good news: SAT can be computed efficiently (linear time in practice, quadratic time in the worst case) time in practice, quadratic time in the worst case)

I I Difficulty: Flattened

Difficulty: Flattened netlist netlist can be very large! can be very large!

N Solution: use multi

Solution: use multi-

level method

level method

Simultaneous Partitioning/Placement with Retiming

SLIDE 41

ASPDAC'01 Tutorial Jason Cong 41

Multi-level Partitioning

Coarsening Uncoarsening & Refinement Initial Partitioning I Iterative coarsening (clustering) to generate a multi-

level hierarchy

I Initial partitioning on the coarsest level I Iterative de-clustering and refinement

SLIDE 42

ASPDAC'01 Tutorial Jason Cong 42

I I Hierarchical approach: higher

Hierarchical approach: higher-

level design

level design constrains constrains lower lower-

level designs

level designs

N Not sufficient information at higher

Not sufficient information at higher-

level

level

N Mistake at higher level is impossible or costly to correct

Mistake at higher level is impossible or costly to correct

I I Multi

Multi-

level approach: finer

level approach: finer-

level design

level design refines refines coarse coarse-

level design

level design

N Converge to better solution as more details are considered

Converge to better solution as more details are considered

Hierarchical Approach vs Multi-Level Approach

SLIDE 43

ASPDAC'01 Tutorial Jason Cong 43

Example: Multi-Level Partitioning with Coarse Placement & Retiming

Timing analysis & cell move Timing analysis & cell move Next cluster level Timing analysis & cell move Next cluster level

I Bottom-up multi-level clustering I Top down cell move based multi-level partitioning I Sequential timing analysis at each level [Cong and Lim,

ICCAD00]

SLIDE 44

ASPDAC'01 Tutorial Jason Cong 44

Success of Multi-Level Approach

I I First used to solve partial differential equations (multi

First used to solve partial differential equations (multi-

grid method)

grid method)

I I Successfully applied to circuit partitioning (

Successfully applied to circuit partitioning (hMetis hMetis

[ [Karypis Karypis et al, 1997] et al, 1997])

)

N Best

Best partitioner partitioner for cut for cut-

size minimization

size minimization

I I Successfully applied to physical hierarchy generation

Successfully applied to physical hierarchy generation (HPM and GEO (HPM and GEO [Cong et al, DAC’00 & ICCAD’00]

[Cong et al, DAC’00 & ICCAD’00])

)

N 30

30-

40% delay reduction compared to

40% delay reduction compared to hMetis hMetis

I I Successfully applied to circuit placement

Successfully applied to circuit placement [Chan

[Chan et al, et al, ICCAD’00] ICCAD’00]

N 10x speed

10x speed-

up over

up over GordianL GordianL

SLIDE 45

ASPDAC'01 Tutorial Jason Cong 45

Experimental Results

0.2 0.4 0.6 0.8 1 1.2 1.4 delay cutsize wire runtime hMetis+RT+FL HPM+FL GEO

I Comparison with existing algorithms

N hMetis [DAC97] + retiming + slicing floorplan [Algo89] N HPM [DAC00] + slicing floorplan [Algo89] N GEO: simultaneous partitioning + coarse placement + retiming

Close to 40% delay reduction!

SLIDE 46

ASPDAC'01 Tutorial Jason Cong 46

Interconnect Planning

I I Physical Hierarchy Generation

Physical Hierarchy Generation

I I Floorplan

Floorplan/Coarse Placement with Interconnect /Coarse Placement with Interconnect Planning Planning

N N Example: Buffer Block Planning in

Example: Buffer Block Planning in Floorplanning Floorplanning

I I Interconnect Architecture Planning

Interconnect Architecture Planning

SLIDE 47

Demand of Buffers in Nanometer Designs

( Estimated based on NTRS’97 & [Davis-Meindl’97] )

Technology (um) 0.25 0.18 0.13 0.10 0.07 #buffer per chip 5k 25k 54k 230k 797k

I I Need to insert buffers in long global interconnects for

Need to insert buffers in long global interconnects for performance optimization performance optimization Source: [Cong’97, SRC Work Paper] http://www.src.org/research/frontier.dgw

SLIDE 48

ASPDAC'01 Tutorial Jason Cong 48

Buffer Block Planning Problem

[Cong-Kong-Pan, ICCAD’99] buffer block

I

Restriction from hard IP blocks Restriction from hard IP blocks

I

Implications on P/G routing Implications on P/G routing

I

Impact on Impact on floorplan floorplan configuration configuration => need to plan ahead for buffers. => need to plan ahead for buffers.

SLIDE 49

ASPDAC'01 Tutorial Jason Cong 49

Optimal Buffer Location Can Be Relaxed

I I Closed

Closed-

form

form formula of feasible region (FR) for formula of feasible region (FR) for inserting one buffer to meet delay constraint inserting one buffer to meet delay constraint 1 buffer driver CL

xmin l x xmax

x M A X K K K K K x M I N l K K K K K

m i n m a x

, , = − −         = + −         4 2 4 2

2 2 2 1 3 1 2 2 2 1 3 1

x x x ∈[ , ]

min max

SLIDE 50

ASPDAC'01 Tutorial Jason Cong 50

Feasible Region (FR) Is Very Large

I I Even under tight delay constraint, FR for BI can still

Even under tight delay constraint, FR for BI can still be very large! be very large!

2000 4000 6000 8000 10000

0.1 0.2 0.3 0.4

Delta um

6000um 7000um 8000um 9000um

Delay budget is (1+Delta) Topt (the best delay by

ptimal buffer

insertion) Delta FR 1% 19% 5% 43% 10% 60% 20% 86%

=> FR provides a lot of flexibility to plan buffer location

SLIDE 51

ASPDAC'01 Tutorial Jason Cong 51

Extension: 2D Feasible Region

I I FR extended to 2

FR extended to 2-

dimension with obstacles

dimension with obstacles

source sink

2-D FR Locus of min-delay BI (Restricted lines)

SLIDE 52

ASPDAC'01 Tutorial Jason Cong 52

Experimental Results of Buffer Block Planning

Buffer block planning reduces # buffer blocks, better meets timing constraints, and use smaller area

0.2 0.4 0.6 0.8 1 1.2 1.4 No-planning With planning

#nets that meet delay constraints #Buffer Block area

SLIDE 53

ASPDAC'01 Tutorial Jason Cong 53

Concluding Remarks

I I High

High-

performance designs in DSM technologies need

performance designs in DSM technologies need carefully interconnect planning carefully interconnect planning

I I Efficient interconnect performance estimation models

Efficient interconnect performance estimation models ( (IPEMs IPEMs) are important for interconnect planning ) are important for interconnect planning

I I Top

Top-

level partitioning defines global and local

level partitioning defines global and local interconnects, and impacts performance significantly interconnects, and impacts performance significantly

I I Retiming and pipelining over global interconnects are

Retiming and pipelining over global interconnects are necessary for multi necessary for multi-

gigahertz designs

gigahertz designs

I I A clever combination of partitioning and retiming can

A clever combination of partitioning and retiming can hide (some) global interconnect delays hide (some) global interconnect delays

I I Buffer block planning help to reduce complexity while

Buffer block planning help to reduce complexity while achieving good performance achieving good performance

SLIDE 54

ASPDAC'01 Tutorial Jason Cong 54

Acknowledgments

I I Thanks to Sung Lim, David Pan, and

Thanks to Sung Lim, David Pan, and Xin Xin Yuan at Yuan at UCLA for their help with slides UCLA for their help with slides

I I Thanks to SRC, MARCO/GSRC, and Intel Corp. for

Thanks to SRC, MARCO/GSRC, and Intel Corp. for their supports of a number of research projects covered their supports of a number of research projects covered in this tutorial in this tutorial

I I Updated slides in PDF file will be available at