An Interconnect-Centric Design Flow for Nanometer Technologies - - PDF document

an interconnect centric design flow for nanometer
SMART_READER_LITE
LIVE PREVIEW

An Interconnect-Centric Design Flow for Nanometer Technologies - - PDF document

An Interconnect-Centric Design Flow for Nanometer Technologies Professor Jason Cong Professor Jason Cong <cong cong@cs.ucla. @cs.ucla.edu edu> UCLA Computer Science Department UCLA Computer Science Department Los Angeles, CA 90095


slide-1
SLIDE 1

An Interconnect-Centric Design Flow for Nanometer Technologies

Professor Jason Cong Professor Jason Cong

<cong cong@cs.ucla. @cs.ucla.edu edu> UCLA Computer Science Department UCLA Computer Science Department Los Angeles, CA 90095 Los Angeles, CA 90095 http:// http://cadlab cadlab.cs cs.ucla ucla.edu edu/~ /~cong cong

VLSI-TSA'99 Jason Cong 2

Gate Delays vs. Interconnect Delays

Source: National Technology Roadmap of Semiconductors (1997)

slide-2
SLIDE 2

VLSI-TSA'99 Jason Cong 3

Interconnect-Centric Design Methodology

device interconnect device interconnect Programs Data/Objects Programs Data/Objects I Proposed transition

Proposed transition

I Analogy

Analogy

device/function centric interconnect/communication centric

VLSI-TSA'99 Jason Cong 4

Interconnect-Centric Design Flow

I Key steps in an interconnect

Key steps in an interconnect-centric design flow: centric design flow:

N Interconnect Planning

Interconnect Planning

N Interconnect Synthesis

Interconnect Synthesis

N Interconnect Layout

Interconnect Layout

I Other supporting tools to enable an interconnect

Other supporting tools to enable an interconnect- centric design flow centric design flow

N Interconnect performance estimation

Interconnect performance estimation

N Interconnect performance verification

Interconnect performance verification

slide-3
SLIDE 3

VLSI-TSA'99 Jason Cong 5

Outline of the Talk

I Interconnect Synthesis

Interconnect Synthesis

I Interconnect Performance Estimation

Interconnect Performance Estimation

I Interconnect Planning

Interconnect Planning

VLSI-TSA'99 Jason Cong 6

Interconnect Synthesis

Constraints:

  • Delay
  • Skew
  • Signal integrity

... Spacing Sizing Topology Optimized interconnect designs:

I Automatic solutions guided by accurate interconnect

Automatic solutions guided by accurate interconnect models models Buffer insertion

slide-4
SLIDE 4

Example: Single-Net Optimal Wire Sizing (OWS) [Cong-Leung, ICCAD’93]

I Given:

Given: A set of possible wire widths { W A set of possible wire widths { W

1, W

, W2, …, , …, Wr }

I Find:

Find: An optimal wire width assignment to minimize An optimal wire width assignment to minimize distributed RC delay distributed RC delay

Wiresizing Optimization VLSI-TSA'99 Jason Cong 8

Example: Global Interconnect Sizing and Spacing (GISS) [Cong et al, ICCAD’97]

I Given:

Given:

N Initial layout of multiple nets

Initial layout of multiple nets

N Critical sinks and their

Critical sinks and their criticalities criticalities

N Capacitance model and design rules

Capacitance model and design rules

I Output:

Output:

N Sizing and spacing of every net to minimize RC

Sizing and spacing of every net to minimize RC delays with consideration of coupling cap. delays with consideration of coupling cap.

Spacing Sizing

slide-5
SLIDE 5

Capacitance Model

I 2.5D capacitance model

2.5D capacitance model [Cong

Cong et al, DAC’97] et al, DAC’97]

N Consider: C

Consider: Ca (area), (area), Cf (fringing) and (fringing) and Cx (coupling) (coupling)

N Build capacitance table from 3D field solver (

Build capacitance table from 3D field solver (FastCap FastCap)

N Table lookup by interpolation and extrapolation

Table lookup by interpolation and extrapolation

cf

Ca Cx

VLSI-TSA'99 Jason Cong 10

Main Approaches to GISS

I Heuristic: Optimize one net at a time: bottom

Heuristic: Optimize one net at a time: bottom- up dynamic programming (optimal for one net) up dynamic programming (optimal for one net)

I Better approach: Compute upper and lower

Better approach: Compute upper and lower bounds of opt. wire widths/ bounds of opt. wire widths/spacings spacings of all nets

  • f all nets

N Extended local refinement (ELR) using generalized

Extended local refinement (ELR) using generalized CH CH-posynomial posynomial formulation formulation

N Or iterative bound refinement (BR)

Or iterative bound refinement (BR)

N In practice, lower and upper bounds meet most of

In practice, lower and upper bounds meet most of time => optimal solution. time => optimal solution.

slide-6
SLIDE 6

VLSI-TSA'99 Jason Cong 11

GISS Optimization Results

Center spacing Average Delays(ns) MIN OWS GISS/S GISS/M 2 x pitch 1.51 1.26 (-17%) 0.82 (-46%) 0.76 (-50%) 3 x pitch 1.33 0.73 (-45%) 0.56 (-58%) 0.50 (-62%) 4 x pitch 1.28 0.46 (-64%) 0.45 (-65%) 0.40 (-69%) 5 x pitch 1.25 0.38 (-70%) 0.37 (-70%) 0.35 (-72%) I 16

16-bit 10mm bus structure equally spaced, with 5 bit 10mm bus structure equally spaced, with 5 different center different centerspacings spacings from 2x to 5x min. pitch from 2x to 5x min. pitch

I pitch = min. width + min.spacing

pitch = min. width + min.spacing

I For non

For non-equal net weights, GISS/M shall have more equal net weights, GISS/M shall have more advantage than GISS/S advantage than GISS/S

VLSI-TSA'99 Jason Cong 12

UCLA TRIO Package (Tree, Repeater, Interconnect Optimization)

I Synthesis/optimization capabilities

Synthesis/optimization capabilities

N Interconnect topology optimization

Interconnect topology optimization

N Optimal buffer insertion

Optimal buffer insertion

N Wiresizing

Wiresizing optimization

  • ptimization

N Global interconnect sizing and spacing

Global interconnect sizing and spacing

N Simultaneous driver, buffer, and interconnect sizing

Simultaneous driver, buffer, and interconnect sizing

N Simultaneous topology generation with buffer insertion and

Simultaneous topology generation with buffer insertion and wiresizing wiresizing

N ...

...

I Efficient polynomial

Efficient polynomial-time optimal/near time optimal/near-optimal algorithms

  • ptimal algorithms

I Interconnect performance can be improved by up to 7x !

Interconnect performance can be improved by up to 7x !

I Available on the web:

Available on the web: http:// http://cadlab cadlab.cs cs.ucla ucla.edu edu/~trio /~trio Demo at DAC’99 Demo at DAC’99

slide-7
SLIDE 7

Impact of Interconnect Optimization

  • -For a 2cm Global Interconnect Using the TRIO Package

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0.25 0.18 0.15 0.13 0.1 0.07

Technology (u m)

Delay (ns)

2cm DS 2cm BIS 2cm BISWS G DS: Driver Sizing only G BIS: Buffer Insertion

and Sizing

G BISWS: Simultaneous

Buffer Insertion/Sizing and Wiresizing

5x ~ 7x performance improvement!

VLSI-TSA'99 Jason Cong 14

Interconnect Synthesis in Layout Design Flow

Chip-planning, Floorplaning, Global Int. Planning and Optimization Timing Driven Placement Delay Budgeting Performance Driven Global Routing with Interconnect Optimization Detailed Routing with Variable Width and Spacing Topology Optimization Buffer Insertion Device Sizing Wiresizing

. . . . .

Topology Optimization Buffer Insertion Device Sizing Wiresizing

. . . . .

Interconnect Optimizations Library (e.g. TRIO)

slide-8
SLIDE 8

VLSI-TSA'99 Jason Cong 15

Outline of the Talk

I Interconnect Synthesis

Interconnect Synthesis

I Interconnect Performance Estimation

Interconnect Performance Estimation

I Interconnect Planning

Interconnect Planning

Interconnect Performance Estimation

I Problem: Estimate the optimized interconnect

Problem: Estimate the optimized interconnect delay, area, etc., without actually running the delay, area, etc., without actually running the

  • ptimization algorithms (such as TRIO)!
  • ptimization algorithms (such as TRIO)!

G Input G0 Csn Cs2 Cs1 Sn S1 S2

slide-9
SLIDE 9

Needs for Interconnect Performance Estimation Models

I Efficiency

Efficiency

N need to explore many micro

need to explore many micro-architectures/ architectures/floorplans floorplans => require to process > 1 million nets/second => require to process > 1 million nets/second

N cannot afford actual synthesis/optimization (1

cannot afford actual synthesis/optimization (1-100 nets/second) 100 nets/second)

I Abstraction

Abstractionto hide detailed design information to hide detailed design information

N granularity of wire segmentation

granularity of wire segmentation

N number of wire widths, buffer sizes, ...

number of wire widths, buffer sizes, ...

I Explicit relation

Explicit relationto enable optimal design decision at high to enable optimal design decision at high levels levels

I Result:

Result: very efficient (constant very efficient (constant-time) estimation models time) estimation models for various interconnect optimization operations for various interconnect optimization operations

Example: Delay/Area Estimation under OWS

I Closed

Closed-form form delay estimation formula delay estimation formula

l l c rc R c R l W l l W l C l R T

f a d f d L d

  • ws

⋅       + + =

+

) ( 2 ) ( ) , , (

2 1 2 2 1

α α α α

where

a

rc

4 1 1 =

α

L d a

C R rc

2 1 2 =

α

, W(x) is Lambert’s W function defined as we

x

w =

I Closed

Closed-form form area estimation formula area estimation formula

l c R C l c r C l R A

a d L f L d

  • ws

⋅ + = 2 ) 2 ( ) , , (

slide-10
SLIDE 10

VLSI-TSA'99 Jason Cong 19

Delay Comparison of OWS model vs. TRIO

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

4000 8000 12000 16000 20000

length(um) ns

Model TRIO I OWS delay model consistently matches TRIO.

OWS delay model consistently matches TRIO.

I 0.10um technology from NTRS’97. Driver is 100x min. To run

0.10um technology from NTRS’97. Driver is 100x min. To run TRIO, 40 discrete wire widths are used with the max width set TRIO, 40 discrete wire widths are used with the max width set to be 40x min width. to be 40x min width.

VLSI-TSA'99 Jason Cong 20

Average Width (Area) Comparison

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

4000 8000 12000 16000 20000

length(um) width(um)

Model TRIO

I

Area estimation model for OWS almost exactly matches TRIO. Area estimation model for OWS almost exactly matches TRIO.

slide-11
SLIDE 11

Example: Delay Estimation Model for BIWS

I Problem: estimate interconnect delay with optimal buffer

Problem: estimate interconnect delay with optimal buffer insertion and wire sizing (BIWS) insertion and wire sizing (BIWS)

I Critical length

Critical lengthfor BIWS for BIWS:

N threshold length over which buffer insertion provides additional

threshold length over which buffer insertion provides additional delay reduction over optimal wire delay reduction over optimal wire-sizing (OWS) sizing (OWS)

I Critical length for BIWS can be computed efficiently

Critical length for BIWS can be computed efficiently

Critical Lengths of Un-Buffered Wires

Technology (um) 0.25 0.18 0.15 0.13 0.10 0.07 b=10x

4.12 3.80 3.97 3.61 2.92 2.08

b=50x

6.40 5.81 6.01 5.51 4.45 3.30

b=100x

7.47 6.83 7.04 6.39 5.30 3.91

b=200x

8.65 7.92 8.14 7.43 6.35 4.49

b=500x

9.98 9.10 9.30 8.57 7.13 5.21

unit: mm

Without wire sizing [Otten ISPD’98, Otten-Brayton DAC’98]

  • Min. WS

2.52 2.23 2.14 1.94 1.50 1.43

With optimal wire sizing [Cong-Pan, IWLS’98/ASP-DAC’99]

slide-12
SLIDE 12

Example: Delay Estimation Model for BIWS (Cont’d)

g biws biws

t l T + ⋅ τ =

biws

τ

is the slope, and can be obtained from

  • ptimal wire sizing for critical length

I Linear delay estimation model for BIWS:

Linear delay estimation model for BIWS: Comparison of BIWS Model vs. TRIO

Delay Modeling 0.2 0.4 0.6 0.8 1

4000 8000 12000 16000 20000

length(um) ns

Model TRIO

n Rd0 = rg /10, CL = cg x 10 , buffer type is 100 x min. n For expt., max. wire width is 20x min. width, wire is segmented in

every 100um.

slide-13
SLIDE 13

VLSI-TSA'99 Jason Cong 25

Outline of the Talk

I Interconnect Synthesis

Interconnect Synthesis

I Interconnect Performance Estimation

Interconnect Performance Estimation

I Interconnect Planning

Interconnect Planning

VLSI-TSA'99 Jason Cong 26

Interconnect Planning

I Interconnect architecture planning (pre

Interconnect architecture planning (pre-design) design)

N Decide within freedom of fabrication technology:

Decide within freedom of fabrication technology:

! number of routing layers

number of routing layers

! metal and isolation material at each layer

metal and isolation material at each layer

! thickness of each metal and isolation layer

thickness of each metal and isolation layer

! nominal width and spacing on each layer

nominal width and spacing on each layer

! vertical interconnection schemes (via structure?) ...

vertical interconnection schemes (via structure?) ...

I Interconnect planning with RTL

Interconnect planning with RTL-floorplan floorplan

I Interconnect planning with physical

Interconnect planning with physical-level floorplan level floorplan

slide-14
SLIDE 14

VLSI-TSA'99 Jason Cong 27

Interconnect Planning (cont’d)

I Interconnect architecture planning (pre

Interconnect architecture planning (pre-design) design)

I Interconnect planning with RTL

Interconnect planning with RTL-floorplan floorplan

N Define global and local interconnects

Define global and local interconnects

N Estimate overall interconnect distribution

Estimate overall interconnect distribution

N Guide RTL

Guide RTL-level and logic level and logic-level synthesis/optimization level synthesis/optimization

! Re

Re-partition of design hierarchy partition of design hierarchy

! Logic replication

Logic replication

! Retiming and pipelining

Retiming and pipelining

! …...

…... I Interconnect planning with physical

Interconnect planning with physical-level floorplan level floorplan

VLSI-TSA'99 Jason Cong 28

Interconnect Planning (cont’d)

I Interconnect architecture planning (pre

Interconnect architecture planning (pre-design) design)

I Interconnect planning with RTL

Interconnect planning with RTL-floorplan floorplan

I Interconnect planning with physical

Interconnect planning with physical-level level floorplan floorplan

! Interconnect topologies

Interconnect topologies

! Wire ordering

Wire ordering

! Wire width and spacing

Wire width and spacing

! Number of buffers and their locations

Number of buffers and their locations

! ……

……

slide-15
SLIDE 15

VLSI-TSA'99 Jason Cong 29

Example: Optimal Wire-Width Planning

I Given:

N Certain technology N Wire length distribution per layer

I Find:

N A small set of “globally optimal” widths per layer N Performance/Area optimization

I Motivation

N Simplify interconnect optimization N Simply detailed routing, layout extraction, ...

Overall Flow

For each metal layer i Assign length range lmin and lmax; Find a small set of optimal widths W to minimize

Φ( , , ) ( ) ( , )

min max

min max

r r W l l l f W l dl

l l

= ⋅

∫ λ

h f(W, l): the objective function to be minimized by the design for wire length l, using W hλ (l): the weight function for wire lengthl

Method: Analytical or numerical

slide-16
SLIDE 16

Objective in Our Study

(performance only) (performance-driven and area-saving)

  • r

f W l A W l T W l A area T delay

j k

( , ) ( , ) ( , ) : : r r r = ⋅ f W l T W l ( , ) ( , ) r r = f W l A W l T W l ( , ) ( , ) ( , ) r r r = ⋅

4

Recommendation for Future Tech.

I 2-width design under objective function of AT

width design under objective function of AT4

I Wiring hierarchy for both performance and density !

Wiring hierarchy for both performance and density !

Technology (um) 0.25 0.18 0.13 0.10 0.07

Tier1 Range (mm) 0-2.50 0-1.80 0-1.30 0-1.00 0-0.70 W (um) 0.25 0.18 0.13 0.10 0.07 Tier2 Range(mm) 2.50-6.50 1.80-5.85 1.30-3.27 1.00-2.84 0.70-2.30 W1(um) 0.25 0.18 0.13 0.10 0.08 W2(um) 0.50 0.36 0.26 0.20 0.16 Tier3 Range(mm) 6.50-17.3 5.85-19.0 3.27-8.23 2.84-8.04 2.30-7.57 W1(um) 0.65 0.47 0.24 0.22 0.23 W2(um) 1.30 0.94 0.48 0.44 0.46 Tier4 Range(mm)

  • 8.23-20.7 8.04-22.8 7.57-24.9

W1(um)

  • 0.98

1.00 1.06 W2(um)

  • 1.96

2.00 2.12

Strawman [Otten- Brayton, DAC’98] 1.0um 2.0um

slide-17
SLIDE 17

Two Simple Wire Sizing Schemes

0.5 1 1.5 2 2.5

4000 8000 12000 16000 20000

length(um) ns

Tier1-1WS Tier1-2WS Tier1-OWS Tier4-1WS Tier4-OWS

I 1-WS and 2

WS and 2-WS have less than 10% difference from OWS for WS have less than 10% difference from OWS for length <4mm in Tier1 length <4mm in Tier1

I Both 1

Both 1-WS and 2 WS and 2-WS work well in Tier4 up to chip size WS work well in Tier4 up to chip size

A Performance-Driven, Area-Saving Metric

0.01 0.1 1 10

0.5 1 1.5 2 2.5 3 3.5 4

width(um) metric

T AT^4 AT^3 AT^2 AT

Optimal width for delay T

  • Opt. width for AT4. Only increase delay by 10%, save area by 60%!
  • 0.10um tech;
  • Top layer pair;
  • Length range

8 -23 mm;

  • Assume uniform

distribution;

  • Metric: integral of

T, AT, AT2, …, AT4

  • Driver/load 100x

min gate

slide-18
SLIDE 18

Experimental Setting

I For each metal pair (tier), assume certain wire length

range

I Assume the max length in tier1 is 10,000x feature size,

and top tier is Ledge (chip dimension) [Fisher+’98]

I Intermediate tier length range follows a geometric

sequence

I Representative driver size for each metal layer (10x,

40x, 100x, and 250x for tiers 1-4)

1 2.84 8.04 22.8 mm

A Rather Surprising Result: 2 Widths /Per Layer are Sufficient! [DAC’99]

I Assumptions: 0.10 um process, layers 7&8 ( 8.04 -- 22.8 mm),

under AT4 metric, limited driver size variation size per layer

I 2-width design superior than 1-width

N delay reduction up to 12.4% N area saving up to 48% !

I 2-width design comparable to many-width

N Avg. delay less than 5% and Max. delay less than 7% N Area difference less than 4.7%

avg-d max-erravg-w avg-d max-err avg-w avg-d max-err avg-w 1-width 0.245 28% 1.98 0.177 16% 1.83 0.143 6% 1.63 2-width 0.215 7% 1.08 0.167 5.90% 1.23 0.14 4% 1.41 m-width 0.204 0% 1.03 0.159 0% 1.19 0.136 0% 1.38 pitch-sp=2um pitch-sp=2.9um pitch-sp=3.8um scheme

slide-19
SLIDE 19

VLSI-TSA'99 Jason Cong 37

Summary

I Paradigm shift

Paradigm shift

N Device/function

Device/function-centric centric => interconnect/communication => interconnect/communication-centric centric

I Key components in an interconnect

Key components in an interconnect-centric centric design flow design flow

N Interconnect planning

Interconnect planning

N Interconnect synthesis

Interconnect synthesis

N Interconnect layout

Interconnect layout

I Also need estimation, simulation, and verification tools

Also need estimation, simulation, and verification tools at each stage for interconnect performance and signal at each stage for interconnect performance and signal integrity integrity

VLSI-TSA'99 Jason Cong 38

Acknowledgements

I Thanks for the supports from

Thanks for the supports from

N Semiconductor Research Corporation (SRC)

Semiconductor Research Corporation (SRC)

N National Science Foundation (NSF)

National Science Foundation (NSF)

N Defense Advanced Research Project Agency

Defense Advanced Research Project Agency (DARPA) (DARPA)

N Intel Corporation

Intel Corporation

I More information:

More information:

N http://

http://cadlab cadlab.cs cs.ucla ucla.edu edu/~ /~con cong

slide-20
SLIDE 20

“Logic Volume” within critical lengths

Technology (um) 0.25 0.18 0.15 0.13 0.10 0.07 2-NAND (um2)

7.80 4.04 3.00 2.18 1.28 0.64

b=10x

0.55 0.89 1.31 1.49 1.66 1.69

b=50x

1.31 2.09 3.01 3.48 3.87 4.25

b=100x

1.79 2.88 4.13 4.68 5.48 5.97

b=200x

2.4 3.88 5.52 6.33 7.87 7.88

b=500x

3.19 5.12 7.21 8.42 9.93 10.6

  • Defined as the number of min 2-input NAND gates

that can be packed within the area of lc/2 * lc/2 unit: million

VLSI-TSA'99 Jason Cong 40

Another Examp: Buffer Block Planning

I Problem: automatically generates buffer blocks

Problem: automatically generates buffer blocks during physical during physical-level level floorplan floorplan

I Motivation:

Motivation:

N Avoid buffer over hard IP

Avoid buffer over hard IP-blocks blocks

N Power/ground network sharing among buffers

Power/ground network sharing among buffers

N More regular layout, etc.

More regular layout, etc.

Logic Blocks Buffer Blocks

slide-21
SLIDE 21

VLSI-TSA'99 Jason Cong 41

Experimental Result: Number of BB

I RDM

RDM: a buffer is randomly assigned to a feasible location : a buffer is randomly assigned to a feasible location

I BBP

BBP: buffers are clustered appropriately : buffers are clustered appropriately

I RES

RES: Restricted (delay : Restricted (delay-minimal) buffer insertion point minimal) buffer insertion point

I FR

FR: feasible buffer region for delay constraints : feasible buffer region for delay constraints

I Our buffer block planning (B

Our buffer block planning (B -P) algorithm can reduce the P) algorithm can reduce the number of buffer blocks to 1/10~1/20 of those from RDM number of buffer blocks to 1/10~1/20 of those from RDM

Circuit

R D M / R E S RDM/FR BBP/RES BBP/FR

Apte

222 248 53 29

Xerox

460 515 83 44

H p

323 329 67 33

Ami33

338 376 97 42

Ami49

435 479 90 44

playout

763 824 101 47

VLSI-TSA'99 Jason Cong 42

Interconnect Layout

I Need a multi

Need a multi-layer general layer general-area router area router

N gridless

gridless

N flexible (variable widths within the same segment, variable

flexible (variable widths within the same segment, variable spacings spacings for each pair of nets) for each pair of nets)

N efficient

efficient

I Will leverage our current research on

Will leverage our current research on gridless gridless routing routing

N Use of implicit graph representation

Use of implicit graph representation

N Use of computational geometry techniques

Use of computational geometry techniques

N Highly scalable and flexible

Highly scalable and flexible