Physical Hierarchy Generation Jason Cong UCLA Computer Science - - PDF document

physical hierarchy generation
SMART_READER_LITE
LIVE PREVIEW

Physical Hierarchy Generation Jason Cong UCLA Computer Science - - PDF document

Physical Hierarchy Generation Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong Outline Global interconnects in nanometer technologies Interconnect-centric design


slide-1
SLIDE 1

Physical Hierarchy Generation

Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong

Jason Cong 08/15/2001 2

Outline

Global interconnects in nanometer

technologies

Interconnect-centric design flow Physical hierarchy generation

Motivation Approaches

Results and on-going work

slide-2
SLIDE 2

Jason Cong 08/15/2001 3

Global/Local Interconnect Delays vs. Gate Delays

0.01 0.1 1 10

0.25 0.18 0.15 0.13 0.1 0.07

Technology generation (um)

Delay (ns)

1mm 2cm un-opt 2cm opt Intrinsic gate delay

Optimization is obtained buffer insertion/sizing and wire sizing

Jason Cong 08/15/2001 4

Clock cycles required for traveling 2cm line under BIWS (buffer insertion and wire sizing)

1 G Hz 3 G Hz 5 G Hz 0.07 um 0.10 um 0.13 um 0.18 ym 0.25 um 1 2 3 4 5

clock cycle(s)

Estimated by IPEM On NTRS’97 technology Driver size: 100x min gate Receiver size: 100x min gate Buffer size: 100x min gate

slide-3
SLIDE 3

Jason Cong 08/15/2001 5

How Far Can We Go in Each Clock Cycle

7.52 15.04 22.56 24.9 (mm) 1 clock 2 clock 3 clock 4 clock 5 clock 6 clock 7 clock

NTRS’97 0.07um Tech 5 G Hz across-chip clock 620 mm2 (24.9mm x

24.9mm)

IPEM BIWS estimations

Buffer size: 100x Driver/receiver size: 100x

From corner to corner:

7 clock cycles Jason Cong 08/15/2001 6

Two Important Implications

Interconnects determine the system

performance

Need multiple clock cycles to cross the global

interconnects in giga-hertz designs Interconnect/communication-centric design methodology Pipelining/retiming on global interconnects

slide-4
SLIDE 4

Jason Cong 08/15/2001 7

Interconnect-Centric Design Methodology

device interconnect device interconnect Programs Data/Objects Programs Data/Objects Proposed transition Analogy

device/function centric interconnect/communication centric

Jason Cong 08/15/2001 8

Interconnect-Centric IC Design Flow Under Development at UCLA

Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view

HDM

Synthesis and Placement under Physical Hierarchy Interconnect Planning

  • Physical Hierarchy Generation
  • Foorplan/Coarse Placement with Interconnect Planning
  • Interconnect Architecture Planning

Interconnect Optimization (TRIO)

  • Topology Optimization with Buffer Insertion
  • Wire sizing and spacing
  • Simultaneous Buffer Insertion and Wire Sizing
  • Simultaneous Topology Construction

with Buffer Insertion and Wire Sizing

Interconnect Layout

Route Planning Point-to-Point Gridless Routing

Interconnect Performance Estimation Models (IPEM)

  • OWS, SDWS, BISWS

Interconnect Synthesis

Performance-driven Global Routing Pseudo Pin Assignment under Noise Control

slide-5
SLIDE 5

Jason Cong 08/15/2001 9

Interconnect-Centric IC Design Flow Under Development at UCLA

Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view

HDM

Synthesis and Placement under Physical Hierarchy Interconnect Planning

  • Physical Hierarchy Generation
  • Foorplan/Coarse Placement with Interconnect Planning
  • Interconnect Architecture Planning

Interconnect Optimization (TRIO)

  • Topology Optimization with Buffer Insertion
  • Wire sizing and spacing
  • Simultaneous Buffer Insertion and Wire Sizing
  • Simultaneous Topology Construction

with Buffer Insertion and Wire Sizing

Interconnect Layout

Route Planning Point-to-Point Gridless Routing

Interconnect Performance Estimation Models (IPEM)

  • OWS, SDWS, BISWS

Interconnect Synthesis

Performance-driven Global Routing Pseudo Pin Assignment under Noise Control

Interconnect Synthesis

Performance-driven Global Routing Pseudo Pin Assignment under Noise Control

Interconnect Layout

Route Planning Point-to-Point Gridless Routing

Interconnect Performance Estimation Models (IPEM)

  • OWS
  • SDWS
  • BISWS

Interconnect Optimization (TRIO)

  • Topology Optimization with

Buffer Insertion

  • Wire sizing and spacing
  • Simultaneous Buffer Insertion

and Wire Sizing

  • Simultaneous Topology Construction

with Buffer Insertion and Wire Sizing

Interconnect Planning

  • Physical Hierarchy Generation
  • Foorplan/Coarse Placement with

Interconnect Planning

  • Interconnect Architecture Planning

Synthesis and Placement under Physical Hierarchy

Jason Cong 08/15/2001 10

Interconnect-Centric IC Design Flow Under Development at UCLA

Architecture/Conceptual-level Design Design Specification Final Layout abstraction Structure view Functional view Physical view Timing view

HDM

Synthesis and Placement under Physical Hierarchy Interconnect Planning

  • Physical Hierarchy Generation
  • Foorplan/Coarse Placement with Interconnect Planning
  • Interconnect Architecture Planning

Interconnect Optimization (TRIO)

  • Topology Optimization with Buffer Insertion
  • Wire sizing and spacing
  • Simultaneous Buffer Insertion and Wire Sizing
  • Simultaneous Topology Construction

with Buffer Insertion and Wire Sizing

Interconnect Layout

Route Planning Point-to-Point Gridless Routing

Interconnect Performance Estimation Models (IPEM)

  • OWS, SDWS, BISWS

Interconnect Synthesis

Performance-driven Global Routing Pseudo Pin Assignment under Noise Control

slide-6
SLIDE 6

Jason Cong 08/15/2001 11

Interconnect Planning

  • Physical Hierarchy Generation
  • Floorplan/Coarse Placement with

Interconnect Planning

  • Interconnect Architecture Planning

Jason Cong 08/15/2001 12

Physical Hierarchy Generation

Motivation:

  • Designs are hierarchical due to high complexity
  • Design specification (in HDL) defines logic hierarchy
  • The common practice partitions the design following

the logic hierarchy

  • But logic hierarchy may not be suitable to be embedded
  • n a 2D silicon surface, resulting poor interconnect
slide-7
SLIDE 7

Jason Cong 08/15/2001 13

Example of Logic Hierarchy

module cpu(pj_su, pj_boot8, …); input …;

  • utput …;

fpu fpu(.fpain (iu_rs2_e), .fpbin(iu_rs1_e), .fpop(fpop), .fpbusyn(fp_rdy _e), .fpkill(iu_kill_fpu), .fpout(fpu_data_e), .clk (clk), …); pcsu pcsu(.pj_clk_out(pj_clk _out), …); smu smu(.i u_optop_in(iu_optop_din), …); dtag_shell dtag_shell(.tag_in(dcu_tag_in), …); dcram_shell dcram_shell(.data_in({dcu_din_e[31], …); dcu dcu( .biu_data(pj_datain ), …); itag_shell itag_shell(.icu_tag_in(icu_tag_in), …); icram_shell icram_shell(.icu_din(icu_din), …); icu icu(.biu_data(pj_datain), …); iu iu(.iu_data_vld(iu_data_vld ), …); endmodule

Integer Unit (IU) ICRAM

DCRAM

SMU DCU

FPU MEMORY ICU itag

dtag PCSU SRAM latches

Verilog

Jason Cong 08/15/2001 14

Example of Logic Hierarchy in Final Layout

By courtesy of IBM (Tony Drumm)

slide-8
SLIDE 8

Jason Cong 08/15/2001 15

Example of Logic Hierarchy in Final Layout

By courtesy of IBM (Tony Drumm)

Jason Cong 08/15/2001 16

What Have We Learned?

Logic hierarchy may not map well to

physical hierarchy

Floorplanning of logic blocks in RT-level

may be a bad idea

Alternatives?

Synthesis under physical hierarchy!

slide-9
SLIDE 9

Jason Cong 08/15/2001 17

Physical Hierarchy Generation Problem Formulation

Hard IP Soft module Same color for modules of the same logic hierarchy Logical Hierarchy Assign modules to physical hierarchy with interconnect estimation and optimization

Jason Cong 08/15/2001 18

Impact of Physical Hierarchy Generation

Define the Global Interconnects

Examples: Global interconnects defined by two different physical hierarchy Critical path Latch

slide-10
SLIDE 10

Jason Cong 08/15/2001 19

Synthesis under Physical Hierarchy

A=3 D=4 A=4 D=3 Latch Alternative Architecture Block Selection Re-Synthesis and Retiming Critical path

Jason Cong 08/15/2001 20

Difficulties in Physical Hierarchy Generation

How to consider retiming/pipelining over

global interconnect

How to handle the high complexity of “almost

flattened” designs Use of the concepts of sequential arrival/required times Use multi-level optimization techniques

slide-11
SLIDE 11

Jason Cong 08/15/2001 21

Need of Considering Retiming during Placement

  • Retiming/pipelining on global interconnects

Multiple clock cycles are needed to cross the chip Proper placement allows retiming to hide global

interconnect delays.

Placement 1 Before retiming, ? = 5.0 a b c d After retiming, ? = 3.0 Before retiming, ? = 4.0 a c b d Placement 2 d(v)=1, WL=6, d(e) ? WL d(v)=1, WL=6, d(e) ? WL Better Initial Placement !!

Jason Cong 08/15/2001 22

Need of Considering Retiming during Placement

  • Retiming/pipelining on global interconnects

Multiple clock cycles are needed to cross the chip Proper placement allows retiming to hide global

interconnect delays.

Placement 1 Before retiming, ? = 5.0 a b c d After retiming, ? = 3.0 Before retiming, ? = 4.0 a c b d After retiming, ? = 4.0 Placement 2 d(v)=1, WL=6, d(e) ? WL d(v)=1, WL=6, d(e) ? WL Better Initial Placement !!

slide-12
SLIDE 12

Jason Cong 08/15/2001 23

Sequential Arrival Time (SAT)

Definition [Pan et al, TCAD98]

l(v) = max delay from PIs to v after opt. retiming under a

given clock period f

l(v) = max{l(u) - f · w(u,v) + d(u,v) + d(v)} Relation to retiming: r(v) = ?l(v) / f ? - 1 Theorem: P can be retimed to f + max{d(e)} iff l(POs) ? f

u v l(u) w(u,v) d(v) u w v l(u) = 7 l(w) = 3 d(v) = 1, d(e) = 2, f = 5 l(v) = max{7-5·1+2+1, 3+2+1} = 6

Jason Cong 08/15/2001 24

Sequential Arrival Time (SAT)

With loops, problem is difficult

Topological order does not exist! Start with a min l-value for each node and iteratively improve

it

Convergence is guaranteed in O(n) iterations if the circuit can

be retimed to the target cycle time

Outline of our approach

SAT(PI) = 0, SAT(others) = -? Relax one vertex at a time and update l-values Complexity is O(VE)

slide-13
SLIDE 13

Jason Cong 08/15/2001 25

Sequential Arrival Time (SAT)

d(v)=1, d(e)=2

Is ? = 4.5 possible ? Iter# a b c d e f g

  • ?
  • ?
  • ?
  • ?
  • ?

1

  • 1.5
  • ?
  • ?
  • ?
  • ?

2

  • 1.5

1.5 1.5

  • ?
  • ?

3

  • 1.5

1.5 4.5 4

  • 1.5

1.5 4.5 5

  • 1.5

1.5 4.5 Cycle time 4.5 is possible as l(g) ? 4.5 a b c d e f g

Jason Cong 08/15/2001 26

Sequential Arrival Time (SAT)

d(v)=1, d(e)=2

Is ? = 4.5 possible ? Iter# a b c d e f g

  • ?
  • ?
  • ?
  • ?
  • ?

1

  • 1.5
  • ?
  • ?
  • ?
  • ?

2

  • 1.5

1.5 1.5

  • ?
  • ?

3

  • 1.5

1.5 4.5 4

  • 1.5

1.5 4.5 5

  • 1.5

1.5 4.5 Cycle time 4.5 is possible as l(g) ? 4.5 a b c d e f g

slide-14
SLIDE 14

Jason Cong 08/15/2001 27

Multi-Level Framework

Coarsening Uncoarsening & Refinement (optimization)

Problem sizes

  • Multi-level coarsening generates smaller problem sizes for top levels

faster optimization on top levels

  • Different levels explore different aspects of the solution space
  • Refinement on good solutions from coarser levels can be fast and

simple with good solution quality Levels

Jason Cong 08/15/2001 28

Successes of Multi-Level Approach

– First used to solve partial differential equations (multi- grid method) – Successfully applied to circuit partitioning (hMetis [Karypis et al, 1997])

– Best partitioner for cut-size minimization

– Successfully applied to physical hierarchy generation (HPM and GEO [Cong et al, DAC’00 & ICCAD’00])

– 30-40% delay reduction compared to hMetis

– Successfully applied to circuit placement [Chan et al, ICCAD’00]

– 10x speed-up over GordianL

slide-15
SLIDE 15

Jason Cong 08/15/2001 29

Physical Hierarchy Generation: Multi-Level Coarse Placement & Retiming

– Bottom-up multi-level clustering – Coarse placement at each level using multi-way weighted min-cut or SA – Sequential timing analysis at each level

Timing analysis & cell move Timing analysis & cell move Next cluster level Timing analysis & cell move Next cluster level

Jason Cong 08/15/2001 30

Hierarchical approach: higher-level design

constrains lower-level designs

Not sufficient information at higher

  • level

Mistake at higher level is impossible or costly to

correct

Multi-level approach: finer-level design

refines coarse-level design

Converge to better solution as more details are

considered

Hierarchical Approach vs. Multi-Level Approach

slide-16
SLIDE 16

Jason Cong 08/15/2001 31

Coarsening for Physical Hierarchy Generation (Multi-level Clustering)

Follow logic hierarchy: Connectivity based clustering:

hMetis [Karypis et al, DAC’97]

Hyper-edge coarsening

ESC [Cong and Lim, ICCAD’00]

Global edge separability based clustering

Performance driven multi-level clustering:

TLC [Cong and Romesis, DAC’01]

Jason Cong 08/15/2001 32

ESC Clustering

Edge separability [Cong & Lim, ASPDAC00]

Min # of edges to separate x and y: x-y mincut

x y e w(e) ? q(e) ? ?(e)

  • ESC clustering algorithm

– Can compute a tight lower-bound q(e) of ?(e) for all edges in O(nlogn) time [Nagamochi & Ibaraki, Algorithmica92] – Use q(e) for bottom-up multi-level clustering – Produce very good cutsize, comparable to hMetis [KA+97]

slide-17
SLIDE 17

Jason Cong 08/15/2001 33

ESC Experimental Results

1.13 1.09 1.31 1.19 1.15 1.31 1 0.2 0.4 0.6 0.8 1 1.2 1.4 Scaled Cutsize ABS DEN REP RTC CLO CON ESC

  • LR [CL+97] bipartitioning on ISPD98 [Alp98] circuits
  • ABS: Absorption[SS93] (max) : weight of edges absorbed

into C

  • DEN: Density [CS93, HK95] (max) : density of C in terms of

w(e)

  • REP: Rent Parameter [NOP87] (min) : entail better

placement result

  • RTC: Ratio Cut [WC92] (min) : identify natural

clusters

  • CLO: Closeness [SK93] (max) : connectivity to

neighboring vertices

  • CON:Connectivity [SU72] (max) : connectivity to

neighboring vertices

Jason Cong 08/15/2001 34 D1 D2 D3 1st level cluster 2nd level cluster 1st level cluster 2nd level cluster

Performance Driven Clustering

  • Capacity of first-level cluster: 2
  • Capacity of second-level cluster:4
  • d=1, D1=2, D2=4, D3=8
  • First solution delay: 35
  • Second solution delay: 31
  • Problem Formulation

Inputs: Areas and delays for all modules Different inter-cluster delays for different level Area constraints on each level of clustering Objectives: Build multi -level clusters that minimized the delay

under the area constraints

slide-18
SLIDE 18

Jason Cong 08/15/2001 35

Performance Driven Clustering – TLC Clustering

Linear space and time complexity (if the network is

bounded).

Two phases (labeling and clustering).

First phase: labeling From PIs to POs, visit nodes in topological order Label the node with the maximum delay under the two-level delay

model.

Second phase: clustering. From POs to PIs, cluster nodes

Node duplication (ND) control

Full node duplication Partial node duplication (depends on node criticality) No node duplication

Jason Cong 08/15/2001 36

TLC Experimental Results

20 40 60 80 100

Quartus Quartus + TLC (no ND) Quartus + TLC (partial ND) Quartus + TLC (full ND)

Delay

slide-19
SLIDE 19

Jason Cong 08/15/2001 37

GEO Experimental Results

  • Comparison with existing algorithms

– hMetis [DAC97] + retiming + slicing floorplan [Algo89] – HPM [DAC00] + slicing floorplan [Algo89] – GEO: simultaneous partitioning + coarse placement + retiming Close to 40% delay reduction!

0.2 0.4 0.6 0.8 1 1.2 1.4 delay cutsize wire runtime hMetis+RT+FL HPM+FL GEO

Jason Cong 08/15/2001 38

Preliminary Results on Multi-level Coarse Placement

  • Multi-level simulated annealing coarse placement engine
  • Comparison with GORDIAN:
  • Our engine only turns on wire length optimization
  • Legalized by DOMINO for wire length comparison

108% 140% 101% 76% 101% 45% 0% 20% 40% 60% 80% 100% 120% 140% 1k-10k 10k-50k 50k-200k Circuit size Our wire length/GD wire length Our time/GD time

Our multi-level engine performs well for big circuits

  • 1k-10k test cases:

s9234, s5378, s13207, s15850, bigkey

  • 10k-50k test cases:

s38417, s38584, clma, big1, big3, big4

  • 50k-200k test cases:

big2, big5, big6

slide-20
SLIDE 20

Jason Cong 08/15/2001 39

Ongoing work – Architecture Evaluation

Architecture blocks with different implementations with

Different areas Different delays Different pipeline stages …

Parameterized Buses with different bus widths Interconnect planning extracts area, delay, etc. for architecture

evaluation

Interconnect planning uses architecture evaluation functions to

explore alternative architecture blocks and buses for system performance optimization

Jason Cong 08/15/2001 40

Ongoing work – Synthesis under Physical Hierarchy

Consider interconnect information during

behavior and logic level synthesis

Explore various synthesis solutions to tradeoff long

global wires with short local wires

Generalized technology mapping: choosing different

behavior and logic synthesis solutions for each block

Revisit and extend various re-wiring techniques

slide-21
SLIDE 21

Jason Cong 08/15/2001 41

Concluding Remarks

Interconnects determine system performance Interconnect-centric design is needed

Interconnect planning Interconnect synthesis Interconnect layout

Physical hierarchy generation is crucial for

interconnect planning

A good combination of partitioning/placement and

retiming can hide global interconnect delays, and lead to good physical hierarchy

Multi-level method is an effective way to cope with

complexity

Jason Cong 08/15/2001 42

Acknowledgements

Thanks for current and former students

contributed to this project: Chin-Chih Chang, Ashok Jagannathan, Sung Lim, Michail Romesis, Chang Wu, and Xin Yuan

Thanks supports from GSRC, SRC, Fujitsu,

IBM, and Intel More details: http://cadlab.cs.ucla.edu http://cadlab.cs.ucla.edu