CS137: Today Electronic Design Automation Idea Challenges - - PDF document

cs137 today electronic design automation
SMART_READER_LITE
LIVE PREVIEW

CS137: Today Electronic Design Automation Idea Challenges - - PDF document

CS137: Today Electronic Design Automation Idea Challenges Path Selection Victimization Allocation Day 2: January 6, 2006 Methodology Quality, Timing Spatial Routing Parallelism Mesh FPGA


slide-1
SLIDE 1

1

CALTECH CS137 Winter2006 -- DeHon 1

CS137: Electronic Design Automation

Day 2: January 6, 2006 Spatial Routing

CALTECH CS137 Winter2006 -- DeHon 2

Today

  • Idea
  • Challenges

– Path Selection – Victimization – Allocation

  • Methodology
  • Quality, Timing
  • Parallelism
  • Mesh
  • FPGA Implementation

CALTECH CS137 Winter2006 -- DeHon 3

Global/Detail

  • With limited switching (e.g. FPGA)

– can represent routing graph exactly

CS137a: Day22

CALTECH CS137 Winter2006 -- DeHon 4

Pathfinder Review

  • Key step: find-shortest path from src to

sink

– Mark links by usage – Used links cost most – Shortest path tries to avoid

  • Negotiated Congestion w/ History

– Increase cost of congested nodes – Adaptive cost … makes historically congest nodes expensive, try to avoid

CALTECH CS137 Winter2006 -- DeHon 5

Slow?

  • Why is routing slow?

– Each route:

  • search all possible paths from source to sink
  • Number of paths expands as distance2
  • Graph of network is MBs large

– Large complicated data structure to walk – Won’t all fit in cache

– Number of nets = Number of edges – Perform many iterations to converge

CALTECH CS137 Winter2006 -- DeHon 6

Parallelism?

  • Search all paths in parallel for a single

route

  • Search routes for multiple nets in

parallel

– Don’t overlap – Overlap?

slide-2
SLIDE 2

2

CALTECH CS137 Winter2006 -- DeHon 7

Initial Key Ideas

  • Augment existing static network

structure to route itself

  • Use hardware to exploit

parallelism in routing

– Search all paths in parallel – Route multiple nets in parallel – Avoid walking irregular graph – Specialized/pipelined hardware at each switch

  • Hardware can perform a route trial

in 10s of cycles vs. 10K-100K cycles for software

CALTECH CS137 Winter2006 -- DeHon 8

Hardware Route Search in Action

2 4

CALTECH CS137 Winter2006 -- DeHon 9

Path Search Hardware

CALTECH CS137 Winter2006 -- DeHon 10

Path Search Hardware

Idea

  • Existing paths

already allocated

  • Drive a one into

search paths

  • All free paths pass

up

CALTECH CS137 Winter2006 -- DeHon 11

Challenges

  • How select among paths?
  • What if there are no free paths?
  • Can we work without Pathfinder’s

history?

  • How handle fanout?
  • How handle allocation and

victimization?

CALTECH CS137 Winter2006 -- DeHon 12

Select Among Paths?

  • Easy: Randomly

– Use PRNG at xover switchbox

  • Otherwise, need to represent costs…
slide-3
SLIDE 3

3

CALTECH CS137 Winter2006 -- DeHon 13

No Paths?

  • Try stealing a path (rip-up) victimize

existing path

  • Which one?

– Randomly select victim – History-free Pathfinder suggest:

  • one with least nets shared with other routes

CountCost

– CountNet: one which intersects least existing nets

CALTECH CS137 Winter2006 -- DeHon 14

CountNet vs. CountCost

  • CountCost: 6
  • CountNetCost: 1

CALTECH CS137 Winter2006 -- DeHon 15

Implement Counting?

Idea: Delay congested signal Free paths not delayed. Least congested signal arrives at xover first.

CALTECH CS137 Winter2006 -- DeHon 16

CountNet Approximation

  • Keeping track of which net uses a switch

would be much more state/complicated

  • Approximate CountNet by only delaying at

conflicting switches

CALTECH CS137 Winter2006 -- DeHon 17

Implement CounNet Approximation

Allow to pass if agrees with switch setting.

CALTECH CS137 Winter2006 -- DeHon 18

Cost is max of sides

  • Also note:

– Actual cost is max(srcxover,sinkxover) instead of sum

slide-4
SLIDE 4

4

CALTECH CS137 Winter2006 -- DeHon 19

400 500 600 700 800 900 1000 1100

8 1 6 3 2 6 4 1 2 8 2 5 6 5 1 2 1 2 4 2 4 8 4 9 6 8 1 9 2 1 6 3 8 4

Pathfinder Random

Algorithm Comparison – Random Netlist

Total Channels HSRA Array Size

CALTECH CS137 Winter2006 -- DeHon 20

How Improve?

  • Apologize for lack of history?

– Exploit fast – Try multiple starts and exploit randomness – Like multiple starts of FM

CALTECH CS137 Winter2006 -- DeHon 21

Trading Routing Time for Quality

5 6 7 8 9 10 8 16 32 64 128 256 512 1024 2048 4096 Array Size # of tracks Pathfinder Random Average Random Best of 20 CountNet

CALTECH CS137 Winter2006 -- DeHon 22

Choosing the Right Victims

4 5 6 7 8 9 10 8 16 32 64 128 256 512 1024 2048 4096 Array Size Quality CountCongestion CountNet Pathfinder CountNetApproximation

CALTECH CS137 Winter2006 -- DeHon 23

CountNet

CountNet best of 20 starts.

CALTECH CS137 Winter2006 -- DeHon 24

Hypergraphs (Fanout)

  • Sequentially route each two-point net, trying to re-

use as much as possible from existing allocated paths.

slide-5
SLIDE 5

5

CALTECH CS137 Winter2006 -- DeHon 25

Hypergraphs (Fanout)

  • Sequentially route each two-point net, trying to re-

use as much as possible from existing allocated paths.

CALTECH CS137 Winter2006 -- DeHon 26

Hypergraphs (Fanout)

  • Sequentially route each two-point net, trying to re-

use as much as possible from existing allocated paths.

CALTECH CS137 Winter2006 -- DeHon 27

Hypergraphs (Fanout)

  • Sequentially route each two-point net, trying to re-

use as much as possible from existing allocated paths.

  • Add a state bit at every switch

– Set when allocate during the current net search. – Clear when we begin to route a new net

  • Order the destinations associated with a single

source

  • For each destination,

– Search from sink as before (only from sink) – At the switch, if the state bit is set and the sink side is congestion free, we have found an available path. – Otherwise, drive ones into all available source paths and allocate a new path, like a standard route search.

CALTECH CS137 Winter2006 -- DeHon 28

Hypergraphs (Fanout)

  • Sequentially route each two-point net, trying to re-

use as much as possible from existing allocated paths.

CALTECH CS137 Winter2006 -- DeHon 29

Hypergraphs (Fanout)

  • Sequentially route each two-point net, trying to re-

use as much as possible from existing allocated paths.

  • Add a state bit at every switch

– Set when allocate during the current net search. – Clear when we begin to route a new net

  • Order the destinations associated with a single

source

  • For each destination,

– Search from sink as before (only from sink) – At the switch, if the state bit is set and the sink side is congestion free, we have found an available path. – Otherwise, drive ones into all available source paths and allocate a new path, like a standard route search.

CALTECH CS137 Winter2006 -- DeHon 30

High Fanout Nets

  • Victimizing high fanout net will cause

considerable re-route work

  • Might want to penalize victimizing high fanout

nets

  • CountNetFanout?

– Requires more state…expensive…

  • Simple hack: lock high fanout nets against

victimization

– What’s a high fanout net? >10?

slide-6
SLIDE 6

6

CALTECH CS137 Winter2006 -- DeHon 31

Toronto20 − Quality

10 10 9 8 9 11 9 11 11 10

Pathfinder

12.80 ex1010 10.00 elliptic 8.10 dsip 9.84 diffeq 10.00 des 11.00 clma 8.01 bigkey 11.00 apex4 10.12 apex2 10.00 alu4

CountNet

199 8 12 11 9 9 9 12 11 10 10 204.50 Total 10.00 tseng 12.11 spla 10.00 seq 9.00 s38584 10.00 s38417 9.06 s298 12.00 pdc 10.00 misex3 10.45 frisc 11.00 ex5p

CALTECH CS137 Winter2006 -- DeHon 32

So far

  • All Quality

– …haven’t dealt with all performance details

  • Had basis for confidence in

performance

  • Wanted to make sure worthwhile first

CALTECH CS137 Winter2006 -- DeHon 33

Hardware Allocation

Add all nets to R While nets in R > 0 and routeTrial < RTmax For each unrouted net Find all possible routes If found possible routes Randomly select and allocate a route Else Select a route to victimize and allocate the route Endfor Adjust R Endwhile

Idea: send one down selected path

CALTECH CS137 Winter2006 -- DeHon 34

With Victimization

Add all nets to R While nets in R > 0 and routeTrial < RTmax For each unrouted net Find all possible routes If found possible routes Randomly select and allocate a route Else Randomly select a route to victimize and allocate the route Endfor Adjust R Endwhile

CALTECH CS137 Winter2006 -- DeHon 35

Analysis Methodology

  • Sequential version that does effectively

the same thing (perhaps inefficiently)

  • Count key operations/variables

– Number of net searches – Number of victims

  • Timing model for key operations
  • Calculate Performance under various

timing assumptions

CALTECH CS137 Winter2006 -- DeHon 36

Timing Models

  • Hardware Timing

– Tpath = length of path ~= log(N) – Tallocate~=Tpath – Tvictim~=4*Tpath

  • Software Timing

– Tallocate~=Npathsw*(Tm+Tc+Twb+Ta) – Tvictim~=Npathsw*(Tm+Tc)+V*Talloc

  • Tm=main memory ref
  • Tc=cache ref; Twb=write buffer; Ta=bit alloc
slide-7
SLIDE 7

7

CALTECH CS137 Winter2006 -- DeHon 37

Route Time

Ntry – number of route starts NRT – number of path searches NRO – number of rip ups NFO – number of fanout searches NFOA – number of fanout allocations

CALTECH CS137 Winter2006 -- DeHon 38

Raw Data

CALTECH CS137 Winter2006 -- DeHon 39 CALTECH CS137 Winter2006 -- DeHon 40

Making comparisons

  • There is a

quality/time tradeoffs

  • Want to compare at

iso-quality

CALTECH CS137 Winter2006 -- DeHon 41 CALTECH CS137 Winter2006 -- DeHon 42

More Parallelism

  • Only exploiting parallelism in path

search

  • Subtrees are independent
  • Route root
  • Then route next two

channels in parallel

  • Then route next 4…
slide-8
SLIDE 8

8

CALTECH CS137 Winter2006 -- DeHon 43 CALTECH CS137 Winter2006 -- DeHon 44

Still Not Exploiting

  • Multiple path searches in parallel that
  • verlap routing resources…

CALTECH CS137 Winter2006 -- DeHon 45

Extension to Mesh Networks

  • No well defined crossover point .
  • Path back to the source is not implied

directly by the topology of the routing network.

  • Paths of different length

– and non-minimal length paths may be important components of a good solution.

CALTECH CS137 Winter2006 -- DeHon 46

Mesh Approach

  • Single-ended search from source
  • Larger delay on congestion allow

non-minimal length paths

  • Breadcrumb approach leave state in

switches pointing back to source

CALTECH CS137 Winter2006 -- DeHon 47

Extension to Mesh Networks

CALTECH CS137 Winter2006 -- DeHon 48

Extension to Mesh Networks - Results

71 61 87 75 64 52

(ms)

Hardware Router VPR 4.3 (μs)

Rnd atomic Fast

Quality

Design

1.5(ms) 5 5 5 s526 890 5 5 4 s526n 140 6 5 5 mm9a 80 6 6 5 ex2 150 6 5 5 c8 800 5 5 4 5xp1

(Simulator too slow to run larger)

slide-9
SLIDE 9

9

CALTECH CS137 Winter2006 -- DeHon 49

BFT FPGA Implementation

  • 21 4-LUTs to implement switch logic

+9 4-LUTs to manage prng/allocation =30 4-LUTs/T-switch

  • 13/3 switches/PE/domain

130 4-LUTs/PE/domain

  • C=10

1300 4-LUTs / PE

CALTECH CS137 Winter2006 -- DeHon 50

Mesh FPGA Implementation

CALTECH CS137 Winter2006 -- DeHon 51

FPGA Implementation

  • Slow clock

– 3ns vs. 0.3ns?

  • 1000-2000 FPGAs
  • Speedup/FPGA

– 0.51

CALTECH CS137 Winter2006 -- DeHon 52

Saving Area

  • Allocation and Victimization occur in a

single domain

– All other domains idle

  • Maybe only implement a single physical

domain?

– Pipeline (C-slow) path search through

  • ther domains
  • Slight slowdown, big area savings???

CALTECH CS137 Winter2006 -- DeHon 53

Admin

  • Read GraphStep for Monday

– (paper not due until next Friday, so feedback welcomed…)

  • Monday: GraphStep talk
  • Friday: Project Selection Due

CALTECH CS137 Winter2006 -- DeHon 54

Big Ideas

  • Parallelism
  • Avoiding bad memory hierarchy
  • Specialization
  • Simple/Lightweight algorithm

– Fast “dumb” alg. vs. slow/stateful alg.