CS137: Today Electronic Design Automation Problem Parallelism - - PDF document

cs137 today electronic design automation
SMART_READER_LITE
LIVE PREVIEW

CS137: Today Electronic Design Automation Problem Parallelism - - PDF document

CS137: Today Electronic Design Automation Problem Parallelism Primary Sources Cellular Automata Wrighton&DeHon FPGA2003 Idea Day 8: January 27, 2006 Wrighton MS Thesis Details 2003 Cellular


slide-1
SLIDE 1

1

CALTECH CS137 Winter2006 -- DeHon 1

CS137: Electronic Design Automation

Day 8: January 27, 2006 Cellular Placement

CALTECH CS137 Winter2006 -- DeHon 2

Today

  • Problem
  • Parallelism
  • Cellular Automata
  • Idea
  • Details

– Avoid Local Minima – Update locations

  • Results
  • Directions
  • Primary Sources

– Wrighton&DeHon FPGA2003 – Wrighton MS Thesis 2003

CALTECH CS137 Winter2006 -- DeHon 3

Placement

  • Problem: Pick locations for all building

blocks

– minimizing energy, delay, area – really:

  • minimize wire length
  • minimize channel density

– surrogates:

  • Minimizing squared wire length
  • Minimize bounding box

CALTECH CS137 Winter2006 -- DeHon 4

Parallelism

  • What parallelism exists in placement?

– Evaluate costs of prospective moves

  • One set to many perspective locations
  • Many moves each to single location

– Perform moves

CALTECH CS137 Winter2006 -- DeHon 5

Cellular Automata

  • Basic idea: regular array of identical

cells with nearest-neighbor communication

CALTECH CS137 Winter2006 -- DeHon 6

CA Model

  • On each cycle:

– Each cell exchanges values with neighbors – Updates state/value based on own state and that of neighbors – E.g. Conway’s LIFE

slide-2
SLIDE 2

2

CALTECH CS137 Winter2006 -- DeHon 7

Cellular Automata

  • Physical Advantage:

– No long wires

  • Area linear in number of nodes
  • Minimum delaysmall cycle time
  • Good scaling properties

CALTECH CS137 Winter2006 -- DeHon 8

System Architecture Taxonomy

(Subject to continuing refinement and embellishment)

CALTECH CS137 Winter2006 -- DeHon 9

CA Placement

  • Can we perform placement in a CA?

CALTECH CS137 Winter2006 -- DeHon 10

Mapping

  • Each cell is a physical placement

location

  • State is a logical node assigned to the

cell

  • Assume:

– Cell knows own location – State knows location of connected nodes

CALTECH CS137 Winter2006 -- DeHon 11

Costs

  • Assume:

– Cell knows own location – State knows location of connected nodes

  • Cell computes: its cost at that location

( )

2 .

) . ( ) . (

edges g e

snk e L src e L

CALTECH CS137 Winter2006 -- DeHon 12

Moves

  • Two adjacent cells can exchange graph

nodes

slide-3
SLIDE 3

3

CALTECH CS137 Winter2006 -- DeHon 13

Moves

  • Evaluate goodness of proposed swap

– Each cell considers impact of its graph node being in the other cell – Keep if swap reduces cost

CALTECH CS137 Winter2006 -- DeHon 14

Move Costs

  • Only really need to evaluate delta cost
  • (src.x-sink.x)2
  • Moving sink
  • d/dx=-2 (src.x-sink.x)
  • Delta move cost is linear distance

CALTECH CS137 Winter2006 -- DeHon 15

Parallel Swaps

  • Pair up and perform N/2 swaps in

parallel

CALTECH CS137 Winter2006 -- DeHon 16

Movement

  • Alternate pairings with N,S,E,W

neighbor move any directions

CALTECH CS137 Winter2006 -- DeHon 17

Basic Idea

  • Pair up PEs
  • Compute impact of swaps in parallel
  • Perform swaps in parallel
  • Repeat until converge

CALTECH CS137 Winter2006 -- DeHon 18

Problems/Details

  • Greedy swaps local minima?
  • How update location of neighbors?

– …they are moving, too

slide-4
SLIDE 4

4

CALTECH CS137 Winter2006 -- DeHon 19

Avoid Greedy

  • Insert randomness in swaps
  • Simulated Annealing
  • Shake up system to get out of local

minima

  • Swap if

– Randomly decide to swap – OR beneficial to swap

  • Change swap thresholds over time

CALTECH CS137 Winter2006 -- DeHon 20

Swap?

CALTECH CS137 Winter2006 -- DeHon 21

Impact of Randomness

CALTECH CS137 Winter2006 -- DeHon 22

Range Limiting

Eurgo, Hauck, & Sharma DAC 2005

CALTECH CS137 Winter2006 -- DeHon 23

Local Swaps Only

  • Assume there’s an ideal location
  • Each node takes a biased Random Walk

away from minimum cost location

  • Gives node a distribution function around the

minimum cost location

  • If wander into a better “minimum cost” home,

then wanders around new centerpoint

  • Decreasing temperature restricts effective

radius of walk

CALTECH CS137 Winter2006 -- DeHon 24

Local Swap Random Walk

  • Decreasing temperature restricts

effective radius of walk

slide-5
SLIDE 5

5

CALTECH CS137 Winter2006 -- DeHon 25

How update locations?

  • Broadcast?
  • Pipelined Ring?
  • Send to neighbors?

– Routing network?

  • Tree?
  • For whom?

– Everyone? Only things moved? Only things moved a lot?

CALTECH CS137 Winter2006 -- DeHon 26

Simple Solution: Ring

  • Drop value

in ring

  • Shift around

entire array

  • Everyone

listens for updates

CALTECH CS137 Winter2006 -- DeHon 27

Simple Solution: Ring

  • Weakness?

– Serial – N cycles to complete – N/2 swaps in O(1) – Then O(N) to update?

CALTECH CS137 Winter2006 -- DeHon 28

Simple Solution: Ring

  • Linear update bad
  • Idea: allow staleness

– Things move slowly – Estimate of position not that bad… – …and continued

  • peration will

correct…

CALTECH CS137 Winter2006 -- DeHon 29

Algorithm

CALTECH CS137 Winter2006 -- DeHon 30

Algorithm

Update Locations

slide-6
SLIDE 6

6

CALTECH CS137 Winter2006 -- DeHon 31

Algorithm

Try Moves

CALTECH CS137 Winter2006 -- DeHon 32

Quality vs. Parameters

CALTECH CS137 Winter2006 -- DeHon 33

Iso-Quality

Pick point on Iso-Quality Curve that minimizes time

CALTECH CS137 Winter2006 -- DeHon 34

FPGA Implementation

  • Virtex E (180nm)
  • 10ns cycle (100MHz)
  • 150 cycles for 4-phase swap

– (~40 cycles/swap)

  • 400 LUTs / Placement Engine
  • Comparing

– 2.2GHz Intel Xeon (L2 512KB)

CALTECH CS137 Winter2006 -- DeHon 35

Results

CALTECH CS137 Winter2006 -- DeHon 36

Tuning Quality

slide-7
SLIDE 7

7

CALTECH CS137 Winter2006 -- DeHon 37

Scaling

  • Processor cycles O(N4/3)

– VPR

  • Systolic cycles

– O(N1/2) – assume geometric refinement;

O(N1/2 ) update

– O(N5/6) – mesh sort, same number of swaps as VPR (N4/3 / N1/2)

CALTECH CS137 Winter2006 -- DeHon 38

Scaling

Also includes technology scaling

CALTECH CS137 Winter2006 -- DeHon 39

Variations

  • Update Schemes
  • Cost Functions
  • Larger bins than PEs

CALTECH CS137 Winter2006 -- DeHon 40

Update Scheme: Tree

  • Build Reduce Tree (H-Tree)
  • Route to route in O(N1/2) time
  • Route from root to leaves in O(N1/2)

times

  • Pipeline
  • Same bandwidth as Ring (1/cycle)
  • But less staleness (only O(N1/2))

CALTECH CS137 Winter2006 -- DeHon 41

Reducing Broadcast (Idea 1)

  • Don’t update things that haven’t moved

(much)

– …or things that move and move back before broadcast

  • Keep track of staleness

– How far moved from last broadcast

  • Give priority to stalest data
  • Max staleness wins at each tree stage

– Break ties with randomness

CALTECH CS137 Winter2006 -- DeHon 42

Reducing Broadcast (Idea 2)

  • Update locally
  • Don’t need to know if someone far away

moved by 1 square

  • …but need to know if near neighbor did
  • Multigrid/multiscale scheme

– Only alert nodes in same subtree – When change subtrees at a level, alert all nodes underneath

slide-8
SLIDE 8

8

CALTECH CS137 Winter2006 -- DeHon 43

Update Scheme: Mesh Route

  • Can Route a permutation in O(N1/2) time
  • n a mesh
  • Build mesh switching
  • Make O(N) swaps
  • Then take O(N1/2) time moving/updating
  • Becomes full simulated annealing

– i.e. not just local swaps

CALTECH CS137 Winter2006 -- DeHon 44

Cost Functions

CALTECH CS137 Winter2006 -- DeHon 45

Cost Functions

  • Bounding Box2 phase update

– Phase 1: alert source to location of all sinks – Phase 2: source communicates bbox extents to all sinks

CALTECH CS137 Winter2006 -- DeHon 46

Timing

  • Linear Update:

– Topological ordering of netlist – Use tree to distribute updates – Send updates in netlist order – get delay in one pass

  • Mesh:

– Compute directly with dataflow-style spreading activation

  • Wait for all inputs; then send output

CALTECH CS137 Winter2006 -- DeHon 47

Bins

CALTECH CS137 Winter2006 -- DeHon 48

Node Bins

  • Keep more than one graph node per PE
  • Local swap of one node from each PE

node set each step

– One with largest benefit? – Randomly select based on cost/benefit?

  • Like rejectionnless annealing
slide-9
SLIDE 9

9

CALTECH CS137 Winter2006 -- DeHon 49

Admin

  • Parallel Prefix familiarity?
  • Due today: literature review
  • There is class on Monday