CS184a: Computer Architecture (Structures and Organization) Day12: - - PDF document

cs184a computer architecture structures and organization
SMART_READER_LITE
LIVE PREVIEW

CS184a: Computer Architecture (Structures and Organization) Day12: - - PDF document

CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a Fall2000 -- DeHon 1 Last Time Dominance of Interconnect Simple things and why they


slide-1
SLIDE 1

1

Caltech CS184a Fall2000 -- DeHon 1

CS184a: Computer Architecture (Structures and Organization)

Day12: November 1, 2000 Interconnect Requirements and Richness

Caltech CS184a Fall2000 -- DeHon 2

Last Time

  • Dominance of Interconnect
  • Simple things

– and why they don’t work

  • Characterizing Interconnect Requirements

– start

slide-2
SLIDE 2

2

Caltech CS184a Fall2000 -- DeHon 3

Today

  • Followups from Monday (3)
  • Interconnect Design Space
  • Characterizing Interconnect Requirements
  • Interconnect Implications
  • How rich should interconnect be

– specifics of understanding interconnect – methodology for attacking these kinds of questions

Caltech CS184a Fall2000 -- DeHon 4

Tree Cut

  • Bisection bandwidth

– binary: 1 – general: log(n)

  • Rent IO Cut

– IO~K/2 * N – P=1

  • Difference:

– include inputs

slide-3
SLIDE 3

3

Caltech CS184a Fall2000 -- DeHon 5

Resource Bounded Scheduling

  • Last time: pointed out can get lower bound
  • n time (upper bound on performance)
  • Scheduling in general NP-hard

– (find optimum) – can approximate in O(E) time

Caltech CS184a Fall2000 -- DeHon 6

Lower Bound: Critical Path

  • ASAP schedule ignoring resource

constraints

– (look at length of remaining critical path)

  • Certainly cannot finish any faster than that
slide-4
SLIDE 4

4

Caltech CS184a Fall2000 -- DeHon 7

Lower Bound: Resource Capacity

  • Sum up all capacity required per resource
  • Divide by total resource (for type)
  • Lower bound on remaining schedule time

– (best can do is pack all use densely)

Caltech CS184a Fall2000 -- DeHon 8

Example

Critical Path Resource Bound (2 resources) Resource Bound (4 resources)

slide-5
SLIDE 5

5

Caltech CS184a Fall2000 -- DeHon 9

Example 2

RB = 8/2=4 LB = 5 best delay= 6

Caltech CS184a Fall2000 -- DeHon 10

Example 3

LB = 3 RB = 13/2 = 7 best delay = 7

slide-6
SLIDE 6

6

Caltech CS184a Fall2000 -- DeHon 11

Good Model?

Log-log plot ==> straight lines represent geometric growth

Caltech CS184a Fall2000 -- DeHon 12

Rent’s Rule

  • Long standing empirical relationship

– IO = C*NP – 0≤P ≤1.0 – compare (F,α)-bifurcator

α= 2P

  • Captures notion of locality

– some signals generated and consumed locally – reconvergent fanout

slide-7
SLIDE 7

7

Caltech CS184a Fall2000 -- DeHon 13

Rent and Locality

  • Rent and IO capture locality

– local consumption – local fanout

Caltech CS184a Fall2000 -- DeHon 14

Resuming...

slide-8
SLIDE 8

8

Caltech CS184a Fall2000 -- DeHon 15

Rent’s Rule

  • Typically consider

– 0.5≤P ≤0.75

  • “High-Speed” Logic P=0.67
  • Memory (P~0.1-0.2)
  • Example (i10)

– max C=7, P=0.68 – avg C=5, P=0.72

Caltech CS184a Fall2000 -- DeHon 16

What tell us about design?

  • Recursive bandwidth requirements in

network

slide-9
SLIDE 9

9

Caltech CS184a Fall2000 -- DeHon 17

What tell us about design?

  • Recursive bandwidth requirements in

network

– lower bound on resource requirements

  • N.B. necessary but not sufficient condition
  • n network design

– I.e. design must also be able to use the wires

Caltech CS184a Fall2000 -- DeHon 18

What tell us about design?

  • Interconnect lengths

– Intuition

  • if p>0.5, everything cannot be nearest neighbor
  • as p grows, so wire distances
slide-10
SLIDE 10

10

Caltech CS184a Fall2000 -- DeHon 19

What tell us about design?

  • Interconnect lengths

– IO=(n2)P cross distance n – dIO/dn end at exactly distance n – E(l)=Integral 0 to n=√N

  • of n*(dIO/dn)/n2
  • assume iid sources

– E(l)=O(N(p-0.5))

  • p>0.5

Caltech CS184a Fall2000 -- DeHon 20

What Tell us about design?

  • IO∝NP
  • Bisection BW∝NP
  • side length ∝NP

– N if p<0.5

  • Area ∝N2p

– p>0.5 N.B. 2D VLSI world has “natural” Rent of P=0.5 (area vs. perimeter)

slide-11
SLIDE 11

11

Caltech CS184a Fall2000 -- DeHon 21

Rent’s Rule Caveats

  • Modern “systems” on a chip -- likely to

contain subcomponents of varying Rent complexity

  • Less I/O at certain “natural” boundaries
  • System close

– (Rent’s Rule apply to workstation, PC, PDA?)

Caltech CS184a Fall2000 -- DeHon 22

Area/Wire Length

  • Bad news

– Area ~ O(N2p)

  • faster than N

– Avg. Wire Length ~ O(N(p-0.5))

  • grows with N
  • Can designers/CAD control p (locality)
  • nce appreciate its effects?
  • I.e. maybe this cost changes design

style/criteria so we mitigate effects?

slide-12
SLIDE 12

12

Caltech CS184a Fall2000 -- DeHon 23

What Rent didn’t tell us

  • Bisection bandwidth purely geometrical
  • No constraint for delay

– I.e. a partition may leave critical path weaving between halves

Caltech CS184a Fall2000 -- DeHon 24

Critical Path and Bisection

Minimum cut may cross critical path multiple times. Minimizing long wires in critical path => increase cut size.

slide-13
SLIDE 13

13

Caltech CS184a Fall2000 -- DeHon 25

Rent Weakness

  • Not account for path topology
  • ? Can we define a “Temporal” Rent which

takes into consideration?

– Promising research topic

Caltech CS184a Fall2000 -- DeHon 26

Administrative Interlude

  • …won’t catchup today + lots more stuff
  • No Class Wed 11/8
  • Can we meet Friday 11/10?
  • Homework 3+4 graded
  • P/F

– (reluctantly) …if you must – must attempt all (>90%) problems to get passing grade

slide-14
SLIDE 14

14

Caltech CS184a Fall2000 -- DeHon 27

Interconnect Richness

Caltech CS184a Fall2000 -- DeHon 28

Now What?

  • There is structure (locality)
  • Rent characterizes locality
  • How rich should interconnect be?

– Allow full utilization? – Model requirements and area impact

slide-15
SLIDE 15

15

Caltech CS184a Fall2000 -- DeHon 29

Step 1: Build Architecture Model

  • Assume geometric growth
  • Pick parameters: Build architecture can

tune

– F, C α, p

Caltech CS184a Fall2000 -- DeHon 30

Tree of Meshes

  • Tree
  • Restricted internal

bandwidth

  • Can match to model
slide-16
SLIDE 16

16

Caltech CS184a Fall2000 -- DeHon 31

Parameterize C

Caltech CS184a Fall2000 -- DeHon 32

Parameterize Growth

(2 1)* => α=√2 (2 2 1)* => α=(2*2)(1/3) =2(2/3) (2 2 2 1)* =>α=2(3/4)

slide-17
SLIDE 17

17

Caltech CS184a Fall2000 -- DeHon 33

Wednesday class stopped here

Caltech CS184a Fall2000 -- DeHon 34

Step 2: Area Model

  • Need to know effect of architecture

parameters on area (costs)

– focus on dominant components

  • wires
  • switches
  • logic blocks(?)
slide-18
SLIDE 18

18

Caltech CS184a Fall2000 -- DeHon 35

Area Parameters

  • Alogic = 40Κλ2
  • Asw = 2.5Κλ2
  • Wire Pitch = 8λ

Caltech CS184a Fall2000 -- DeHon 36

Switchbox Population

  • Full population is excessive (next week?)
  • Hypothesis: linear population adequate

– still to be (dis)proven

slide-19
SLIDE 19

19

Caltech CS184a Fall2000 -- DeHon 37

“Cartoon” VLSI Area Model

(Example artificially small for clarity)

Caltech CS184a Fall2000 -- DeHon 38

Larger “Cartoon”

1024 LUT Network P=0.67 LUT Area 3%

slide-20
SLIDE 20

20

Caltech CS184a Fall2000 -- DeHon 39

Effects of P (α) on Area

P=0.5 P=0.67 P=0.75 1024 LUT Area Comparison

Caltech CS184a Fall2000 -- DeHon 40

Effects of P on Capacity

slide-21
SLIDE 21

21

Caltech CS184a Fall2000 -- DeHon 41

Step 3: Characterize Application Requirements

  • Identify representative applications.

– Today: IWLS93 logic benchmarks

  • How much structure there?
  • How much variation among applications?

Caltech CS184a Fall2000 -- DeHon 42

Application Requirements

Max: C=7, P=0.68 Avg: C=5, P=0.72

slide-22
SLIDE 22

22

Caltech CS184a Fall2000 -- DeHon 43

Benchmark Wide

Caltech CS184a Fall2000 -- DeHon 44

Benchmark Parameters

slide-23
SLIDE 23

23

Caltech CS184a Fall2000 -- DeHon 45

Complication

  • Interconnect requirements vary among

applications

  • Interconnect richness has large effect on

area

  • What is effect of architecture/application

mismatch?

– Interconnect too rich? – Interconnect too poor?

Caltech CS184a Fall2000 -- DeHon 46

Interconnect Mismatch in Theory

slide-24
SLIDE 24

24

Caltech CS184a Fall2000 -- DeHon 47

Step 4: Assess Resource Impact

  • Map designs to parameterized architecture
  • Identify architectural resource required

Compare: mapping to k-LUTs; LUT count vs. k.

Caltech CS184a Fall2000 -- DeHon 48

Mapping to Fixed Wire Schedule

  • Easy if need less

wires than Net

  • If need more

wires than net, must depopulate to meet interconnect limitations.

slide-25
SLIDE 25

25

Caltech CS184a Fall2000 -- DeHon 49

Mapping to Fixed-WS

  • Better results if

“reassociate” rather than keeping original subtrees.

Caltech CS184a Fall2000 -- DeHon 50

Observation

  • Don’t really want a “bisection” of LUTs

– subtree filled to capacity by either of

  • LUTs
  • root bandwidth

– May be profitable to cut at some place other than midpoint

  • not require “balance” condition

– “Bisection” should account for both LUT and wiring limitations

slide-26
SLIDE 26

26

Caltech CS184a Fall2000 -- DeHon 51

Challenge

  • Not know where to cut design into

– not knowing when wires will limit subtree capacity

Caltech CS184a Fall2000 -- DeHon 52

Brute Force Solution

  • Explore all cuts

– start with all LUTs in group – consider “all” balances – try cut – recurse

slide-27
SLIDE 27

27

Caltech CS184a Fall2000 -- DeHon 53

Brute Force

  • Too expensive
  • Exponential work
  • …viable if solving same subproblems

Caltech CS184a Fall2000 -- DeHon 54

Simplification

  • Single linear ordering
  • Partitions = pick split point on ordering
  • Reduce to finding cost of [start,end] ranges

(subtrees) within linear ordering

  • Only n2 such subproblems
  • Can solve with dynamic programming
slide-28
SLIDE 28

28

Caltech CS184a Fall2000 -- DeHon 55

Dynamic Programming

  • Start with base set of

size 1

  • Compute all splits of

size n, from solutions to all problems of size n-1 or smaller

  • Done when compute

where to split 0,N-1

Caltech CS184a Fall2000 -- DeHon 56

Dynamic Programming

  • Just one possible “heuristic” solution to this

problem

– not optimal – dependent on ordering – sacrifices ability to reorder on splits to avoid exponential problem size

  • Opportunity to find a better solution here...
slide-29
SLIDE 29

29

Caltech CS184a Fall2000 -- DeHon 57

Ordering LUTs

  • Another problem

– lay out gates in 1D line – minimize sum of squared wire length

  • tend to cluster connected gates together

– Is solvable mathematically for optimal

  • Eigenvector of connectivity matrix
  • Use this 1D ordering for our linear ordering

Caltech CS184a Fall2000 -- DeHon 58

Mapping Results

slide-30
SLIDE 30

30

Caltech CS184a Fall2000 -- DeHon 59

Step 5: Apply Area Model

  • Assess impact of resource results

Caltech CS184a Fall2000 -- DeHon 60

Resources × Area Model ⇒ Area Resources × Area Model ⇒ Area Resources × Area Model ⇒ Area Resources × Area Model ⇒ Area

slide-31
SLIDE 31

31

Caltech CS184a Fall2000 -- DeHon 61

Net Area

Caltech CS184a Fall2000 -- DeHon 62

Picking Network Design Point

Don’t optimize for 100% compute util. (100% yield) …also don’t optimize for highest peak.

slide-32
SLIDE 32

32

Caltech CS184a Fall2000 -- DeHon 63

What about a single design?

Caltech CS184a Fall2000 -- DeHon 64

LUT Utilization predict Area?

Single design

slide-33
SLIDE 33

33

Caltech CS184a Fall2000 -- DeHon 65

Methodology

  • Architecture model (parameterized)
  • Cost model
  • Important task characteristics
  • Mapping Algorithm

– Map to determine resources

  • Apply cost model
  • Digest results

– find optimum (multiple?) – understand conflicts (avoidable?)

Caltech CS184a Fall2000 -- DeHon 66

Big Ideas [MSB Ideas]

  • Rent’s rule characterize locality
  • => Area growth O(N2p)
  • p>0.5 => interconnect growing faster than

compute elements

– expect interconnect to dominate other resources

slide-34
SLIDE 34

34

Caltech CS184a Fall2000 -- DeHon 67

Big Ideas [MSB Ideas]

  • Interconnect area dominates logic area
  • Interconnect requirements vary

– among designs – within a single design

  • To minimize area

– focus on using dominant resource (interconnect) – may underuse non-dominant resources (LUTs)

Caltech CS184a Fall2000 -- DeHon 68

Big Ideas [MSB Ideas]

  • Two different resources here

– compute, interconnect

  • Balance of resources required varies among

designs (even within designs)

  • Cannot expect full utilization of every

resource

  • Most area-efficient designs may waste some

compute resources (cheaper resource)