CS184a: Computer Architecture (Structures and Organization) Day10: - - PDF document

cs184a computer architecture structures and organization
SMART_READER_LITE
LIVE PREVIEW

CS184a: Computer Architecture (Structures and Organization) Day10: - - PDF document

CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs, PLAs Caltech CS184a Fall2000 -- DeHon 1 Last Time LUTs area structure big LUTs vs. small LUTs with


slide-1
SLIDE 1

1

Caltech CS184a Fall2000 -- DeHon 1

CS184a: Computer Architecture (Structures and Organization)

Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs, PLAs

Caltech CS184a Fall2000 -- DeHon 2

Last Time

  • LUTs

– area – structure – big LUTs vs. small LUTs with interconnect – design space – optimization

slide-2
SLIDE 2

2

Caltech CS184a Fall2000 -- DeHon 3

Today

  • LUT Delay
  • LUT Cascades
  • ALUs
  • PLAs

Caltech CS184a Fall2000 -- DeHon 4

Delay

slide-3
SLIDE 3

3

Caltech CS184a Fall2000 -- DeHon 5

Delay?

  • Circuit Depth in LUTs?
  • “Simple Function” --> M-input AND

– 1 table lookup in M-LUT – logk(M) in K-LUT

Caltech CS184a Fall2000 -- DeHon 6

Delay?

  • M-input “Complex” function

– 1 table lookup for M-LUT – between: (M-K)/log2(k) +1 – and (M-K)/log2(k- log2(k))+1

slide-4
SLIDE 4

4

Caltech CS184a Fall2000 -- DeHon 7

Delay

  • Simple: log M
  • Complex: linear in M
  • Both go as 1/log(k)

Caltech CS184a Fall2000 -- DeHon 8

Circuit Depth vs. K

slide-5
SLIDE 5

5

Caltech CS184a Fall2000 -- DeHon 9

LUT Delay vs. K

  • For small LUTs:

– tLUT≈c0+c1×K

  • Large LUTs:

– add length term – c2 ×√2K

  • Plus Wire Delay

– ~√area

Caltech CS184a Fall2000 -- DeHon 10

Delay vs. K

Delay = Depth × (tLUT+ tInterconnect)

Why not satisfied with this model?

slide-6
SLIDE 6

6

Caltech CS184a Fall2000 -- DeHon 11

Observation

  • General interconnect is expensive
  • “Larger” logic blocks

– => less interconnect crossing – => lower interconnect delay – => get larger – => get slower

  • faster than modeled here due to area

– => less area efficient

  • don’t match structure in computation

Caltech CS184a Fall2000 -- DeHon 12

Different Structure

  • How can we have “larger” compute nodes

(less general interconnect) without paying huge area penalty of large LUTs?

slide-7
SLIDE 7

7

Caltech CS184a Fall2000 -- DeHon 13

Structure in subgraphs

  • Small LUTs capture

structure

  • Structure of small

LUT-mapped netlists?

Caltech CS184a Fall2000 -- DeHon 14

Structure

  • LUT sequences

ubiquitous

slide-8
SLIDE 8

8

Caltech CS184a Fall2000 -- DeHon 15

Hardwired Logic Blocks

Single Output

Caltech CS184a Fall2000 -- DeHon 16

Hardwired Logic Blocks

Two outputs

slide-9
SLIDE 9

9

Caltech CS184a Fall2000 -- DeHon 17

Relation to ALUs

  • How do ALUs differ?

Caltech CS184a Fall2000 -- DeHon 18

PLAs

slide-10
SLIDE 10

10

Caltech CS184a Fall2000 -- DeHon 19

PLA

Caltech CS184a Fall2000 -- DeHon 20

PLA and Memory

slide-11
SLIDE 11

11

Caltech CS184a Fall2000 -- DeHon 21

PLA and PAL

Caltech CS184a Fall2000 -- DeHon 22

PLAs

  • Fast Implementations for large ANDs or

Ors

  • Number of P-terms can be exponential in

number of input bits

– most complicated functions

  • Can use arrays of small PLAs

– to exploit structure – like we saw arrays of small memories last time

slide-12
SLIDE 12

12

Caltech CS184a Fall2000 -- DeHon 23

PLAs vs. LUTs?

  • Look at Inputs, Outputs, P-Terms

– minimum area (one study, see paper) – K=10, N=12, M=3

  • A(PLA 10,12,3) comparable to 4-LUT?

– 80-130%? – 300% on ECC (structure LUT can exploit)

  • Delay?

– Claim 40% fewer logic levels

  • (general interconnect crossings)

Caltech CS184a Fall2000 -- DeHon 24

PLA Optimization (Folding)

slide-13
SLIDE 13

13

Caltech CS184a Fall2000 -- DeHon 25

Conventional/Commercial FPGA

Altera 9K (from databook)

Caltech CS184a Fall2000 -- DeHon 26

Conventional/Commercial FPGA

Altera 9K (from databook)

slide-14
SLIDE 14

14

Caltech CS184a Fall2000 -- DeHon 27

Finishing Up...

Caltech CS184a Fall2000 -- DeHon 28

Admin

  • Homework 2 return
  • Questions about homework
slide-15
SLIDE 15

15

Caltech CS184a Fall2000 -- DeHon 29

Big Ideas [MSB Ideas]

  • Programmable Interconnect allows us to

exploit that structure

– want to match to application structure

  • Hardwired Cascades

– key technique to reducing delay in programmables

  • PLAs

– canonical two level structure – hardwire portions to get Memories, PALs

Caltech CS184a Fall2000 -- DeHon 30

Big Ideas [MSB-1 Ideas]

  • Delay

– LUT depth decreases with K

  • in practice closer to log(K)

– Delay increases with K

  • small K linear + large fixed term
  • minimum around 5-6
  • Better structure match with hardwired LUT

cascades