CS184a: Computer Architecture (Structures and Organization) Day10: - - PDF document

▶

Dec 23, 2023 207 likes •372 views

CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs, PLAs Caltech CS184a Fall2000 -- DeHon 1 Last Time LUTs area structure big LUTs vs. small LUTs with

SLIDE 1

1

Caltech CS184a Fall2000 -- DeHon 1

CS184a: Computer Architecture (Structures and Organization)

Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs, PLAs

Caltech CS184a Fall2000 -- DeHon 2

Last Time

LUTs

– area – structure – big LUTs vs. small LUTs with interconnect – design space – optimization

SLIDE 2

2

Caltech CS184a Fall2000 -- DeHon 3

Today

LUT Delay
LUT Cascades
ALUs
PLAs

Caltech CS184a Fall2000 -- DeHon 4

Delay

SLIDE 3

3

Caltech CS184a Fall2000 -- DeHon 5

Delay?

Circuit Depth in LUTs?
“Simple Function” --> M-input AND

– 1 table lookup in M-LUT – logk(M) in K-LUT

Caltech CS184a Fall2000 -- DeHon 6

Delay?

M-input “Complex” function

– 1 table lookup for M-LUT – between: (M-K)/log2(k) +1 – and (M-K)/log2(k- log2(k))+1

SLIDE 4

4

Caltech CS184a Fall2000 -- DeHon 7

Delay

Simple: log M
Complex: linear in M
Both go as 1/log(k)

Caltech CS184a Fall2000 -- DeHon 8

Circuit Depth vs. K

SLIDE 5

5

Caltech CS184a Fall2000 -- DeHon 9

LUT Delay vs. K

For small LUTs:

– tLUT≈c0+c1×K

Large LUTs:

– add length term – c2 ×√2K

Plus Wire Delay

– ~√area

Caltech CS184a Fall2000 -- DeHon 10

Delay vs. K

Delay = Depth × (tLUT+ tInterconnect)

Why not satisfied with this model?

SLIDE 6

6

Caltech CS184a Fall2000 -- DeHon 11

Observation

General interconnect is expensive
“Larger” logic blocks

– => less interconnect crossing – => lower interconnect delay – => get larger – => get slower

faster than modeled here due to area

– => less area efficient

don’t match structure in computation

Caltech CS184a Fall2000 -- DeHon 12

Different Structure

How can we have “larger” compute nodes

(less general interconnect) without paying huge area penalty of large LUTs?

SLIDE 7

7

Caltech CS184a Fall2000 -- DeHon 13

Structure in subgraphs

Small LUTs capture

structure

Structure of small

LUT-mapped netlists?

Caltech CS184a Fall2000 -- DeHon 14

Structure

LUT sequences

ubiquitous

SLIDE 8

8

Caltech CS184a Fall2000 -- DeHon 15

Hardwired Logic Blocks

Single Output

Caltech CS184a Fall2000 -- DeHon 16

Hardwired Logic Blocks

Two outputs

SLIDE 9

9

Caltech CS184a Fall2000 -- DeHon 17

Relation to ALUs

How do ALUs differ?

Caltech CS184a Fall2000 -- DeHon 18

PLAs

SLIDE 10

10

Caltech CS184a Fall2000 -- DeHon 19

PLA

Caltech CS184a Fall2000 -- DeHon 20

PLA and Memory

SLIDE 11

11

Caltech CS184a Fall2000 -- DeHon 21

PLA and PAL

Caltech CS184a Fall2000 -- DeHon 22

PLAs

Fast Implementations for large ANDs or

Ors

Number of P-terms can be exponential in

number of input bits

– most complicated functions

Can use arrays of small PLAs

– to exploit structure – like we saw arrays of small memories last time

SLIDE 12

12

Caltech CS184a Fall2000 -- DeHon 23

PLAs vs. LUTs?

Look at Inputs, Outputs, P-Terms

– minimum area (one study, see paper) – K=10, N=12, M=3

A(PLA 10,12,3) comparable to 4-LUT?

– 80-130%? – 300% on ECC (structure LUT can exploit)

Delay?

– Claim 40% fewer logic levels

(general interconnect crossings)

Caltech CS184a Fall2000 -- DeHon 24

PLA Optimization (Folding)

SLIDE 13

13

Caltech CS184a Fall2000 -- DeHon 25

Conventional/Commercial FPGA

Altera 9K (from databook)

Caltech CS184a Fall2000 -- DeHon 26

Conventional/Commercial FPGA

Altera 9K (from databook)

SLIDE 14

14

Caltech CS184a Fall2000 -- DeHon 27

Finishing Up...

Caltech CS184a Fall2000 -- DeHon 28

Admin

Homework 2 return
Questions about homework

SLIDE 15

15

Caltech CS184a Fall2000 -- DeHon 29

Big Ideas [MSB Ideas]

Programmable Interconnect allows us to

exploit that structure

– want to match to application structure

Hardwired Cascades

– key technique to reducing delay in programmables

PLAs

– canonical two level structure – hardwire portions to get Memories, PALs

Caltech CS184a Fall2000 -- DeHon 30

Big Ideas [MSB-1 Ideas]

Delay

– LUT depth decreases with K

in practice closer to log(K)

– Delay increases with K

small K linear + large fixed term
minimum around 5-6
Better structure match with hardwired LUT

1

CS184a: Computer Architecture (Structures and Organization)

Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs, PLAs

Last Time

– area – structure – big LUTs vs. small LUTs with interconnect – design space – optimization

2

Today

Delay

3

Delay?

– 1 table lookup in M-LUT – logk(M) in K-LUT

Delay?

– 1 table lookup for M-LUT – between: (M-K)/log2(k) +1 – and (M-K)/log2(k- log2(k))+1

4

Delay

Circuit Depth vs. K

5

LUT Delay vs. K

– tLUT≈c0+c1×K

– add length term – c2 ×√2K

– ~√area

Delay vs. K

Delay = Depth × (tLUT+ tInterconnect)

Why not satisfied with this model?

6

Observation

– => less interconnect crossing – => lower interconnect delay – => get larger – => get slower

– => less area efficient

Different Structure

(less general interconnect) without paying huge area penalty of large LUTs?

7

Structure in subgraphs

structure

LUT-mapped netlists?

Structure

ubiquitous

8

Hardwired Logic Blocks

Single Output

Hardwired Logic Blocks

Two outputs

9

Relation to ALUs

PLAs

10

PLA

PLA and Memory

11

PLA and PAL

PLAs

Ors

number of input bits

– most complicated functions

– to exploit structure – like we saw arrays of small memories last time

12

PLAs vs. LUTs?

– minimum area (one study, see paper) – K=10, N=12, M=3

– 80-130%? – 300% on ECC (structure LUT can exploit)

– Claim 40% fewer logic levels

PLA Optimization (Folding)

13

Conventional/Commercial FPGA

Altera 9K (from databook)

Conventional/Commercial FPGA

Altera 9K (from databook)

14

Finishing Up...

Admin

15

Big Ideas [MSB Ideas]

exploit that structure

– want to match to application structure

– key technique to reducing delay in programmables

– canonical two level structure – hardwire portions to get Memories, PALs

Big Ideas [MSB-1 Ideas]

– LUT depth decreases with K

– Delay increases with K

cascades