cs184a computer architecture structures and organization
play

CS184a: Computer Architecture (Structures and Organization) Day10: - PDF document

CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs, PLAs Caltech CS184a Fall2000 -- DeHon 1 Last Time LUTs area structure big LUTs vs. small LUTs with


  1. CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs, PLAs Caltech CS184a Fall2000 -- DeHon 1 Last Time • LUTs – area – structure – big LUTs vs. small LUTs with interconnect – design space – optimization Caltech CS184a Fall2000 -- DeHon 2 1

  2. Today • LUT Delay • LUT Cascades • ALUs • PLAs Caltech CS184a Fall2000 -- DeHon 3 Delay Caltech CS184a Fall2000 -- DeHon 4 2

  3. Delay? • Circuit Depth in LUTs? • “Simple Function” --> M-input AND – 1 table lookup in M-LUT – log k (M) in K-LUT Caltech CS184a Fall2000 -- DeHon 5 Delay? • M-input “Complex” function – 1 table lookup for M-LUT – between:  (M-K)/log 2 (k)  +1 – and  (M-K)/log 2 (k- log 2 (k))  +1 Caltech CS184a Fall2000 -- DeHon 6 3

  4. Delay • Simple: log M • Complex: linear in M • Both go as 1/log(k) Caltech CS184a Fall2000 -- DeHon 7 Circuit Depth vs. K Caltech CS184a Fall2000 -- DeHon 8 4

  5. LUT Delay vs. K • For small LUTs: • Large LUTs: – t LUT ≈ c 0 +c 1 × K – add length term – c 2 ×√ 2 K • Plus Wire Delay – ~ √ area Caltech CS184a Fall2000 -- DeHon 9 Delay vs. K Why not satisfied with this model? Delay = Depth × (t LUT + t Interconnect ) Caltech CS184a Fall2000 -- DeHon 10 5

  6. Observation • General interconnect is expensive • “Larger” logic blocks – => less interconnect crossing – => lower interconnect delay – => get larger – => get slower • faster than modeled here due to area – => less area efficient • don’t match structure in computation Caltech CS184a Fall2000 -- DeHon 11 Different Structure • How can we have “larger” compute nodes (less general interconnect) without paying huge area penalty of large LUTs? Caltech CS184a Fall2000 -- DeHon 12 6

  7. Structure in subgraphs • Small LUTs capture structure • Structure of small LUT-mapped netlists? Caltech CS184a Fall2000 -- DeHon 13 Structure • LUT sequences ubiquitous Caltech CS184a Fall2000 -- DeHon 14 7

  8. Hardwired Logic Blocks Single Output Caltech CS184a Fall2000 -- DeHon 15 Hardwired Logic Blocks Two outputs Caltech CS184a Fall2000 -- DeHon 16 8

  9. Relation to ALUs • How do ALUs differ? Caltech CS184a Fall2000 -- DeHon 17 PLAs Caltech CS184a Fall2000 -- DeHon 18 9

  10. PLA Caltech CS184a Fall2000 -- DeHon 19 PLA and Memory Caltech CS184a Fall2000 -- DeHon 20 10

  11. PLA and PAL Caltech CS184a Fall2000 -- DeHon 21 PLAs • Fast Implementations for large ANDs or Ors • Number of P-terms can be exponential in number of input bits – most complicated functions • Can use arrays of small PLAs – to exploit structure – like we saw arrays of small memories last time Caltech CS184a Fall2000 -- DeHon 22 11

  12. PLAs vs. LUTs? • Look at Inputs, Outputs, P-Terms – minimum area (one study, see paper) – K=10, N=12, M=3 • A(PLA 10,12,3) comparable to 4-LUT? – 80-130%? – 300% on ECC (structure LUT can exploit) • Delay? – Claim 40% fewer logic levels • (general interconnect crossings) Caltech CS184a Fall2000 -- DeHon 23 PLA Optimization (Folding) Caltech CS184a Fall2000 -- DeHon 24 12

  13. Conventional/Commercial FPGA Altera 9K (from databook) Caltech CS184a Fall2000 -- DeHon 25 Conventional/Commercial FPGA Altera 9K (from databook) Caltech CS184a Fall2000 -- DeHon 26 13

  14. Finishing Up... Caltech CS184a Fall2000 -- DeHon 27 Admin • Homework 2 return • Questions about homework Caltech CS184a Fall2000 -- DeHon 28 14

  15. Big Ideas [MSB Ideas] • Programmable Interconnect allows us to exploit that structure – want to match to application structure • Hardwired Cascades – key technique to reducing delay in programmables • PLAs – canonical two level structure – hardwire portions to get Memories, PALs Caltech CS184a Fall2000 -- DeHon 29 Big Ideas [MSB-1 Ideas] • Delay – LUT depth decreases with K • in practice closer to log(K) – Delay increases with K • small K linear + large fixed term • minimum around 5-6 • Better structure match with hardwired LUT cascades Caltech CS184a Fall2000 -- DeHon 30 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend