anne bracy cs 3410 computer science cornell university
play

Anne Bracy CS 3410 Computer Science Cornell University The slides - PowerPoint PPT Presentation

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. See P&H Appendix B.8 (register files) and B.9 1 inst memory


  1. Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. See P&H Appendix B.8 (register files) and B.9 1

  2. inst memory register alu file +4 +4 addr =? PC d in d out control cmp offset memory target new imm pc extend focus for today A Single cycle processor 2

  3. Memory • Register Files • Tri-state devices • SRAM (Static RAM—random access memory) • DRAM (Dynamic RAM) 3

  4. Register File • N read/write registers Q A 32 D W Dual-Read-Port • Indexed by 32 Single-Write-Port register number Q B 32 32 x 32 Register File W R W R A R B 1 5 5 5 4

  5. Recall: Register D0 • D flip-flops in parallel • shared clock D1 • extra clocked inputs: write_enable, reset, … D2 D3 4-bit 4 4 reg clk clk 5

  6. Register File D32 Reg 0 • N read/write registers Reg 1 …. • Indexed by 5-to-32 register number decoder Reg 30 … Reg 31 addi r5, r0, 10 00101 5 R W How to write to one register in the register file? • Need a decoder 6

  7. i2 i1 i0 o0 o1 o2 o3 o4 o5 o6o7 3-to-8 0 0 0 decoder … 0 0 1 0 1 0 101 3 R W 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 7

  8. Register File D32 Reg 0 • N read/write registers Reg 1 …. • Indexed by 5-to-32 register number decoder Reg 30 Reg 31 addi r5, r0, 10 5R W W How to write to one register in the register file? • Need a decoder 9

  9. Register File 32 Reg 0 • N read/write registers Reg 1 32Q A M …. …. • Indexed by U X register number Reg 30 Reg 31 M 32Q B …. U X How to read from two registers? • Need a multiplexor 5 5 R A R B 10

  10. Register File D32 32 Reg 0 • N read/write registers Reg 1 32Q A M …. …. • Indexed by U 5-to-32 X register number decoder Reg 30 Reg 31 M Implementation: 32Q B …. U X • D flip flops to store bits • Decoder for each write port • Mux for each read port 5 5 5 R A W R W R B 11

  11. Register File • N read/write registers Q A 32 D W Dual-Read-Port • Indexed by 32 Single-Write-Port register number Q B 32 32 x 32 Register File W R W R A R B Implementation: 1 5 5 5 • D flip flops to store bits • Decoder for each write port • Mux for each read port 12

  12. 8-to-1 mux a Register File tradeoffs b + Very fast (a few gate delays for c both read and write) d + Adding extra ports is straightforward e – Doesn’t scale f e.g. 32Mb register file with g 32 bit registers h Need 32x 1M-to-1 multiplexor and 32x 20-to-1M decoder How many logic gates/transistors? s 2 s 1 s 0 13

  13. Memory • CPU: Register Files (i.e. Memory w/in the CPU) • Scaling Memory: Tri-state devices • Cache: SRAM (Static RAM—random access memory) • Memory: DRAM (Dynamic RAM) 14

  14. Need a shared bus (or shared bit line) • Many FlipFlops/outputs/etc. connected to single wire • Only one output drives the bus at a time D 0 S 0 D 1 S 1 D 2 S 2 D 3 S 3 D 1023 S 1023 shared line • How do we build such a device? 15

  15. Tri-State Buffers If enabled (E=1), then Q = D • Otherwise, Q is not connected (z = high impedance) • E D Q E D Q 0 0 z z 0 1 1 0 0 1 1 1 16

  16. D 0 S 0 D 1 S 1 D 2 S 2 D 3 S 3 D 1023 S 1023 shared line 17

  17. Register files are very fast storage (only a few gate delays), but does not scale to large memory sizes. Tri-state Buffers allow scaling since multiple registers can be connected to a single output, while only one register actually drives the output. 18

  18. Memory • CPU: Register Files (i.e. Memory w/in the CPU) • Scaling Memory: Tri-state devices • Cache: SRAM (Static RAM—random access memory) • Memory: DRAM (Dynamic RAM) 19

  19. • Storage Cells + plus Tri-State Buffers • Inputs: Address, Data (for writes) • Outputs: Data (for reads) • Also need R/W signal (not shown) N Address N address bits à 2 N words total • M M data bits à each word M bits • Data 20

  20. • Storage Cells + plus Tri-State Buffers • Decoder selects a word line • R/W selector determines access type • Word line is then coupled to the data lines data lines Address Decoder R/W

  21. D in [1] D in [2] E.g. How do we design D Q D Q a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are D Q 2-to-4 D Q decoder each 2 bits wide)? enable enable 1 2 4 x 2 Memory D Q Address D Q enable enable 2 D Q D Q enable enable 3 Write Enable Output Enable D out [1] D out [2] 22

  22. D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 enable enable 3 Write Enable Output Enable D out [1] D out [2]

  23. D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 Bit lines enable enable 3 Write Enable Output Enable D out [1] D out [2] 24

  24. D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 Word lines enable enable 3 Write Enable Output Enable D out [1] D out [2] 25

  25. Typical SRAM Cell bit line word line " B B Each cell stores one bit, and requires 4 – 8 transistors (6 is typical) Pass-Through Transistors 26

  26. SRAM • A few transistors (~6) per cell • Used for working memory (caches) • But for even higher density… 27

  27. bit line Dynamic-RAM (DRAM) word line • Data values require constant refresh Capacitor Gnd Each cell stores one bit, and requires 1 transistors 28

  28. bit line Dynamic-RAM (DRAM) word line • Data values require constant refresh Pass-Through Transistors Capacitor Gnd Each cell stores one bit, and requires 1 transistors 29

  29. Single transistor vs. many gates • Denser, cheaper ($30/1GB vs. $30/2MB) • But more complicated, and has analog sensing Also needs refresh • Read and write back… • …every few milliseconds • Organized in 2D grid, so can do rows at a time • Chip can do refresh internally Hence… slower and energy inefficient 30

  30. Register File tradeoffs + Very fast (a few gate delays for both read and write) + Adding extra ports is straightforward – Expensive, doesn’t scale – Volatile Volatile Memory alternatives: SRAM, DRAM, … – Slower + Cheaper, and scales well – Volatile Non-Volatile Memory (NV-RAM): Flash, EEPROM, … + Scales well – Limited lifetime; degrades after 100000 to 1M writes 31

  31. Finally have the building blocks to build machines that can perform non-trivial computational tasks Register File: Tens of words of working memory SRAM: Millions of words of working memory DRAM: Billions of words of working memory NVRAM: long term storage (usb fob, solid state disks, BIOS, …) Next time we will build a simple processor! 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend