Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. See P&H Appendix B.8 (register files) and B.9 1
inst memory register alu file +4 +4 addr =? PC d in d out control cmp offset memory target new imm pc extend focus for today A Single cycle processor 2
Memory • Register Files • Tri-state devices • SRAM (Static RAM—random access memory) • DRAM (Dynamic RAM) 3
Register File • N read/write registers Q A 32 D W Dual-Read-Port • Indexed by 32 Single-Write-Port register number Q B 32 32 x 32 Register File W R W R A R B 1 5 5 5 4
Recall: Register D0 • D flip-flops in parallel • shared clock D1 • extra clocked inputs: write_enable, reset, … D2 D3 4-bit 4 4 reg clk clk 5
Register File D32 Reg 0 • N read/write registers Reg 1 …. • Indexed by 5-to-32 register number decoder Reg 30 … Reg 31 addi r5, r0, 10 00101 5 R W How to write to one register in the register file? • Need a decoder 6
i2 i1 i0 o0 o1 o2 o3 o4 o5 o6o7 3-to-8 0 0 0 decoder … 0 0 1 0 1 0 101 3 R W 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 7
Register File D32 Reg 0 • N read/write registers Reg 1 …. • Indexed by 5-to-32 register number decoder Reg 30 Reg 31 addi r5, r0, 10 5R W W How to write to one register in the register file? • Need a decoder 9
Register File 32 Reg 0 • N read/write registers Reg 1 32Q A M …. …. • Indexed by U X register number Reg 30 Reg 31 M 32Q B …. U X How to read from two registers? • Need a multiplexor 5 5 R A R B 10
Register File D32 32 Reg 0 • N read/write registers Reg 1 32Q A M …. …. • Indexed by U 5-to-32 X register number decoder Reg 30 Reg 31 M Implementation: 32Q B …. U X • D flip flops to store bits • Decoder for each write port • Mux for each read port 5 5 5 R A W R W R B 11
Register File • N read/write registers Q A 32 D W Dual-Read-Port • Indexed by 32 Single-Write-Port register number Q B 32 32 x 32 Register File W R W R A R B Implementation: 1 5 5 5 • D flip flops to store bits • Decoder for each write port • Mux for each read port 12
8-to-1 mux a Register File tradeoffs b + Very fast (a few gate delays for c both read and write) d + Adding extra ports is straightforward e – Doesn’t scale f e.g. 32Mb register file with g 32 bit registers h Need 32x 1M-to-1 multiplexor and 32x 20-to-1M decoder How many logic gates/transistors? s 2 s 1 s 0 13
Memory • CPU: Register Files (i.e. Memory w/in the CPU) • Scaling Memory: Tri-state devices • Cache: SRAM (Static RAM—random access memory) • Memory: DRAM (Dynamic RAM) 14
Need a shared bus (or shared bit line) • Many FlipFlops/outputs/etc. connected to single wire • Only one output drives the bus at a time D 0 S 0 D 1 S 1 D 2 S 2 D 3 S 3 D 1023 S 1023 shared line • How do we build such a device? 15
Tri-State Buffers If enabled (E=1), then Q = D • Otherwise, Q is not connected (z = high impedance) • E D Q E D Q 0 0 z z 0 1 1 0 0 1 1 1 16
D 0 S 0 D 1 S 1 D 2 S 2 D 3 S 3 D 1023 S 1023 shared line 17
Register files are very fast storage (only a few gate delays), but does not scale to large memory sizes. Tri-state Buffers allow scaling since multiple registers can be connected to a single output, while only one register actually drives the output. 18
Memory • CPU: Register Files (i.e. Memory w/in the CPU) • Scaling Memory: Tri-state devices • Cache: SRAM (Static RAM—random access memory) • Memory: DRAM (Dynamic RAM) 19
• Storage Cells + plus Tri-State Buffers • Inputs: Address, Data (for writes) • Outputs: Data (for reads) • Also need R/W signal (not shown) N Address N address bits à 2 N words total • M M data bits à each word M bits • Data 20
• Storage Cells + plus Tri-State Buffers • Decoder selects a word line • R/W selector determines access type • Word line is then coupled to the data lines data lines Address Decoder R/W
D in [1] D in [2] E.g. How do we design D Q D Q a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are D Q 2-to-4 D Q decoder each 2 bits wide)? enable enable 1 2 4 x 2 Memory D Q Address D Q enable enable 2 D Q D Q enable enable 3 Write Enable Output Enable D out [1] D out [2] 22
D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 enable enable 3 Write Enable Output Enable D out [1] D out [2]
D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 Bit lines enable enable 3 Write Enable Output Enable D out [1] D out [2] 24
D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 Word lines enable enable 3 Write Enable Output Enable D out [1] D out [2] 25
Frequency should be set to AA What’s your familiarity with memory (SRAM, DRAM)? A. I’ve never heard of any of this. B. I’ve heard the words SRAM and DRAM, but I have no idea what they are. C. I know that DRAM means main memory. D. I know the difference between SRAM and DRAM and where they are used in a computer system. 26
Typical SRAM Cell bit line word line " B B Each cell stores one bit, and requires 4 – 8 transistors (6 is typical) Pass-Through Transistors 27
SRAM • A few transistors (~6) per cell • Used for working memory (caches) • But for even higher density… 28
Dynamic-RAM (DRAM) • Data values require constant refresh bit line word line Capacitor Gnd Each cell stores one bit, and requires 1 transistors 29
Dynamic-RAM (DRAM) • Data values require constant refresh bit line word line Pass-Through Transistors Capacitor Gnd Each cell stores one bit, and requires 1 transistors 30
Single transistor vs. many gates • Denser, cheaper ($30/1GB vs. $30/2MB) • But more complicated, and has analog sensing Also needs refresh • Read and write back… • …every few milliseconds • Organized in 2D grid, so can do rows at a time • Chip can do refresh internally Hence… slower and energy inefficient 31
Register File tradeoffs + Very fast (a few gate delays for both read and write) + Adding extra ports is straightforward – Expensive, doesn’t scale – Volatile Volatile Memory alternatives: SRAM, DRAM, … – Slower + Cheaper, and scales well – Volatile Non-Volatile Memory (NV-RAM): Flash, EEPROM, … + Scales well – Limited lifetime; degrades after 100000 to 1M writes 32
Finally have the building blocks to build machines that can perform non-trivial computational tasks Register File: Tens of words of working memory SRAM: Millions of words of working memory DRAM: Billions of words of working memory NVRAM: long term storage (usb fob, solid state disks, BIOS, …) Next time we will build a simple processor! 33
Recommend
More recommend