page 1
play

Page 1 Increasing Bandwidth - Interleaving Main Memory Performance - PDF document

Classical DRAM Organization (square) bit (data) lines EECS 252 Graduate Computer r Each intersection represents o a 1-T DRAM Cell w Architecture RAM Cell d Array e c o Lec 23 Storage Technology d word (row) select e r David


  1. Classical DRAM Organization (square) bit (data) lines EECS 252 Graduate Computer r Each intersection represents o a 1-T DRAM Cell w Architecture RAM Cell d Array e c o Lec 23 – Storage Technology d word (row) select e r David Culler Electrical Engineering and Computer Sciences Column Selector & row Column I/O Circuits address University of California, Berkeley Address http://www.eecs.berkeley.edu/~culler • Row and Column Address data http://www-inst.eecs.berkeley.edu/~cs252 together: – Select 1 bit a time DRAM Capacitors: more capacitance Review:1-T Memory Cell (DRAM) in a small area row select • Write: – 1. Drive bit line – 2.. Select row • Read: – 1. Precharge bit line to Vdd/2 – 2.. Select row bit – 3. Cell and bit line share charges » Very small voltage changes on the bit line – 4. Sense (fancy sense amp) » Can detect changes of ~1 million electrons • Trench capacitors: • Stacked capacitors – 5. Write: restore the value – Logic ABOVE capacitor – Logic BELOW capacitor • Refresh – Gain in surface area of capacitor – Gain in surface area of capacitor – Better Scaling properties – 1. Just do a dummy read to every cell. – 2-dim cross-section quite small – Better Planarization DRAM Read Timing 4 Key DRAM Timing Parameters • Every DRAM access RAS_L CAS_L WE_L OE_L begins at: • t RAC : minimum time from RAS line falling to – The assertion of the RAS_L A 256K x 8 the valid data output. D – 2 ways to read: DRAM 9 8 – Quoted as the speed of a DRAM when buy early or late v. CAS DRAM Read Cycle Time – A typical 4Mb DRAM t RAC = 60 ns RAS_L – Speed of DRAM since on purchase sheet? • t RC : minimum time from the start of one row CAS_L access to the start of the next. A Row Address Col Address Junk Row Address Col Address Junk – t RC = 110 ns for a 4Mbit DRAM with a t RAC of 60 ns • t CAC : minimum time from CAS line falling to WE_L valid data output. OE_L – 15 ns for a 4Mbit DRAM with a t RAC of 60 ns D High Z Junk Data Out High Z Data Out • t PC : minimum time from the start of one Read Access Output Enable column access to the start of the next. Time Delay – 35 ns for a 4Mbit DRAM with a t RAC of 60 ns Early Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L Page 1

  2. Increasing Bandwidth - Interleaving Main Memory Performance Access Pattern without Interleaving: CPU Memory Cycle Time Access Time Time D1 available • DRAM (Read/Write) Cycle Time >> DRAM Start Access for D1 Start Access for D2 (Read/Write) Access Time – - 2:1; why? Memory Access Pattern with 4-way Interleaving: Bank 0 • DRAM (Read/Write) Cycle Time : Memory – How frequent can you initiate an access? Bank 1 CPU – Analogy: A little kid can only ask his father for money on Saturday Memory • DRAM (Read/Write) Access Time: Bank 2 – How quickly will you get what you want once you initiate an access? Memory Access Bank 0 Bank 3 – Analogy: As soon as he asks, his father will give him the money Access Bank 1 • DRAM Bandwidth Limitation analogy: Access Bank 2 Access Bank 3 – What happens if he runs out of money on Wednesday? We can Access Bank 0 again Main Memory Performance Main Memory Performance • Timing model – 1 to send address, – 4 for access time, 10 cycle time, 1 to send data – Cache Block is 4 words • Simple M.P. = 4 x (1+10+1) = 48 • Wide M.P. = 1 + 10 + 1 = 12 • Interleaved M.P. = 1+10+1 + 3 =15 • Wide : • Interleaved : – CPU/Mux 1 word; – CPU, Cache, Bus 1 word: address address address address Mux/Cache, Bus, Memory N Modules 0 1 2 3 Memory N words (4 Modules); example is 4 5 6 7 (Alpha: 64 bits & 256 word interleaved 8 9 10 11 bits) 12 13 14 15 • Simple : Bank 0 Bank 1 Bank 2 Bank 3 – CPU, Cache, Bus, Memory same width (32 bits) Finding Bank Number and Address Avoiding Bank Conflicts within a bank • Lots of banks Problem: We want to determine the number of banks, N b , to use int x[256][512]; and the number of words to store in each bank, W b , such that: for (j = 0; j < 512; j = j+1) for (i = 0; i < 256; i = i+1) • given a word address x, it is easy to find the bank where x will x[i][j] = 2 * x[i][j]; be found, B(x), and the address of x within the bank, A(x). • Even with 128 banks, since 512 is multiple of 128, conflict on word accesses • for any address x, B(x) and A(x) are unique. • SW: loop interchange or declaring array not power of 2 • the number of bank conflicts is minimized (“array padding”) • HW: Prime number of banks – bank number = address mod number of banks – bank number = address mod number of banks – address within bank =  address / number of words in bank – modulo & divide per memory access with prime no. banks? Page 2

  3. Finding Bank Number and Address Fast Bank Number within a bank • Chinese Remainder Theorem Solution: We will use the following relation to determine the bank As long as two sets of integers ai and bi follow these rules number for x, B(x), and the address of x within the bank, A(x): b i = x mod a i ,0 ≤ b i < a i , 0 ≤ x < a 0 × a 1 × a 2 ×… and that ai and aj are co-prime if i ≠ j, then the integer x has only one B(x) = x MOD N b solution (unambiguous mapping): A(x) = x MOD W b – bank number = b 0 , number of banks = a 0 – address within bank = b 1 , number of words in bank = a 1 and we will choose N b and W b to be co-prime, i.e., there is no prime – N word address 0 to N-1, prime no. banks, words power of 2 number that is a factor of N b and W b (this condition is satisfied • 3 banks Nb = 3, and 8 words per bank, Wb = 8. if we choose N b to be a prime number that is equal to an integer Seq. Interleaved Modulo Interleaved power of two minus 1). Bank Number: 0 1 2 0 1 2 Address We can then use the Chinese Remainder Theorem within Bank: 0 0 1 2 0 16 8 to show that B(x) and A(x) is always unique. 1 3 4 5 9 1 17 2 6 7 8 18 10 2 3 9 10 11 3 19 11 4 12 13 14 12 4 20 5 15 16 17 21 13 5 6 18 19 20 6 22 14 7 21 22 23 15 7 23 Fast Page Mode Operation Fast Memory Systems: DRAM specific • Regular DRAM Organization: Column Address N cols – N rows x N column x M-bit • Multiple CAS accesses: several names (page mode) – Read & Write M-bit at a time – Extended Data Out (EDO) : 30% faster in page mode – Each M-bit access requires a RAS / CAS cycle DRAM • New DRAMs to address gap; Row • Fast Page Mode DRAM N rows Address what will they cost, will they survive? – N x M “SRAM” to save a row – RAMBUS : startup company; reinvent DRAM interface • After a row is read into the » Each Chip a module vs. slice of memory register » Short bus between CPU and chips – Only CAS is needed to access N x M “SRAM” » Does own refresh other M-bit blocks on that row M bits » Variable amount of data returned – RAS_L remains asserted while M-bit Output CAS_L is toggled » 1 byte / 2 ns (500 MB/s per chip) – Synchronous DRAM : 2 banks on chip, a clock signal to DRAM, 1st M-bit Access 2nd M-bit 3rd M-bit 4th M-bit transfer synchronous to system clock (66 - 150 MHz) RAS_L – Intel claims RAMBUS Direct (16 b wide) is future PC memory • Niche memory or main memory? CAS_L – e.g., Video RAM for frame buffers, DRAM + fast serial output A Row Address Col Address Col Address Col Address Col Address DRAM History DRAM Future: 1 Gbit+ DRAM • DRAMs: capacity +60%/yr, cost –30%/yr – 2.5X cells/area, 1.5X die size in -3 years • ‘98 DRAM fab line costs $2B Mitsubishi Samsung – DRAM only: density, leakage v. speed • Blocks 512 x 2 Mbit 1024 x 1 Mbit • Rely on increasing no. of computers & memory per • Clock 200 MHz 250 MHz computer (60% market) – SIMM or DIMM is replaceable unit • Data Pins 64 16 => computers use any generation DRAM • Die Size 24 x 24 mm 31 x 21 mm • Commodity, second source industry => high volume, low profit, conservative – Sizes will be much smaller in production – Little organization innovation in 20 years • Metal Layers 3 4 • Order of importance: 1) Cost/bit 2) Capacity • Technology 0.15 micron 0.16 micron – First RAMBUS: 10X BW, +30% cost => little impact Page 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend