Page 1 Increasing Bandwidth - Interleaving Main Memory Performance - PDF document

Classical DRAM Organization (square) bit (data) lines EECS 252 Graduate Computer r Each intersection represents o a 1-T DRAM Cell w Architecture RAM Cell d Array e c o Lec 23 – Storage Technology d word (row) select e r David Culler Electrical Engineering and Computer Sciences Column Selector & row Column I/O Circuits address University of California, Berkeley Address http://www.eecs.berkeley.edu/~culler • Row and Column Address data http://www-inst.eecs.berkeley.edu/~cs252 together: – Select 1 bit a time DRAM Capacitors: more capacitance Review:1-T Memory Cell (DRAM) in a small area row select • Write: – 1. Drive bit line – 2.. Select row • Read: – 1. Precharge bit line to Vdd/2 – 2.. Select row bit – 3. Cell and bit line share charges » Very small voltage changes on the bit line – 4. Sense (fancy sense amp) » Can detect changes of ~1 million electrons • Trench capacitors: • Stacked capacitors – 5. Write: restore the value – Logic ABOVE capacitor – Logic BELOW capacitor • Refresh – Gain in surface area of capacitor – Gain in surface area of capacitor – Better Scaling properties – 1. Just do a dummy read to every cell. – 2-dim cross-section quite small – Better Planarization DRAM Read Timing 4 Key DRAM Timing Parameters • Every DRAM access RAS_L CAS_L WE_L OE_L begins at: • t RAC : minimum time from RAS line falling to – The assertion of the RAS_L A 256K x 8 the valid data output. D – 2 ways to read: DRAM 9 8 – Quoted as the speed of a DRAM when buy early or late v. CAS DRAM Read Cycle Time – A typical 4Mb DRAM t RAC = 60 ns RAS_L – Speed of DRAM since on purchase sheet? • t RC : minimum time from the start of one row CAS_L access to the start of the next. A Row Address Col Address Junk Row Address Col Address Junk – t RC = 110 ns for a 4Mbit DRAM with a t RAC of 60 ns • t CAC : minimum time from CAS line falling to WE_L valid data output. OE_L – 15 ns for a 4Mbit DRAM with a t RAC of 60 ns D High Z Junk Data Out High Z Data Out • t PC : minimum time from the start of one Read Access Output Enable column access to the start of the next. Time Delay – 35 ns for a 4Mbit DRAM with a t RAC of 60 ns Early Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L Page 1

Increasing Bandwidth - Interleaving Main Memory Performance Access Pattern without Interleaving: CPU Memory Cycle Time Access Time Time D1 available • DRAM (Read/Write) Cycle Time >> DRAM Start Access for D1 Start Access for D2 (Read/Write) Access Time – - 2:1; why? Memory Access Pattern with 4-way Interleaving: Bank 0 • DRAM (Read/Write) Cycle Time : Memory – How frequent can you initiate an access? Bank 1 CPU – Analogy: A little kid can only ask his father for money on Saturday Memory • DRAM (Read/Write) Access Time: Bank 2 – How quickly will you get what you want once you initiate an access? Memory Access Bank 0 Bank 3 – Analogy: As soon as he asks, his father will give him the money Access Bank 1 • DRAM Bandwidth Limitation analogy: Access Bank 2 Access Bank 3 – What happens if he runs out of money on Wednesday? We can Access Bank 0 again Main Memory Performance Main Memory Performance • Timing model – 1 to send address, – 4 for access time, 10 cycle time, 1 to send data – Cache Block is 4 words • Simple M.P. = 4 x (1+10+1) = 48 • Wide M.P. = 1 + 10 + 1 = 12 • Interleaved M.P. = 1+10+1 + 3 =15 • Wide : • Interleaved : – CPU/Mux 1 word; – CPU, Cache, Bus 1 word: address address address address Mux/Cache, Bus, Memory N Modules 0 1 2 3 Memory N words (4 Modules); example is 4 5 6 7 (Alpha: 64 bits & 256 word interleaved 8 9 10 11 bits) 12 13 14 15 • Simple : Bank 0 Bank 1 Bank 2 Bank 3 – CPU, Cache, Bus, Memory same width (32 bits) Finding Bank Number and Address Avoiding Bank Conflicts within a bank • Lots of banks Problem: We want to determine the number of banks, N b , to use int x[256][512]; and the number of words to store in each bank, W b , such that: for (j = 0; j < 512; j = j+1) for (i = 0; i < 256; i = i+1) • given a word address x, it is easy to find the bank where x will x[i][j] = 2 * x[i][j]; be found, B(x), and the address of x within the bank, A(x). • Even with 128 banks, since 512 is multiple of 128, conflict on word accesses • for any address x, B(x) and A(x) are unique. • SW: loop interchange or declaring array not power of 2 • the number of bank conflicts is minimized (“array padding”) • HW: Prime number of banks – bank number = address mod number of banks – bank number = address mod number of banks – address within bank =  address / number of words in bank – modulo & divide per memory access with prime no. banks? Page 2

Finding Bank Number and Address Fast Bank Number within a bank • Chinese Remainder Theorem Solution: We will use the following relation to determine the bank As long as two sets of integers ai and bi follow these rules number for x, B(x), and the address of x within the bank, A(x): b i = x mod a i ,0 ≤ b i < a i , 0 ≤ x < a 0 × a 1 × a 2 ×… and that ai and aj are co-prime if i ≠ j, then the integer x has only one B(x) = x MOD N b solution (unambiguous mapping): A(x) = x MOD W b – bank number = b 0 , number of banks = a 0 – address within bank = b 1 , number of words in bank = a 1 and we will choose N b and W b to be co-prime, i.e., there is no prime – N word address 0 to N-1, prime no. banks, words power of 2 number that is a factor of N b and W b (this condition is satisfied • 3 banks Nb = 3, and 8 words per bank, Wb = 8. if we choose N b to be a prime number that is equal to an integer Seq. Interleaved Modulo Interleaved power of two minus 1). Bank Number: 0 1 2 0 1 2 Address We can then use the Chinese Remainder Theorem within Bank: 0 0 1 2 0 16 8 to show that B(x) and A(x) is always unique. 1 3 4 5 9 1 17 2 6 7 8 18 10 2 3 9 10 11 3 19 11 4 12 13 14 12 4 20 5 15 16 17 21 13 5 6 18 19 20 6 22 14 7 21 22 23 15 7 23 Fast Page Mode Operation Fast Memory Systems: DRAM specific • Regular DRAM Organization: Column Address N cols – N rows x N column x M-bit • Multiple CAS accesses: several names (page mode) – Read & Write M-bit at a time – Extended Data Out (EDO) : 30% faster in page mode – Each M-bit access requires a RAS / CAS cycle DRAM • New DRAMs to address gap; Row • Fast Page Mode DRAM N rows Address what will they cost, will they survive? – N x M “SRAM” to save a row – RAMBUS : startup company; reinvent DRAM interface • After a row is read into the » Each Chip a module vs. slice of memory register » Short bus between CPU and chips – Only CAS is needed to access N x M “SRAM” » Does own refresh other M-bit blocks on that row M bits » Variable amount of data returned – RAS_L remains asserted while M-bit Output CAS_L is toggled » 1 byte / 2 ns (500 MB/s per chip) – Synchronous DRAM : 2 banks on chip, a clock signal to DRAM, 1st M-bit Access 2nd M-bit 3rd M-bit 4th M-bit transfer synchronous to system clock (66 - 150 MHz) RAS_L – Intel claims RAMBUS Direct (16 b wide) is future PC memory • Niche memory or main memory? CAS_L – e.g., Video RAM for frame buffers, DRAM + fast serial output A Row Address Col Address Col Address Col Address Col Address DRAM History DRAM Future: 1 Gbit+ DRAM • DRAMs: capacity +60%/yr, cost –30%/yr – 2.5X cells/area, 1.5X die size in -3 years • ‘98 DRAM fab line costs $2B Mitsubishi Samsung – DRAM only: density, leakage v. speed • Blocks 512 x 2 Mbit 1024 x 1 Mbit • Rely on increasing no. of computers & memory per • Clock 200 MHz 250 MHz computer (60% market) – SIMM or DIMM is replaceable unit • Data Pins 64 16 => computers use any generation DRAM • Die Size 24 x 24 mm 31 x 21 mm • Commodity, second source industry => high volume, low profit, conservative – Sizes will be much smaller in production – Little organization innovation in 20 years • Metal Layers 3 4 • Order of importance: 1) Cost/bit 2) Capacity • Technology 0.15 micron 0.16 micron – First RAMBUS: 10X BW, +30% cost => little impact Page 3

Page 1 Increasing Bandwidth - Interleaving Main Memory Performance - PDF document

Classical DRAM Organization (square) bit (data) lines EECS 252 Graduate Computer r Each intersection represents o a 1-T DRAM Cell w Architecture RAM Cell d Array e c o Lec 23 Storage Technology d word (row) select e r David

Agenda Item 7 Page 107 Page 108 Page 109 Page 110 Page 111 Page 112 Page 113 Page 114 Page

Page 1 of 36 Page 2 of 36 Page 3 of 36 Page 4 of 36 Page 5 of 36 Page 6 of 36 Page 7 of 36

Agenda Item 7 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Lecture 8 Friday, June 2, 2017 5:38 PM slide_8 Page 1 slide_8 Page 2 slide_8 Page 3 slide_8

177 Hudson Street Manhattan, NY 10013 Block 219 Lot 21 Historic Photos Page 1 Page 2 Page 3

PAGE 1 PAGE 2 PAGE 3 PAGE 4 Vision PAGE 5 Desire Lines of Cow Paths? PAGE 6

1. Test page This page is for testing. This page is for testing. This page is for testing.

Lecture 12 Sunday, January 27, 2019 5:25 PM Lecture12 Page 1 Lecture12 Page 2 Lecture12 Page 3

KAMPARO page 9 page 16 page 19 page 27 page 34 2 INHOUDSOPGA VE page 4 Cables Chargers

Page 35 Page 36 Page 37 Page 38 Page 39 This page is intentionally left blank

May 26, 2015 Presentation to Council and School Board Page 1 of 24 Page 2 of 24 Page 3 of 24

BRIGHT-LINE TEST Table of Contents page page page page page 3 5 11 15 19 What is the

Contents Nordea Page 3 Integration Page 16 Highlights and market development Page 24

Contents Summary presentation Q3/02 Page 3 Nordea Page 43 Integration Page 54

Contents Press slides Page 3 Financial highlights Q3/03 Page 17 Credit quality Page

1 Memory Read Transaction (1) Memory Read Transaction (2) CPU places address A on the memory

CSSE132 Introduc0on to Computer Systems 20 : Memory hierarchy

Permanent Storage Devices Disks, RAID, and SSDs (Chapters 36 38, 44) CS 4410 Operating

PHM Report on the 2019 Population Health Colloquium in Philadelphia June

650MHz SRF Cavity TUNER Yuriy Pischalnikov PIP-II Fine Tuning Workshop, Saclay 26 June, 2018

On th the e challeng allenges es of de deplo ploying ying an an unusual sual hig igh pe

The Book Embedding Problem from a SAT-Solving Perspective [GD 2015] Michalis Bekos, Michael

Algorithms, Probability & Computing Emo Welzl Ueli Maurer Angelika Steger Peter Widmayer

Page 1 Increasing Bandwidth - Interleaving Main Memory Performance - PDF document

Classical DRAM Organization (square) bit (data) lines EECS 252 Graduate Computer r Each intersection represents o a 1-T DRAM Cell w Architecture RAM Cell d Array e c o Lec 23 Storage Technology d word (row) select e r David

Agenda Item 7 Page 107 Page 108 Page 109 Page 110 Page 111 Page 112 Page 113 Page 114 Page

Page 1 of 36 Page 2 of 36 Page 3 of 36 Page 4 of 36 Page 5 of 36 Page 6 of 36 Page 7 of 36

Agenda Item 7 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Lecture 8 Friday, June 2, 2017 5:38 PM slide_8 Page 1 slide_8 Page 2 slide_8 Page 3 slide_8

177 Hudson Street Manhattan, NY 10013 Block 219 Lot 21 Historic Photos Page 1 Page 2 Page 3

PAGE 1 PAGE 2 PAGE 3 PAGE 4 Vision PAGE 5 Desire Lines of Cow Paths? PAGE 6

1. Test page This page is for testing. This page is for testing. This page is for testing.

Lecture 12 Sunday, January 27, 2019 5:25 PM Lecture12 Page 1 Lecture12 Page 2 Lecture12 Page 3

KAMPARO page 9 page 16 page 19 page 27 page 34 2 INHOUDSOPGA VE page 4 Cables Chargers

Page 35 Page 36 Page 37 Page 38 Page 39 This page is intentionally left blank

May 26, 2015 Presentation to Council and School Board Page 1 of 24 Page 2 of 24 Page 3 of 24

BRIGHT-LINE TEST Table of Contents page page page page page 3 5 11 15 19 What is the

Contents Nordea Page 3 Integration Page 16 Highlights and market development Page 24

Contents Summary presentation Q3/02 Page 3 Nordea Page 43 Integration Page 54

Contents Press slides Page 3 Financial highlights Q3/03 Page 17 Credit quality Page

1 Memory Read Transaction (1) Memory Read Transaction (2) CPU places address A on the memory

CSSE132 Introduc0on to Computer Systems 20 : Memory hierarchy

Permanent Storage Devices Disks, RAID, and SSDs (Chapters 36 38, 44) CS 4410 Operating

PHM Report on the 2019 Population Health Colloquium in Philadelphia June

650MHz SRF Cavity TUNER Yuriy Pischalnikov PIP-II Fine Tuning Workshop, Saclay 26 June, 2018

On th the e challeng allenges es of de deplo ploying ying an an unusual sual hig igh pe

The Book Embedding Problem from a SAT-Solving Perspective [GD 2015] Michalis Bekos, Michael

Algorithms, Probability &amp; Computing Emo Welzl Ueli Maurer Angelika Steger Peter Widmayer

Algorithms, Probability & Computing Emo Welzl Ueli Maurer Angelika Steger Peter Widmayer