non volatile memory datapath component 3
play

Non-volatile memory & Datapath component (3) Prof. Usagi - PowerPoint PPT Presentation

Non-volatile memory & Datapath component (3) Prof. Usagi Recap: Memory hierarchy in modern processor architectures Processor fastest Processor < 1ns Core fastest Registers 32 or 64 words L1 $ L2 $ SRAM $ a few ns KBs ~


  1. Non-volatile memory & Datapath component (3) Prof. Usagi

  2. Recap: Memory “hierarchy” in modern processor architectures Processor fastest Processor < 1ns Core fastest Registers 32 or 64 words L1 $ L2 $ SRAM $ a few ns KBs ~ MBs L3 $ GBs DRAM tens of ns larger TBs tens of ns Storage larger 2

  3. Recap: Registers • Register: a sequential component that can store multiple bits • A basic register can be built simply by using multiple D-FFs Register Output 1 Output 3 Output 2 Output 4 Output 5 D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop D Q D Q D Q D Q D Q D Clk Input 1 Input 3 Input 2 Input 4 Input 5 Inpu 3

  4. Recap: A Classical 6-T SRAM Cell bitline’ bitline wordline Q Q’ Sense Amplifier 4

  5. Recap: SRAM array wd0 wd1 wd2 wd(m-1) 0 1 2 Decoder We can only work on cells sharing upper bits of the same word line simultaneously address n-1 Sense Sense Sense Sense Amp Amp Amp Amp lower bits of MUX address 5

  6. Recap: DRAM cell data • 1 transistor (rather than 6) • Relies on large capacitor to store bit • Write: transistor conducts, data voltage level gets stored on top plate of capacitor wordline • Read: look at the value of d • Problem: Capacitor discharges over time • Must “refresh” regularly, by reading d and then writing it right back 6

  7. Recap: DRAM array 0 1 2 Row Decoder Usually 4K — the page size of your OS! upper bits of address n-1 Row Buffer lower bits of MUX address 7

  8. Recap: Latency of volatile memory Size (Transistors per bit) Latency (ns) Register 18T ~ 0.1 ns SRAM 6T ~ 0.5 ns DRAM 1T 50-100 ns 8

  9. Recap: Thinking about programming struct student_record int main( int argc, char **argv) { { int id; int i,j; double homework; double midterm_average=0.0; double midterm; int number_of_records = 10000000; double final; struct timeval time_start, time_end; }; id = ( int *)malloc( sizeof ( int )*number_of_records); midterm = ( double *)malloc( sizeof ( double )*number_of_records); int main( int argc, char **argv) final = ( double *)malloc( sizeof ( double )*number_of_records); { homework = ( double *)malloc( sizeof ( double )*number_of_records); int i,j; init(number_of_records); double midterm_average=0.0; int number_of_records = 10000000; for (j = 0; j < 100; j++) struct timeval time_start, time_end; for (i = 0; i < number_of_records; i++) struct student_record *records; midterm_average+=midterm[i]; records = ( struct student_record*)malloc( sizeof ( struct free(id); student_record)*number_of_records); free(midterm); More row buffer hits in the init(number_of_records,records); free(final); free(homework); DRAM, more SRAM hits return 0; for (j = 0; j < 100; j++) } for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm; • Which side is faster in executing the for-loop? printf("average: %lf\n",midterm_average/ A. Left number_of_records); free(records); B. Right return 0; } C. About the same 9

  10. Recap: Memory “hierarchy” in modern processor architectures Processor fastest Processor < 1ns Core fastest Registers 32 or 64 words L1 $ Volatile L2 $ SRAM $ a few ns KBs ~ MBs L3 $ GBs DRAM tens of ns larger TBs tens of ns Non-Volatile Storage larger 10

  11. Recap: Flash memory • Floating gate made by polycrystalline silicon trap electrons • The voltage level within the floating gate determines the value of the cell • The floating gates will wear out eventually 11

  12. Recap: Types of Flash Chips 2 voltage levels, 4 voltage levels, 8 voltage levels, 16 voltage levels, 1-bit 2-bit 3-bit 4-bit Single-Level Cell Multi-Level Cell Triple-Level Cell Quad-Level Cell (SLC) (MLC) (TLC) (QLC) 12

  13. Outline • Non-volatile memory — case study: flash memory • Sequential Datapath Components 13

  14. Programming in MLC 4 voltage levels, 2-bit 3.1400000000000001243449787580 = 0x40091EB851EB851F = 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111 1 1 11 1 0 phase #1 10 01 00 0 1 11 phase #1 10 0 0 01 00 Multi-Level Cell (MLC) 1 Phase to finish programming the first page! 1 st page 14

  15. Programming the 2nd page in MLC 4 voltage levels, 3.1400000000000001243449787580 2 nd page = 0x40091EB851EB851F 2-bit = 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111 1 1 = 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111 phase #1 11 1 0 10 phase #2 01 00 0 1 11 phase #1 10 0 0 phase #2 01 00 Multi-Level Cell (MLC) 2 Phase to finish programming the second page! 1 st page 15

  16. Optimizing 1st Page Programming in MLC 4 voltage levels, 2-bit 3.1400000000000001243449787580 = 0x40091EB851EB851F = 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111 1 1 1 1 1 0 phase #1 0 0 0 0 0 1 1 phase #1 0 0 0 0 0 Multi-Level Cell 1 st page 1 Phase to finish programming the first page! (MLC) — the phase is shorter now 16

  17. 2nd Page Programming in MLC 4 voltage levels, 2 nd page 2-bit 3.1400000000000001243449787580 = 0x40091EB851EB851F = 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111 1 1 = 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111 11 1 0 10 phase #1 01 phase #2 00 0 1 11 phase #1 10 0 0 01 phase #2 00 Multi-Level Cell 1 st page (MLC) 2 Phase to finish programming the second page! 17

  18. Poll close in Flash memory characteristics • Regarding the following flash memory characteristics, please identify how many of the following statements are correct ① Flash memory cells can only be programmed with limited times ② The reading latency of flash memory cells can be largely different from programming ③ The latency of programming different flash memory pages can be different ④ The programmed cell cannot be reprogrammed again unless its charge level is refilled to the top-level A. 0 B. 1 C. 2 D. 3 E. 4 18

  19. Program-erase cycles: SLC v.s. MLC v.s. TLC v.s. QLC 19

  20. Flash performance Not a good practice 4000 2,000 140 1,500 3000 105 Program Time( µ s) Erase Time( µ s) Read Time( µ s) MLC MLC 1,000 70 2000 MLC SLC 500 35 1000 SLC SLC 0 '- A-SLC2 A-SLC4 A-SLC8 B-SLC2 50nm B-SLC4 72nm E-SLC8 B-MLC8 72nm B-MLC32 50nm C-MLC64 43nm D-MLC32 E-MLC8 A-SLC2 A-SLC4 A-SLC8 B-SLC2 50nm B-SLC4 72nm E-SLC8 B-MLC8 72nm B-MLC32 50nm C-MLC64 43nm D-MLC32 E-MLC8 0 A-SLC2 A-SLC4 A-SLC8 B-SLC2 50nm B-SLC4 72nm E-SLC8 B-MLC8 72nm B-MLC32 50nm C-MLC64 43nm D-MLC32 E-MLC8 Reads: Program/write: Erase: less than 150us less than 2ms less than 3.6ms Similar relative performance for reads, writes and erases Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. Characterizing flash memory: anomalies, observations, and applications. In MICRO 2009. 20

  21. Flash memory characteristics • Regarding the following flash memory characteristics, please identify how many of the following statements are correct ① Flash memory cells can only be programmed with limited times ② The reading latency of flash memory cells can be largely different from programming ③ The latency of programming different flash memory pages can be different ④ The programmed cell cannot be reprogrammed again unless its charge level is refilled to the top-level A. 0 B. 1 C. 2 D. 3 E. 4 21

  22. Basic flash operations Program Read Free Page Programmed page Page #: 0 1 2 3 4 5 6 7 n-8n-7 n-6 n-5 n-4 n-3n-2 n-1 Block #0 ………………… Block #1 ………………… Block #2 ………………… ………… ………… ………… ………………… Block #n-2 ………………… Block #n-1 • Flash pages must be erased in “blocks” Erase 22

  23. Phase change memory • The bit is stored in the crystal structure of a tiny spec of metal. • To write, it melts the metal (650C) • let it cool quickly or slowly to set the value • Crystaline and amorphous states have different resistance 23

  24. Spin-torque transfer • Bits stored as magnetic orientation of a thin film • Change the state using polarized electrons (!) • Depending on polarization, resistance differs • More complex cell structure • Great promise — potential DRAM replacement • Roughly the same speed, power, and bandwidth. • But it’s durable! 24

  25. Non-volatile memory technologies H.D.D Flash Optane STT-MRAM ~ 100 us (read) 7 us (read) Latency ~ 10-15 ms 35 ns ~ 1 ms (write) 18 us (write) 3.5 GB/sec (read) 1.35 GB/sec (read) Bandwidth ~200 MB/Sec 2.1 GB/sec (write) 290 MB/sec (write) Dollar/GB 0.0295 0.583 2.18 Flash is still the most convincing technology for now 25

  26. If programmer doesn’t know flash “features” • Software designer should be aware of the characteristics of underlying hardware components 26

  27. CLA v.s. Carry-ripple • Size: • 32-bit CLA with 4-bit CLAs — requires 8 of 4-bit CLA • Each requires 116 for the CLA 4*(4*6+8) for the A+B — 244 gates • 1952 transistors Area-Delay Trade-off! • 32-bit CRA • 1600 transistors Win! • Delay • 32-bit CLA with 8 4-bit CLAs • 2 gates * 8 = 16 Win! • 32-bit CRA • 64 gates 27

  28. Serial Adder 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend