came ame l lab ab emerging non volatile memory for ssds
play

CAME AME L Lab ab Emerging Non-Volatile Memory for SSDs 450 us - PowerPoint PPT Presentation

ATC 2020 Fully Hardware Automated Open Research Framework for Future Fast NVMe Device Myoungsoo Jung Computer Architecture and Memory systems Laboratory Sponsored by CAME AME L Lab ab Emerging Non-Volatile Memory for SSDs 450 us


  1. ATC 2020 Fully Hardware Automated Open Research Framework for Future Fast NVMe Device Myoungsoo Jung Computer Architecture and Memory systems Laboratory Sponsored by CAME AME L Lab ab

  2. Emerging Non-Volatile Memory for SSDs 450 us Latency (reads) 150 us 25 us 3 us 120 ns 50~80 ns 60~80 ns Memory Types MRAM TLC MLC SLC New Flash PRAM DRAM Storage Class Memory ( SCM ) Flash Technologies CAMEL ELab ab

  3. NVMe Internals and Interfaces Flash Flash Flash Flash CTRL CPU CAMEL ELab ab

  4. NVMe Storage Stack Applications (Processes) VFS Page Flash Flash Flash Flash CTRL /FS cache CPU Block layer 1~3GB/sec Block device driver CAMEL ELab ab

  5. NVMe Storage Stack Redesign Applications • FlashShare: Punching Through Server Storage Stack (Processes) from Kernel to Firmware for Ultra-Low Latency SSDs (OSDI’18) VFS Page Flash Flash Flash Flash • De-indirection for Flash-Based SSDs with Nameless CTRL /FS cache CPU writes (FAST’12) • Towards SLO Complying SSDs Through OPS Isolation Block layer (FAST’15) • The case of FEMU: Cheap, Accurate, Scalable and 1~3GB/sec Challenges #1: Block device Extensible Flash Emulator (FAST’18) driver Most storage • There’re more and more! research relies on simulation/kernel- level emulation CAMEL ELab ab

  6. SCM-based NVMe Storage Card Challenges #2: SSD’s Applications CPU can be a (Processes) performance bottleneck for SCMs VFS Page CTRL SCM SCM SCM SCM /FS cache CPU Block layer 7GB/sec Block device driver CAMEL ELab ab

  7. What Does SSD’s CPU Do? Applications (Processes) VFS Page CTRL SCM SCM SCM SCM /FS cache CPU Block layer 7GB/sec Block device driver CAMEL ELab ab

  8. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications Host (Processes) memory VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) Block layer Block device driver Device register SQ Doorbell CQ Doorbell CAMEL ELab ab

  9. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) VFS Page CTRL SCM /FS cache Data CPU (PRP) ❶ I/O Submission queue (SQ) Block layer submission Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

  10. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) VFS Page CTRL SCM /FS cache Data CPU (PRP) Submission queue (SQ) ❷ Ring SQ Block layer doorbell Block device driver SQ Doorbell SQ Doorbell CQ Doorbell CAMEL ELab ab

  11. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) ❸ I/O VFS Page CTRL SCM fetch /FS cache Data CPU (PRP) Submission queue (SQ) Block layer Block device driver SQ Doorbell SQ Doorbell CQ Doorbell CAMEL ELab ab

  12. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications ❹ Data (Processes) transfer VFS Page CTRL SCM /FS cache Data CPU (PRP) Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

  13. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications ❺ I/O (Processes) process VFS Page CTRL SCM /FS cache Data CPU (PRP) Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

  14. What Does SSD’s CPU Do? Address space ❻ I/O Completion queue (CQ) Applications completion (Processes) VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

  15. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications ❼ Interrupt (Processes) (notification) VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

  16. What Does SSD’s CPU Do? Address space ❽ Process Completion queue (CQ) completion Applications (Processes) VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

  17. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) ❾ Ring CQ Block layer doorbell Block device driver SQ Doorbell CQ Doorbell CQ Doorbell CAMEL ELab ab

  18. What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) All these NVMe VFS Page CTRL SCM activities give a /FS cache CPU burden on the Submission queue (SQ) Block layer storage! Block device driver SQ Doorbell CQ Doorbell CQ Doorbell CAMEL ELab ab

  19. Multi-core IP for High-Performance SSD Backend I-RAM I-RAM I-RAM PCIe Client Logic Channel Complex NVMe driver PCIe SQ CQ Interconnection Networks Core0 Outbound Inbound PCIe Memory Controller SRAM CPU CAMEL ELab ab

  20. Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 Latency breakdown Latency breakdown 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 ZNAND MRAM PRAM ZNAND TLCMLCSLC PRAM MRAM TLCMLCSLC CAMEL ELab ab

  21. Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 1.0 1.0 Latency breakdown Latency breakdown Latency breakdown Latency breakdown 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 D D M M M M D D C C C C N N M M M C C M A A A A C C C C C C N N L L L L L L A A A A A R R A R R L M M L L L L A A T T S S N N L R R R R P P M M T M M S S T N N Z Z M M P P Z Z CAMEL ELab ab

  22. Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 1.0 1.0 1.0 1.0 Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 D D D M M M M M M ZNAND D D C C C C C C N N N PRAM M MRAM M M C C C M A A A A A A TLCMLCSLC C C C C C C N N L L L L L L L L L A A A A A A R R R A R R R L M M M L L L L A A T T T S S S N N N L R R R R P P P M M M T M M S S T N N Z Z Z M M P P Z Z CAMEL ELab ab

  23. Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ZNAND D D D MRAM M M M PRAM M M M ZNAND ZNAND D D TLCMLCSLC C C C C C C N N N PRAM PRAM M MRAM MRAM M M C C C M A A A A A A TLCMLCSLC TLCMLCSLC C C C C C C N N L L L L L L L L L A A A A A A R R R A R R R L M M M L L L L A A T T T S S S N N N L R R R R P P P M M M T M M S S T N N Z Z Z M M P P Z Z CAMEL ELab ab

  24. Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ZNAND ZNAND D D D MRAM MRAM M M M PRAM PRAM M M M ZNAND ZNAND D D D TLCMLCSLC TLCMLCSLC C C C C C C N N N PRAM PRAM M M MRAM MRAM M M M C C C M A A A A A A TLCMLCSLC TLCMLCSLC C C C C C C C C C N N N L L L L L L L L L A A A A A A A A R R R A R R R L L M M M L L L L L L A A A T T T S S S N N N L R R R R R R P P P M M M T T M M M S S S T N N N Z Z Z M M M P P P Z Z Z CAMEL ELab ab

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend