CAME AME L Lab ab Emerging Non-Volatile Memory for SSDs 450 us - PowerPoint PPT Presentation

ATC 2020 Fully Hardware Automated Open Research Framework for Future Fast NVMe Device Myoungsoo Jung Computer Architecture and Memory systems Laboratory Sponsored by CAME AME L Lab ab

Emerging Non-Volatile Memory for SSDs 450 us Latency (reads) 150 us 25 us 3 us 120 ns 50~80 ns 60~80 ns Memory Types MRAM TLC MLC SLC New Flash PRAM DRAM Storage Class Memory ( SCM ) Flash Technologies CAMEL ELab ab

NVMe Internals and Interfaces Flash Flash Flash Flash CTRL CPU CAMEL ELab ab

NVMe Storage Stack Applications (Processes) VFS Page Flash Flash Flash Flash CTRL /FS cache CPU Block layer 1~3GB/sec Block device driver CAMEL ELab ab

NVMe Storage Stack Redesign Applications • FlashShare: Punching Through Server Storage Stack (Processes) from Kernel to Firmware for Ultra-Low Latency SSDs (OSDI’18) VFS Page Flash Flash Flash Flash • De-indirection for Flash-Based SSDs with Nameless CTRL /FS cache CPU writes (FAST’12) • Towards SLO Complying SSDs Through OPS Isolation Block layer (FAST’15) • The case of FEMU: Cheap, Accurate, Scalable and 1~3GB/sec Challenges #1: Block device Extensible Flash Emulator (FAST’18) driver Most storage • There’re more and more! research relies on simulation/kernel- level emulation CAMEL ELab ab

SCM-based NVMe Storage Card Challenges #2: SSD’s Applications CPU can be a (Processes) performance bottleneck for SCMs VFS Page CTRL SCM SCM SCM SCM /FS cache CPU Block layer 7GB/sec Block device driver CAMEL ELab ab

What Does SSD’s CPU Do? Applications (Processes) VFS Page CTRL SCM SCM SCM SCM /FS cache CPU Block layer 7GB/sec Block device driver CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications Host (Processes) memory VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) Block layer Block device driver Device register SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) VFS Page CTRL SCM /FS cache Data CPU (PRP) ❶ I/O Submission queue (SQ) Block layer submission Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) VFS Page CTRL SCM /FS cache Data CPU (PRP) Submission queue (SQ) ❷ Ring SQ Block layer doorbell Block device driver SQ Doorbell SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) ❸ I/O VFS Page CTRL SCM fetch /FS cache Data CPU (PRP) Submission queue (SQ) Block layer Block device driver SQ Doorbell SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications ❹ Data (Processes) transfer VFS Page CTRL SCM /FS cache Data CPU (PRP) Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications ❺ I/O (Processes) process VFS Page CTRL SCM /FS cache Data CPU (PRP) Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space ❻ I/O Completion queue (CQ) Applications completion (Processes) VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications ❼ Interrupt (Processes) (notification) VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space ❽ Process Completion queue (CQ) completion Applications (Processes) VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) Block layer Block device driver SQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) VFS Page CTRL SCM /FS cache CPU Submission queue (SQ) ❾ Ring CQ Block layer doorbell Block device driver SQ Doorbell CQ Doorbell CQ Doorbell CAMEL ELab ab

What Does SSD’s CPU Do? Address space Completion queue (CQ) Applications (Processes) All these NVMe VFS Page CTRL SCM activities give a /FS cache CPU burden on the Submission queue (SQ) Block layer storage! Block device driver SQ Doorbell CQ Doorbell CQ Doorbell CAMEL ELab ab

Multi-core IP for High-Performance SSD Backend I-RAM I-RAM I-RAM PCIe Client Logic Channel Complex NVMe driver PCIe SQ CQ Interconnection Networks Core0 Outbound Inbound PCIe Memory Controller SRAM CPU CAMEL ELab ab

Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 Latency breakdown Latency breakdown 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 ZNAND MRAM PRAM ZNAND TLCMLCSLC PRAM MRAM TLCMLCSLC CAMEL ELab ab

Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 1.0 1.0 Latency breakdown Latency breakdown Latency breakdown Latency breakdown 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 D D M M M M D D C C C C N N M M M C C M A A A A C C C C C C N N L L L L L L A A A A A R R A R R L M M L L L L A A T T S S N N L R R R R P P M M T M M S S T N N Z Z M M P P Z Z CAMEL ELab ab

Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 1.0 1.0 1.0 1.0 Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 D D D M M M M M M ZNAND D D C C C C C C N N N PRAM M MRAM M M C C C M A A A A A A TLCMLCSLC C C C C C C N N L L L L L L L L L A A A A A A R R R A R R R L M M M L L L L A A T T T S S S N N N L R R R R P P P M M M T M M S S T N N Z Z Z M M P P Z Z CAMEL ELab ab

Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ZNAND D D D MRAM M M M PRAM M M M ZNAND ZNAND D D TLCMLCSLC C C C C C C N N N PRAM PRAM M MRAM MRAM M M C C C M A A A A A A TLCMLCSLC TLCMLCSLC C C C C C C N N L L L L L L L L L A A A A A A R R R A R R R L M M M L L L L A A T T T S S S N N N L R R R R P P P M M M T M M S S T N N Z Z Z M M P P Z Z CAMEL ELab ab

Component Latency Decomposition Completion Translation PRP Completion Translation PRP Queue/Doorbells Fetching NVM Queue/Doorbells Fetching NVM 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown Latency breakdown 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ZNAND ZNAND D D D MRAM MRAM M M M PRAM PRAM M M M ZNAND ZNAND D D D TLCMLCSLC TLCMLCSLC C C C C C C N N N PRAM PRAM M M MRAM MRAM M M M C C C M A A A A A A TLCMLCSLC TLCMLCSLC C C C C C C C C C N N N L L L L L L L L L A A A A A A A A R R R A R R R L L M M M L L L L L L A A A T T T S S S N N N L R R R R R R P P P M M M T T M M M S S S T N N N Z Z Z M M M P P P Z Z Z CAMEL ELab ab

CAME AME L Lab ab Emerging Non-Volatile Memory for SSDs 450 us - PowerPoint PPT Presentation

ATC 2020 Fully Hardware Automated Open Research Framework for Future Fast NVMe Device Myoungsoo Jung Computer Architecture and Memory systems Laboratory Sponsored by CAME AME L Lab ab Emerging Non-Volatile Memory for SSDs 450 us

Encrypted Non-volatile Main Memory Systems Yu Hua Huazhong University of Science and Technology

Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong , Haibo Chen Institute of

Object-Oriented Recovery for Non-volatile Memory Nachshon Cohen, David Aksun, James Larus EPFL 10

Architectural Support for Atomic Durability in Non-Volatile Memory Arpit Joshi , Vijay Nagarajan,

U i U i Using Using Flash Fl Fl Flash SSDs h h SSD SSDs as SSD as Primary Primary P i

Emerging Non Volatile Memory Resistive Memory Technologies Key concept: replace DRAM cell

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

A Persistent Friedman Lock-Free Queue Maurice Herlihy for Non-Volatile Memory Virendra

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory Miao Cai Chance

A Write-friendly Hashing Scheme for Non-volatile Memory Systems Pengfei Zuo and Yu Hua Huazhong

Managing Non-Volatile Memory in Database Systems A review by Apaar Shanker DATA ANALYTICS

Storage Class Memory Towards a disruptively low-cost solid-state non-volatile memory Science

Rethinking Applications in the NVM Era Amitabha Roy ex- Intel Research NVM = Non Volatile

NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System Jian Andiry Xu, Lu Zhang ,

A Persistent Lock- Free Queue for Maurice Herlihy Non-Volatile Virendra Memory (PPoPP18)

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

RAFFLES CITY SINGAPORE 2Q 2020 Financial Results 22 July 2020 Important Notice This presentation

Security Price and Your Bottom Line Presented by Scott McCormick, CMT DYAD Capital Management

Advisory Panel on Rare Disease Winter 2015 Meeting Arlington, VA January 13, 2015 9:30 a.m.

Delay Aware Packet Scheduling (DAPS) and receivers buffer blocking in CMT-SCTP Nicolas KUHN 1 ,

Unit4: Rotations, Angles & Dynamics Mike Chantler, 31/8/2008 P35 image from

Audit Risk Presented by: Eric Kline, CPA Quality Assurance & Technical Specialist Center

Zero-Knowledge Protocols A B 1 Mihir Bellare, UCSD 2 Mihir Bellare, UCSD Claim The Awards

Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa

Sambuz

Useful Links

Newsletter

Mail Us

CAME AME L Lab ab Emerging Non-Volatile Memory for SSDs 450 us - PowerPoint PPT Presentation

ATC 2020 Fully Hardware Automated Open Research Framework for Future Fast NVMe Device Myoungsoo Jung Computer Architecture and Memory systems Laboratory Sponsored by CAME AME L Lab ab Emerging Non-Volatile Memory for SSDs 450 us

Encrypted Non-volatile Main Memory Systems Yu Hua Huazhong University of Science and Technology

Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong , Haibo Chen Institute of

Object-Oriented Recovery for Non-volatile Memory Nachshon Cohen, David Aksun, James Larus EPFL 10

Architectural Support for Atomic Durability in Non-Volatile Memory Arpit Joshi , Vijay Nagarajan,

U i U i Using Using Flash Fl Fl Flash SSDs h h SSD SSDs as SSD as Primary Primary P i

Emerging Non Volatile Memory Resistive Memory Technologies Key concept: replace DRAM cell

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

A Persistent Friedman Lock-Free Queue Maurice Herlihy for Non-Volatile Memory Virendra

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory Miao Cai Chance

A Write-friendly Hashing Scheme for Non-volatile Memory Systems Pengfei Zuo and Yu Hua Huazhong

Managing Non-Volatile Memory in Database Systems A review by Apaar Shanker DATA ANALYTICS

Storage Class Memory Towards a disruptively low-cost solid-state non-volatile memory Science

Rethinking Applications in the NVM Era Amitabha Roy ex- Intel Research NVM = Non Volatile

NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System Jian Andiry Xu, Lu Zhang ,

A Persistent Lock- Free Queue for Maurice Herlihy Non-Volatile Virendra Memory (PPoPP18)

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

RAFFLES CITY SINGAPORE 2Q 2020 Financial Results 22 July 2020 Important Notice This presentation

Security Price and Your Bottom Line Presented by Scott McCormick, CMT DYAD Capital Management

Advisory Panel on Rare Disease Winter 2015 Meeting Arlington, VA January 13, 2015 9:30 a.m.

Delay Aware Packet Scheduling (DAPS) and receivers buffer blocking in CMT-SCTP Nicolas KUHN 1 ,

Unit4: Rotations, Angles &amp; Dynamics Mike Chantler, 31/8/2008 P35 image from

Audit Risk Presented by: Eric Kline, CPA Quality Assurance &amp; Technical Specialist Center

Zero-Knowledge Protocols A B 1 Mihir Bellare, UCSD 2 Mihir Bellare, UCSD Claim The Awards

Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa

Sambuz

Useful Links

Newsletter

Mail Us

Unit4: Rotations, Angles & Dynamics Mike Chantler, 31/8/2008 P35 image from

Audit Risk Presented by: Eric Kline, CPA Quality Assurance & Technical Specialist Center