lightnvm the linux open channel ssd subsystem matia tias
play

LightNVM: The Linux Open-Channel SSD Subsystem Matia tias Bj - PowerPoint PPT Presentation

LightNVM: The Linux Open-Channel SSD Subsystem Matia tias Bj Bjrli ling (ITU (ITU, CN CNEX La Labs), Javier Gonzlez (CNEX Labs), Philippe Bonnet (ITU) 0% Writes - Read Latency 4K Random Read 4K Random Read Latency Percentile 2


  1. ½ LightNVM: The Linux Open-Channel SSD Subsystem Matia tias Bj Bjørli ling (ITU (ITU, CN CNEX La Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

  2. 0% Writes - Read Latency 4K Random Read 4K Random Read Latency Percentile 2

  3. 20% Writes - Read Latency 4K Random Read / 4K Random Write 4ms! 4K Random Read Latency Signficant outliers! Worst-case 30X Percentile 3

  4. NAND Capacity Continues to Grow Workload #2 Workload #3 Workload #4 Workload #1 Solid State Drive Performance – Endurance – DRAM overheads Source: William Tidwell -The Harder Alternative – Managing NAND capacity in the 3D age 4

  5. What contributes to outliers? Even if Writes and Reads does not collide from application Indirection and a Narrow Storage interface cause outliers Host: Log-on-Log Device: Write Indirection & Unknown State Log-structured Database (e.g., RocksDB) User 1 Reads Writes Space Metadata Mgmt. Address Mapping Garbage Collection pread/pwrite Solid-State Drive Pipeline VFS Write Buffer Log-structured File-system NAND Controller Kernel Log- 2 ii Space Metadata Mgmt. Address Mapping Garbage Collection die 0 die 1 die 2 die 3 Structured Block Layer Read/Write/Trim Drive maps logical data Buffered Writes to the physical location Solid-State Drive with Best Effort 3 HW Metadata Mgmt. Address Mapping Garbage Collection Host is oblivious to physical data placement due to Unable to align data logically indirection 5 = Write amplification increase + extra GC

  6. Open-Channel SSDs I/O Isolation Predictable Latency Data Placement & I/O Provide isolation I/Os are synchronous. Scheduling between tenants by Access time to parallel Manage the non-volatile allocating independent units are explicit defined. memory as a block device, parallel units through a file-system or inside your application. 6

  7. Solid-State Drives Read/Write Host Interface Read/Write/Erase Solid-State Drive Parallel Units Responsibilities Media Controller R/W/E to R/W Channel X Flash Translation Layer Media Error Handling Channel Y Read (50-100us) Media Retention Management Write (1-5ms) Erase (3-15ms) Tens of Units! Manage Media Constraints ECC, RAID, Retention 7

  8. Rebalance the Storage Interface Expose device parallelism • Parallel units (LUNs) are exposed as independent units to the host. • Can be a logical or a physical representation. • Explicit performance characteristics. Log-Structured Storage • Exposes storage as chunks that must be written sequentially. • Similar to the HDD Shingled Magnetic Recording (SMR) interface. • No need for internal garbage collection by the device. Integrate with file-systems and databases, and can also implement I/O determinism, streams, barriers, and other new data management schemes without changing device firmware. 8

  9. Specification Device model • Defines parallel units and how they are laid out in the LBA address space. • Defines chunks. Each chunk is a range of LBAs where writes must be sequential. To write again, a chunk must be reset. – A chunk can be in one of four states (free/open/closed/offline) – If a chunk is open, there is a write pointer associated. – The model is media-agnostic. Geometry and I/O Commands • Read/Write/Reset – Scalars and Vectors 9

  10. Drive Model - Chunks Logical Block Address Space … 0 1 Chunk - 1 Chunk LBA 0 1 LBA -1 Reads Write Reset Min. Write size Chunk granularity granularity Synchronous – May fail – Logical block granularity Synchronous – May fail – An error only marks For example 4KB An error marks write chunk bad, and not bad, not whole SSD whole SSD

  11. Drive Model - Organization Host Parallelism across NVMe LUN LUN PU Groups (Shared bus) SSD LUN Parallel Units (LUNs) LUN PU Chunk Chunk Logical Block Address Space Group … 0 1 Group - 1 … PU 0 1 PU - 1 … Chunk 0 1 Chunk - 1 … LBA 0 1 LBA -1

  12. LightNVM Subsystem Architecture 1. NVMe Device Driver Detection of OCSSD User Implements specification Application(s) Space 2. LightNVM Subsystem Geometry Vectored Kernel Read/Write R/W/E File System (optional) Generic layer Space Scalar pblk (3) Core functionality LightNVM Subsystem (2) Target management 3. High-level I/O Interfaces NVMe Device Driver (1) PPA Addressing Block device using a target Hardware Open-Channel SSD Application integration with liblightnvm File-systems, ... 12

  13. pblk - Host-side Flash Translation Layer Mapping table • Linux Logical block granularity File System Kernel Write buffering Read Path Write Path Write Context make_rq make_rq • Lockless circular buffer Add Entry • Multiple producers Lookup Cache Hit Write Buffer Write Entry • Single consumer (Write Thread) GC/Rate-limiting Thread Error Handling Write Thread L2P Table Read Error Handling • Device write/reset errors LightNVM Subsystem Garbage Collection Write NVMe Device Driver • Refresh data • Rewrite chunks Hardware Open-Channel SSD 13

  14. Experimentation • Drive CNEX Labs Open-Channel SSD NVMe, Gen3x8, 2TB MLC NAND Implements Open-Channel 1.2 specification • Parallelism 16 channels 8 parallel units per channel (Total: 128 PUs) • Parallel unit characteristic Min. Write size: 16K + 64B OOB Chunks: 1,067, Chunk size: 16MB • Throughput per parallel unit: Write: 47MB/s Read: 108MB/s (4K), 280MB/s (64K) 14

  15. Base Performance – Throughput + Latency RR slightly lower Grows with parallelism Request I/O Size 15

  16. Limit # of Active Writers Limit number of writers to improve read latency Single Read or Write Perf. at Write Perf. 200MB/s Mixed Read/Write Write latency increases, and read latency reduces 256K Write QD1 256K Read QD16 A priori knowledge of workload. Write 200MB/s 16

  17. Multi-Tenant Workloads NVMe SSD OCSSD 2 Tenants (1W/1R) 4 Tenants (3W/1R) 8 Tenants (7W/1R) Source: Multi-Tenant I/O Isolation with Open-Channel SSDs, 17 Javier González and Matias Bjørling , NVMW ‘17

  18. Lessons Learned 1. Warranty to end-users – Users has direct access to media. 2. Media characterization is complex and performed for each type of NAND memory – Abstract the media to a ”clean” interface. 3. Write buffering – For MLC/TLC media, write buffering is required. Decide if in host or in device. 4. Application-agnostic wear leveling is mandatory – Enable statistics for host to make appropriate decisions. 18

  19. Conclusion Contributions LightNVM • New storage interface between host • Initial release of subsystem with Linux and drive. kernel 4.4 (January 2016). • The Linux kernel LightNVM subsystem. • User-space library (liblightnvm) support upstream in Linux kernel 4.11 (April • pblk: A host-side Flash Translation Layer 2017). for Open-Channel SSDs. • pblk available in Linux kernel 4.12 (July • Demonstration of an Open-Channel 2017). SSD. • Open-Channel SSD 2.0 specification released (January 2018) and support available from Linux kernel 4.17 (May 2018). 12-03-2018 · 19

  20. Thank You 12-03-2018 · 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend