wort write optimal radix tree for persistent memory
play

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems - PowerPoint PPT Presentation

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1 , Hyunsub Song, Beomseok Nam, Sam H. Noh UNIST 1 Hongik University Persistent Memory (PM) Persistent memory is expected to replace both DRAM


  1. WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1 , Hyunsub Song, Beomseok Nam, Sam H. Noh UNIST 1 Hongik University

  2. Persistent Memory (PM) § Persistent memory is expected to replace both DRAM & NAND NAND PCM DRAM STT-MRAM Non-volatility o o o x Read (ns) 2.5 X 10 4 5 - 30 20 – 70 10 2 X 10 5 Write (ns) 10 - 100 150 - 220 10 x o o o Byte-addressable Density 185.8 Gbit/cm 2 0.36 Gbit/cm 2 13.5 Gbit/cm 2 9.1 Gbit/cm 2 K. Suzuki and S. Swanson. “A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014”, IMW 2015 Non-volatile High performance Persistent Memory 2

  3. Indexing Structure for PM Storage Systems 13 30 B+Tree 5 20 40 50 … 1 4 9 10 30 38 48 60 70 3

  4. Consistency Issue of B+tree in PM § B+tree is a block-based index • Key sorting à Block granularity write • Rebalancing à Multi-blocks granularity write § Persistent memory Can result in • Byte-addressable à Byte granularity write consistency problem • Write reordering 4

  5. Consistency Issue of B+tree in PM § Traditional case Volatile CPU Caches 30 35 30 31 35 2 3 DRAM Write reordering 30 31 35 3 Not persistent data Non-volatile Block based storage 30 35 Block granularity update 2 5

  6. Consistency Issue of B+tree in PM § PM case Volatile CPU Caches 30 35 30 31 35 2 3 Byte granularity update Non-volatile Persistent Memory Write reordering 30 35 Crash 2 Persistent data Garbage data persistently stored 6

  7. Primitives for Data Consistency in PM § Durability Volatile • CLFLUSH (Flush cache line) CPU Caches − Can be reordered § Ordering • MFENCE (Load and Store fence) Non-volatile Persistent − Order CPU cache line flush Memory instructions 7

  8. Primitives for Data Consistency in PM § Durability CPU Volatile • CLFLUSH (Flush cache line) Serialization of CLFLUSH and MFENCE is CPU Caches − Can be reordered known to cause large overhead § Ordering • MFENCE (Load and Store fence) Non-volatile Persistent − Order CPU cache line flush Memory instructions 8

  9. Primitives for Data Consistency in PM § Atomicity • 8-byte failure atomicity 30 31 35 30 31 35 3 3 − Need only CLFLUSH • Logging or CoW based atomicity Non-volatile (more than 8 bytes) Log area Data area − Requires duplicate copies 30 35 2 9

  10. Primitives for Data Consistency in PM § Atomicity • 8-byte failure atomicity 30 31 35 30 31 35 3 3 − Need only CLFLUSH • Logging or CoW based atomicity Non-volatile Logging increases cache line flush overhead (more than 8 bytes) Log area Data area − Requires duplicate copies 30 35 2 10

  11. B+tree Variants for Persistent Memory How can we ensure consistency using failure-atomic writes without logging? Unsorted keys à Append-only with metadata Failure-atomic update of metadata wB+Tree (VLDB’ 15) NVTree (FAST’ 15) FPTree (SIGMOD’ 16) Slot array 5 Fingerprints 9 2 3 1 7 Flag Flag Flag (+/-) (+/-) (+/-) Entry … K1 Kn … K1 K2 K3 K1 Kn … Cnt. bmp bmp P1 Pn … P1 P2 P3 P1 Pn … P next Unsorted key à Decreases search performance 11

  12. B+tree Variants for Persistent Memory § Logging still necessary 30 32 Overflow • Multi-block granularity updates 35 30 32 38 Split due to node splits and merges 35 38 New key − Cannot update atomically • Logging-based solution − wB+Tree, FPTree large overhead • Tree reconstruction based solution − NVTree 12

  13. B+tree Variants for Persistent Memory Key sorting 30 35 30 31 35 2 3 Fundamental characteristics of B+tree cause problems Rebalancing 30 32 Overflow 35 30 32 38 Split 35 38 New key 13

  14. B+tree Variants for Persistent Memory Key sorting 30 35 30 31 35 2 3 Why use B+ trees in the first place? Fundamental characteristics of B+tree cause problems Rebalancing Perhaps there is a better tree data structure more suited for PM? 30 32 Overflow 35 30 32 38 Split 35 38 New key 14

  15. Our Contributions § Show Radix Tree is a suitable data structure for PM § Propose optimal radix tree variants WORT and WOART • WORT: Write Optimal Radix Tree • WOART: Write Optimal redesigned Adaptive Radix Tree (ART) − Optimal: maintain consistency only with single failure-atomic write without any duplicate copies 15

  16. Radix Tree § Deterministic structure … C A … … A C … … A C Z C ACA ACC ACZ CAC 16

  17. Radix Tree § Deterministic structure • No key comparison … C A … … A C … … A C Z C ACA ACC ACZ CAC 17

  18. Radix Tree § Deterministic structure 8-byte pointer • No key comparison … − Only 8-byte pointer entries C A − Implicitly stored keys … … A C … … A C Z C ACA ACC ACZ CAC 18

  19. Radix Tree § Deterministic structure • No key comparison … − Only 8-byte pointer entries C A − Implicitly stored keys − No problem caused by key sorting … … A N … … A C Z C ACA ACC ACZ CAC 19

  20. Radix Tree § Deterministic structure • No key comparison … − Only 8-byte pointer entries C A − Implicitly stored keys − No problem caused by key sorting … … A N • No modification of other keys … … − Single 8-byte pointer write per node A C Z C − Easy to use failure-atomic write ACA ACC ACZ CAC 20

  21. Problem of Deterministic Structure § For sparse key distribution • Waste excessive memory space à Optimized through path compression High utilization … … … … … Low utilization … … … … key key key key key key key key … … … 21

  22. Path Compression in Radix Tree § Path compression • Search paths that do not need to be distinguished can be removed … Unnecessary search path C A … … A C … … C A C Z ACA ACC ACZ CAC 22

  23. Path Compression in Radix Tree § Path compression • Common search path is compressed in header • Improve memory utilization & indexing performance … A … C Compression header … A C Z ACA ACC ACZ 23

  24. Node Split with Path Compression § Path compression split AZA to be inserted Prefix keys are not equal AZ != AC … AC A C Z ACA ACC ACZ 24

  25. Node Split with Path Compression § Path compression split ① New parent allocation … Split A C Z C AZA … A A C C A C Z ACA ACC ACZ 25

  26. Node Split with Path Compression § Path compression split … A C Z ② Decompression of old common prefix AZA … A C A C Z ACA ACC ACZ 26

  27. Node Split with Path Compression § Path compression split … A C However, this split process causes consistency Z ② Old common prefix update problem in PM. AZA … A C A C Z ACA ACC ACZ 27

  28. Path compression Problem in PM 28

  29. Consistency Issue of Path Compression § Path compression split • cause updates of multiple nodes • have to employ expensive logging methods … A C Z Consistent state AZA … A C Z Crash Inconsistent state ACA ACC ACZ A C 29

  30. Path compression Solution 30

  31. WORT (Write-Optimal Radix Tree) for PM § Failure-atomic path compression • Add node depth field to compression header Compression header (8 bytes) … struct Header { 0 AC unsigned char depth; A C Z unsigned char PrefixArr[7]; } ACA ACC ACZ 31

  32. WORT (Write-Optimal Radix Tree) for PM § Failure-atomic path compression • Add node depth field to compression header AZA to be inserted Compression header (8 bytes) … 0 AC A C Z ACA ACC ACZ 32

  33. WORT (Write-Optimal Radix Tree) for PM § Failure-atomic path compression • Add node depth field to compression header Compression header (8 bytes) … 0 A C Z Consistent state AZA 2 … ② Decompression of old common prefix A C Z Crash Inconsistent state ACA ACC ACZ 0 A C 33

  34. WORT (Write-Optimal Radix Tree) for PM § Failure-atomic path compression • Failure detection in WORT − Depth in a header ≠ Counted depth à Crashed header Compression header (8 bytes) … 0 A C Z Inconsistent state AZA … 0 A C A C Z Not equal to ACA ACC ACZ expected tree depth (2) 34

  35. WORT (Write-Optimal Radix Tree) for PM § Failure-atomic path compression • Failure recovery in WORT − Compression header can be reconstructed à Atomically overwrite Compression header (8 bytes) Consistent state ACA … 0 A 2 ACC C Z Inconsistent state AZA … 0 A C A C Z ACA ACC ACZ 35

  36. Write Optimal Data Structure for PM § Our proposed radix tree variant is optimal for PM • Consistency is always guaranteed with a single 8-byte failure-atomic write without any additional copies for logging or CoW WORT (Write Optimal Radix Tree) WOART (Write Optimal Adaptive Radix Tree) 1. Failure-atomic path compression 2. Redesigned adaptive node 36

  37. Evaluation § Experimental environment System configuration Description CPU Intel Xeon E5-2620V3 X 2 OS Linux CentOS 6.6 (64bit) kernel v4.7.0 Emulated with 256GB DRAM PM Write latency: Injecting additional stall cycles 37

  38. Evaluation § Experimental environment Comparison group Radix tree variants B+tree variants WORT wB+Tree (VLDB’ 15) NVTree (FAST’ 15) FPTree (SIGMOD’ 16) DRAM DRAM PM PM 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend