RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory - - PowerPoint PPT Presentation

recipe converting concurrent dram
SMART_READER_LITE
LIVE PREVIEW

RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory - - PowerPoint PPT Presentation

RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes Se Kwon Lee, Jayashree Mohan, Sanidhya Kashyap * , Taesoo Kim, Vijay Chidambaram *On the job market 1 Persistent Memory (PM) New storage class memory technology


slide-1
SLIDE 1

RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes

Se Kwon Lee, Jayashree Mohan, Sanidhya Kashyap*, Taesoo Kim, Vijay Chidambaram

1

*On the job market

slide-2
SLIDE 2

Persistent Memory (PM)

2

Intel Optane DC Persistent Memory

  • New storage class memory technology
  • Performance similar to DRAM
  • Non-volatile & high-capacity
  • Up-to 6TB on a single machine
slide-3
SLIDE 3

3

  • PM has high capacity and low latency
  • 6TB on a single machine → 100 billion 64-byte key-value pairs
  • Indexing data on PM is crucial for efficient data access

Indexing on PM

slide-4
SLIDE 4

PM Indexes need to achieve three goals simultaneously

PM Indexes

4

PM

Crash Consistency Concurrency Cache Efficiency

slide-5
SLIDE 5

PM

  • Cache Efficiency
  • Persistent memory is attached to the memory bus
  • 3x higher latency than DRAM → More cache-sensitive

PM Indexes

5

Crash Consistency Concurrency Cache Efficiency

slide-6
SLIDE 6

PM

  • Concurrency
  • High concurrency is necessary for scalability on any modern

multicore platform

PM Indexes

6

Crash Consistency Concurrency Cache Efficiency

slide-7
SLIDE 7

PM

  • Crash Consistency
  • CPU cache is still volatile
  • Arbitrarily-evicted cache lines → Persistence reordering

PM Indexes

7

Crash Consistency Concurrency Cache Efficiency

slide-8
SLIDE 8

log

commit Core Cache

  • Crash Consistency
  • CPU cache is still volatile
  • Arbitrarily-evicted cache lines → Persistence reordering

PM Indexes

8

Volatile Persistent

Program order write (log); write (commit);

① ②

commit

log

slide-9
SLIDE 9

commit Core Cache

  • Crash Consistency
  • CPU cache is still volatile
  • Arbitrarily-evicted cache lines → Persistence reordering

PM Indexes

9

Volatile Persistent

Persistence reordering write (log); write (commit); log Reordered

commit

slide-10
SLIDE 10

commit commit Core Cache

  • Crash Consistency
  • CPU cache is still volatile
  • Arbitrarily-evicted cache lines → Persistence reordering

PM Indexes

10

Volatile Persistent

Persistence reordering write (log); write (commit); log

commit

Inconsistency Crash

slide-11
SLIDE 11

Core Cache

  • Crash Consistency
  • CPU cache is still volatile
  • Arbitrarily-evicted cache lines → Persistence reordering
  • Flush: persist writes to PM
  • Fence: ensure one write prior another to be persisted first

PM Indexes

11

Volatile Persistent commit

log Consistent persistence ordering write (log) flush (log) fence () write (commit) flush (commit) fence ()

slide-12
SLIDE 12

Challenge in building PM indexes

12

Correct Concurrency Correct Crash Consistency Consistent Data Conflict Crash

Correctness condition: return previously inserted data without data loss or corruption

slide-13
SLIDE 13

Bug Bug

Challenge in building PM indexes

13

Concurrency and crash consistency interact with each other, a bug in either can lead to data loss

Bug Concurrency Bug Crash Consistency Data Loss Conflict Crash

slide-14
SLIDE 14
  • We found bugs in FAST&FAIR [FAST’18] and CCEH [FAST’19]
  • FAST&FAIR: Concurrent PM-based B+Tree
  • One bug in concurrency mechanism
  • Two bugs in recovery mechanism
  • CCEH: Concurrent PM-based dynamic hash table
  • One bug in concurrency mechanism
  • One bug in recovery mechanism

Bug in Concurrent PM Index

14

slide-15
SLIDE 15

15

How can we reduce the effort involved in building concurrent, crash-consistent PM indexes? Answer: We can convert concurrent DRAM indexes to PM indexes with low effort

Insight: Isolation and Crash Consistency are similar

slide-16
SLIDE 16

16

How can we reduce the effort involved in building concurrent, crash-consistent PM indexes? Approach: Convert concurrent DRAM indexes to PM indexes with low effort

Insight: Isolation and Crash Consistency are similar

slide-17
SLIDE 17
  • Already designed for cache efficiency and concurrency

DRAM Index

17

Concurrency Cache Efficiency T-Tree CSS- Tree CSB+ Tree BD- Tree FAST Bw- Tree ART HOT Mass tree 1986 CLHT 2010 2019

Cache Efficiency Concurrency

Time

slide-18
SLIDE 18

18

Crash DRAM Index

Volatile Concurrency Cache Efficiency

DRAM Index

Concurrency Cache Efficiency Crash Vulnerable

DRAM Index

slide-19
SLIDE 19

Challenge in Conversion

19

  • Require minimal changes to DRAM index
  • Without modifying the original design principles of DRAM index

DRAM Index

Volatile Concurrency Cache Efficiency

PM Index

Concurrency Cache Efficiency Crash Consistency

Conversion

slide-20
SLIDE 20

20

1. Steven Pelley et al., Memory Persistency, ISCA’14

  • Similar semantics between isolation and consistency1
  • Isolation
  • Return consistent data while multiple active threads are running
  • Crash consistency
  • Return consistent data even after a crash happens at any point

Insight for Conversion

slide-21
SLIDE 21
  • Similar semantics between isolation and consistency1
  • Isolation
  • Return consistent data while multiple active threads are running
  • Crash consistency
  • Return consistent data even after a crash happens at any point

21

1. Steven Pelley et al., Memory Persistency, ISCA’14

Insight for Conversion

Approach: reuse mechanisms for isolation in DRAM indexes to obtain crash consistency

slide-22
SLIDE 22

RECIPE

  • Principled approach to convert DRAM indexes into PM indexes
  • Case study of changing five popular DRAM indexes
  • Conversion involves different data structures such as Hash

Tables, B+ Trees, and Radix Trees

  • Conversion required modifying <= 200 LOC
  • Up-to 5.2x better performance in multi-threaded evaluation

22

RECIPE

https://github.com/utsaslab/RECIPE

slide-23
SLIDE 23
  • Overall Intuition
  • Conversion Conditions
  • Conversion Example: Masstree
  • Assumptions & Limitations
  • Evaluation

Outline

23

slide-24
SLIDE 24
  • Overall Intuition
  • Conversion Conditions
  • Conversion Example: Masstree
  • Assumptions & Limitations
  • Evaluation

Outline

24

slide-25
SLIDE 25

Overall Intuition for Conversion

25

  • Blocking algorithms
  • Use explicit locks to prevent the conflicts of threads to shared

data

  • Non-blocking algorithms
  • Use well-defined invariants and ordering constraints without

locks

  • Employed by most high-performance DRAM indexes
slide-26
SLIDE 26

Overall Intuition for Conversion

26

  • Non-blocking algorithms
  • Readers Detect and Tolerate inconsistencies
  • E.g., Ignore duplicated keys

Inconsistency

Reader

Tolerate Detect

slide-27
SLIDE 27

Overall Intuition for Conversion

27

  • Non-blocking algorithms
  • Readers Detect and Tolerate inconsistencies
  • E.g., Ignore duplicated keys
  • Writers also Detect, but Fix inconsistencies
  • E.g., Eliminate duplicated keys

Writer

Detect Fix

Consistent

Inconsistency

slide-28
SLIDE 28

Overall Intuition for Conversion

28

  • Non-blocking algorithms
  • Readers Detect and Tolerate inconsistencies
  • Writers also Detect, but Fix inconsistencies
  • Helping mechanism1 ≈ Crash Recovery2
  • Such indexes are *inherently* crash consistent

Writer

Detect Fix

Consistent

Inconsistency

1. Keren Censor-Hillel et al., Help!, PODC’15 2. Ryan Berryhill et al., Robust shared objects for non-volatile main memory, OPODIS’15

slide-29
SLIDE 29
  • Not all DRAM indexes can be converted with low effort
  • Exploit inherent crash recovery in the index
  • Provide specific conditions that must hold for a DRAM

index to be converted

  • Provide a matching conversion actions for each condition

29

slide-30
SLIDE 30
  • Overall Intuition
  • Conversion Conditions
  • Conversion Example: Masstree
  • Assumptions & Limitations
  • Evaluation

Outline

30

slide-31
SLIDE 31
  • Condition 1: Updates via Single Atomic Store
  • Condition 2: Writers fix inconsistencies
  • Condition 3: Writers don’t fix inconsistencies
  • Conditions are not exhaustive!

Three Conversion Conditions

31

slide-32
SLIDE 32
  • Condition 1: Updates via Single Atomic Store
  • Condition 2: Writers fix inconsistencies
  • Condition 3: Writers don’t fix inconsistencies

Three Conversion Conditions

32

slide-33
SLIDE 33

Condition 1: Updates via Single Atomic Store

33

  • Non-blocking readers, (Non-blocking or Blocking) writers
  • Updates become visible to other threads via single

atomic commit store

Atomic Store

Crash

Step 1 Step 2 Step N

Step 1 Step 2 Step N

Invisible Visible

slide-34
SLIDE 34

Condition 1: Updates via Single Atomic Store

34

  • Updates become visible to other threads via single

atomic commit store

  • Conversion: Add flushes after each store and bind final

atomic store using fences

Atomic Store

Step 1 Step 2 Step N

Step 1 Step 2 Step N

Flushes Flush Fence Fence

slide-35
SLIDE 35
  • Condition 1: Updates via Single Atomic Store
  • Condition 2: Writers fix inconsistencies
  • Condition 3: Writers don’t fix inconsistencies

Three Conversion Conditions

35

slide-36
SLIDE 36

Condition 2: Writers fix inconsistencies

36

  • Non-blocking readers and writers (don’t hold locks)
  • Readers & Writers → Detect (

), Tolerate ( ), Fix ( )

Step 1 Step 2 Step 3

Update Writer 1

A sequence of ordered deterministic steps Commit Step

slide-37
SLIDE 37

Step 3

Condition 2: Writers fix inconsistencies

37

  • Non-blocking readers and writers (don’t hold locks)
  • Readers & Writers → Detect ( ), Tolerate ( ), Fix ( )

Step 1 Step 2

Writer 1 Reader Detect

Commit Step

Tolerate

slide-38
SLIDE 38

Condition 2: Writers fix inconsistencies

38

  • Non-blocking readers and writers (don’t hold locks)
  • Readers & Writers → Detect ( ), Tolerate ( ), Fix ( )

Step 1 Step 2

Writer 1 Writer 2

Step 3

Fix Detect

Commit Step

slide-39
SLIDE 39

PM

Condition 2: Writers fix inconsistencies

39

  • Readers & Writers → Detect ( ), Tolerate ( ), Fix (

)

  • Inherently crash recoverable

Step 1 Step 2 Step 3

Writer 1

Commit Step

slide-40
SLIDE 40

PM

Condition 2: Writers fix inconsistencies

40

  • Readers & Writers → Detect ( ), Tolerate ( ), Fix (

)

  • Inherently crash recoverable

Step 1 Step 2 Step 3

Writer 1

Crash

Commit Step

slide-41
SLIDE 41

PM

Condition 2: Writers fix inconsistencies

41

  • Readers & Writers → Detect (

), Tolerate ( ), Fix ( )

  • Inherently crash recoverable

Step 1 Step 2

Thread

Step 3

Recover

Commit Step

slide-42
SLIDE 42

PM

Condition 2: Writers fix inconsistencies

42

  • Readers & Writers → Detect (

), Tolerate ( ), Fix ( )

  • Inherently crash recoverable
  • Conversion: Adding flushes and fences after each store and

specific loads

Step 2

Thread

Step 3

Flushes Flushes Flushes Flushes Fence Fence Fence

Step 1 Commit Step

slide-43
SLIDE 43
  • Condition 1: Updates via Single Atomic Store
  • Condition 2: Writers fix inconsistencies
  • Condition 3: Writers don’t fix inconsistencies

Three Conversion Conditions

43

slide-44
SLIDE 44
  • Non-blocking readers, Blocking writers (hold locks)
  • Readers & Writers → Detect (

), Tolerate ( ), Fix (X)

Condition 3: Writers don’t fix inconsistencies

44

Step 1 Step 2 Step 3

Writer 1

Step 1 Step 2 Step 3

Writer 2 Update

A sequence of ordered deterministic steps

Update

Commit Step Commit Step A sequence of ordered deterministic steps

slide-45
SLIDE 45
  • Non-blocking readers, Blocking writers (hold locks)
  • Readers & Writers → Detect (

), Tolerate ( ), Fix (X)

Condition 3: Writers don’t fix inconsistencies

45

Step 1 Step 2 Step 3

Writer 1

Step 1 Step 2 Step 3

Writer 2

Commit Step Commit Step

slide-46
SLIDE 46
  • Non-blocking readers, Blocking writers (hold locks)
  • Readers & Writers → Detect (

), Tolerate ( ), Fix (X)

Condition 3: Writers don’t fix inconsistencies

46

Step 1 Step 2

Writer 1

Step 1 Step 2 Step 3

Writer 2

Crash

Commit Step

Failure

slide-47
SLIDE 47

Writer 2

  • Readers & Writers → Detect (

), Tolerate ( ), Fix ( )

  • Conversion: Add helping mechanism
  • Reuse existing algorithm handling each step

Condition 3: Writers don’t fix inconsistencies

47

Step 1 Step 2 Step 3

Thread

Step 1 Step 2

Detect Fix

Commit Step

slide-48
SLIDE 48
  • Overall Intuition
  • Conversion Conditions
  • Conversion Example: Masstree
  • Assumptions & Limitations
  • Evaluation

Outline

48

slide-49
SLIDE 49
  • Example: B-link Tree (Masstree)

Conversion of Masstree

49

10 1 10 15 25 30 30 … …

High Key High Key

slide-50
SLIDE 50
  • Example: B-link Tree (Masstree)

Conversion of Masstree

50

Insert 26

15 10 1 10 30 30 … 15 25

  • 1. installing new sibling
  • 2. insert middle key to parent
slide-51
SLIDE 51
  • Example: B-link Tree (Masstree)

Conversion of Masstree

51

Insert 26

10 1 10 30 30 … 15 25

Intermediate state

slide-52
SLIDE 52
  • Example: B-link Tree (Masstree)

Conversion of Masstree

52

Insert 26

10 1 10 30 30 …

Lookup 25

15 25

slide-53
SLIDE 53
  • Example: B-link Tree (Masstree)

Conversion of Masstree

53

Insert 26

10 1 10 30 30 …

Lookup 25

15 25

Detect (25 > 15) >>>> Tolerate

slide-54
SLIDE 54
  • Example: B-link Tree (Masstree)

Conversion of Masstree

54

10 1 10 30 30 … 15 25

Crash Permanent Inconsistency

slide-55
SLIDE 55
  • Example: B-link Tree (Masstree)
  • Add helping mechanism to resume split

Conversion of Masstree

55

10 1 10 30 30 … 15 25

Insert 30

Detect

15

Resume split (recovery)

slide-56
SLIDE 56

Conversion Results of Five DRAM Indexes

56

DRAM Index DS Type CLHT (Cache-Line Hash Table) [ASPLOS’15] Hash table HOT (Height Optimized Trie) [SIGMOD’18] Trie BwTree [ICDE’13] B+Tree ART (Adaptive Radix Tree) [ICDE’13] Radix Tree Masstree [Eurosys’12] Hybrid (B+Tree & Trie)

slide-57
SLIDE 57

Conversion Results of Five DRAM Indexes

57

DRAM Index PM Index Condition CLHT P-CLHT #1 HOT P-HOT #1 BwTree P-BwTree #1, #2 ART P-ART #1, #3 Masstree P-Masstree #1, #3

  • We produce the P-* family of PM indexes
slide-58
SLIDE 58
  • Overall Intuition
  • Conversion Conditions
  • Conversion Example: Masstree
  • Assumptions & Limitations
  • Evaluation

Outline

58

slide-59
SLIDE 59
  • Assume garbage collection in memory allocator
  • Assume locks are volatile or re-initialized after a crash
  • Provide low level of isolation: Read Uncommitted
  • RECIPE applies only to individual data structures

Assumptions & Limitations

59

slide-60
SLIDE 60
  • Overall Intuition
  • Conversion Conditions
  • Conversion Example: Masstree
  • Assumptions & Limitations
  • Evaluation

Outline

60

slide-61
SLIDE 61

Evaluation

61

  • How much effort is involved in converting indexes?
  • What is the performance of converted indexes?
  • Are the converted indexes crash consistent?
slide-62
SLIDE 62

Evaluation

62

  • How much effort is involved in converting indexes?
  • What is the performance of converted indexes?
  • Are the converted indexes crash consistent?
slide-63
SLIDE 63

Evaluation

63

  • How much effort is involved in converting indexes?
  • What is the performance of converted indexes?
slide-64
SLIDE 64

Modified Lines of Code

64

  • Conversion for all indexes → <= 200 LoC changes

RECIPE-converted Indexes Lines of Code Index Core Modified P-CLHT 2.8K 30 (1%) P-HOT 2K 38 (2%) P-BwTree 5.2K 85 (1.6%) P-ART 1.5K 52 (3.4%) P-Masstree 2.2K 200 (9%)

slide-65
SLIDE 65

Modified Lines of Code

65

  • Conversion for all indexes → <= 200 LoC changes

RECIPE-converted Indexes Lines of Code Index Core Modified P-CLHT 2.8K 30 (1%) P-HOT 2K 38 (2%) P-BwTree 5.2K 85 (1.6%) P-ART 1.5K 52 (3.4%) P-Masstree 2.2K 200 (9%)

Conversion for all indexes: <= 200 LoC changes <= 9% from core code base

slide-66
SLIDE 66

Evaluation

66

  • How much effort is involved in converting indexes?
  • What is the performance of converted indexes?
slide-67
SLIDE 67
  • 2-socket 96-core machine with 32MB LLC
  • 768 GB Intel Optane DC PMM, 378 GB DRAM
  • YCSB with 16 threads
  • Ordered/Unordered indexes, Integer/String keys

Performance Evaluation

67

Load Workload A Workload B Workload C Workload E Insertion 100% Insertion 50% Point Lookup 50% Insertion 5% Point Lookup 95% Point Lookup 100% Insertion 5% Range Scan 95%

slide-68
SLIDE 68

Ordered Index

68

  • Support both point and range operations
  • P-HOT
  • Persistent Height-Optimized Trie converted by RECIPE
  • FAST & FAIR [FAST’18]
  • Hand-crafted PM-based concurrent B+Tree
slide-69
SLIDE 69

Ordered Index

69

  • P-HOT produced by RECIPE conversion
  • P-HOT performs up-to 5.2x better in point operations
  • Cache-efficient designs of P-HOT → Low cache misses

1 2 3 4 5 6 Load A B C E

Normalized throughput

FAST&FAIR P-HOT

slide-70
SLIDE 70

RECIPE

  • Principled approach to convert concurrent DRAM

indexes into PM indexes

  • Case study of changing five DRAM indexes
  • Evaluations with YCSB show RECIPE indexes have

better performance than hand-crafted PM indexes

  • Try our indexes: https://github.com/utsaslab/RECIPE

70

RECIPE