15-721 DATABASE SYSTEMS Lecture #07 Latch-free OLTP Indexes - - PowerPoint PPT Presentation

15 721
SMART_READER_LITE
LIVE PREVIEW

15-721 DATABASE SYSTEMS Lecture #07 Latch-free OLTP Indexes - - PowerPoint PPT Presentation

15-721 DATABASE SYSTEMS Lecture #07 Latch-free OLTP Indexes (Part I) Andy Pavlo / / Carnegie Mellon University / / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 ADMINISTRIVIA Peloton master branch has been


slide-1
SLIDE 1

Andy Pavlo / / Carnegie Mellon University / / Spring 2016

DATABASE SYSTEMS

Lecture #07 – Latch-free OLTP Indexes (Part I)

15-721

@Andy_Pavlo // Carnegie Mellon University // Spring 2017

slide-2
SLIDE 2

CMU 15-721 (Spring 2017)

ADMINISTRIVIA

Peloton master branch has been updated to provide cleaner test cases.

→ There is now a separate file for Skip List tests. → Your implementation should match the behavior of the Bw-Tree.

We will be sending out information on how to access the MemSQL development machines.

2

slide-3
SLIDE 3

CMU 15-721 (Spring 2017)

TODAY’S AGENDA

T-Trees Skip Lists Index Implementation Issues

3

slide-4
SLIDE 4

CMU 15-721 (Spring 2017)

T-TREES

Based on AVL Trees. Instead of storing keys in nodes, store pointers to their original values. Proposed in 1986 from Univ. of Wisconsin Used in TimesTen and other early in-memory DBMSs during the 1990s.

4

A STUDY OF INDEX STRUCTURES FOR MAIN MEMORY DATABASE MANAGEMENT SYSTEMS VLDB 1986

slide-5
SLIDE 5

CMU 15-721 (Spring 2017)

T-Tree Node

T-TREES

5

Min-K

¤ Max-K ¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

slide-6
SLIDE 6

CMU 15-721 (Spring 2017)

T-Tree Node

T-TREES

5

Min-K

¤ Max-K ¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

Data Pointers

slide-7
SLIDE 7

CMU 15-721 (Spring 2017)

T-Tree Node

T-TREES

5

Min-K

¤ Max-K ¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

Node Boundaries

slide-8
SLIDE 8

CMU 15-721 (Spring 2017)

Key Space (Low→High) T-Tree Node

T-TREES

5

Min-K

¤ Max-K ¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

1 2 3 4 5 6 7

slide-9
SLIDE 9

CMU 15-721 (Spring 2017)

Key Space (Low→High) T-Tree Node

T-TREES

5

Min-K

¤ Max-K ¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

1 2 3 4 5 6 7 1 2 3 4 5 6 7

slide-10
SLIDE 10

CMU 15-721 (Spring 2017)

Key Space (Low→High) T-Tree Node

T-TREES

5

Min-K

¤ Max-K ¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

1 2 3 4 5 6 7 1 2 3 4 5 6 7

slide-11
SLIDE 11

CMU 15-721 (Spring 2017)

T-TREES

Advantages

→ Uses less memory because it does not store keys inside of each node. → Inner nodes contain key/value pairs (like B-Tree).

Disadvantages

→ Difficult to rebalance. → Difficult to implement safe concurrent access. → Have to chase pointers when scanning range or performing binary search inside of a node.

6

slide-12
SLIDE 12

CMU 15-721 (Spring 2017)

OBSERVATION

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

7

K1 K2 K3 K4 K6 K5 K7

slide-13
SLIDE 13

CMU 15-721 (Spring 2017)

OBSERVATION

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

7

K1 K2 K3 K4 K6 K5 K7

slide-14
SLIDE 14

CMU 15-721 (Spring 2017)

OBSERVATION

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

7

K1 K2 K3 K4 K6 K5 K7

slide-15
SLIDE 15

CMU 15-721 (Spring 2017)

SKIP LISTS

Multiple levels of linked lists with extra pointers that skip over intermediate nodes. Maintains keys in sorted order without requiring global rebalancing.

8

SKIP LISTS: A PROBABILISTIC ALTERNATIVE TO BALANCED TREES CACM Volume 33 Issue 6 1990

slide-16
SLIDE 16

CMU 15-721 (Spring 2017)

SKIP LISTS

A collection of lists at different levels

→ Lowest level is a sorted, singly linked list of all keys → 2nd level links every other key → 3rd level links every fourth key → In general, a level has half the keys of one below it

To insert a new key, flip a coin to decide how many levels to add the new key into. Provides approximate O(log n) search times.

9

slide-17
SLIDE 17

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

10

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-18
SLIDE 18

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

10

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-19
SLIDE 19

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

10

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-20
SLIDE 20

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

10

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-21
SLIDE 21

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

11

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

slide-22
SLIDE 22

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

11

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 V5 K5 K5

slide-23
SLIDE 23

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

11

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 V5 K5 K5

slide-24
SLIDE 24

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

11

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 V5 K5 K5

slide-25
SLIDE 25

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

12

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

slide-26
SLIDE 26

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

12

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

slide-27
SLIDE 27

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

12

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5

slide-28
SLIDE 28

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

12

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2

slide-29
SLIDE 29

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

12

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2 K3<K4

slide-30
SLIDE 30

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

12

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2 K3<K4

slide-31
SLIDE 31

CMU 15-721 (Spring 2017)

SKIP LISTS: ADVANTAGES

Uses less memory than a typical B+tree (only if you don’t include reverse pointers). Insertions and deletions do not require rebalancing. It is possible to implement a concurrent skip list using only CAS instructions.

13

slide-32
SLIDE 32

CMU 15-721 (Spring 2017)

CONCURRENT SKIP LIST

Can implement insert and delete without locks using only CAS operations.

→ Only support linking in one direction

14

CONCURRENT MAINTENANCE OF SKIP LISTS

  • Univ. of Maryland Tech Report 1990
slide-33
SLIDE 33

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

15

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

slide-34
SLIDE 34

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

15

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-35
SLIDE 35

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

15

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-36
SLIDE 36

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

15

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-37
SLIDE 37

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

15

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-38
SLIDE 38

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

15

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-39
SLIDE 39

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

15

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-40
SLIDE 40

CMU 15-721 (Spring 2017)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

15

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-41
SLIDE 41

CMU 15-721 (Spring 2017)

SKIP LISTS: DELETE

First logically remove a key from the index by setting a flag to tell threads to ignore. Then physically remove the key once we know that no other thread is holding the reference.

→ Perform CaS to update the predecessor’s pointer.

16

Source: Stephen Tu

slide-42
SLIDE 42

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

17

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

slide-43
SLIDE 43

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

17

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

slide-44
SLIDE 44

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

17

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-45
SLIDE 45

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

17

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-46
SLIDE 46

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

17

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-47
SLIDE 47

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

17

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-48
SLIDE 48

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

Levels

SKIP LISTS: DELETE

17

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

slide-49
SLIDE 49

CMU 15-721 (Spring 2017)

CONCURRENT SKIP LIST

Be careful about how you order operations. If the DBMS invokes operation on the index, it can never “fail”

→ A txn can only abort due to higher-level conflicts. → If a CaS fails, then the index will retry until it succeeds.

18

slide-50
SLIDE 50

CMU 15-721 (Spring 2017)

SKIP LISTS: DISADVANTAGES

Invoking random number generator multiple times per insert is slow. Not cache friendly because they do not optimize locality of references. Reverse search is non-trivial.

19

slide-51
SLIDE 51

CMU 15-721 (Spring 2017)

SKIP LIST OPTIMIZATIONS

Reducing RAND() invocations. Packing multiple keys in a node. Reverse iteration with a stack. Reusing nodes with memory pools.

20

SKIP LISTS: DONE RIGHT Ticki(?) Blog 2016

slide-52
SLIDE 52

CMU 15-721 (Spring 2017)

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

22

K2 V2 K3 V3 K6 V6

Source: Ticki

slide-53
SLIDE 53

CMU 15-721 (Spring 2017)

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

22

K2 V2 K3 V3 K6 V6

  • Source: Ticki
slide-54
SLIDE 54

CMU 15-721 (Spring 2017)

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

22

K2 V2 K3 V3 K6 V6

  • Insert K4

Source: Ticki

slide-55
SLIDE 55

CMU 15-721 (Spring 2017)

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

22

K2 V2 K3 V3 K6 V6

  • K4

V4

Insert K4

Source: Ticki

slide-56
SLIDE 56

CMU 15-721 (Spring 2017)

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

22

K2 V2 K3 V3 K6 V6

  • K4

V4

Search K6

Source: Ticki

slide-57
SLIDE 57

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

Source: Mark Papadakis

slide-58
SLIDE 58

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5

Source: Mark Papadakis

slide-59
SLIDE 59

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Source: Mark Papadakis

slide-60
SLIDE 60

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Source: Mark Papadakis

slide-61
SLIDE 61

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2 K2<K4

Source: Mark Papadakis

slide-62
SLIDE 62

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2 K2<K4

Source: Mark Papadakis

slide-63
SLIDE 63

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack:

K2<K4

Source: Mark Papadakis

slide-64
SLIDE 64

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2

K2<K4

Source: Mark Papadakis

slide-65
SLIDE 65

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2 K3

K2<K4

Source: Mark Papadakis

slide-66
SLIDE 66

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2 K4 K3

K2<K4

Source: Mark Papadakis

slide-67
SLIDE 67

CMU 15-721 (Spring 2017)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

23

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2 K4 K3

K2<K4

Source: Mark Papadakis

slide-68
SLIDE 68

CMU 15-721 (Spring 2017)

INDEX IMPLEMENTATION ISSUES

Memory Pools Garbage Collection Non-Unique Keys Variable-length Keys

24

slide-69
SLIDE 69

CMU 15-721 (Spring 2017)

MEMORY POOLS

We don’t want to be calling malloc and free anytime we need to add or delete a node. If all the nodes are the same size, then the index can maintain a pool of available nodes.

→ Insert: Grab a free node, otherwise create a new one. → Delete: Add the node back to the free pool.

Need some policy to decide when to retract the pool size.

25

slide-70
SLIDE 70

CMU 15-721 (Spring 2017)

GARBAGE COLLECTION

We need to know when it is safe to reclaim memory for deleted nodes in a latch-free index.

→ Reference Counting → Epoch-based Reclamation → Hazard Pointers → Many others…

26

K2 V2 K3 V3 K4 V4

slide-71
SLIDE 71

CMU 15-721 (Spring 2017)

GARBAGE COLLECTION

We need to know when it is safe to reclaim memory for deleted nodes in a latch-free index.

→ Reference Counting → Epoch-based Reclamation → Hazard Pointers → Many others…

26

K2 V2 K3 V3 K4 V4

X

slide-72
SLIDE 72

CMU 15-721 (Spring 2017)

GARBAGE COLLECTION

We need to know when it is safe to reclaim memory for deleted nodes in a latch-free index.

→ Reference Counting → Epoch-based Reclamation → Hazard Pointers → Many others…

26

K2 V2 K4 V4

slide-73
SLIDE 73

CMU 15-721 (Spring 2017)

GARBAGE COLLECTION

We need to know when it is safe to reclaim memory for deleted nodes in a latch-free index.

→ Reference Counting → Epoch-based Reclamation → Hazard Pointers → Many others…

26

K2 V2 K4 V4

slide-74
SLIDE 74

CMU 15-721 (Spring 2017)

REFERENCE COUNTING

Maintain a counter for each node to keep track of the number of threads that are accessing it.

→ Increment the counter before accessing. → Decrement it when finished. → A node is only safe to delete when the count is zero.

This has bad performance for multi-core CPUs

→ Incrementing/decrementing counters causes a lot of cache coherence traffic.

27

slide-75
SLIDE 75

CMU 15-721 (Spring 2017)

OBSERVATION

We don’t actually care about the actual value of the reference counter. We only need to know when it reaches zero. We don’t have to perform garbage collection immediately when the counter reaches zero.

28

Source: Stephen Tu

slide-76
SLIDE 76

CMU 15-721 (Spring 2017)

EPOCH GARBAGE COLLECTION

Maintain a global epoch counter that is periodically updated (e.g., every 10 ms).

→ Keep track of what threads enter the index during an epoch and when they leave.

Mark the current epoch of a node when it is marked for deletion.

→ The node can be reclaimed once all threads have left that epoch (and all preceding epochs).

Also known as Read-Copy-Update (RCU) in Linux.

29

slide-77
SLIDE 77

CMU 15-721 (Spring 2017)

NON-UNIQUE INDEXES

Approach #1: Duplicate Keys

→ Use the same node layout but store duplicate keys multiple times.

Approach #2: Value Lists

→ Store each key only once and maintain a linked list of unique values.

30

slide-78
SLIDE 78

CMU 15-721 (Spring 2017)

B+Tree Leaf Node

DUPLICATE KEYS

31

Sorted Keys K1 K1 K1 K2 K2 • • • Kn

¤

Prev

¤

Next # Level # Slots Values

¤ ¤ ¤ ¤ ¤ • • • ¤

slide-79
SLIDE 79

CMU 15-721 (Spring 2017)

B+Tree Leaf Node

VALUE LISTS

32

Values

¤ ¤ ¤ ¤ ¤

  • • •

¤

Prev

¤

Next # Level # Slots Sorted Keys K1 K2 K3 K4 K5 • • • Kn

slide-80
SLIDE 80

CMU 15-721 (Spring 2017)

VARIABLE LENGTH KEYS

Approach #1: Pointers

→ Store the keys as pointers to the tuple’s attribute.

Approach #2: Variable Length Nodes

→ The size of each node in the index can vary. → Requires careful memory management.

Approach #3: Padding

→ Always pad the key to be max length of the key type.

Approach #4: Key Map

→ Embed an array of pointers that map to the key + value list within the node.

33

slide-81
SLIDE 81

CMU 15-721 (Spring 2017)

B+Tree Leaf Node

KEY MAP

34

Key+Values

  • • •

¤

Prev

¤

Next # Level # Slots Key Map K1 V1 V2

¤ ¤ ¤

K2 V1 V2 V3

slide-82
SLIDE 82

CMU 15-721 (Spring 2017)

PARTING THOUGHTS

Managing a concurrent index looks a lot like managing a database. Skip List is really easy to implement. Concurrent Skip List is more tricky. Epoch garbage collection is more cache friendly.

35

slide-83
SLIDE 83

CMU 15-721 (Spring 2017)

NEXT CLASS

More OLTP Indexes

→ Microsoft Bw-Tree → HyPer ART

Crash course on performance testing.

36