DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - - PowerPoint PPT Presentation

database system implementation
SMART_READER_LITE
LIVE PREVIEW

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #11: OLTP INDEXES (PART 1) 2 ADMINISTRIVIA Midterm-Topics HW#3 Programming Assignment Due Date Project Topics 3 TODAYS AGENDA T-Tree Skip List Bw-Tree


slide-1
SLIDE 1

DATABASE SYSTEM IMPLEMENTATION

GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #11: OLTP INDEXES (PART 1)

slide-2
SLIDE 2

ADMINISTRIVIA

Midterm-Topics HW#3 Programming Assignment Due Date Project Topics

2

slide-3
SLIDE 3

TODAY’S AGENDA

T-Tree Skip List Bw-Tree

3

slide-4
SLIDE 4

T-TREES

Based on AVL Trees. Instead of storing keys in nodes, store pointers to their original values. Proposed in 1986 from Univ. of Wisconsin Used in TimesTen and other early in-memory DBMSs during the 1990s.

4

A STUDY OF INDEX STRUCTURES FOR MAIN MEMORY DATABASE MANAGEMENT SYSTEMS VLDB 1986

slide-5
SLIDE 5

T-TREES

5

slide-6
SLIDE 6

T-Tree Node

T-TREES

6

Min-K

¤

Max-K

¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

slide-7
SLIDE 7

T-Tree Node

T-TREES

7

Min-K

¤

Max-K

¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

Data Pointers

slide-8
SLIDE 8

T-Tree Node

T-TREES

8

Min-K

¤

Max-K

¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

Node Boundaries

slide-9
SLIDE 9

Key Space (Low→High) T-Tree Node

T-TREES

9

Min-K

¤

Max-K

¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

1 2 3 4 5 6 7

slide-10
SLIDE 10

Key Space (Low→High) T-Tree Node

T-TREES

10

Min-K

¤

Max-K

¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

1 2 3 4 5 6 7 1 2 3 4 5 6 7

slide-11
SLIDE 11

Key Space (Low→High) T-Tree Node

T-TREES

11

Min-K

¤

Max-K

¤

Parent Pointer Right Child Pointer Left Child Pointer

¤ ¤ ¤ ¤

1 2 3 4 5 6 7 1 2 3 4 5 6 7

slide-12
SLIDE 12

T-TREES

Searching in a T-Tree is similar to searching in a binary tree. The key difference is that comparisons are made with the min and max values of the node rather than a single value as in a binary tree node.

12

A STUDY OF INDEX STRUCTURES FOR MAIN MEMORY DATABASE MANAGEMENT SYSTEMS VLDB 1986

slide-13
SLIDE 13

T-TREES

Advantages

→ Uses less memory because it does not store keys inside of each node. → Inner nodes contain key/value pairs (like B-Tree).

Disadvantages

→ Traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access. → Difficult to rebalance and support concurrent access. → Have to chase pointers when scanning range or performing binary search inside of a node.

13

slide-14
SLIDE 14

OBSERVATION

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

14

K1 K2 K3 K4 K6 K5 K7

slide-15
SLIDE 15

OBSERVATION

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

15

K1 K2 K3 K4 K6 K5 K7

slide-16
SLIDE 16

OBSERVATION

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

16

K1 K2 K3 K4 K6 K5 K7

slide-17
SLIDE 17

SKIP LISTS

Multiple levels of linked lists with extra pointers that skip over intermediate nodes. Maintains keys in sorted order without requiring global rebalancing. Probabilistic data structure.

17

SKIP LISTS: A PROBABILISTIC ALTERNATIVE TO BALANCED TREES CACM Volume 33 Issue 6 1990

slide-18
SLIDE 18

SKIP LISTS

A collection of lists at different levels

→ Lowest level is a sorted, singly linked list of all keys → 2nd level links every other key → 3rd level links every fourth key → In general, a level has half the keys of one below it

To insert a new key, flip a coin to decide how many levels to add the new key into. Provides approximate O(log n) search times.

18

slide-19
SLIDE 19

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

19

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-20
SLIDE 20

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

20

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-21
SLIDE 21

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

21

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-22
SLIDE 22

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

22

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-23
SLIDE 23

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

23

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-24
SLIDE 24

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

24

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-25
SLIDE 25

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAMPLE

25

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-26
SLIDE 26

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

26

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

slide-27
SLIDE 27

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

27

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

slide-28
SLIDE 28

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

28

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 V5 K5 K5

slide-29
SLIDE 29

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

29

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 V5 K5 K5

slide-30
SLIDE 30

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

30

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 V5 K5 K5

slide-31
SLIDE 31

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

31

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

slide-32
SLIDE 32

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

32

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

slide-33
SLIDE 33

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

33

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5

slide-34
SLIDE 34

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

34

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2

slide-35
SLIDE 35

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

35

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2 K3<K4

slide-36
SLIDE 36

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

36

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2 K3<K4

slide-37
SLIDE 37

SKIP LISTS: ADVANTAGES

Uses less memory than a typical B+tree (only if you don’t include reverse pointers). Insertions and deletions do not require rebalancing. It is possible to implement a concurrent skip list using only CAS instructions.

37

slide-38
SLIDE 38

CONCURRENT SKIP LIST

Can implement insert and delete without locks using only CaS operations. The data structure only support links in one direction because CaS can only swap one pointer atomically.

38

CONCURRENT MAINTENANCE OF SKIP LISTS

  • Univ. of Maryland Tech Report 1990
slide-39
SLIDE 39

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

39

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

slide-40
SLIDE 40

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

40

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

slide-41
SLIDE 41

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

41

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-42
SLIDE 42

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

42

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-43
SLIDE 43

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

43

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-44
SLIDE 44

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

44

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-45
SLIDE 45

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

45

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5 CaS

slide-46
SLIDE 46

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

46

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5 CaS

slide-47
SLIDE 47

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

47

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5 CaS CaS

slide-48
SLIDE 48

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

48

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5 CaS CaS CaS

slide-49
SLIDE 49

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: INSERT

49

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5 CaS CaS CaS

slide-50
SLIDE 50

SKIP LISTS: DELETE

First logically remove a key from the index by setting a flag to tell threads to ignore. Then physically remove the key once we know that no other thread is holding the reference.

→ Perform CaS to update the predecessor’s pointer.

50

Source: Stephen Tu

slide-51
SLIDE 51

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

51

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

slide-52
SLIDE 52

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

52

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

slide-53
SLIDE 53

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

53

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

slide-54
SLIDE 54

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

54

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-55
SLIDE 55

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

55

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-56
SLIDE 56

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

56

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-57
SLIDE 57

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

57

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-58
SLIDE 58

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

Levels

SKIP LISTS: DELETE

58

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

slide-59
SLIDE 59

CONCURRENT SKIP LIST

Be careful about how you order operations. If the DBMS invokes operation on the index, it can never “fail”

→ A txn can only abort due to higher-level conflicts. → If a CaS fails, then the index will retry until it succeeds.

59

slide-60
SLIDE 60

SKIP LIST OPTIMIZATIONS

Reducing RAND() invocations. Packing multiple keys in a node. Reverse iteration with a stack. Reusing nodes with memory pools.

60

SKIP LISTS: DONE RIGHT Ticki(?) Blog 2016

slide-61
SLIDE 61

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

61

K2 V2 K3 V3 K6 V6

Source: Ticki

slide-62
SLIDE 62

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

62

K2 V2 K3 V3 K6 V6

  • Source: Ticki
slide-63
SLIDE 63

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

63

K2 V2 K3 V3 K6 V6

  • Insert K4

Source: Ticki

slide-64
SLIDE 64

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

64

K2 V2 K3 V3 K6 V6

  • Insert K4

Source: Ticki

slide-65
SLIDE 65

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

65

K2 V2 K3 V3 K6 V6

  • K4

V4

Insert K4

Source: Ticki

slide-66
SLIDE 66

SKIP LIST: COMBINE NODES

Store multiple keys in a single node.

→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.

66

K2 V2 K3 V3 K6 V6

  • K4

V4

Search K6

Source: Ticki

slide-67
SLIDE 67

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

67

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

Source: Mark Papadakis

slide-68
SLIDE 68

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

68

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

Source: Mark Papadakis

slide-69
SLIDE 69

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

69

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5

Source: Mark Papadakis

slide-70
SLIDE 70

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

70

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Source: Mark Papadakis

slide-71
SLIDE 71

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

71

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Source: Mark Papadakis

slide-72
SLIDE 72

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

72

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack:

Source: Mark Papadakis

slide-73
SLIDE 73

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

73

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2

Source: Mark Papadakis

slide-74
SLIDE 74

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

74

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2 K3

Source: Mark Papadakis

slide-75
SLIDE 75

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

75

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2 K4 K3

Source: Mark Papadakis

slide-76
SLIDE 76

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

76

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2 K4 K3

Source: Mark Papadakis

slide-77
SLIDE 77

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

77

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2 K4 K3

Source: Mark Papadakis

slide-78
SLIDE 78

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: REVERSE SEARCH

78

∞ ∞ ∞

P=N P=N/2 P=N/4

Find [K4,K2]

K2<K5 K2=K2

Stack: K2 K4 K3

Source: Mark Papadakis

slide-79
SLIDE 79

OBSERVATION

Because CaS only updates a single address at a time, this limits the design of our data structures

→ We cannot have reverse pointers in a latch-free concurrent Skip List. → We cannot build a latch-free B+Tree.

What if we had an indirection layer that allowed us to update multiple addresses atomically?

79

slide-80
SLIDE 80

BW-TREE

Latch-free B+Tree index

→ Threads never need to set latches or block.

Key Idea #1: Deltas

→ No updates in place → Reduces cache invalidation.

Key Idea #2: Mapping Table

→ Allows for CaS of physical locations of pages.

80

THE BW-TREE: A B-TREE FOR NEW HARDWARE ICDE 2013

slide-81
SLIDE 81

BW-TREE: MAPPING TABLE

81

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer Index Page

slide-82
SLIDE 82

BW-TREE: MAPPING TABLE

82

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer Index Page

102 101 104

slide-83
SLIDE 83

BW-TREE: MAPPING TABLE

83

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer Index Page

102 101 104

slide-84
SLIDE 84

BW-TREE: MAPPING TABLE

84

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer Index Page

slide-85
SLIDE 85

BW-TREE: MAPPING TABLE

85

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

102 104 102 104

Index Page

slide-86
SLIDE 86

BW-TREE: DELTA UPDATES

86

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer Page 102

Source: Justin Levandoski

slide-87
SLIDE 87

BW-TREE: DELTA UPDATES

87

Each update to a page produces a new delta. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer Page 102

Source: Justin Levandoski

slide-88
SLIDE 88

BW-TREE: DELTA UPDATES

88

Each update to a page produces a new delta. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

Source: Justin Levandoski

slide-89
SLIDE 89

BW-TREE: DELTA UPDATES

89

Each update to a page produces a new delta. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

Delta physically points to base page.

Source: Justin Levandoski

slide-90
SLIDE 90

BW-TREE: DELTA UPDATES

90

Each update to a page produces a new delta. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.

Source: Justin Levandoski

slide-91
SLIDE 91

BW-TREE: DELTA UPDATES

91

Each update to a page produces a new delta. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.

Source: Justin Levandoski

slide-92
SLIDE 92

BW-TREE: DELTA UPDATES

92

Each update to a page produces a new delta. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.

Source: Justin Levandoski

slide-93
SLIDE 93

BW-TREE: DELTA UPDATES

93

Each update to a page produces a new delta. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.

Source: Justin Levandoski

slide-94
SLIDE 94

BW-TREE: DELTA UPDATES

94

Each update to a page produces a new delta. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.

Source: Justin Levandoski

slide-95
SLIDE 95

BW-TREE: SEARCH

95

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

slide-96
SLIDE 96

BW-TREE: SEARCH

96

Traverse tree like a regular B+tree. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

slide-97
SLIDE 97

BW-TREE: SEARCH

97

Traverse tree like a regular B+tree. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

If mapping table points to delta chain, stop at first

  • ccurrence of search key.
slide-98
SLIDE 98

BW-TREE: SEARCH

98

Traverse tree like a regular B+tree. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Otherwise, perform binary search on base page. If mapping table points to delta chain, stop at first

  • ccurrence of search key.
slide-99
SLIDE 99

BW-TREE: CONTENTION UPDATES

99

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

slide-100
SLIDE 100

BW-TREE: CONTENTION UPDATES

100

Threads may try to install updates to same state of the page. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

slide-101
SLIDE 101

BW-TREE: CONTENTION UPDATES

101

Threads may try to install updates to same state of the page. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 16

slide-102
SLIDE 102

BW-TREE: CONTENTION UPDATES

102

Threads may try to install updates to same state of the page. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Winner succeeds, any losers must retry or abort

▲Insert 16

slide-103
SLIDE 103

BW-TREE: CONTENTION UPDATES

103

Threads may try to install updates to same state of the page. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Winner succeeds, any losers must retry or abort

▲Insert 16

slide-104
SLIDE 104

BW-TREE: CONTENTION UPDATES

104

Threads may try to install updates to same state of the page. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Winner succeeds, any losers must retry or abort

▲Insert 16

slide-105
SLIDE 105

BW-TREE: CONTENTION UPDATES

105

Threads may try to install updates to same state of the page. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Winner succeeds, any losers must retry or abort

▲Insert 16

X

slide-106
SLIDE 106

BW-TREE: DELTA TYPES

Record Update Deltas

→ Insert/Delete/Update of record on a page

Structure Modification Deltas

→ Split/Merge information

106

slide-107
SLIDE 107

BW-TREE: CONSOLIDATION

107

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

slide-108
SLIDE 108

BW-TREE: CONSOLIDATION

108

Consolidate updates by creating new page with deltas applied. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

slide-109
SLIDE 109

BW-TREE: CONSOLIDATION

109

Consolidate updates by creating new page with deltas applied. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102

slide-110
SLIDE 110

BW-TREE: CONSOLIDATION

110

Consolidate updates by creating new page with deltas applied. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102

▲Insert 50

slide-111
SLIDE 111

BW-TREE: CONSOLIDATION

111

Consolidate updates by creating new page with deltas applied. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102

slide-112
SLIDE 112

BW-TREE: CONSOLIDATION

112

Consolidate updates by creating new page with deltas applied. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

CAS-ing the mapping table address ensures no deltas are missed.

▲Insert 55

New 102

slide-113
SLIDE 113

BW-TREE: CONSOLIDATION

113

Consolidate updates by creating new page with deltas applied. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

CAS-ing the mapping table address ensures no deltas are missed.

▲Insert 55

New 102

slide-114
SLIDE 114

BW-TREE: CONSOLIDATION

114

Consolidate updates by creating new page with deltas applied. Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

CAS-ing the mapping table address ensures no deltas are missed.

▲Insert 55

New 102

Old page + deltas are marked as garbage.

slide-115
SLIDE 115

BW-TREE: GARBAGE COLLECTION

Operations are tagged with an epoch

→ Each epoch tracks the threads that are part of it and the

  • bjects that can be reclaimed.

→ Thread joins an epoch prior to each operation and post

  • bjects that can be reclaimed for the current epoch (not

necessarily the one it joined)

Garbage for an epoch reclaimed only when all threads have exited the epoch.

115

slide-116
SLIDE 116

BW-TREE: GARBAGE COLLECTION

116

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102

Epoch Table

slide-117
SLIDE 117

BW-TREE: GARBAGE COLLECTION

117

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102 CPU1

Epoch Table

slide-118
SLIDE 118

BW-TREE: GARBAGE COLLECTION

118

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102 CPU1

Epoch Table

CPU1

slide-119
SLIDE 119

BW-TREE: GARBAGE COLLECTION

119

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102 CPU2 CPU1

Epoch Table

CPU1

slide-120
SLIDE 120

BW-TREE: GARBAGE COLLECTION

120

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102 CPU2 CPU1

Epoch Table

CPU2 CPU1

slide-121
SLIDE 121

BW-TREE: GARBAGE COLLECTION

121

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102 CPU2 CPU1

Epoch Table

CPU2 CPU1

slide-122
SLIDE 122

BW-TREE: GARBAGE COLLECTION

122

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102 CPU2 CPU1

Epoch Table

CPU2 CPU1

slide-123
SLIDE 123

BW-TREE: GARBAGE COLLECTION

123

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102 CPU2

Epoch Table

CPU2 CPU1

slide-124
SLIDE 124

BW-TREE: GARBAGE COLLECTION

124

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102 CPU2

Epoch Table

CPU2

slide-125
SLIDE 125

BW-TREE: GARBAGE COLLECTION

125

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102

Epoch Table

slide-126
SLIDE 126

BW-TREE: GARBAGE COLLECTION

126

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 55

New 102

Epoch Table

slide-127
SLIDE 127

BW-TREE: GARBAGE COLLECTION

127

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer New 102

Epoch Table

slide-128
SLIDE 128

BW-TREE: STRUCTURE MODIFICATIONS

Split Delta Record

→ Mark that a subset of the base page’s key range is now located at another page. → Use a logical pointer to the new page.

Separator Delta Record

→ Provide a shortcut in the modified page’s parent on what ranges to find the new page.

129

slide-129
SLIDE 129

102 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

130

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer

105

slide-130
SLIDE 130

102 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

131

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

slide-131
SLIDE 131

102 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

132

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

slide-132
SLIDE 132

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

133

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

slide-133
SLIDE 133

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

134

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

slide-134
SLIDE 134

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

135

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Split

slide-135
SLIDE 135

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

136

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Split

XX

slide-136
SLIDE 136

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

137

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Split

XX

slide-137
SLIDE 137

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

138

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Split

XX

slide-138
SLIDE 138

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

139

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Split

XX

slide-139
SLIDE 139

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

140

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Split

XX

slide-140
SLIDE 140

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

141

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Split

XX

slide-141
SLIDE 141

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

142

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Split

XX

[-∞,3) [3,7) [7,∞)

slide-142
SLIDE 142

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

143

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Separator ▲Split

XX

[-∞,3) [3,7) [7,∞)

slide-143
SLIDE 143

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

144

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Separator ▲Split

XX

[-∞,3) [3,7) [7,∞) [5,7)

slide-144
SLIDE 144

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

145

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Separator ▲Split

XX

[-∞,3) [3,7) [7,∞) [5,7)

slide-145
SLIDE 145

102 105 104 101 103

BW-TREE: STRUCTURE MODIFICATIONS

146

Mapping Table

PID Addr 101 102 103 104

Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8

105

5 6

▲Separator ▲Split

XX

[-∞,3) [3,7) [7,∞) [5,7)

slide-146
SLIDE 146

BW-TREE: PERFORMANCE

147

Source: Justin Levandoski

10.4 3.83 2.84 0.56 0.66 0.33 4.23 1.02 0.72

2 4 6 8 10 12 Xbox Synthetic Deduplication

Operations/sec (M)

Bw-Tree B+Tree Skip List

Processor: 1 socket, 4 cores w/ 2×HT

slide-147
SLIDE 147

BW-TREE: PERFORMANCE

148

9.94 15.5 13.3 8.09 29 1.51 2.51 2.78 25.1

10 20 30 40 Insert-Only Read-Only Read/Update

Operations/sec (M)

Open Bw-Tree B+Tree Skip List

Processor: 1 socket, 10 cores w/ 2×HT Workload: 50m Random Integer Keys (64-bit)

Source: Ziqi Wang

slide-148
SLIDE 148

BW-TREE: PERFORMANCE

149

9.94 15.5 13.3 8.09 29 1.51 2.51 2.78 25.1 17.9 30.5 22 44.9 51.5 42.9

20 40 60 Insert-Only Read-Only Read/Update

Operations/sec (M)

Open Bw-Tree B+Tree Skip List Masstree ART

Processor: 1 socket, 10 cores w/ 2×HT Workload: 50m Random Integer Keys (64-bit)

Source: Ziqi Wang

slide-149
SLIDE 149

PARTING THOUGHTS

Managing a concurrent index looks a lot like managing a database. Non-concurrent Skip List is easy to implement. A Bw-Tree is hard to implement.

150

slide-150
SLIDE 150

NEXT CLASS

Let's add latches back in our OLTP indexes! Other implementation issues. Crash course on performance testing.

151