Tree Indexes (Part I) Lecture # 06 Database Systems Andy Pavlo - - PowerPoint PPT Presentation

tree indexes
SMART_READER_LITE
LIVE PREVIEW

Tree Indexes (Part I) Lecture # 06 Database Systems Andy Pavlo - - PowerPoint PPT Presentation

Tree Indexes (Part I) Lecture # 06 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 3 DATA STRUCTURES Internal Meta-data Core Data Storage Temporary Data Structures Table Indexes CMU


slide-1
SLIDE 1

Database Systems 15-445/15-645 Fall 2018 Andy Pavlo Computer Science Carnegie Mellon Univ.

AP AP

Lecture # 06

Tree Indexes

(Part I)

slide-2
SLIDE 2

CMU 15-445/645 (Fall 2018)

DATA STRUCTURES

Internal Meta-data Core Data Storage Temporary Data Structures Table Indexes

3

slide-3
SLIDE 3

CMU 15-445/645 (Fall 2018)

TABLE IN DEXES

A table index is a replica of a subset of a table's columns that are organized and/or sorted for efficient access using a subset of those columns. The DBMS ensures that the contents of the table and the index are logically in sync.

4

slide-4
SLIDE 4

CMU 15-445/645 (Fall 2018)

TABLE IN DEXES

It is the DBMS's job to figure out the best index(es) to use to execute each query. There is a trade-off on the number of indexes to create per database.

→ Storage Overhead → Maintenance Overhead

5

slide-5
SLIDE 5

CMU 15-445/645 (Fall 2018)

TO DAY'S AGEN DA

B+Tree Overview Design Decisions Optimizations

6

slide-6
SLIDE 6

CMU 15-445/645 (Fall 2018)

B- TREE FAM ILY

There is a specific data structure called a B-Tree, but then people also use the term to generally refer to a class of data structures.

→ B-Tree → B+Tree → Blink-Tree → B*Tree

7

slide-7
SLIDE 7

CMU 15-445/645 (Fall 2018)

B+ TREE

A B+Tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in O(log n).

→ Generalization of a binary search tree in that a node can have more than two children. → Optimized for systems that read and write large blocks of data.

8

slide-8
SLIDE 8

CMU 15-445/645 (Fall 2018)

B+ TREE PRO PERTIES

A B+tree is an M-way search tree with the following properties:

→ It is perfectly balanced (i.e., every leaf node is at the same depth). → Every inner node other than the root, is at least half-full M/2-1 ≤ #keys ≤ M-1 → Every inner node with k keys has k+1 non-null children

9

slide-9
SLIDE 9

CMU 15-445/645 (Fall 2018)

B+ TREE EXAM PLE

10

Leaf Nodes Inner Node Sibling Pointers

5 9 6 7 9 13 1 3

slide-10
SLIDE 10

CMU 15-445/645 (Fall 2018)

B+ TREE EXAM PLE

10

Leaf Nodes <5 <9 Inner Node

<value>|<key>

Sibling Pointers

5 9 6 7 9 13 1 3

slide-11
SLIDE 11

CMU 15-445/645 (Fall 2018)

N O DES

Every node in the B+Tree contains an array of key/value pairs.

→ The keys will always be the column or columns that you built your index on → The values will differ based on whether the node is classified as inner nodes or leaf nodes.

The arrays are (usually) kept in sorted key order.

11

slide-12
SLIDE 12

CMU 15-445/645 (Fall 2018)

LEAF N O DE VALUES

Approach #1: Record Ids

→ A pointer to the location of the tuple that the index entry corresponds to.

Approach #2: Tuple Data

→ The actual contents of the tuple is stored in the leaf node. → Secondary indexes have to store the record id as their values.

12

slide-13
SLIDE 13

CMU 15-445/645 (Fall 2018)

B+Tree Leaf Node

B+ TREE LEAF N O DES

13

K1 V1 • • • Kn Vn

¤ ¤

Prev Next PageID PageID

slide-14
SLIDE 14

CMU 15-445/645 (Fall 2018)

B+Tree Leaf Node

B+ TREE LEAF N O DES

13

Key+ Value

K1 V1 • • • Kn Vn

¤ ¤

Prev Next

¤ ¤

PageID PageID

slide-15
SLIDE 15

CMU 15-445/645 (Fall 2018)

B+Tree Leaf Node

B+ TREE LEAF N O DES

13

Sorted Keys

K1 K2 K3 K4 K5 • • • Kn

Values

¤ ¤ ¤ ¤ ¤

  • • • ¤

¤

Prev

¤

Next

#

Level

#

Slots

slide-16
SLIDE 16

CMU 15-445/645 (Fall 2018)

B+Tree Leaf Node

B+ TREE LEAF N O DES

13

Sorted Keys

K1 K2 K3 K4 K5 • • • Kn

Values

¤ ¤ ¤ ¤ ¤

  • • • ¤

¤

Prev

¤

Next

#

Level

#

Slots

slide-17
SLIDE 17

CMU 15-445/645 (Fall 2018)

B- TREE VS. B+ TREE

The original B-Tree from 1972 stored keys + values in all nodes in the tree.

→ More space efficient since each key only appears once in the tree.

A B+Tree only stores values in leaf nodes. Inner nodes only guide the search process.

14

slide-18
SLIDE 18

CMU 15-445/645 (Fall 2018)

B+ TREE IN SERT

Find correct leaf L. Put data entry into L in sorted order. If L has enough space, done! Else, must split L into L and a new node L2

→ Redistribute entries evenly, copy up middle key. → Insert index entry pointing to L2 into parent of L.

To split inner node, redistribute entries evenly, but push up middle key.

15

Source: Chris Re

slide-19
SLIDE 19

CMU 15-445/645 (Fall 2018)

B+ TREE VISUALIZATIO N

https://cmudb.io/btree

Source: David Gales (Univ. of San Francisco)

16

slide-20
SLIDE 20

CMU 15-445/645 (Fall 2018)

B+ TREE DELETE

Start at root, find leaf L where entry belongs. Remove the entry. If L is at least half-full, done! If L has only M/2-1 entries,

→ Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). → If re-distribution fails, merge L and sibling.

If merge occurred, must delete entry (pointing to L

  • r sibling) from parent of L.

17

Source: Chris Re

slide-21
SLIDE 21

CMU 15-445/645 (Fall 2018)

B+ TREES IN PRACTICE

Typical Fill-Factor: 67%.

→ Average Fanout = 2*100*0.67 = 134

Typical Capacities:

→ Height 4: 1334 = 312,900,721 entries → Height 3: 1333 = 2,406,104 entries

Pages per level:

→ Level 1 = 1 page = 8 KB → Level 2 = 134 pages = 1 MB → Level 3 = 17,956 pages = 140 MB

18

slide-22
SLIDE 22

CMU 15-445/645 (Fall 2018)

CLUSTERED IN DEXES

The table is stored in the sort order specified by the primary key.

→ Can be either heap- or index-organized storage.

Some DBMSs always use a clustered index.

→ If a table doesn’t include a pkey, the DBMS will automatically make a hidden row id pkey.

Other DBMSs cannot use them at all.

19

slide-23
SLIDE 23

CMU 15-445/645 (Fall 2018)

SELECTIO N CO N DITIO N S

The DBMS can use a B+Tree index if the query provides any of the attributes of the search key. Example: Index on <a,b,c>

→ Supported: (a=5 AND b=3) → Supported: (b=3).

Not all DBMSs support this. For hash index, we must have all attributes in search key.

20

slide-24
SLIDE 24

CMU 15-445/645 (Fall 2018)

SELECTIO N CO N DITIO N S

21

Find Key=(A,B)

A,C B,B C,C A,C B,A A,A A,B B,B B,C C,C C,D

slide-25
SLIDE 25

CMU 15-445/645 (Fall 2018)

SELECTIO N CO N DITIO N S

21

Find Key=(A,B) Find Key=(*,B)

A,C B,B C,C A,C B,A A,A A,B B,B B,C C,C C,D

slide-26
SLIDE 26

CMU 15-445/645 (Fall 2018)

SELECTIO N CO N DITIO N S

21

Find Key=(A,B) Find Key=(*,B)

A,C B,B C,C A,C B,A A,A A,B B,B B,C C,C C,D

*,B < C,C

slide-27
SLIDE 27

CMU 15-445/645 (Fall 2018)

B+ TREE DESIGN CH O ICES

Node Size Merge Threshold Variable Length Keys Non-Unique Indexes Intra-Node Search

22

slide-28
SLIDE 28

CMU 15-445/645 (Fall 2018)

N O DE SIZE

The slower the disk, the larger the optimal node size for a B+Tree.

→ HDD ~1MB → SSD: ~10KB → In-Memory: ~512B

Optimal sizes can vary depending on the workload

→ Leaf Node Scans vs. Root-to-Leaf Traversals

23

slide-29
SLIDE 29

CMU 15-445/645 (Fall 2018)

M ERGE TH RESH O LD

Some DBMSs don't always merge nodes when it is half full. Delaying a merge operation may reduce the amount of reorganization. May be better to just let underflows to exist and then periodically rebuild entire tree.

24

slide-30
SLIDE 30

CMU 15-445/645 (Fall 2018)

VARIABLE LEN GTH KEYS

Approach #1: Pointers

→ Store the keys as pointers to the tuple’s attribute.

Approach #2: Variable Length Nodes

→ The size of each node in the B+Tree can vary. → Requires careful memory management.

Approach #3: Key Map

→ Embed an array of pointers that map to the key + value list within the node.

25

slide-31
SLIDE 31

CMU 15-445/645 (Fall 2018)

N O N- UN IQ UE IN DEXES

Approach #1: Duplicate Keys

→ Use the same leaf node layout but store duplicate keys multiple times.

Approach #2: Value Lists

→ Store each key only once and maintain a linked list of unique values.

26

slide-32
SLIDE 32

CMU 15-445/645 (Fall 2018)

B+Tree Leaf Node

N O N- UN IQ UE: DUPLICATE KEYS

27

Sorted Keys

K1 K1 K1 K2 K2 • • • Kn

¤

Prev

¤

Next

#

Level

#

Slots Values

¤ ¤ ¤ ¤ ¤

  • • • ¤
slide-33
SLIDE 33

CMU 15-445/645 (Fall 2018)

B+Tree Leaf Node

N O N- UN IQ UE: VALUE LISTS

28

Values

¤ ¤ ¤ ¤ ¤

  • • •

¤

Prev

¤

Next

#

Level

#

Slots Sorted Keys

K1 K2 K3 K4 K5 • • • Kn

slide-34
SLIDE 34

CMU 15-445/645 (Fall 2018)

IN TRA- N O DE SEARCH

Approach #1: Linear

→ Scan node keys from beginning to end.

Approach #2: Binary

→ Jump to middle key, pivot left/right depending on comparison.

Approach #3: Interpolation

→ Approximate location of desired key based

  • n known distribution of keys.

29

Find Key=8

5 6 7 8 9 10 4

slide-35
SLIDE 35

CMU 15-445/645 (Fall 2018)

IN TRA- N O DE SEARCH

Approach #1: Linear

→ Scan node keys from beginning to end.

Approach #2: Binary

→ Jump to middle key, pivot left/right depending on comparison.

Approach #3: Interpolation

→ Approximate location of desired key based

  • n known distribution of keys.

29

Find Key=8

5 6 7 8 9 10 4

slide-36
SLIDE 36

CMU 15-445/645 (Fall 2018)

IN TRA- N O DE SEARCH

Approach #1: Linear

→ Scan node keys from beginning to end.

Approach #2: Binary

→ Jump to middle key, pivot left/right depending on comparison.

Approach #3: Interpolation

→ Approximate location of desired key based

  • n known distribution of keys.

29

Find Key=8

5 6 7 8 9 10 4 5 6 7 8 9 10 4

slide-37
SLIDE 37

CMU 15-445/645 (Fall 2018)

IN TRA- N O DE SEARCH

Approach #1: Linear

→ Scan node keys from beginning to end.

Approach #2: Binary

→ Jump to middle key, pivot left/right depending on comparison.

Approach #3: Interpolation

→ Approximate location of desired key based

  • n known distribution of keys.

29

Find Key=8

5 6 7 8 9 10 4 5 6 7 8 9 10 4

slide-38
SLIDE 38

CMU 15-445/645 (Fall 2018)

IN TRA- N O DE SEARCH

Approach #1: Linear

→ Scan node keys from beginning to end.

Approach #2: Binary

→ Jump to middle key, pivot left/right depending on comparison.

Approach #3: Interpolation

→ Approximate location of desired key based

  • n known distribution of keys.

29

Find Key=8

5 6 7 8 9 10 4 5 6 7 8 9 10 4

slide-39
SLIDE 39

CMU 15-445/645 (Fall 2018)

IN TRA- N O DE SEARCH

Approach #1: Linear

→ Scan node keys from beginning to end.

Approach #2: Binary

→ Jump to middle key, pivot left/right depending on comparison.

Approach #3: Interpolation

→ Approximate location of desired key based

  • n known distribution of keys.

29

Find Key=8

5 6 7 8 9 10 4 5 6 7 8 9 10 4 5 6 7 8 9 10 4

Offset: 7-(10-8)=5

slide-40
SLIDE 40

CMU 15-445/645 (Fall 2018)

IN TRA- N O DE SEARCH

Approach #1: Linear

→ Scan node keys from beginning to end.

Approach #2: Binary

→ Jump to middle key, pivot left/right depending on comparison.

Approach #3: Interpolation

→ Approximate location of desired key based

  • n known distribution of keys.

29

Find Key=8

5 6 7 8 9 10 4 5 6 7 8 9 10 4 5 6 7 8 9 10 4

Offset: 7-(10-8)=5

slide-41
SLIDE 41

CMU 15-445/645 (Fall 2018)

O PTIM IZATIO N S

Prefix Compression Suffix Truncation Bulk Insert Pointer Swizzling

30

slide-42
SLIDE 42

CMU 15-445/645 (Fall 2018)

PREFIX CO M PRESSIO N

Sorted keys in the same leaf node are likely to have the same prefix. Instead of storing the entire key each time, extract common prefix and store

  • nly unique suffix for each key.

→ Many variations.

31

robbed robbing robot

slide-43
SLIDE 43

CMU 15-445/645 (Fall 2018)

PREFIX CO M PRESSIO N

Sorted keys in the same leaf node are likely to have the same prefix. Instead of storing the entire key each time, extract common prefix and store

  • nly unique suffix for each key.

→ Many variations.

31

robbed robbing robot bed bing

  • t

Prefix: rob

slide-44
SLIDE 44

CMU 15-445/645 (Fall 2018)

SUFFIX TRUN CATIO N

The keys in the inner nodes are only used to "direct traffic".

→ We don't actually need the entire key.

Store a minimum prefix that is needed to correctly route probes into the index.

32

abcdefghijk lmnopqrstuv … … … …

slide-45
SLIDE 45

CMU 15-445/645 (Fall 2018)

SUFFIX TRUN CATIO N

The keys in the inner nodes are only used to "direct traffic".

→ We don't actually need the entire key.

Store a minimum prefix that is needed to correctly route probes into the index.

32

… … … … abc lmn

slide-46
SLIDE 46

CMU 15-445/645 (Fall 2018)

BULK IN SERT

The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up.

33

slide-47
SLIDE 47

CMU 15-445/645 (Fall 2018)

BULK IN SERT

The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up.

33

Keys: 3, 7, 9, 13, 6, 1 Sorted Keys: 1, 3, 6, 7, 9, 13

slide-48
SLIDE 48

CMU 15-445/645 (Fall 2018)

BULK IN SERT

The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up.

33

6 7 9 13 1 3

Keys: 3, 7, 9, 13, 6, 1 Sorted Keys: 1, 3, 6, 7, 9, 13

slide-49
SLIDE 49

CMU 15-445/645 (Fall 2018)

BULK IN SERT

The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up.

33

6 9 6 7 9 13 1 3

Keys: 3, 7, 9, 13, 6, 1 Sorted Keys: 1, 3, 6, 7, 9, 13

slide-50
SLIDE 50

CMU 15-445/645 (Fall 2018)

PO IN TER SWIZZLIN G

Nodes use page ids to reference other nodes in the index. The DBMS has to get the memory location from the page table during traversal. If a page is pinned in the buffer pool, then we can store raw pointers instead

  • f page ids, thereby removing the need

to get address from the page table.

34

slide-51
SLIDE 51

CMU 15-445/645 (Fall 2018)

PO IN TER SWIZZLIN G

Nodes use page ids to reference other nodes in the index. The DBMS has to get the memory location from the page table during traversal. If a page is pinned in the buffer pool, then we can store raw pointers instead

  • f page ids, thereby removing the need

to get address from the page table.

34

6 9 6 7 1 3

Buffer Pool

1

Header

2

Header

3

Header

Find Key>3

slide-52
SLIDE 52

CMU 15-445/645 (Fall 2018)

PO IN TER SWIZZLIN G

Nodes use page ids to reference other nodes in the index. The DBMS has to get the memory location from the page table during traversal. If a page is pinned in the buffer pool, then we can store raw pointers instead

  • f page ids, thereby removing the need

to get address from the page table.

34

6 9 6 7 1 3

Page #2

Buffer Pool

1

Header

2

Header

3

Header

Find Key>3

slide-53
SLIDE 53

CMU 15-445/645 (Fall 2018)

PO IN TER SWIZZLIN G

Nodes use page ids to reference other nodes in the index. The DBMS has to get the memory location from the page table during traversal. If a page is pinned in the buffer pool, then we can store raw pointers instead

  • f page ids, thereby removing the need

to get address from the page table.

34

6 9 6 7 1 3

Page #2

Buffer Pool

1

Header

2

Header

3

Header

Page #2

Find Key>3

slide-54
SLIDE 54

CMU 15-445/645 (Fall 2018)

PO IN TER SWIZZLIN G

Nodes use page ids to reference other nodes in the index. The DBMS has to get the memory location from the page table during traversal. If a page is pinned in the buffer pool, then we can store raw pointers instead

  • f page ids, thereby removing the need

to get address from the page table.

34

6 9 6 7 1 3

Page #2 Page #3

Buffer Pool

1

Header

2

Header

3

Header

Page #2 Page #3

Find Key>3

slide-55
SLIDE 55

CMU 15-445/645 (Fall 2018)

PO IN TER SWIZZLIN G

Nodes use page ids to reference other nodes in the index. The DBMS has to get the memory location from the page table during traversal. If a page is pinned in the buffer pool, then we can store raw pointers instead

  • f page ids, thereby removing the need

to get address from the page table.

34

6 9 6 7 1 3

Buffer Pool

1

Header

2

Header

3

Header

Find Key>3

<Page*> <Page*>

slide-56
SLIDE 56

CMU 15-445/645 (Fall 2018)

CO N CLUSIO N

The venerable B+Tree is always a good choice for your DBMS.

35

slide-57
SLIDE 57

CMU 15-445/645 (Fall 2018)

N EXT CLASS

Skip Lists Radix Trees Inverted Indexes

36