CS143: Index 1 Topics to Learn Important concepts Dense index vs. - - PowerPoint PPT Presentation

cs143 index
SMART_READER_LITE
LIVE PREVIEW

CS143: Index 1 Topics to Learn Important concepts Dense index vs. - - PowerPoint PPT Presentation

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based index Indexed


slide-1
SLIDE 1

1

CS143: Index

slide-2
SLIDE 2

2

Topics to Learn

  • Important concepts

– Dense index vs. sparse index – Primary index vs. secondary index (= clustering index vs. non-clustering index) – Tree-based vs. hash-based index

  • Tree-based index

– Indexed sequential file – B+-tree

  • Hash-based index

– Static hashing – Extendible hashing

slide-3
SLIDE 3

3

Basic Problem

  • SELECT *

FROM Student WHERE sid = 40

  • How can we answer the query?

sid name GPA 20 Elaine 3.2 70 Peter 2.6 40 Susan 3.7

slide-4
SLIDE 4

4

Random-Order File

  • How do we find sid=40?

sid name GPA 20 Susan 3.5 60 James 1.7 70 Peter 2.6 40 Elaine 3.9 30 Christy 2.9

slide-5
SLIDE 5

5

Sequential File

  • Table sequenced by sid. Find sid=40?

sid name GPA 20 Susan 3.5 30 James 1.7 40 Peter 2.6 50 Elaine 3.9 60 Christy 2.9

slide-6
SLIDE 6

6

Binary Search

  • 100,000 records
  • Q: How many blocks to read?
  • Any better way?

– In a library, how do we find a book?

slide-7
SLIDE 7

7

Basic Idea

  • Build an “index” on the table

– An auxiliary structure to help us locate a record given a “key”

20 60 10 40 80

40

slide-8
SLIDE 8

8

Dense, Primary Index

  • Primary index (clustering index)

– Index on the search key

  • Dense index

– (key, pointer) pair for every record

  • Find the key from index and

follow pointer

– Maybe through binary search

  • Q: Why dense index?

– Isn’t binary search on the file the same?

20 10 40 30 60 50 80 70 100 90 Dense Index

10 20 30 40 50 60 70 80 90 100 110 120

Sequential File

slide-9
SLIDE 9

9

Why Dense Index?

  • Example

– 10,000,000 records (900-bytes/rec) – 4-byte search key, 4-byte pointer – 4096-byte block. Unspanned tuples

  • Q: How many blocks for table (how big)?
  • Q: How many blocks for index (how big)?
slide-10
SLIDE 10

10

Sparse, Primary Index

  • Sparse index

– (key, pointer) pair per every “block” – (key, pointer) pair points to the first record in the block

  • Q: How can we find 60?

Sequential File

20 10 40 30 60 50 80 70 100 90

Sparse Index 10 30 50 70 90 110 130 150

slide-11
SLIDE 11

11

Multi-level index

Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse 2nd level

10 30 50 70 90 110 130 150 170 190 210 230 10 90 170 250 330 410 490 570

1st level Q: Why multi-level index? Q: Does dense, 2nd level index make sense?

slide-12
SLIDE 12

12

Secondary (non-clustering) Index

  • Secondary (non-clustering)

index

– When tuples in the table are not ordered by the index search key

  • Index on a non-search-key for

sequential file

  • Unordered file
  • Q: What index?

– Does sparse index make sense?

Sequence field

50 30 70 20 40 80 10 100 60 90

slide-13
SLIDE 13

13

Sparse and secondary index?

50 30 70 20 40 80 10 100 60 90

30 20 80 100 90 ...

slide-14
SLIDE 14

14

Secondary index

50 30 70 20 40 80 10 100 60 90

10 20 30 40 50 60 70 ... 10 50 90 ...

sparse High level

  • First level is always dense
  • Sparse from the second level
slide-15
SLIDE 15

15

Important terms

  • Dense index vs. sparse index
  • Primary index vs. secondary index

– Clustering index vs. non-clustering index

  • Multi-level index
  • Indexed sequential file

– Sometimes called ISAM (indexed sequential access method)

  • Search key ( ≠ primary key)
slide-16
SLIDE 16

16

Insertion

20 10 30 50 40 60

10 30 40 60

Insert 35

Q: Do we need to update higher-level index?

35

slide-17
SLIDE 17

17

Insertion

10 30 50 40 60

10 30 40 60

Insert 15

Q: Do we need to update higher-level index?

15 20

slide-18
SLIDE 18

18

Insertion

10 50 40 60

10 30 40 60

Q: Do we need to update higher-level index?

20 20 30 15 Insert 15

slide-19
SLIDE 19

19

Potential performance problem

After many insertions…

10 20 30 40 50 60 70 80 90 39 31 35 36 32 38 34 33

  • verflow pages

(not sequential)

Main index

slide-20
SLIDE 20

20

Traditional Index (ISAM)

  • Advantage

– Simple – Sequential blocks

  • Disadvantage

– Not suitable for updates – Becomes ugly (loses sequentiality and balance) over time

slide-21
SLIDE 21

21

B+Tree

  • Most popular index structure in RDBMS
  • Advantage

– Suitable for dynamic updates – Balanced – Minimum space usage guarantee

  • Disadvantage

– Non-sequential index blocks

slide-22
SLIDE 22

22

B+Tree Example (n=3)

20 30 50 80 90 70 50 80 70 Leaf Non leaf root

20 Susan 2.7 30 James 3.6 50 Peter 1.8 … … …

... ... ...

Balanced: All leaf nodes are at the same level

slide-23
SLIDE 23

23

  • n: max # of pointers in a node
  • All pointers (except the last one) point to tuples
  • At least half of the pointers are used.

(more precisely, ⎡(n+1)/2⎤ pointers)

Sample Leaf Node (n=3)

20 30

From a non-leaf node Last pointer: to the next leaf node

20 Susan 2.7 30 James 3.6 50 Peter 1.8 … … …

points to tuple

slide-24
SLIDE 24

24

  • Points to the nodes one-level below
  • No direct pointers to tuples
  • At least half of the ptrs used (precisely, ⎡n/2⎤)
  • except root, where at least 2 ptrs used

Sample Non-leaf Node (n=3)

23 56

To keys 23≤ k<56 To keys 56≤ k To keys k<23

slide-25
SLIDE 25

25

  • Find a greater key and follow the link on the left

(Algorithm: Figure 12.10 on textbook)

  • Find 30, 60, 70?

Search on B+tree

20 30 80 90 70 50 80 70 50

slide-26
SLIDE 26

26

Nodes are never too empty

  • Use at least

Non-leaf: ⎡n/2⎤ pointers Leaf: ⎡(n+1)/2⎤ pointers

full node min. node

Non-leaf Leaf

n=4

5 8 10 5 5 8 10 5 8

slide-27
SLIDE 27

27

Non-leaf (non-root) n n-1 ⎡n/2⎤

⎡n/2⎤-1

Leaf (non-root) n n-1 Root n n-1 2 1 Max Max Min Min Ptrs keys ptrs keys

⎡(n+1)/2⎤

⎡(n-1)/2⎤

Number of Ptrs/Keys for B+tree

slide-28
SLIDE 28

28

(a) simple case (no overflow) (b) leaf overflow (c) non-leaf overflow (d) new root

B+Tree Insertion

slide-29
SLIDE 29

29

(a) Simple case (no overflow)

slide-30
SLIDE 30

30

  • Insert 60

Insertion (Simple Case)

20 30 80 90 70 50 80 70 50

slide-31
SLIDE 31

31

  • Insert 60

Insertion (Simple Case)

20 30 80 90 70 50 80 70 50 60

slide-32
SLIDE 32

32

(b) Leaf overflow

slide-33
SLIDE 33

33

  • Insert 55
  • No space to store 55

Insertion (Leaf Overflow)

20 30 50 60 80 90 70 50 80 70

slide-34
SLIDE 34

34

50 55

  • Insert 55
  • Split the leaf into two. Put the keys half and half

Insertion (Leaf Overflow)

20 30 80 90 60

Overflow!

70 50 80 70

slide-35
SLIDE 35

35

  • Insert 55

Insertion (Leaf Overflow)

20 30 50 55 80 90 70 50 80 70 60

slide-36
SLIDE 36

36

  • Insert 55
  • Copy the first key of the new node to parent

Insertion (Leaf Overflow)

20 30 50 55 80 90 60 70 50 80 70 60

slide-37
SLIDE 37

37

  • Insert 55

Insertion (Leaf Overflow)

20 30 50 55 80 90

  • Q: After split, leaf nodes always half full?

No overflow. Stop 70 50 80 70 60 60

slide-38
SLIDE 38

38

(c) Non-leaf overflow

slide-39
SLIDE 39

39

Insertion (Non-leaf Overflow)

  • Insert 52

20 30 50 55 50 60 Leaf overflow. Split and copy the first key of the new node 60 70

slide-40
SLIDE 40

40

Insertion (Non-leaf Overflow)

  • Insert 52

20 30 50 52 50 60 55 60 70

slide-41
SLIDE 41

41

Insertion (Non-leaf Overflow)

  • Insert 52

20 30 50 52 50 60 55 55 60 70

slide-42
SLIDE 42

42

Insertion (Non-leaf Overflow)

  • Insert 52

20 30 50 52 50 55 60

Overflow!

55 60 70

slide-43
SLIDE 43

43

Insertion (Non-leaf Overflow)

  • Insert 52

20 30 50 52 50 55 Split the node into two. Move up the key in the middle. 60 55 60 70

slide-44
SLIDE 44

44

Insertion (Non-leaf Overflow)

  • Insert 52

20 30 50 52 55 Middle key 55 60 50 60 70

slide-45
SLIDE 45

45

Insertion (Non-leaf Overflow)

  • Insert 52

20 30 50 52 55 70 No overflow. Stop Q: After split, non-leaf at least half full? 55 60 50 60

slide-46
SLIDE 46

46

(d) New root

slide-47
SLIDE 47

47

Insertion (New Root Node)

  • Insert 25

20 30 50 55 50 60 60

slide-48
SLIDE 48

48

Insertion (New Root Node)

  • Insert 25

20 25 50 55 50 60

30

Overflow!

60 30

slide-49
SLIDE 49

49

Insertion (New Root Node)

  • Insert 25

20 25 50 55 50 60

30

Split and move up the mid-key. Create new root

60 30

slide-50
SLIDE 50

50

Insertion (New Root Node)

  • Insert 25

20 25 50 55

  • Q: At least 2 ptrs at root?

60 60 30 30 50

slide-51
SLIDE 51

51

B+Tree Insertion

  • Leaf node overflow

– The first key of the new node is copied to the parent

  • Non-leaf node overflow

– The middle key is moved to the parent

  • Detailed algorithm: Figure 12.13
slide-52
SLIDE 52

52

B+Tree Deletion

(a) Simple case (no underflow) (b) Leaf node, coalesce with neighbor (c) Leaf node, redistribute with neighbor (d) Non-leaf node, coalesce with neighbor (e) Non-leaf node, redistribute with neighbor In the examples, n = 4

– Underflow for non-leaf when fewer than ⎡n/2⎤ = 2 ptrs – Underflow for leaf when fewer than ⎡(n+1)/2⎤ = 3 ptrs – Nodes are labeled as a, b, c, d, …

slide-53
SLIDE 53

53

(a) Simple case (no underflow)

slide-54
SLIDE 54

54

(a) Simple case

  • Delete 25

20 25 30 40 50 20 40 60

a b c d e

slide-55
SLIDE 55

55

(a) Simple case

  • Delete 25

– Underflow? Min 3 ptrs. Currently 3 ptrs 20 30 20 40 60

a b c d e

Underflow?

40 50

slide-56
SLIDE 56

56

(b) Leaf node, coalesce with neighbor

slide-57
SLIDE 57

57

(b) Coalesce with sibling (leaf)

  • Delete 50

20 30 40 50 20 40 60 60

b c d a e

slide-58
SLIDE 58

58

(b) Coalesce with sibling (leaf)

  • Delete 50

– Underflow? Min 3 ptrs, currently 2.

20 40 60 60

b c d a

Underflow?

40 20 30

e

slide-59
SLIDE 59

59

(b) Coalesce with sibling (leaf)

  • Delete 50

– Try to merge with a sibling

20 40 60 60

b c d a

underflow! Can be merged?

40 20 30

e

slide-60
SLIDE 60

60

(b) Coalesce with sibling (leaf)

  • Delete 50

– Merge c and d. Move everything on the right to the left. 20 40 60 60

b c d a

Merge

40 20 30

e

slide-61
SLIDE 61

61

(b) Coalesce with sibling (leaf)

  • Delete 50

– Once everything is moved, delete d

20 30 40 20 40 60 60

b c d a e

slide-62
SLIDE 62

62

(b) Coalesce with sibling (leaf)

  • Delete 50

– After leaf node merge,

  • From its parent, delete the pointer and key to the

deleted node 20 30 40 20 40 60 60

b c d e a

slide-63
SLIDE 63

63

(b) Coalesce with sibling (leaf)

  • Delete 50

– Check underflow at a. Min 2 ptrs, currently 3 20 30 40 20 60 60

b c a

Underflow?

e

slide-64
SLIDE 64

64

(c) Leaf node, redistribute with neighbor

slide-65
SLIDE 65

65

(c) Redistribute (leaf)

  • Delete 50

20 40 60 60

b c d e a

40 50 20 25 30

slide-66
SLIDE 66

66

(c) Redistribute (leaf)

  • Delete 50

– Underflow? Min 3 ptrs, currently 2 – Check if d can be merged with its sibling c

  • r e

– If not, redistribute the keys in d with a sibling

  • Say, with c

20 40 60 60

b c d e a

Underflow? Can be merged?

40 20 25 30

slide-67
SLIDE 67

67

(c) Redistribute (leaf)

  • Delete 50

– Redistribute c and d, so that nodes c and d are roughly “half full”

  • Move the key 30 and its tuple pointer to the d

20 40 60 60

b c d e a

Redistribute

40 20 25 30

slide-68
SLIDE 68

68

(c) Redistribute (leaf)

  • Delete 50

– Update the key in the parent

20 25 20 40 60 60

b c d e a

30 40

slide-69
SLIDE 69

69

(c) Redistribute (leaf)

  • Delete 50

– No underflow at a. Done.

20 40 60 60

b c d e a

30 Underflow? 20 25 30 40

slide-70
SLIDE 70

70

(d) Non-leaf node, coalesce with neighbor

slide-71
SLIDE 71

71

(d) Coalesce (non-leaf)

  • Delete 20

– Underflow! Merge d with e.

  • Move everything in the right to the left

70

a b c d e f g

50 90 50 60 70 30 30 40 10 20

slide-72
SLIDE 72

72

(d) Coalesce (non-leaf)

  • Delete 20

– From the parent node, delete pointer and key to the deleted node

70

a b c d e f g

50 90 50 60 70 30 10 30 40

slide-73
SLIDE 73

73

(d) Coalesce (non-leaf)

  • Delete 20

– Underflow at b? Min 2 ptrs, currently 1. – Try to merge with its sibling.

  • Nodes b and c: 3 ptrs in total. Max 4 ptrs.
  • Merge b and c.

70

a b c d f g

underflow! Can be merged?

50 90 50 60 70 10 30 40

slide-74
SLIDE 74

74

(d) Coalesce (non-leaf)

  • Delete 20

– Merge b and c

  • Pull down the mid-key 50 in the parent node
  • Move everything in the right node to the left.
  • Very important: when we merge non-leaf nodes,

we always pull down the mid-key in the parent and place it in the merged node.

70

a b c d f g

merge

50 90 50 60 70 10 30 40

slide-75
SLIDE 75

75

(d) Coalesce (non-leaf)

  • Delete 20

– Merge b and c

  • Pull down the mid-key 50 in the parent node
  • Move everything in the right node to the left.
  • Very important: when we merge non-leaf nodes,

we always pull down the mid-key in the parent and place it in the merged node.

70

b c d f g

50 60 70 90 50

a

10 30 40

slide-76
SLIDE 76

76

(d) Coalesce (non-leaf)

70

a b c d f g

90 50 60 50 70

  • Delete 20

– Delete pointer to the merged node.

10 30 40

slide-77
SLIDE 77

77

(d) Coalesce (non-leaf)

70

a b d f g

90 50 60 50 70

  • Delete 20

– Underflow at a? Min 2 ptrs. Currently 2. Done.

10 30 40

slide-78
SLIDE 78

78

(e) Non-leaf node, redistribute with neighbor

slide-79
SLIDE 79

79

(e) Redistribute (non-leaf)

  • Delete 20

– Underflow! Merge d with e.

70 70 90 97

a b c d e f g

50 60 50 99 30 30 40 10 20

slide-80
SLIDE 80

80

(e) Redistribute (non-leaf)

  • Delete 20

– After merge, remove the key and ptr to the deleted node from the parent

70 70 90 97

a b c d e f g

50 60 50 99 30 10 30 40

slide-81
SLIDE 81

81

(e) Redistribute (non-leaf)

  • Delete 20

– Underflow at b? Min 2 ptrs, currently 1. – Merge b with c? Max 4 ptrs, 5 ptrs in total. – If cannot be merged, redistribute the keys with a sibling.

  • Redistribute b and c

70 70 90 97

a b c d f g

underflow! Can be merged?

50 60 50 99 10 30 40

slide-82
SLIDE 82

82

(e) Redistribute (non-leaf)

  • Delete 20

Redistribution at a non-leaf node is done in two steps. Step 1: Temporarily, make the left node b “overflow” by pulling down the mid-key and moving everything to the left. 70 70 90 97

a b c d f g

redistribute

50 60 50 99 10 30 40

slide-83
SLIDE 83

83

(e) Redistribute (non-leaf)

  • Delete 20

Step 2: Apply the “overflow handling algorithm” (the same algorithm used for B+tree insertion) to the

  • verflowed node

– Detailed algorithm in the next slide

70 50 70 90

a b c d f g

redistribute

97

temporary overflow

50 60 99 10 30 40

slide-84
SLIDE 84

84

(e) Redistribute (non-leaf)

  • Delete 20

Step 2: “overflow handling algorithm”

– Pick the mid-key (say 90) in the node and move it to parent. – Move everything to the right of 90 to the empty node c.

70 50 70 90

a b c d f g

redistribute

97 50 60 99 10 30 40

slide-85
SLIDE 85

85

(e) Redistribute (non-leaf)

  • Delete 20

– Underflow at a? Min 2 ptrs, currently 3. Done

70

a b c d f g

50 60 90 99 97 50 70 10 30 40

slide-86
SLIDE 86

86

Important Points

  • Remember:

– For leaf node merging, we delete the mid-key from the parent – For non-leaf node merging/redistribution, we pull down the mid-key from their parent.

  • Exact algorithm: Figure 12.17
  • In practice

– Coalescing is often not implemented

  • Too hard and not worth it
slide-87
SLIDE 87

87

Where does n come from?

  • n determined by

– Size of a node – Size of search key – Size of an index pointer

  • Q: 1024B node, 10B key, 8B ptr à n?
slide-88
SLIDE 88

88

Question on B+tree

  • SELECT *

FROM Student WHERE sid > 60?

20 30 50 60 80 90 70 50 80 70

slide-89
SLIDE 89

89

Summary on tree index

  • Issues to consider

– Sparse vs. dense – Primary (clustering) vs. secondary (non-clustering)

  • Indexed sequential file (ISAM)

– Simple algorithm. Sequential blocks – Not suitable for dynamic environment

  • B+trees

– Balanced, minimum space guarantee – Insertion, deletion algorithms

slide-90
SLIDE 90

90

Index Creation in SQL

  • CREATE INDEX <indexname>

ON <table>(<attr>,<attr>,…)

  • Example

– CREATE INDEX stidx ON Student(sid)

  • Creates a B+tree on the attributes
  • Speeds up lookup on sid
slide-91
SLIDE 91

91

Primary (Clustering) Index

  • MySQL:

– Primary key becomes the clustering index

  • DB2:

– CREATE INDEX idx ON Student(sid) CLUSTER – Tuples in the table are sequenced by sid

  • Oracle: Index-Organized Table (IOT)

– CREATE TABLE T ( ... ) ORGANIZATION INDEX – B+tree on primary key – Tuples are stored at the leaf nodes of B+tree

  • Periodic reorganization may still be necessary

to improve range scan performance

slide-92
SLIDE 92

92

Next topic

  • Hash index

– Static hashing – Extendible hashing

slide-93
SLIDE 93

93

What is a Hash Table?

  • Hash Table

– Hash function

  • h(k): key à integer [0…n]
  • e.g., h(‘Susan’) = 7

– Array for keys: T[0…n] – Given a key k, store it in T[h(k)]

1 Neil 2 3 James 4 Susan 5 h(Susan) = 4 h(James) = 3 h(Neil) = 1

slide-94
SLIDE 94

94

Hashing for DBMS (Static Hashing)

(key, record) . . .

Disk blocks (buckets) search key → h(key)

1 2 3 4

slide-95
SLIDE 95

95

Overflow and Chaining

  • Insert

h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 h(e) = 1

  • Delete

h(b) = 2 h(c) = 1

1 2 3

a b c d e

slide-96
SLIDE 96

96

Major Problem of Static Hashing

  • How to cope with growth?

– Data tends to grow in size – Overflow blocks unavoidable

10 20 30 40 50 60 70 80 90 39 31 35 36 32 38 34 33

  • verflow blocks

hash buckets

slide-97
SLIDE 97

97

(a) Use i of b bits output by hash function

Extendible Hashing (two ideas) 00110101

h(K) → b

use i → grows over time

slide-98
SLIDE 98

98

(b) Use directory that maintains pointers to hash buckets (indirection)

Extendible Hashing (two ideas)

. . . . . . c e hash bucket directory h(c)

slide-99
SLIDE 99

99

Example

  • h(k) is 4 bits; 2 keys/bucket

Insert 0111

1 1 0001 1001 1100 1

1

i = i = i = 0111

slide-100
SLIDE 100

100

Example

Insert 1010

1 1 0001 1001 1100 0111 i = i = 1

1

i = 1010

  • verflow!

Increase i of the bucket. Split it.

slide-101
SLIDE 101

101

Example

Insert 1010

1 1 0001 1001 1100 0111 1010 1010

  • verflow!

2

Redistribute keys based

  • n first i bits

i = 2 i = i = 1

1

i =

slide-102
SLIDE 102

102

Example

Insert 1010

1 2 0001 1001 1010 0111 2 1100 Update ptr in dir to new bkt 1

1

i =

?

If no space, double directory size (increase i)

slide-103
SLIDE 103

103

Example

Insert 1010

1 2 0001 1001 1010 0111 2 1100 1

1

i =

00 01 10 11

2 i = Copy pointers

slide-104
SLIDE 104

104

1

1

i =

Example

Insert 1010

1 2 0001 1001 1010 0111 2 1100

00 01 10 11

2 i =

slide-105
SLIDE 105

105

Example

Insert 0000

1 2 0001 1001 1010 0111 2 1100

00 01 10 11

2 i = 0000 Overflow! Split bucket and increase i

slide-106
SLIDE 106

106

Example

Insert 0000

1 2 0001 1001 1010 0111 2 1100

00 01 10 11

2 i = 2 2 0000 Overflow! Redistribute keys

slide-107
SLIDE 107

107

Example

Insert 0000

1 2 0111 1001 1010 2 1100

00 01 10 11

2 i = 2 2 0000 0001 Update ptr in directory

slide-108
SLIDE 108

108

Example

Insert 0000

1 2 0111 1001 1010 2 1100

00 01 10 11

2 i = 2 2 0000 0001

slide-109
SLIDE 109

109

Insert 0011

2 2 0111 1001 1010 2 1100

00 01 10 11

2 i = 2 0000 0001 0011 Overflow!

Split bucket, increase i, redistribute keys

slide-110
SLIDE 110

110

2 2 0111 1001 1010 2 1100

00 01 10 11

2 i = 2 0011

Update ptr in dir If no space, double directory

3 0000 0001 3

Insert 0011

slide-111
SLIDE 111

111

2 2 0111 1001 1010 2 1100

00 01 10 11

2 i = 2 0011 3 0000 0001 3

Insert 0011

000 001 010 011

3 i =

100 101 110 111

slide-112
SLIDE 112

112

2 2 0111 1001 1010 2 1100

00 01 10 11

2 i = 2 0011 3 0000 0001 3

Insert 0011

000 001 010 011

3 i =

100 101 110 111

slide-113
SLIDE 113

113

Extendible Hashing: Deletion

  • Two options

a) No merging of buckets b) Merge buckets and shrink directory if possible

slide-114
SLIDE 114

114

Delete 1010

1 0001 2 1001 2 1100

00 01 10 11

2 i = a b c 1010

slide-115
SLIDE 115

115

Delete 1010

  • Can we merge a and b? b and c?

1 0001 2 1001 2 1100

00 01 10 11

2 i = a b c

slide-116
SLIDE 116

116

Delete 1010

Decrease i and merge buckets 1 0001 2 1001

00 01 10 11

2 i = a b 2 1100 c 1 1100 Update ptr in directory Q: Can we shrink directory?

slide-117
SLIDE 117

117

Delete 1010

1 0001 2 1001

00 01 10 11

2 i = a b 1 1100 1

1

i =

slide-118
SLIDE 118

118

Bucket Merge Condition

  • Bucket merge condition

– Bucket i’s are the same – First (i-1) bits of the hash key are the same

  • Directory shrink condition

– All bucket i’s are smaller than the directory i

slide-119
SLIDE 119

119

Questions on Extendible Hashing

  • Can we provide minimum space

guarantee?

slide-120
SLIDE 120

120

Space Waste

2 1 4 00010 4 00000 00001 4 i = 3

slide-121
SLIDE 121

121

Hash index summary

  • Static hashing

– Overflow and chaining

  • Extendible hashing

– Can handle growing files

  • No periodic reorganizations

– Indirection

  • Up to 2 disk accesses to access a key

– Directory doubles in size

  • Not too bad if the data is not too large
slide-122
SLIDE 122

122

Hashing vs. Tree

  • Can an extendible-hash index support?

SELECT *

FROM R

WHERE R.A > 5

  • Which one is better, B+tree or

Extendible hashing? SELECT *

FROM R

WHERE R.A = 5