Hash-Based Indexes UMass Amherst March 6, 2008 Slides Courtesy of - - PowerPoint PPT Presentation

hash based indexes
SMART_READER_LITE
LIVE PREVIEW

Hash-Based Indexes UMass Amherst March 6, 2008 Slides Courtesy of - - PowerPoint PPT Presentation

Hash-Based Indexes UMass Amherst March 6, 2008 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k* : Data record with key value k < k , rid of data record with


slide-1
SLIDE 1

1

Hash-Based Indexes

UMass Amherst March 6, 2008

Slides Courtesy of R. Ramakrishnan and J. Gehrke

slide-2
SLIDE 2

2

Introduction

As for any index, 3 alternatives for data entries k*:

Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key k>

Choice orthogonal to the indexing technique

Hash-based indexes are best for equality selections.

Cannot support range searches.

Static and dynamic hashing techniques exist;

trade-offs for dynamic data

slide-3
SLIDE 3

3

Static Hashing

h(k) mod N = bucket to which data entry with key

k belongs. k1k2 can lead to the same bucket.

Static: # buckets (N) fixed

main pages allocated sequentially, never de-allocated;

  • verflow pages if needed.

h(key) mod N h key

Primary bucket pages Overflow pages

2 N-1

slide-4
SLIDE 4

4

Static Hashing (Contd.)

Hash fn works on search key field of record r. Must

distribute values over range 0 ... N-1.

h(key) mod N = (a * key + b) mod N usually works well. a and b are constants; lots known about how to tune h.

Buckets contain data entries. Long overflow chains can develop and degrade

performance.

Extendible and Linear Hashing: Dynamic techniques to fix

this problem.

slide-5
SLIDE 5

5

Extendible Hashing

Situation: Bucket (primary page) becomes full.

Why not re-organize file by doubling # of buckets?

Reading and writing all pages is expensive! Idea: Use directory of pointers to buckets, double # of

buckets by (1) doubling the directory, (2) splitting just the bucket that overflowed!

Directory much smaller than file, so doubling it is much

  • cheaper. Only one page of data entries is split. No
  • verflow page!

Trick lies in how hash function is adjusted!

slide-6
SLIDE 6

6

Example

Directory is array of size 4,

global depth D = 2.

Each bucket has local depth L

(L D)

To find bucket for r, (1) get

h(r), (2) take last `global depth’ # bits of h(r).

If h(r) = 5 = binary 101, Take last 2 bits, go to

bucket pointed to by 01.

13* 00 01 10 11 2 2 2 2 2 LOCAL DEPTH L GLOBAL DEPTH D DIRECTORY Bucket A Bucket B Bucket C Bucket D DATA Entry PAGES 10* 1* 21* 4* 12* 32* 16* 15* 7* 19* 5*

slide-7
SLIDE 7

7

Inserts

If bucket is full, split it

(allocate new page, re- distribute).

If necessary, double the

  • directory. Splitting or not can

be decided by comparing global depth and local depth for the split bucket.

Split if global depth =

local depth.

Don’t otherwise.

13* 00 01 10 11 2 2 2 2 2 LOCAL DEPTH L GLOBAL DEPTH D DIRECTORY Bucket A Bucket B Bucket C Bucket D DATA PAGES 10* 1* 21* 4* 12* 32* 16* 15* 7* 19* 5*

Insert r with h(r)=20?

slide-8
SLIDE 8

8

Insert h(r)=20 (Causes Doubling)

20* 00 01 10 11 2 2 2 2 LOCAL DEPTH 2 2 DIRECTORY GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D Bucket A2 (`split image'

  • f Bucket A)

1* 5* 21*13* 32*16* 10* 15* 7* 19* 4* 12* 19* 2 2 2 000 001 010 011 100 101 110 111 3 3 3 DIRECTORY Bucket A Bucket B Bucket C Bucket D Bucket A2 (`split image'

  • f Bucket A)

32* 1* 5* 21*13* 16* 10* 15* 7* 4* 20* 12* LOCAL DEPTH GLOBAL DEPTH

slide-9
SLIDE 9

9

Points to Note

20 = binary 10100. Last 2 bits (00) tell us r belongs in

A or A2. Last 3 bits needed to tell which.

Global depth of directory: Max # of bits needed to tell

which bucket an entry belongs to.

Local depth of a bucket: # of bits used to determine if an

entry belongs to this bucket.

When does bucket split cause directory doubling?

Before insert, local depth of bucket = global depth. Insert

causes local depth to become > global depth; directory is doubled by copying it over and `fixing’ pointer to split image page. (Use of least significant bits enables efficient doubling via copying of directory!)

slide-10
SLIDE 10

10

Directory Doubling (inserting 8*)

vs.

0001 1001

Least Significant Most Significant

00 01 10 11

  • 2

0100 1100 1011 0001 00 01 10 11 2 0100 1001 1011 1100 000 001 010 011 3

  • 100

101 110 111

  • 000

001 010 011 3 100 101 110 111 0001 0100 1011 1100 1000 1001 2 2 2 2 2 2 2 2 2 3 3 2 0001 1001 1011 2 0100 1100 3 2 1000 3

slide-11
SLIDE 11

11

Comments on Extendible Hashing

If directory fits in memory, equality search answered with

  • ne disk access; else two.

100MB file, 100 bytes/rec, 4K pages 1,000,000 records (as data entries) and 25,000 directory elements;

chances are high that directory will fit in memory.

Directory grows in spurts, and, if the distribution of hash values is

skewed, directory can grow large.

Entries with same key value (duplicates) need overflow pages!

Delete: removal of data entry from bucket

If bucket is empty, can be merged with `split image’. If each directory element points to same bucket as its split image, can halve directory.

slide-12
SLIDE 12

12

Summary

Hash-based indexes: best for equality searches,

cannot support range searches.

Static Hashing can lead to long overflow chains. Extendible Hashing avoids overflow pages by

splitting a full bucket when a new data entry is to be added to it. (But duplicates may require overflow pages.)

Directory to keep track of buckets, doubles periodically. Can get large with skewed data; additional I/O if this does

not fit in main memory.

slide-13
SLIDE 13

1

Tree-Structured Indexes

CMPSCI 645 Mar 6, 2008

Slides Courtesy of R. Ramakrishnan and J. Gehrke

slide-14
SLIDE 14

2

Review

As for any index, 3 alternatives for data entries k*:

Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key k>

Choice is orthogonal to the indexing technique

used to locate data entries k*.

Tree-structured indexing techniques support

both range searches and equality searches.

slide-15
SLIDE 15

3

B+ Tree: Most Widely Used Index

Inserts/deletes keep tree height-balanced. Log F N

cost (F = fanout, N = # leaf pages).

Minimum 50% occupancy (except for root). Each

node contains d <= m <= 2d entries, where d is called the order of the tree.

Supports equality, range-searches, updates

efficiently.

Index Entries Data Entries ("Sequence set") (Direct search)

slide-16
SLIDE 16

4

Example B+ Tree

Search begins at root, and key comparisons

direct it to a leaf.

Search for 5*, 15*, all data entries >= 24* ...

Based on the search for 15*, we know it is not in the tree!

Root

17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13

slide-17
SLIDE 17

5

B+ Trees in Practice

Typical order: 200. Typical fill-factor: 67%.

average fanout = 133

Typical capacities:

Height 4: 1334 = 312,900,700 records Height 3: 1333 = 2,352,637 records

Can often hold top levels in buffer pool:

Level 1 = 1 page = 8 Kbytes Level 2 = 133 pages = 1 Mbyte Level 3 = 17,689 pages = 133 MBytes

slide-18
SLIDE 18

6

Inserting a Data Entry into a B+ Tree

Find correct leaf L. Put data entry onto L.

If L has enough space, done! Else, must split L (into L and a new node L2)

  • Redistribute entries evenly, copy up middle key.
  • Insert index entry pointing to L2 into parent of L.

This can happen recursively

To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.)

Splits “grow” tree; root split increases height.

Tree growth: gets wider or one level taller at top.

slide-19
SLIDE 19

7

Previous Example

Root

17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13

Inserting 8*

slide-20
SLIDE 20

8

Inserting 8* into Example B+ Tree

Observe how

minimum

  • ccupancy is

guaranteed in both leaf and index pg splits.

Note difference

between copy- up and push-up; be sure you understand the reasons for this.

2* 3* 5* 7* 8*

5 Entry to be inserted in parent node. (Note that 5 is continues to appear in the leaf.) s copied up and appears once in the index. Contrast

5 24 30 17 13

Entry to be inserted in parent node. (Note that 17 is pushed up and only this with a leaf split.)

slide-21
SLIDE 21

9

Example B+ Tree After Inserting 8*

Notice that root was split, leading to increase in height. In this example, we can avoid split by re-distributing entries between siblings; but not usually done in practice.

2* 3*

Root

17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8*

slide-22
SLIDE 22

10

Deleting a Data Entry from a B+ Tree

Start at root, find leaf L where entry belongs. Remove the entry.

If L is at least half-full, done! If L has only d-1 entries,

  • Try to re-distribute, borrowing from sibling (adjacent

node with same parent as L).

  • If re-distribution fails, merge L and sibling.

If merge occurred, must delete entry (pointing to L

  • r sibling) from parent of L.

Merge could propagate to root, decreasing height.

slide-23
SLIDE 23

11

Current B+ Tree

2* 3*

Root

17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8*

Delete 19* Delete 20*

slide-24
SLIDE 24

12

Example Tree After (Inserting 8*, Then) Deleting 19* and 20* ...

Deleting 19* is easy. Deleting 20* is done with re-distribution.

Notice how middle key is copied up.

2* 3*

Root

17 30 14* 16* 33* 34* 38* 39* 13 5 7* 5* 8* 22* 24* 27 27* 29*

slide-25
SLIDE 25

13

... And Then Deleting 24*

Must merge. Observe `toss’ of

index entry (on right), and `pull down’ of index entry (below).

30 22* 27* 29* 33* 34* 38* 39* 2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5* 8*

Root

30 13 5 17

slide-26
SLIDE 26

14

Example of Non-leaf Re-distribution

Tree is shown below during deletion of 24*. In contrast to previous example, can re-distribute

entry from left child of root to right child.

Root

13 5 17 20 22 30 14* 16* 17* 18* 20* 33* 34* 38* 39* 22* 27* 29* 21* 7* 5* 8* 3* 2*

slide-27
SLIDE 27

15

After Re-distribution

Intuitively, entries are re-distributed by `pushing

through’ the splitting entry in the parent node.

It suffices to re-distribute index entry with key 20;

we’ve re-distributed 17 as well for illustration.

14* 16* 33* 34* 38* 39* 22* 27* 29* 17* 18* 20* 21* 7* 5* 8* 2* 3*

Root

13 5 17 30 20 22

slide-28
SLIDE 28

16

Prefix Key Compression

Important to increase fan-out. (Why?) Key values in index entries only `direct traffic’;

can often compress them.

E.g., If we have adjacent index entries with search

key values Dannon Yogurt, David Smith and Devarakonda Murthy, we can abbreviate David Smith to Dav. (The other keys can be compressed too ...)

  • Is this correct? Not quite! What if there is a data entry

Davey Jones? (Can only compress David Smith to Davi)

  • In general, while compressing, must leave each index entry

greater than every key value (in any subtree) to its left.

Insert/delete must be suitably modified.

slide-29
SLIDE 29

Prefix key compression

17

Daniel Lee David Smith Devarakonda Dante Wu Darius Rex ... Davey Jones

Compress to ‘Dav’ or ‘Davi’

slide-30
SLIDE 30

18

Bulk Loading of a B+ Tree

If we have a large collection of records, and we

want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow.

Bulk Loading can be done much more efficiently. Initialization: Sort all data entries, insert pointer

to first (leaf) page in a new (root) page.

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ tree Root

slide-31
SLIDE 31

19

Summary of Bulk Loading

Option 1: multiple inserts.

Slow. Does not give sequential storage of leaves.

Option 2: Bulk Loading

Has advantages for concurrency control. Fewer I/Os during build. Leaves will be stored sequentially (and linked, of

course).

Can control “fill factor” on pages.

slide-32
SLIDE 32

20

A Note on `Order’

Order (d) concept replaced by physical space

criterion in practice (`at least half-full’).

Index pages can typically hold many more entries

than leaf pages.

Variable sized records and search keys mean different

nodes will contain different numbers of entries.

Even with fixed length fields, multiple records with

the same search key value (duplicates) can lead to variable-sized data entries (if we use Alternative (3)).

slide-33
SLIDE 33

21

Summary

Tree-structured indexes are ideal for range-

searches, also good for equality searches.

B+ tree is a dynamic structure.

Inserts/deletes leave tree height-balanced; log F N cost. High fanout (F) means depth rarely more than 3 or 4. Almost always better than maintaining a sorted file. Typically, 67% occupancy on average. If data entries are data records, splits can change rids!

slide-34
SLIDE 34

22

Summary (Contd.)

Key compression increases fanout, reduces height. Bulk loading can be much faster than repeated

inserts for creating a B+ tree on a large data set.

Most widely used index in database management

systems because of its versatility. One of the most

  • ptimized components of a DBMS.