Hash-Based Indexes (From Chapter 11) - - PDF document

hash based indexes
SMART_READER_LITE
LIVE PREVIEW

Hash-Based Indexes (From Chapter 11) - - PDF document

Hash-Based Indexes (From Chapter 11) Introduction


slide-1
SLIDE 1

Hash-Based Indexes

(From Chapter 11)

Introduction

As for any index, 3 alternatives for data entries k*: Hash-based indexes are best for equality selections. Static and dynamic hashing techniques exist

Static Hashing

# primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. h(k) mod N = bucket to which data entry with key k belongs. (N = # of buckets)

h(key) mod N h key

Primary bucket pages Overflow pages

2 N-1

slide-2
SLIDE 2

Static Hashing (Contd.)

Buckets contain data entries. Hash fn works on search key field(s) of record r. Must distribute values over range 0 ... N-1.

h(key) =

Long overflow chains can develop and degrade performance

Extendible Hashing

Main idea: If bucket (primary page) becomes full, why not re-organize file by doubling # of buckets? But reading and writing all buckets is expensive!

Idea:

Insert h(r)=14

00 01 10 11 2 2 2 2 LOCAL DEPTH 2 DIRECTORY GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D 1* 5* 21*13* 32*16* 10* 15* 7* 19* 4* 12*

slide-3
SLIDE 3

Insert h(r)=20

00 01 10 11 2 2 2 2 LOCAL DEPTH 2 DIRECTORY GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D 1* 5* 21*13* 32* 16* 10* 15* 7* 19* 4* 12*

Insert h(r)=20

20* 00 01 10 11 2 2 2 2 LOCAL DEPTH 2 2 DIRECTORY GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D Bucket A2 (`split image'

  • f Bucket A)

1* 5* 21*13* 32* 16* 10* 15* 7* 19* 4* 12*

Insert h(r)=32

LOCAL DEPTH DIRECTORY GLOBAL DEPTH Bucket A 1*10* 4* 12*

slide-4
SLIDE 4

Insert h(r)=16

1* 10* 32* 4* 12* 1 1 1 LOCAL DEPTH 1 DIRECTORY GLOBAL DEPTH Bucket A Bucket B

Insert h(r)=20

1* 32* 16* 10* 4* 12* 00 01 10 11 2 1 2 LOCAL DEPTH 2 DIRECTORY GLOBAL DEPTH Bucket A Bucket B Bucket C

Insert h(r)=5, 15, 7, 19

1 2 000 001 010 011 100 101 110 111 3 3 3 DIRECTORY Bucket A Bucket B Bucket C Bucket A2 (`split image'

  • f Bucket A)

32* 1* 16* 10* 4* 20* 12* LOCAL DEPTH GLOBAL DEPTH 5* 15* 7*

slide-5
SLIDE 5

Deletions

Inverse of insertion If removal of data entry makes bucket empty, merge with ‘split image’ If each directory element points to same bucket as its split image, can halve directory

Comments on Extendible Hashing

If directory fits in memory, equality search answered with _____ I/O; else _____

100MB file, 100 bytes/rec, 4K pages contain 1,000,000 records (as data entries) and 25,000 directory elements; chances are high that directory will fit in memory.

Directory grows in spurts, and, if the distribution of hash values is ________, directory can grow large

Linear Hashing

This is another dynamic hashing scheme, an alternative to Extendible Hashing LH handles the problem of long overflow chains without using a directory, and handles duplicates Main idea:

slide-6
SLIDE 6

Inserting h(r) = 43

2 h h 3 (This info is for illustration

  • nly!)

Level=2, N=4 00 01 10 11 000 001 010 011 (The actual contents

  • f the linear hashed

file) Next=0 PRIMARY PAGES 44* 36* 32* 25* 9* 5* 14* 18*10*30* 31*35* 11* 7*

Example (Inserting h(r) = 43)

2 h h 3 Level=2 00 01 10 11 000 001 010 011 Next=0 PRIMARY PAGES 32* 25* 9* 5* 14* 18*10*30* 31*35* 11* 7* OVERFLOW PAGES 43* 44* 36*

Inserting h(r) = 50 (End of a Round)

2 h h3 22* 00 01 10 11 000 001 010 011 00 100 Next=3 01 10 101 110 Level=2 PRIMARY PAGES OVERFLOW PAGES 32* 9* 5* 14* 25* 66* 10* 18* 34* 35* 31* 7* 11* 43* 44* 36* 37*29* 30*

slide-7
SLIDE 7

Overview of LH File

In the middle of a round.

Level h

Buckets that existed at the beginning of this round: this is the range of Next Bucket to be split

  • f other buckets) in this round

Level h search key value ) ( search key value ) ( Buckets split in this round: If is in this range, must use h Level+1 `split image' bucket. to decide if entry is in created (through splitting `split image' buckets:

Summary

Hash-based indexes: best for ______ searches, cannot support _____ searches. Static Hashing can lead to ________________. Extendible Hashing uses directory doubling to avoid ___________

Duplicates may require ________________

Linear hashing avoids directory by splitting in rounds

Naturally handles ______________ Uses overflow buckets (but not very long in practice)