Outline/summary Conventional Indexes Sparse vs. dense Primary - - PowerPoint PPT Presentation

outline summary
SMART_READER_LITE
LIVE PREVIEW

Outline/summary Conventional Indexes Sparse vs. dense Primary - - PowerPoint PPT Presentation

Outline/summary Conventional Indexes Sparse vs. dense Primary vs. secondary B trees B+trees vs. indexed sequential Hashing schemes --> Next Hashing <key> key h(key) Buckets (typically 1 . disk block)


slide-1
SLIDE 1

Outline/summary

  • Conventional Indexes
  • Sparse vs. dense
  • Primary vs. secondary
  • B trees
  • B+trees vs. indexed sequential
  • Hashing schemes
  • -> Next
slide-2
SLIDE 2

key → h(key)

Hashing

<key>

. . .

Buckets (typically 1 disk block)

slide-3
SLIDE 3

. . .

T wo alternatives

records

. . .

(1) key → h(key)

slide-4
SLIDE 4

(2) key → h(key)

Index record

key 1

T wo alternatives

  • Alt (2) for “secondary” search key
slide-5
SLIDE 5

Example hash function

  • Key = ‘x1 x2 … xn’ n byte character

string

  • Have b buckets
  • h: add x1 + x2 + ….. xn

– compute sum modulo b

slide-6
SLIDE 6

 This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash function:  Expected number of keys/bucket is the same for all buckets

slide-7
SLIDE 7

Within a bucket:

  • Do we keep keys sorted?
  • Yes, if CPU time critical

& Inserts/Deletes not too frequent

slide-8
SLIDE 8

Next: example to illustrate inserts, overfmows, deletes

h(K)

slide-9
SLIDE 9

EXAMPLE 2 records/bucket

INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0

1 2 3

d a c b

h(e) = 1

e

slide-10
SLIDE 10

1 2 3

a b c e d

EXAMPLE: deletion Delete: e f

f g

maybe move “g” up

c

d

slide-11
SLIDE 11

Rule of thumb:

  • T

ry to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fjt

  • If < 50%, wasting space
  • If > 80%, overfmows signifjcant

depends on how good hash function is & on # keys/bucket

slide-12
SLIDE 12

How do we cope with growth?

  • Overfmows and reorganizations
  • Dynamic hashing
  • Extensible
  • Linear
slide-13
SLIDE 13

Extensible hashing: two ideas

(a) Use i of b bits output by hash function b h(K) → use i → grows over time….

00110101

b i

slide-14
SLIDE 14

(b) Use directory h(K)[i ] to bucket

. . . . . .

slide-15
SLIDE 15

Example: h(k) is 4 bits; 2 keys/bucket

i = 1

1 1 0001 1001 1100

Insert 1010

1 1100 1010 New directory 2

00 01 10 11

i =

2 2

1

slide-16
SLIDE 16

1 0001 2 1001 1010 2 1100 Insert: 0111 0000

00 01 10 11

2 i = Example continued 0111 0000 0111 0001 2 2

slide-17
SLIDE 17

00 01 10 11

2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1001 1010

000 001 010 011 100 101 110 111

3 i = 3 3

slide-18
SLIDE 18

Extensible hashing: deletion

  • No merging of blocks
  • Merge blocks

and cut directory if possible (Reverse insert procedure)

slide-19
SLIDE 19

Deletion example:

  • Run thru insert example in reverse!
slide-20
SLIDE 20

Extensible hashing

Can handle growing fjles

  • with less wasted space
  • with no full reorganizations

Summary

+

Indirection

(Not bad if directory in memory)

Directory doubles in size

(Now it fjts, now it does not)

slide-21
SLIDE 21

Linear hashing

  • Another dynamic hashing scheme

T wo ideas:

(a) Use i low order bits of hash 01110101

grows b i

(b) Number of buckets in use grows linearly Constraint: 2i-1 ≤ n+1 < 2i

(We take n to be the id of the largest bucket in use, starting at 0.)

slide-22
SLIDE 22

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11 0101 1111 0000 1010 n = 01 (number of last bucket in use)

Future growth buckets

If h(k)[i ] ≤ n, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule 0101

  • can have overfmow chains!
  • insert 0101
slide-23
SLIDE 23

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11 0101 1111 0000 1010 n = 01 (number of last bucket in use)

Future growth buckets

If h(k)[i ] ≤ n, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule 1110

  • insert 1110

bucket h(k)[i ] - 2i -1 is the bucket whose ith bit is fmipped in binary

slide-24
SLIDE 24

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11 0101 1111 0000 1010 n = 01

Future growth buckets

10 1010 0101

  • insert 0101

11 1111

0101

If h(k)[i ] ≤ n, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule

slide-25
SLIDE 25

Example Continued:

How to grow beyond this?

00 01 10 11 1111 1010 0101 0101 0000 n = 11 i = 2 100 101 110 111 3 . . . 100 100 101 101 0101 0101 If h(k)[i ] ≤ n, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule

Constraint: 2i-1 ≤ n+1 < 2i

slide-26
SLIDE 26
  • If U > threshold then increase n

(and maybe i )

 When do we expand fjle?

  • Keep track of: # records

# buckets = U

slide-27
SLIDE 27

Linear Hashing

Can handle growing fjles

  • with less wasted space
  • with no full reorganizations

No indirection like extensible hashing

Summary

+ +

Can still have overfmow chains

slide-28
SLIDE 28

Example: BAD CASE

Very full Very empty Need to move n here… Would waste space...

slide-29
SLIDE 29

Hashing

  • How it works
  • Dynamic hashing
  • Extensible
  • Linear

Summary

slide-30
SLIDE 30
  • Hashing good for probes given key

e.g., SELECT … FROM R WHERE R.A = 5

B+trees vs Hashing

slide-31
SLIDE 31
  • INDEXING (Including B T

rees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5

B+T rees vs Hashing