Advanced Implementations of Tables: Balanced Search Trees and - - PowerPoint PPT Presentation

advanced implementations of tables balanced search trees
SMART_READER_LITE
LIVE PREVIEW

Advanced Implementations of Tables: Balanced Search Trees and - - PowerPoint PPT Presentation

Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the desired node The path length can


slide-1
SLIDE 1

Advanced Implementations of Tables: Balanced Search Trees and Hashing

slide-2
SLIDE 2

Binary searching & introduction to trees

2 CMPS 12B, UC Santa Cruz

Balanced Search Trees

Binary search tree operations such as insert, delete,

retrieve, etc. depend on the length of the path to the desired node

The path length can vary from log2(n+1) to O(n)

depending on how balanced or unbalanced the tree is

The shape of the tree is determined by the values of

the items and the order in which they were inserted

slide-3
SLIDE 3

Binary searching & introduction to trees

3 CMPS 12B, UC Santa Cruz

Examples

40 20 60 50 30 10 70 10 40 20 70 50 60 30 Can you get the same tree with different insertion orders?

slide-4
SLIDE 4

Binary searching & introduction to trees

4 CMPS 12B, UC Santa Cruz

2-3 Trees

Each internal node has two or three children All leaves are at the same level 2 children = 2-node, 3 children = 3-node

slide-5
SLIDE 5

Binary searching & introduction to trees

5 CMPS 12B, UC Santa Cruz

2-3 Trees (continued)

2-3 trees are not binary trees (duh) A 2-3 tree of height h always has at least 2h-1 nodes

i.e. greater than or equal to a binary tree of height h

A 2-3 tree with n nodes has height less than or equal

to log2(n+1)

i.e less than or equal to the height of a full binary tree with

n nodes

slide-6
SLIDE 6

Binary searching & introduction to trees

6 CMPS 12B, UC Santa Cruz

Definition of 2-3 Trees

T is a 2-3 tree of height h if T is empty (height 0), OR T is of the form Where r is a node that contains one data item and TL

and TR are 2-3 trees, each of height h-1, and the search key of r is greater than any in TL and less than any in TR, OR

r TL TR

slide-7
SLIDE 7

Binary searching & introduction to trees

7 CMPS 12B, UC Santa Cruz

Definition of 2-3 Trees (continued)

T is of the form Where r is a node that contains two data items and

TL, TM, and TR are 2-3 trees, each of height h-1, and the smaller search key of r is greater than any in TL and less than any in TM and the larger search key in r is greater than any in TM and smaller than any in TR.

r TL TR TM

slide-8
SLIDE 8

Binary searching & introduction to trees

8 CMPS 12B, UC Santa Cruz

Placing Data in a 2-3 tree

  • 1. A 2-node must contain
  • a single data item whose value is
  • greater than any in its left subtree, and
  • smaller than any in its right subtree
  • 2. A 3-mode must contain two data items,
  • the smaller of which is
  • greater than any in its left subtree, and
  • smaller than any in its middle subtree, and
  • the greater of which is
  • greater than any in its middle subtree, and
  • smaller than any in its right subtree
  • 3. A leaf may contain either one or two data items
slide-9
SLIDE 9

Binary searching & introduction to trees

9 CMPS 12B, UC Santa Cruz

2-3 Node Code

import searchkeys.*; public class Tree23Node { private KeyedItem smallitem; private KeyedItem largeItem; private Tree23Node leftChild; private Tree23Node middleChild; private Tree23Node rightChild; }

slide-10
SLIDE 10

Binary searching & introduction to trees

10 CMPS 12B, UC Santa Cruz

Traversing a 2-3 Tree

Just like a binary tree: preorder, inorder, postorder

50 90 20 120 150 100 110 30 40 10 130 140 70 160 60 80

slide-11
SLIDE 11

Binary searching & introduction to trees

11 CMPS 12B, UC Santa Cruz

Searching a 2-3 tree

Same efficiency as a balanced binary search tree:

O(logn)

BUT, easy to keep balanced, unlike binary search

trees

This means that as you insert elements, the balance is

easily maintained, and worst case performance remains O(logn)

slide-12
SLIDE 12

Binary searching & introduction to trees

12 CMPS 12B, UC Santa Cruz

Inserting 39 Into A 2-3 Tree

50 30 70 90 60 10 20 80 160 40

slide-13
SLIDE 13

Binary searching & introduction to trees

13 CMPS 12B, UC Santa Cruz

Inserting 38

50 30 70 90 60 10 20 80 160 39 40

slide-14
SLIDE 14

Binary searching & introduction to trees

14 CMPS 12B, UC Santa Cruz

Inserting 38 (continued)

50 30 70 90 60 10 20 80 160 38 39 40

slide-15
SLIDE 15

Binary searching & introduction to trees

15 CMPS 12B, UC Santa Cruz

Inserting 37

50 30 39 70 90 60 10 20 80 160 38 40

slide-16
SLIDE 16

Binary searching & introduction to trees

16 CMPS 12B, UC Santa Cruz

Inserting 36

50 30 39 70 90 60 10 20 80 160 37 38 40

slide-17
SLIDE 17

Binary searching & introduction to trees

17 CMPS 12B, UC Santa Cruz

Inserting 36 (continued)

50 30 39 70 90 60 10 20 80 160 36 37 38 40

slide-18
SLIDE 18

Binary searching & introduction to trees

18 CMPS 12B, UC Santa Cruz

Inserting 36 (continued)

50 30 37 39 70 90 60 10 20 80 160 36 40 38

slide-19
SLIDE 19

Binary searching & introduction to trees

19 CMPS 12B, UC Santa Cruz

Inserting 75

37 50 30 70 90 60 10 20 80 39 160 36 40 38

slide-20
SLIDE 20

Binary searching & introduction to trees

20 CMPS 12B, UC Santa Cruz

Inserting 77

37 50 30 70 90 60 10 20 75 80 39 160 36 40 38

slide-21
SLIDE 21

Binary searching & introduction to trees

21 CMPS 12B, UC Santa Cruz

Inserting 77 (continued)

37 50 30 70 90 60 10 20 75 77 80 39 160 36 40 38

slide-22
SLIDE 22

Binary searching & introduction to trees

22 CMPS 12B, UC Santa Cruz

Inserting 77 (continued)

37 50 30 70 77 90 60 10 20 75 39 160 36 40 38 80

slide-23
SLIDE 23

Binary searching & introduction to trees

23 CMPS 12B, UC Santa Cruz

Inserting 77 (continued)

37 50 77 30 70 60 10 20 75 39 160 36 40 38 80 90

slide-24
SLIDE 24

Binary searching & introduction to trees

24 CMPS 12B, UC Santa Cruz

Inserting 77 (continued)

37 30 70 60 10 20 75 39 160 36 40 38 80 90 77 50

slide-25
SLIDE 25

Binary searching & introduction to trees

25 CMPS 12B, UC Santa Cruz

Inserting Into A 2-3 Tree

Insert into the leaf node in which the search key

belongs

If the leaf has two values, stop If the leaf has three values, split the node into two

nodes with the smallest and largest values, and

Push the middle value into the parent node Continue with the parent node until either

you push a value into a node that had only one value, or you create a new root node

slide-26
SLIDE 26

Binary searching & introduction to trees

26 CMPS 12B, UC Santa Cruz

Deleting From A 2-3 Tree

The inverse of inserting Delete the value (in a leaf), then Merge empty nodes If necessary, delete empty root

slide-27
SLIDE 27

Binary searching & introduction to trees

27 CMPS 12B, UC Santa Cruz

Redistribute I

P S L L S P

slide-28
SLIDE 28

Binary searching & introduction to trees

28 CMPS 12B, UC Santa Cruz

Merge I

L S S L

slide-29
SLIDE 29

Binary searching & introduction to trees

29 CMPS 12B, UC Santa Cruz

Redistribute II

P S L L S P

a b c d a b c d

slide-30
SLIDE 30

Binary searching & introduction to trees

30 CMPS 12B, UC Santa Cruz

Merge II

L S S L

a b c a b c

slide-31
SLIDE 31

Binary searching & introduction to trees

31 CMPS 12B, UC Santa Cruz

Delete

S L

a b c

S L

a b c

slide-32
SLIDE 32

Binary searching & introduction to trees

32 CMPS 12B, UC Santa Cruz

2-3 Trees: Results

Slightly more complicated than binary search trees,

BUT

2-3 Trees are always balanced Every operation takes O(logn)

slide-33
SLIDE 33

Binary searching & introduction to trees

33 CMPS 12B, UC Santa Cruz

2-3-4 Trees

Slightly less complicated than 2-3 Trees Each node can contain 1–3 values and have 1–4

children

Inserting

Split 4-nodes on the way down Insert into leaf

Deleting:

Only delete from 3-node or 4-node

slide-34
SLIDE 34

Binary searching & introduction to trees

34 CMPS 12B, UC Santa Cruz

Red-Black Trees

2-3 Trees are always balanced

O(logn) time for all operations

2-3-4 Trees are always balanced

O(logn) time for all operations, and Insertion and deletion can be done in a single pass from

root to leaf

But, require slightly more storage per node

Red-Black Trees have the advantage of 2-3-4 trees,

without the overhead

Represent the 2-3-4 Tree as a binary tree with colored

references (red or black)

slide-35
SLIDE 35

Binary searching & introduction to trees

35 CMPS 12B, UC Santa Cruz

Red-Black Representation of a 4-Node

S M L

a b c d

M S L

a b c d

slide-36
SLIDE 36

Binary searching & introduction to trees

36 CMPS 12B, UC Santa Cruz

Red-Black Representation of a 3-Node

S L

a b c

L S

a b c

S L

b c a

slide-37
SLIDE 37

Binary searching & introduction to trees

37 CMPS 12B, UC Santa Cruz

2-3-4 Tree

37 50 30 35 70 90 60 10 20 80 39 160 32 33 34 40 38 36

slide-38
SLIDE 38

Binary searching & introduction to trees

38 CMPS 12B, UC Santa Cruz

Equivalent Red-Black Tree

50 30 90 70 20 100 39 80 33 40 38 36 37 35 10 32 34 60

slide-39
SLIDE 39

Binary searching & introduction to trees

39 CMPS 12B, UC Santa Cruz

Red-Black Tree Node

public class RBTreeNode { public static final int RED = 0; public static final int BLACK = 1; private KeyedItem item; private RBTreeNode leftChild; private RBTreeNode rightChild; private int leftColor; private int rightColor; }

slide-40
SLIDE 40

Binary searching & introduction to trees

40 CMPS 12B, UC Santa Cruz

Searching and Traversing Red-Black Trees

Red-Black trees are binary search trees

Just search them the same way you would any other binary

search tree

Inserting

Split 4-nodes on the way down by changing paired red

child references to black

Insert into a leaf

slide-41
SLIDE 41

Binary searching & introduction to trees

41 CMPS 12B, UC Santa Cruz

Splitting a 4-node root

M S L

a b c d

M S L

a b c d

slide-42
SLIDE 42

Binary searching & introduction to trees

42 CMPS 12B, UC Santa Cruz

Splitting a 4-node

M S L

a b c d

M S L

a b c d

P

e

P

e

slide-43
SLIDE 43

Binary searching & introduction to trees

43 CMPS 12B, UC Santa Cruz

Splitting a 4-node

M S L

b c d e

M S L

b c d e

P

a

P

a

slide-44
SLIDE 44

Binary searching & introduction to trees

44 CMPS 12B, UC Santa Cruz

Splitting a 4-node

M S L

a b c d

P

e

Q

f

M S L

a b c d

P

e

Q

f

slide-45
SLIDE 45

Binary searching & introduction to trees

45 CMPS 12B, UC Santa Cruz

Splitting a 4-node

M S L

a b c d

P

e

Q

f

M S L

a b c d

P

e

Q

f

slide-46
SLIDE 46

Binary searching & introduction to trees

46 CMPS 12B, UC Santa Cruz

Splitting a 4-node

M S L

b c d e

Q

f

P

a

P L

d e

M

a

Q

f

S

b c

slide-47
SLIDE 47

Binary searching & introduction to trees

47 CMPS 12B, UC Santa Cruz

AVL Trees

Insert in the appropriate spot Rotate as necessary to restore balance

Rotate your way back up the tree

Every operation is O(logn)

slide-48
SLIDE 48

Binary searching & introduction to trees

48 CMPS 12B, UC Santa Cruz

Hashing

Trees allow for efficient table operations, but What if you want O(1) behavior? What if time is much more critical than space? Basic idea:

Same for insert, lookup, delete

Hash Function Search Key Number in 0..n-1 Array

index

“Hash Table”

slide-49
SLIDE 49

Binary searching & introduction to trees

49 CMPS 12B, UC Santa Cruz

Issues

How to create a hash function

Easy or difficult, depending upon the desired properties

How large the array should be

Factors: how many items will be stored, how much

memory you have, how fast you want the operations to be

Static or dynamic Hash collisions: what if two input values produce the

same hash value?

slide-50
SLIDE 50

Binary searching & introduction to trees

50 CMPS 12B, UC Santa Cruz

Hash Functions

Goals

Fast, easy to compute Distributes hash values evenly in the target range

Possible functions for an n-entry hash table

h(k) = random number in 0..n-1 h(k) = the sum of the digits in k h(k) = the first m digits of k, where m = log10(n) h(k) = k×mod(n)

Are these good or bad? Why?

slide-51
SLIDE 51

Binary searching & introduction to trees

51 CMPS 12B, UC Santa Cruz

Hash Functions on Strings

Need to convert string to a number Possible solutions:

Add the binary representations of the characters Concatenate the binary representations to get a very large

number

Hello = H×324 + e×323 + l×322 + l×32 + o

Horner’s rule: (((H×32 + e) × 32 + l) × 32 + l) × 32 + o This is a very big number, so (A × B) mod n = (A mod n × B mod n) mod n (A + B) mod n = (A mod n + B mod n) mod n Apply the modulo operator early and often to keep the

number small

slide-52
SLIDE 52

Binary searching & introduction to trees

52 CMPS 12B, UC Santa Cruz

Dealing with Hash Collisions

Open addressing: Collision ⇒ try to place the object

in other locations in a predictable sequence

Linear probing: search starting from the current location

Issues: wrapping, slow, empty/deleted entries, clustering

Quadratic probing: search +1,4,9,16,25modn Double hashing: Use second hash function to determine

the size of the steps:

h2(k) ≠ 0 h2 ≠ h1 Example: h1(k) = k mod 11, h2(k) = 7 – (k mod 7)

Note: size of steps and size of table must be relatively

prime so that all entries get visited

Use prime size, step, etc.

slide-53
SLIDE 53

Binary searching & introduction to trees

53 CMPS 12B, UC Santa Cruz

Increase The Size of the Hash Table

Increasing the actual size is infeasible

Everything must be rehashed and moved

Chain-bucket hashing

Each entry in a hash table is a bucket into which multiple

values may be added

Each bucket can be implemented as a chain, or linked list

Now the size of the hash table is variable

slide-54
SLIDE 54

Binary searching & introduction to trees

54 CMPS 12B, UC Santa Cruz

Efficiency of Hashing

Load factor α is a measure of how full the table is

Small α ⇒ little chance of collision and low search time Large α ⇒ high chance of collision and high search time

Unsuccessful searches require more time than

successful searches

table

  • f

Size items

  • f

Number _ _ _ _ = α

slide-55
SLIDE 55

Binary searching & introduction to trees

55 CMPS 12B, UC Santa Cruz

Linear Probing

Successful search Unsuccessful search

      − + α 1 1 1 2 1

( ) 

     − +

2

1 1 1 2 1 α

slide-56
SLIDE 56

Binary searching & introduction to trees

56 CMPS 12B, UC Santa Cruz

Quadratic Probing and Double Hashing

Successful search Unsuccessful search

( )

α α − − 1 loge

α − 1 1

slide-57
SLIDE 57

Binary searching & introduction to trees

57 CMPS 12B, UC Santa Cruz

Chain-Bucket Hashing

Successful search Unsuccessful search

2 1 α +

α

slide-58
SLIDE 58

Binary searching & introduction to trees

58 CMPS 12B, UC Santa Cruz

How well does a hash function work?

How fast is it to compute? How well does it scatter random data? How well does it scatter non-random data?

This can be very important It is always possible to construct a worst case

General principles:

The hash function should involve the whole search key If a hash function involves a modulo operation, the base

should be prime

slide-59
SLIDE 59

Binary searching & introduction to trees

59 CMPS 12B, UC Santa Cruz

Hashing vs. Binary Trees

Hashing supports efficient insert and remove Hashing does not support efficient sorting Hashing does not support efficient range queries