CE 221 Data Structures and Algorithms Chapter 4: Trees (BST) Text: - - PowerPoint PPT Presentation

ce 221
SMART_READER_LITE
LIVE PREVIEW

CE 221 Data Structures and Algorithms Chapter 4: Trees (BST) Text: - - PowerPoint PPT Presentation

CE 221 Data Structures and Algorithms Chapter 4: Trees (BST) Text: Read Weiss, 4.3 Izmir University of Economics 1 The Search Tree ADT Binary Search Trees An important application of binary trees is in searching. Let us assume that


slide-1
SLIDE 1

CE 221 Data Structures and Algorithms

Chapter 4: Trees (BST)

Text: Read Weiss, §4.3

1 Izmir University of Economics

slide-2
SLIDE 2

The Search Tree ADT – Binary Search Trees

2 Izmir University of Economics

  • An important application of binary trees is in searching. Let us

assume that each node in the tree stores an item. Assume for simplicity that these are distinct integers (deal with duplicates later).

  • The property that makes a binary tree into a binary search

tree is that for every node, X, in the tree, the values of all the items in the left subtree are smaller than the item in X, and the values of items in the right subtree are larger than the item in X.

The tree on the left is a binary search tree, but the tree on the right is not. The tree on the right has a node with key 7 in the left subtree of a node with key 6 (which happens to be the root).

slide-3
SLIDE 3

Izmir University of Economics 3

Binary Search Trees - Operations

 Descriptions and implementations of the

  • perations that are usually performed on binary

search trees (BST) are given.  Note that because of the recursive definition

  • f trees, it is common to write these routines
  • recursively. Because the average depth of a

binary search tree is O(log N), we generally do not need to worry about running out of stack space.  Since all the elements can be ordered, we will assume that the operators <, >, and = can be applied to them.

slide-4
SLIDE 4

BST – Implementation - I

Izmir University of Economics 4

slide-5
SLIDE 5

BST – Implementation - II

Izmir University of Economics 5 Previous slide

slide-6
SLIDE 6

BST – Implementation - III

Izmir University of Economics 6

...

46 }

slide-7
SLIDE 7

Izmir University of Economics 7

  • contains returns true if element x is in the BST referenced by t, or

false if there is no such node. The structure of the tree makes this simple. If t is NULL , then we can just return . Otherwise, we make a recursive call on either the left or the right subtree of the node referenced by t.

BST – Implementation - IV

slide-8
SLIDE 8

Izmir University of Economics 8

BST – Implementation - V

  • To perform a findMin, start at the root and go

left as long as there is a left child. The stopping point is the smallest element.

  • The findMax routine is the same, except that

branching is to the right child.

  • Notice that the degenerate case of an empty tree

is carefully handled.

  • Also notice that it is safe to change t in

findMax, since we are only working with a

  • copy. Always be extremely careful, however,

because a statement such as t.right=t.right.right will make changes.

slide-9
SLIDE 9

Izmir University of Economics 9

BST – Implementation – Insertion I

The insertion routine is conceptually simple. To insert x into tree t, proceed down the tree as you would with a contains. If x is found, do nothing (or "update" something). Otherwise, insert x at the last spot on the path traversed. Duplicates can be handled by keeping an extra field in the node indicating the frequency of occurrence. If the key is only part of a larger record, then all of the records with the same key might be kept in an auxiliary data structure, such as a list or another search tree.

Insert node 5

slide-10
SLIDE 10

Izmir University of Economics 10

BST – Implementation – Insertion II

slide-11
SLIDE 11

Izmir University of Economics 11

BST – Implementation – Deletion I

  • Once we have found the node to be deleted, we need to

consider 3 possibilities. (1) If the node is a leaf, it can be deleted immediately. (2) If the node has one child, the node can be deleted after its parent adjusts a pointer to bypass the node. Notice that the deleted node is now unreferenced and can be disposed of

  • nly if a pointer to it has been saved.

Delete node 4

slide-12
SLIDE 12

Izmir University of Economics 12

BST – Implementation – Deletion II

(3) The complicated case deals with a node with two

  • children. The general strategy is to replace the key of this

node with the smallest key of the right subtree (easy) and recursively delete that node (which is now empty). Because the smallest node in the right subtree cannot have a left child, the second delete is an easy one.

Delete node 2

slide-13
SLIDE 13

13 Izmir University of Economics

BST – Implementation – Deletion III

Inefficient, since calls highlighted in yellow result in two passes down the tree to find and delete the smallest node in the right subtree.

slide-14
SLIDE 14
  • We can use stacks to convert an

expression in standart form (otherwise known as infix) into postfix.

  • Example: operators = {+, *, (, )}, usual

precedence rules; a + b * c + (d * e + f) * g Answer = a b c * + d e * f + g * +

Izmir University of Economics 14

BST – Implementation – Deletion IV

...

else if (t.left != null && t.right != null){ BinaryNode<AnyType> tmp,prev; /* declare references */ tmp = t.right; /* point to smallest in the right */ prev = t.right; /* point to parent of tmp */ while (tmp.left != null){ /* find smallest of right */ prev = tmp; tmp = tmp.left; } t.element = tmp.element; /* replace with smallest */ if (tmp == prev) /* t.right is smallest */ t.right = tmp.right; /* skip over tmp */ else /* connect left of prev to right of tmp */ prev.left = tmp.right; } ... Efficient Version

slide-15
SLIDE 15

BST – Implementation – Lazy Deletion

  • If the number of deletions is small, then a popular

strategy to use is lazy deletion: When an element is to be deleted, it is left in the tree and merely marked as deleted. This is especially popular if duplicates are present, because then the field that keeps count of the items can be decremented.

  • If the number of real nodes is the same as the

number of "deleted" nodes, then the depth of the tree is only expected to go up by a small constant (why?), so there is a very small time penalty associated with lazy deletion. Also, if an item is reinserted, the

  • verhead of allocating a new cell is avoided.

Izmir University of Economics 15

slide-16
SLIDE 16

Average-Case Analysis - I

Izmir University of Economics 16

  • All of the operations of BST, except MakeEmpty, take O(d)

time where d is the depth of the node containing the accessed

  • key. As a result, they are O (depth of tree).
  • Why? Because in constant time we descend a level in the

tree, thus operating on a tree that is now roughly half as large.

  • MakeEmpty take O(N) time.
  • Observation: The average depth over all nodes in a BST is

O(log N) assuming all insertion sequences are equally likely.

  • Proof: The sum of the depths of all nodes in a tree is the

internal path length. Let’s calculate the average internal path length over all possible insertion sequences.

slide-17
SLIDE 17

17 Izmir University of Economics

Average-Case Analysis - II

  • Let D(N) be the internal path length for some BST T of N
  • nodes. D(1) = 0.
  • D(N) = D(i) + D(N-i-1) + N -1 // Subtree nodes are 1 level deeper
  • All subtree sizes are equally likely for BSTs, since it depends
  • nly on the rank of the first element inserted into BST. This

does not hold for binary trees though. Let’s, then, average: 1 ) ( ) / 2 ( ) ( ), 1 ) 1 ( ) ( ( ) / 1 ( ) (

1 1

                 

 

   

N i D N N D N i i N i N D i D N N D

N i N i

  • If the recurrence is solved, D(N) = O(N log N). Thus, the

expected depth of any node is O(log N).

slide-18
SLIDE 18

Derivation of D(N) - 1

Izmir University of Economics 18

1 ) ( ) / 2 ( ) ( ), 1 ) 1 ( ) ( ( ) / 1 ( ) (

1 1

                 

 

   

N i D N N D N i i N i N D i D N N D

N i N i

) 1 ( 2 ) 1 ( 2 ) 1 ( ) 1 ( ) ( ) 2 )...( 2 )( 1 ( ) ( 2 ) 1 ( ) 1 ( ) 1 ..( )......... 1 ( ) ( 2 ) (

2 1

                           

 

   

N N D N D N N ND N N i D N D N N N i D N ND

N i N i

                        

N i

i i i D N N D D D N N N N N D N N D N N N N N D N N D N N D N N ND

2

) 1 ( 1 2 2 / ) 1 ( ) 1 /( ) ( 3 * 2 1 * 2 2 / ) 1 ( 3 / ) 2 ( ... ) 1 ( ) 2 ( 2 ) 1 /( ) 2 ( / ) 1 ( ) 1 ( ) 1 ( 2 / ) 1 ( ) 1 /( ) ( ) 1 ( 2 ) 1 ( ) 1 ( ) ( ...(sum the equations side by side) .....(divide by N(N+1)) .....(subtract (2) from (1))

slide-19
SLIDE 19

Izmir University of Economics 19

Derivation of D(N) - 2   

  

         

N i N i N i

i i i N N D i i i D N N D

2 2 2

) 1 ( 1 2 1 1 2 ) 1 /( ) ( ) 1 ( 1 2 2 / ) 1 ( ) 1 /( ) (

         

N i

i i N N N N D

2

) 1 1 1 ( 2 )) 1 /( 1 / 1 ... 4 / 1 3 / 1 ( 2 ) 1 /( ) ( ) 1 /( 2 1 ) 1 /( 2 ) 2 / 3 (log 2 ) 1 /( ) ( )) 1 /( 1 2 / 1 ( 2 )) 1 /( 1 ) 2 / 3 ((log 2 ) 1 /( ) ( )) 1 /( 1 2 / 1 ( 2 )) 1 /( 1 ) / 1 ... 4 / 1 3 / 1 (( 2 ) 1 /( ) (                             N N N N N D N N N N N D N N N N N D

e e

  4 ) 1 ( 2 ) 1 ( 4 log ) 1 ( 2 ) ( ) 1 /( 4 2 4 log 2 ) 1 /( ) (              N N N N N D N N N N D

e e

  ) log ( ) ( ) log log ( ) ( ) log ( ) ( 4 ) 1 ( 2 ) 1 ( 4 log ) 1 ( 2 ) (

2 2

N N O N D e N N O N D N N O N D N N N N N D

e e

          

slide-20
SLIDE 20

20 Izmir University of Economics

Average-Case Analysis - III

  • As an example, the randomly generated 500 node BST has

nodes at expected depth 9.98.

slide-21
SLIDE 21

Izmir University of Economics 21

Average-Case Analysis - IV

) ( N 

After a quarter-million random insert/remove pairs, right-heavy tree

  • n the previous slide, looks decidedly

unbalanced and average depth becomes 12.51.

  • Deletion algorithm described favors making left subtrees

deeper than the right (a deleted node is replaced with a node from the right). The exact effect of this still unknown, but if insertions and deletions are alternated Ɵ(N2) times, expected depth is .In the absence of deletions or when lazy deletion is used; average running times for BST

  • perations are O(log N).
slide-22
SLIDE 22

Homework Assignments

  • 4.9.a, 4.9b, 4.16, 4.37, 4.48
  • You are requested to study and solve the
  • exercises. Note that these are for you to

practice only. You are not to deliver the results to me.

Izmir University of Economics 22