CPSC 221: Data Structures Dictionary ADT Binary Search Trees Alan - - PowerPoint PPT Presentation

cpsc 221 data structures dictionary adt binary search
SMART_READER_LITE
LIVE PREVIEW

CPSC 221: Data Structures Dictionary ADT Binary Search Trees Alan - - PowerPoint PPT Presentation

CPSC 221: Data Structures Dictionary ADT Binary Search Trees Alan J. Hu (Using Steve Wolfmans Slides) Learning Goals After this unit, you should be able to... Determine if a given tree is an instance of a particular type (e.g. binary


slide-1
SLIDE 1

CPSC 221: Data Structures Dictionary ADT Binary Search Trees

Alan J. Hu (Using Steve Wolfman’s Slides)

slide-2
SLIDE 2

Learning Goals

After this unit, you should be able to...

  • Determine if a given tree is an instance of a particular type

(e.g. binary search tree, heap, etc.)

  • Describe and use pre-, in- and post-order traversal

algorithms

  • Describe the properties of binary trees, binary search trees,

and more general trees; Implement iterative and recursive algorithms for navigating them in C++

  • Compare and contrast ordered versus unordered trees in

terms of complexity and scope of application

  • Insert and delete elements from a binary tree
slide-3
SLIDE 3

Today’s Outline

  • Binary Trees
  • Dictionary ADT
  • Binary Search Trees
  • Deletion
  • Some troubling questions
slide-4
SLIDE 4

Binary Trees

  • Binary tree is

– an empty tree (NULL, in our case) – or, a root node with two subtrees

  • Properties

– max # of leaves: – max # of nodes:

  • Representation:

A B D E C F H G J I Data

right pointer left pointer

recursive definition!

slide-5
SLIDE 5

Binary Trees

  • Binary tree is

– an empty tree (NULL, in our case) – or, a root node with two subtrees

  • Properties

– max # of leaves: 2h – max # of nodes: 2h+1-1

  • Representation:

A B D E C F H G J I Data

right pointer left pointer

recursive definition!

slide-6
SLIDE 6

Representation

A

right pointer left pointer

A B D E C F B

right pointer left pointer

C

right pointer left pointer

D

right pointer left pointer

E

right pointer left pointer

F

right pointer left pointer

struct Node { KTYPE key; DTYPE data; Node * left; Node * right; };

slide-7
SLIDE 7

Today’s Outline

  • Binary Trees
  • Dictionary ADT
  • Binary Search Trees
  • Deletion
  • Some troubling questions
slide-8
SLIDE 8

What We Can Do So Far

  • Stack

– Push – Pop

  • Queue

– Enqueue – Dequeue

What’s wrong with Lists?

  • List

– Insert – Remove – Find

  • Priority Queue

– Insert – DeleteMin

slide-9
SLIDE 9

Dictionary ADT

  • Dictionary operations

– create – destroy – insert – find – delete

  • Stores values associated with user-specified keys

– values may be any (homogenous) type – keys may be any (homogenous) comparable type

  • midterm

– would be tastier with brownies

  • prog-project

– so painful… who invented templates?

  • wolf

– the perfect mix of oomph and Scrabble value

insert find(wolf)

  • brownies
  • tasty
  • wolf
  • the perfect mix of oomph

and Scrabble value

slide-10
SLIDE 10

Search/Set ADT

  • Dictionary operations

– create – destroy – insert – find – delete

  • Stores keys

– keys may be any (homogenous) comparable – quickly tests for membership

  • Berner
  • Whippet
  • Alsatian
  • Sarplaninac
  • Beardie
  • Sarloos
  • Malamute
  • Poodle

insert find(Wolf)

  • Min Pin

NOT FOUND

slide-11
SLIDE 11

A Modest Few Uses

  • Arrays and “Associative” Arrays
  • Sets
  • Dictionaries
  • Router tables
  • Page tables
  • Symbol tables
  • C++ Structures
  • Python’s __dict__ that stores fields/methods
slide-12
SLIDE 12

Desiderata

  • Fast insertion

– runtime:

  • Fast searching

– runtime:

  • Fast deletion

– runtime:

slide-13
SLIDE 13

Naïve Implementations

  • Linked list
  • Unsorted array
  • Sorted array

insert delete find

slide-14
SLIDE 14

Naïve Implementations

  • Linked list
  • Unsorted array
  • Sorted array

insert delete find so close!

slide-15
SLIDE 15

Today’s Outline

  • Binary Trees
  • Dictionary ADT
  • Binary Search Trees
  • Deletion
  • Some troubling questions
slide-16
SLIDE 16

Binary Search Tree Dictionary Data Structure

4 12 10 6 2 11 5 8 14 13 7 9

  • Binary tree property

– each node has  2 children – result:

  • storage is small
  • operations are simple
  • average depth is small
  • Search tree property

– all keys in left subtree smaller than root’s key – all keys in right subtree larger than root’s key – result:

  • easy to find any given key
slide-17
SLIDE 17

Example and Counter-Example

3 11 7 1 8 4 5 4 18 10 6 2 11 5 8 20 21 BINARY SEARCH TREE NOT A BINARY SEARCH TREE 7 15

slide-18
SLIDE 18

In Order Listing

20 9 2 15 5 10 30 7 17 In order listing: 25791015172030

struct Node { // constructors omitted KTYPE key; DTYPE data; Node *left, *right; };

slide-19
SLIDE 19

Aside: Traversals

  • Pre-Order Traversal: Process the data at the node

first, then process left child, then process right child.

  • Post-Order Traversal: Process left child, then

process right child, then process data at the node.

  • In-Order Traversal: Process left child, then

process data at the node, then process right child. Code?

19

slide-20
SLIDE 20

Aside: Traversals

  • Pre-Order Traversal: Process the data at the node

first, then process left child, then process right child.

  • Post-Order Traversal: Process left child, then

process right child, then process data at the node.

  • In-Order Traversal: Process left child, then

process data at the node, then process right child. Who cares? These are the most common ways in which code processes trees.

20

slide-21
SLIDE 21

Finding a Node

Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) return find(key, root->right); else return root; }

20 9 2 15 5 10 30 7 17 runtime: a. O(1)

  • b. O(lg n)

c. O(n)

  • d. O(n lg n)

e. None of these

slide-22
SLIDE 22

Finding a Node

Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) return find(key, root->right); else return root; }

20 9 2 15 5 10 30 7 17

WARNING: Much fancy footwork with refs (&) coming. You can do all of this without refs... just watch out for special cases.

slide-23
SLIDE 23

Iterative Find

Node * find(Comparable key, Node * root) { while (root != NULL && root->key != key) { if (key < root->key) root = root->left; else root = root->right; } return root; }

Look familiar? 20 9 2 15 5 10 30 7 17 (It’s trickier to get the ref return to work here. We won’t worry.)

slide-24
SLIDE 24

Insert

20 9 2 15 5 10 30 7 17 runtime:

void insert(Comparable key, Node *& root) { Node *& target(find(key, root)); assert(target == NULL); target = new Node(key); }

Funky game we can play with the *& version.

slide-25
SLIDE 25

Reminder: Value vs. Reference Parameters

  • Value parameters (Object foo)

– copies parameter – no side effects

  • Reference parameters (Object & foo)

– shares parameter – can affect actual value – use when the value needs to be changed

  • Const reference parameters (const Object & foo)

– shares parameter – cannot affect actual value – use when the value is too intricate for pass-by-value

slide-26
SLIDE 26

BuildTree for BSTs

  • Suppose the data 1, 2, 3, 4, 5, 6, 7, 8, 9 is inserted

into an initially empty BST:

– in order – in reverse order – median first, then left median, right median, etc.

slide-27
SLIDE 27

Analysis of BuildTree

  • Worst case: O(n2) as we’ve seen
  • Average case assuming all orderings equally likely

turns out to be O(n lg n).

slide-28
SLIDE 28

Bonus: FindMin/FindMax

  • Find minimum
  • Find maximum

20 9 2 15 5 10 30 7 17

slide-29
SLIDE 29

Double Bonus: Successor

Find the next larger node in this node’s subtree.

// Note: If no succ, returns (a useful) NULL.

Node *& succ(Node *& root) { if (root->right == NULL) return root->right; else return min(root->right); } Node *& min(Node *& root) { if (root->left == NULL) return root; else return min(root->left); }

20 9 2 15 5 10 30 7 17

slide-30
SLIDE 30

More Double Bonus: Predecessor

Find the next smaller node in this node’s subtree.

Node *& pred(Node *& root) { if (root->left == NULL) return root->left; else return max(root->left); } Node *& max(Node *& root) { if (root->right == NULL) return root; else return max(root->right); }

20 9 2 15 5 10 30 7 17

slide-31
SLIDE 31

Today’s Outline

  • Some Tree Review

(here for reference, not discussed)

  • Binary Trees
  • Dictionary ADT
  • Binary Search Trees
  • Deletion
  • Some troubling questions
slide-32
SLIDE 32

Deletion

20 9 2 15 5 10 30 7 17 Why might deletion be harder than insertion?

slide-33
SLIDE 33

Lazy Deletion (“Tombstones”)

  • Instead of physically deleting

nodes, just mark them as deleted

+ simpler + physical deletions done in batches + some adds just flip deleted flag – extra memory for “tombstone” – many lazy deletions slow finds – some operations may have to be modified (e.g., min and max) 20 9 2 15 5 10 30 7 17

slide-34
SLIDE 34

Lazy Deletion

20 9 2 15 5 10 30 7 17 Delete(17) Delete(15) Delete(5) Find(9) Find(16) Insert(5) Find(17)

slide-35
SLIDE 35

Real Deletion - Leaf Case

20 9 2 15 5 10 30 7 17 Delete(17)

slide-36
SLIDE 36

Real Deletion - One Child Case

20 9 2 15 5 10 30 7 Delete(15)

slide-37
SLIDE 37

Real Deletion - Two Child Case

30 9 2 20 5 10 7 Delete(5)

slide-38
SLIDE 38

Finally…

30 9 2 20 7 10

slide-39
SLIDE 39

Delete Code

void delete(Comparable key, Node *& root) { Node *& handle(find(key, root)); Node * toDelete = handle; if (handle != NULL) { if (handle->left == NULL) { // Leaf or one child handle = handle->right; } else if (handle->right == NULL) { // One child handle = handle->left; } else { // Two child case Node *& successor(succ(handle)); handle->data = successor->data; toDelete = successor; successor = successor->right; // Succ has <= 1 child } } delete toDelete; }

Refs make this short and “elegant”… but could be done without them with a bit more work.

slide-40
SLIDE 40

Today’s Outline

  • Binary Trees
  • Dictionary ADT
  • Binary Search Trees
  • Deletion
  • Some troubling questions
slide-41
SLIDE 41

Thinking about Binary Search Trees

  • Observations

– Each operation views two new elements at a time – Elements (even siblings) may be scattered in memory – Binary search trees are fast if they’re shallow

  • Realities

– For large data sets, disk accesses dominate runtime – Some deep and some shallow BSTs exist for any data One more piece of bad news: what happens to a balanced tree after many insertions/deletions?

slide-42
SLIDE 42

Solutions?

  • Reduce disk accesses?
  • Keep BSTs shallow?
slide-43
SLIDE 43

Coming Up

  • Self-balancing Binary Search Trees
  • Huge Search Tree Data Structure