CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ Searching Review: Searching - - PowerPoint PPT Presentation

cs 1501
SMART_READER_LITE
LIVE PREVIEW

CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ Searching Review: Searching - - PowerPoint PPT Presentation

CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ Searching Review: Searching through a collection Given a collection of keys C , how to we search for a given key k ? Store collection in an array Unsorted Sorted Linked list


slide-1
SLIDE 1

CS 1501

www.cs.pitt.edu/~nlf4/cs1501/

Searching

slide-2
SLIDE 2
  • Given a collection of keys C, how to we search for a given

key k?

○ Store collection in an array ■ Unsorted ■ Sorted ○ Linked list ■ Unsorted ■ Sorted ○ Binary search tree

  • Differences?
  • Runtimes?

Review: Searching through a collection

2

slide-3
SLIDE 3
  • Abstract structures that link keys to values

○ Key is used to search the data structure for a value ○ Described as a class in the text, but probably more accurate to think of the concept of a symbol table in general as an interface ■ Key functions:

  • put()
  • contains()

Symbol tables

3

slide-4
SLIDE 4
  • BinarySearchST.java and BST.java present symbol tables

based on sorted arrays and binary search trees, respectively

  • Can we do better than these?
  • Both methods depend on comparisons against other keys

○ I.e., k is compared against other keys in the data structure

  • 4 options at each node in a BST:

○ Node ref is null, k not found ○ k is equal to the current node's key, k is found ○ k is less than current key, continue to left child ○ k is greater than the current key, continue to right child

A closer look

4

slide-5
SLIDE 5
  • Instead of looking at less than/greater than, lets go left right

based on the bits of the key, so we again have 4 options:

○ Node ref is null, k not found ○ k is equal to the current node's key, k is found ○ current bit of k is 0, continue to left child ○ current bit of k is 1, continue to right child

Digital Search Trees (DSTs)

5

slide-6
SLIDE 6

Insert:

DST example

4 3 2 6 5 Search: 3 7 0100 0011 0010 0110 0101 0011 0111 4

1

3

1

6

1

2

1

5

1

6

slide-7
SLIDE 7
  • Runtime?
  • We end up doing many comparisons against the full key,

can we improve on this?

Analysis of digital search trees

7

slide-8
SLIDE 8
  • Trie as in retrieve, pronounced the same as “try”
  • Instead of storing keys as nodes in the tree, we store them

implicitly as paths down the tree

○ Interior nodes of the tree only serve to direct us according to the bitstring of the key ○ Values can then be stored at the end of key’s bit string path

Radix search tries (RSTs)

8

slide-9
SLIDE 9

Insert:

RST example

4 3 2 6 5 Search: 3 7 0100 0011 0010 0110 0101 0011 0111 V V V

1

V

1 1 1

V

1 1 1

9

slide-10
SLIDE 10
  • Runtime?
  • Would this structure work as well for other key data types?

○ Characters? ○ Strings?

RST analysis

10

slide-11
SLIDE 11
  • In our binary-based Radix search trie, we considered one bit

at a time

  • What if we applied the same method to characters in a

string?

○ What would like this new structure look like?

  • Let’s try inserting the following strings into an trie:

○ she, sells, sea, shells, by, the, sea, shore

Larger branching factor tries

11

slide-12
SLIDE 12

Another trie example

s h e l l s b y t h e e a l l s

  • r

e

12

slide-13
SLIDE 13
  • See TrieSt.java

○ Implements an R-way trie

  • Basic node object:

private static class Node { private Object val; private Node[] next = new Node[R]; }

Implementation Concerns

Where R is the branching factor

  • Non-null val means we have traversed to a valid key
  • Again, note that keys are not directly stored in the trie at all

13

slide-14
SLIDE 14

R-way trie example

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Val:

Next A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Val:

Next A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Val:

Next A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Val:

Next A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Val:

Next A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Val:

Next

1

14

slide-15
SLIDE 15
  • Runtime?

Analysis

15

slide-16
SLIDE 16
  • Miss times

○ Require an average of logR(n) nodes to be examined ■ Where R is the size of the alphabet being considered ■ Proof in Proposition H of Section 5.2 of the text ○ Average # of checks with 220 keys in an RST? ○ With 220 keys in a large branching factor trie, assuming 8-bits at a time?

Further analysis

16

slide-17
SLIDE 17
  • Space!

○ Considering 8-bit ASCII, each node contains 28 references! ○ This is especially problematic as in many cases, alot of this space is wasted ■ Common paths or prefixes for example, e.g., if all keys begin with “key”, thats 255*3 wasted references! ■ At the lower levels of the trie, most keys have probably been separated out and reference lists will be sparse

So what’s the catch?

17

slide-18
SLIDE 18
  • Replace the .next array of the R-way trie with a linked-list

De La Briandais tries (DLBs)

18

slide-19
SLIDE 19

DLB trie example

19

S

Val:

Next E H

Val:

Next E

Val:

Next

Val:

Next A

Val:

Next

Val:

Next

1

slide-20
SLIDE 20

Another DLB Example

S H E ^ E L L S ^ A ^ L L S ^ B Y ^ H E ^ T

20

slide-21
SLIDE 21
  • How does DLB performance differ from R-way tries?
  • Which should you use?

DLB analysis

21

slide-22
SLIDE 22
  • So far we’ve continually assumed each search would only

look for the presence of a whole key

  • What about if we wanted to know if our search term was a

prefix to a valid key?

Searching

22

slide-23
SLIDE 23
  • This lecture does not present an exhaustive look at search

trees/tries, just the sampling that we’re going to focus on

  • Many variations on these techniques exist and perform

quite well in different circumstances

○ Red/black BSTs ○ Ternary search Tries ○ R-way tries without 1-way branching

  • See the table at the end of Section 5.2 of the text

Final notes

23