COMP 3403 Algorithm Analysis Part 3 Chapters 6 7 Jim Diamond - - PowerPoint PPT Presentation

comp 3403 algorithm analysis part 3 chapters 6 7
SMART_READER_LITE
LIVE PREVIEW

COMP 3403 Algorithm Analysis Part 3 Chapters 6 7 Jim Diamond - - PowerPoint PPT Presentation

COMP 3403 Algorithm Analysis Part 3 Chapters 6 7 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Chapter 6 Transform-and-Conquer Jim Diamond, Jodrey School of Computer Science, Acadia University Chapter 6


slide-1
SLIDE 1

COMP 3403 — Algorithm Analysis Part 3 — Chapters 6 – 7

Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University

slide-2
SLIDE 2

Chapter 6

Transform-and-Conquer

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-3
SLIDE 3

Chapter 6 95

Transform and Conquer

  • Idea: somehow transform the given instance, or any instance of a

problem, to something simpler or something already solved

  • Three major variations:

– instance simplification: transform a problem instance to a simpler

  • r more convenient instance of the same problem

– representation change: transform a problem instance to a different representation of the same instance – problem reduction: transform a problem instance to an instance of a different problem, for which a solution technique is already known

  • The concept of problem reduction is well-known in various areas of

mathematics, and figures heavily in the study of NP-completeness –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-4
SLIDE 4

Chapter 6 96

Presorting

  • Many problems involving lists are easier when the list is sorted

– – computing the median (selection problem) – checking if all elements are distinct (element uniqueness)

  • Also:

– topological sorting helps to solve some problems on dags – presorting is used in many geometric algorithms

  • Note: if sorting is more expensive than another solution to the original

problem, it makes little or no sense to do this transformation

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-5
SLIDE 5

Chapter 6 97

Presorting Example: Checking For Uniqueness

  • Suppose you have a list of numbers and want to confirm that all

numbers are unique – – this is Θ(n2)

GEQ: What is EXACT answer?

  • Instead, consider presorting the list

  • Time complexity?

Tunique(n) = Tsort(n) + Tscan(n) ∈ Θ(n lg n) + Θ(n) = Θ(n lg n)

  • Other possibilities?

– – linear time?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-6
SLIDE 6

Chapter 6 98

Searching

  • Searching is a ubiquitous problem in computer science
  • Example: use grep to find a matching string in a text file or text stream
  • Example: use a web search engine to find 345,678 matches to a query
  • Example: in a database use a command like

select <stuff> from <some table> where <some condition>;

to ask the database to search for some data

  • Q: how much time does grep take to search through an input of n

bytes? – – even if the search stops upon the first match, you still might need to process the entire file

  • While linear time is generally a good time complexity for a problem, in

the case of searching that may not be acceptable –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-7
SLIDE 7

Chapter 6 99

Binary Trees

  • A binary tree is a tree in which each node has at most two children

(the left child and the right child)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-8
SLIDE 8

Chapter 6 100

Binary Search Trees (also called “ordered binary trees”)

  • A binary search tree is a binary tree in which

– – all children in the left subtree of a given node N have values less than N’s value – all children in the right subtree of a given node N have values greater than N’s value

6 3 1 5 8 9

  • A binary search tree is said to be balanced if, for every pair of leaves,

the lengths of the paths from the root to these leaves are different by at most 1 –

  • Q: how can we efficiently build a balanced search tree?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-9
SLIDE 9

Chapter 6 101

Binary Search Trees: 2

  • If we use a straightforward technique for building a binary search tree,

some input sequences create “deep” trees – – 2 would be at the root, 3 its only child, 5 the only child of 3, and so on – with n items in this sequence, the maximum depth would be n − 1

  • Q: how many “probes” to find an item which is in the tree?

A: on average, about n/2

  • Q: how many “probes” to determine an item is not in the tree?

A: on average, about n/2

  • Not much better than an unsorted list!

  • Solution: balance the tree

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-10
SLIDE 10

Chapter 6 102

Balanced Search Trees

  • There are various ways of balancing a search tree

– AVL trees: binary trees where the subtree heights differ by at most 1 – if an addition or deletion unbalances the tree, some rotations are done to re-balance the tree – Red-black trees: binary trees where, at any node, the height of one subtree can be at most twice the height of the other

  • 2–3 trees: all leaves are at the same depth, but each internal node can

have either 2 or 3 children

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-11
SLIDE 11

Chapter 6 103

2–3 Tree Node Insertion

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-12
SLIDE 12

Chapter 6 104

2–3 Tree: Analysis

  • Q: how good is a 2–3 tree? (That is, how quickly can we search?)
  • A: it depends on the exact shape of the tree
  • Consider the case of an n-key tree of height h where all internal nodes

have degree 2; we must have

n = 1 + 2 + 4 + · · · + 2h = 2h+1 − 1

Therefore h = log2(n + 1) − 1

  • Now consider the case of an n-key tree of height h where all internal

nodes have degree 3; we must have

n = 2 + 6 + 18 + · · · + 2 · 3h = 2(1 + 3 + 9 + · · · + 3h) = 3h+1 − 1

Therefore h = log3(n + 1) − 1

  • These provide upper and lower bounds for any 2–3 tree with n keys, so

that

log3(n + 1) − 1 ≤ h ≤ log2(n + 1) − 1

  • Therefore insertion, deletion and searching are all Θ(log n) in both the

average case and the worst case

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-13
SLIDE 13

Chapter 6 105

Heaps

  • A priority queue is an abstract data type which supports the following
  • perations:

  • A heap is a data structure which implements a priority queue

– it is an implicit data structure — no pointers/links/etc are used

  • A heap is a balanced binary tree in which the deepest leaves are all to

the left of the other leaves

  • Note that the children of the node in array index n are in indices 2n and

2n + 1

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-14
SLIDE 14

Chapter 6 106

Heaps: 2

  • In a (max-)heap, the values stored in the children of a node N have

values less than that stored in N –

  • This is similar to a binary tree

– but in the case of a heap there is no specified relationship between the values in the two child nodes

  • Note that a left child can have a value greater or less than that of its

sibling

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-15
SLIDE 15

Chapter 6 107

Heap Properties (Max Heap)

  • For any given n, there is only 1 binary tree which has the right shape to

be a heap

  • The largest element in a heap is always at the root
  • For any node in a heap, that node with its left and right subtrees is also

a heap

  • To represent a heap in an array, just write down the elements top to

bottom, left to right

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-16
SLIDE 16

Chapter 6 108

Constructing a Heap (“Heapify”)

  • Insert the elements in the array (or the binary tree) in the order they are

received

  • Then starting at the bottom level of non-leaves, if the parent’s value is

not larger than both children, swap the parent with the largest child –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-17
SLIDE 17

Chapter 6 109

Heapify Pseudo-Code

HeapBottomUp(H[1..n]) // Construct a max-heap from the elements of the given array // using the bottom-up algorithm. // Input: an array H[1..n] of orderable items // Output: a max-heap H[1..n] for i = n/2 downto 1 do // i is root of current heap k = i // index of sub-tree being checked v = H[k] // save this node’s value heap = false while not heap and 2 * k < n do j = 2 * k if j < n // if there are two children if H[j] < H[j + 1] // pick the larger value j++ if v >= H[j] // is this node’s value bigger? heap = true // if so we are done else H[k] = H[j] // else "bubble" child up k = j H[k] = v // put the saved value in the empty slot

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-18
SLIDE 18

Chapter 6 110

Efficiency of Heapify

  • Assume the tree is full; i.e., n = 2k − 1 for some k

– define h = ⌊lg n⌋ to be the height of the tree – in this case, h = k − 1

  • In the worst case, every time we examine an internal node it will need to

be moved to a leaf –

  • Thus a value at level i in the tree will be compared to 2(h − i) other

values (note: the root is at level 0)

  • The total number of comparisons is therefore

Cworst(n) =

h−1

  • i=0
  • level

i keys

2(h − i) =

h−1

  • i=0

2(h − i)2i = 2

  • n − lg(n + 1)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-19
SLIDE 19

Chapter 6 111

Heapsort Using A Max-Heap

  • We can sort an array A of n items as follows:

– – for i = n to 2 do – swap A[1] with A[i] – – swap new root with largest child until A[1 .. i − 1] is a valid heap

  • A[1 .. n] is now sorted in increasing order

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-20
SLIDE 20

Chapter 6 112

Analysis of Heapsort

  • Let h = ⌊lg n⌋, where n is the number of items to be sorted
  • Stage 1: Build heap for a given list of n keys

worst-case:

C(n) =

h−1

  • i=0

2(h − i)2i = 2

  • n − lg(n + 1)
  • ∈ Θ(n)
  • Stage 2: Repeat operation of root removal n − 1 times (fix heap)

worst-case

C(n) =

n−1

  • i=1

2 lg i ∈ Θ(n lg n)

Both worst-case and average-case efficiency: Θ(n lg n)

  • In-place: yes
  • Stable: no (e.g., 1 1)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-21
SLIDE 21

Chapter 6 113

Using a Heap as a Priority Queue

  • The “find max” operation can be done in constant time
  • The heapsort algorithm uses the “delete max” operation, whose cost is

Θ(lg n) for a heap of n elements

  • The “insert” operation can be done by inserting the new element in the

first array location following the heap, and then (repeatedly) comparing it with its parent, until the heap condition is satisfied –

  • Other data structures can implement a priority queue

– – e.g., invent a data structure which allows insertion in O(1) time, but requires O(n) time for “find max” or to “delete max” – e.g., invent a data structure which implements “find max” and “delete max” in O(1) time, but requires O(n) time for insertion

GEQs

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-22
SLIDE 22

Chapter 6 114

Problem Reduction

  • This variation of transform-and-conquer solves a problem P by

transforming it into a different problem P ′ for which an algorithm is already available

  • To be of practical value, the combined time of the transformation and

solving P ′ should be smaller than solving P as given by another method

  • Example:

Suppose you want the least common multiple lcm(m, n) of two numbers – you could compute the prime factorization of the two numbers, compare the exponents, and figure out the lcm – e.g., lcm(24,18): 24 = 23 · 31, 18 = 21 · 32;

  • ne number (24) has three 2’s
  • ne number (18) has two 3’s

there are no other prime factors, so the LCM is 23 · 32 = 72 – – instead note that lcm(m, n) = m · n/gcd(m, n) and recall that we have a “nice” algorithm to compute gcd()

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-23
SLIDE 23

Chapter 6 115

Problem Reduction: Counting Paths in Graphs

  • Suppose we have a graph G and we wish to calculate the number of

walks (“paths”, according to the textbook author) of length k > 0 from

  • ne vertex to another
  • It can be shown that if A is the adjacency matrix for G, then the number
  • f length-k walks (“paths”) from vertex i to vertex j is given by Ak

i,j

  • We already know how to do matrix multiplication, so. . .
  • Q: is there a better algorithm?

Recall that the usual mat-mult algorithm is Θ(n3) – Q’: what about high powers of A? – A?: can we use fast powering algorithms on matrices?

GEQ!

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-24
SLIDE 24

Chapter 6 116

Linear Programming: 1

  • Linear programming is one of the more important ideas in “industrial

mathematics”; it is used in many, many disciplines

  • Example: suppose a farmer uses three feeds for his cows: F1, F2 and F3

Fi has unit cost Ci and gives nutrient amounts N 1

i , N 2 i , N 3 i and N 4 i

– the farmer wishes to feed his cows to meet their nutritional needs

Rj (1 ≤ j ≤ 4) at minimum cost

– this requires Ui units of each type of feed (Ui to be found) – he must solve the following problem: – minimize C =

3

  • i=1

Ui · Ci (the objective function) subject to N 1

1 · U1 + N 1 2 · U2 + N 1 3 · U3 ≥ R1

N 2

1 · U1 + N 2 2 · U2 + N 2 3 · U3 ≥ R2

N 3

1 · U1 + N 3 2 · U2 + N 3 3 · U3 ≥ R3

(the constraints)

N 4

1 · U1 + N 4 2 · U2 + N 4 3 · U3 ≥ R4

U1, U2, U3 ≥ 0

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-25
SLIDE 25

Chapter 6 117

Linear Programming: 2

  • These optimization systems are much more difficult to solve than

solving a linear system of equations

(Q: how do we do that?)

  • It turns out that the constraints define a region in n-dimensional space

  • This feasible region is the intersection of a finite number of half-spaces

– this means it is a convex region (and a convex polytope)

  • It can be shown that the optimum value of the objective function can

be found at a corner of the polytope

if the optimum solution is finite

– (a) finding the corners of an n-dimensional convex polytope, and (b) finding a corner with the optimum value

  • The simplex method was used since the 1940’s to solve such problems

– although the simplex method can, in principle, require exponential time, in practice it runs acceptably quickly

This made people curious!

major result from the 1980’s: there is a polynomial-time algorithm

to solve linear programming problems

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-26
SLIDE 26

Chapter 7

Space and Time Tradeoffs

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-27
SLIDE 27

Chapter 7 118

Space and Time Tradeoffs

  • Idea: in some situations, we can

– – use less space by using more time – the “less time for more space” is the one typically considered

  • Example: pre-compute and store some values which your program may

need many times – a library to implement sin(x) might precompute sin() for a few hundred values of x (0 ≤ x < 2π) and then use some fast algorithm to interpolate other values – note that

x − x3/3! + x5/5! − x7/7! + x9/9!

is a very good approximation of sin(x) in the range [−π, π]

  • Example 2: there is less coding effort involved in computing Fib(n) if

you had previously filled an array with Fib(0), Fib(1), Fib(2), Fib(3), . . .

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-28
SLIDE 28

Chapter 7 119

Space and Time Tradeoffs: 2

  • We consider two varieties of space-for-time algorithms:

– input enhancement — pre-process the input (or part of the input) to store some info to be used later in solving the problem – counting sorts – – pre-structuring — pre-process the input to make accessing its elements easier – hashing –

  • Counting sort: if your array values are restricted to some very small

range (say, 16 to 25) you could sort as follows: – – fill the first c1 array locations with 1, the next c2 array locations with 2, the next c3 array locations with 3, and so on

What is the time complexity of this algorithm?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-29
SLIDE 29

Chapter 7 120

Input Enhancement for String Searching

  • Recall: we look for a search string of length m in a text of length n
  • Also recall we looked at the brute force string searching algorithm;

  • Better algorithms: Knuth, Morris & Pratt (1977),

Boyer & Moore (1977), Horspool (1980) – – B-M algorithm pre-processes the search string right to left, storing information in two tables – Horspool’s algorithm simplifies the B-M algorithm by using just one table

  • B-M and Horspool’s algorithms compare the search string with the text

right to left – e.g., if the search string is ABC, first C is compared, then (if necessary) B and then (if necessary) A – if no match is found, the search string is then moved to the right (so the text is searched from left to right)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-30
SLIDE 30

Chapter 7 121

String Matching Ideas

  • Suppose you are looking for the pattern LOOK in some text, and you have

reached this point in the search (the lower case letters are “variables”):

...QWERTYabcd... LOOK

– assume ’

abcd’ != ’ LOOK’

(i.e., there is not a match)

  • Q: how far ahead should you move?

...QWERTYabcdefgh... LOOK

– Case 2: d is found in the pattern, somewhere other than last char: shift pattern so d is aligned with rightmost occurrence in pattern

e.g., d == ’O’

...QWERTYabcOe... LOOK

– why rightmost?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-31
SLIDE 31

Chapter 7 122

String Matching Ideas (continued)

...QWERTYabcd... LOOK

– Case 3: d is only found at the last char of pattern: similar to case 1: re-start the search 4 (==|LOOK|) chars to the right

...QWERTYabcKefgh... LOOK

– Case 4: d is found both at the end of the pattern and somewhere else in the pattern (consider the pattern OKOK for this case): shift pattern so d is aligned with the next rightmost occurrence in the pattern

...QWERTYabcKe... OKOK

  • Using these ideas allows us to move forward more than 1 char at a time

– – but only if we can efficiently figure out how far to move ahead – i.e., we don’t want to do length calculations every step

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-32
SLIDE 32

Chapter 7 123

Horspool’s Algorithm

  • Horspool’s algorithm is a simplified version of the Boyer-Moore alg:

– it pre-processes the pattern to generate a shift table t() that determines how much to shift the pattern when a mismatch occurs – always makes a shift based on the text’s character d aligned with the last character in the pattern according to the shift table’s entry for d (i.e., t(d))

  • Example: for LOOK, every table entry is 4, except for O, which is 1, and

L, which is 3.

– that is, if LOOK doesn’t match the current text, look at the character

d aligned with K, and shift the pattern t(d) characters to the right

  • The table can be computed in O(m) + O(|A|) time, where m is the

length of the pattern and A is the alphabet of all characters –

  • Summary: by pre-processing the input and using extra space (t()), we

can improve upon the brute force algorithm (sort of. . . this is still

O(nm) in worst case)

GEQ: show such a worst case.

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-33
SLIDE 33

Chapter 7 124

Yet More String Matching Ideas

  • In Horspool’s algorithm, if a mismatch is found at any character

position, the shift is based upon the character aligned with the rightmost pattern character:

...QWERTYabJK... OKOK

In this case Horspool’s algorithm will shift t(’

K’ ) = 2 characters

– but we can observe that the mismatched character (’

J’ in this

case) does not appear anywhere in the pattern, so we could actually shift more than 2 characters – this is t(’

J’ ) − k = 3 characters, where k is the number of

characters (starting at the right) which matched (1 in this case) – if 0 characters (starting at the right) match, the computation would be exactly the same as in Horspool’s algorithm

  • Q: is this modified strategy always better than Horspool’s?

A: no; for the above example, if the pattern was OZOK Horspool would shift the pattern 4 chars to the right – so is this an improvement in any situation?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-34
SLIDE 34

Chapter 7 125

Yet Yet More String Matching Ideas

  • Suppose the picture was like this, where e = B:

...abcdeER... ABERBER

In this case the length k = 2 suffix of the pattern matches the text –

  • If the pattern contains another occurrence of suff (k), we could shift the

pattern right until that occurrence aligns with where suff (k) is now:

...abcdeER... ABERBER

Note that since the second ER from the right is preceded by the same char as suff (k), this can’t possibly match either (we know e != B)

  • A more interesting possibility:

...abcdeER... ====> ...abcdeERfgh AZERBER ====> AZERBER

We can shift the pattern by the distance between suff (k) and the next rightmost occurrence of suff (k) not preceded by the same char

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-35
SLIDE 35

Chapter 7 126

Yet Yet More String Matching Ideas: 2

  • What if there is no other occurrence of suff (k) not preceded by the

same char as suff (k)?

...abcdeER... ====> ....abcdeERfghijkl ABERBER ====> ABERBER

In this specific example, it is safe to shift the pattern by its entire length

  • How about this (for k = 3, so d = R)?

...abcdBER... ERBER

If we shift by m = | ERBER | chars, we might miss a match

...abcdBERBER... ERBER

  • We need to handle this case a bit more carefully:

– – if this exists, we could shift by the distance between this prefix and suff (l)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-36
SLIDE 36

Chapter 7 127

Boyer-Moore

  • Bad symbol shift when char c doesn’t match, after k ≥ 0 matching

chars found; a generalization of Horspool’s algorithm:

d1 = max(t(c) − k, 1)

  • Good suffix shift d2 (when k > 0):

– set d2 to the distance between matched suffix of size k and its rightmost occurrence in the pattern that is not preceded by the same character as the suffix†, if such a substring exists –

  • therwise, see if there is a prefix of size 0 < l < k which matches

suff (l), and if there is, d2 is the distance between the longest such prefix and the suffix (the distance between the first chars of the prefix and the suffix) –

  • Finally, d = max(d1, d2) is the amount by which the pattern can be

shifted

† clearer to say “preceded by a different character than the suffix”?

Jim Diamond, Jodrey School of Computer Science, Acadia University