Advanced algorithms Based on texts from: H. Cormen, S. Dasgupta, G. - - PowerPoint PPT Presentation

advanced algorithms
SMART_READER_LITE
LIVE PREVIEW

Advanced algorithms Based on texts from: H. Cormen, S. Dasgupta, G. - - PowerPoint PPT Presentation

Introduction Divide and conquer D.P. G.A. NP-completeness Advanced algorithms Based on texts from: H. Cormen, S. Dasgupta, G. Bertrand, and M. Couprie X. Hilaire ESIEE Paris, IT department October 15, 2018 Introduction Divide and conquer


slide-1
SLIDE 1

Introduction Divide and conquer D.P. G.A. NP-completeness

Advanced algorithms

Based on texts from: H. Cormen, S. Dasgupta, G. Bertrand, and

  • M. Couprie
  • X. Hilaire

ESIEE Paris, IT department

October 15, 2018

slide-2
SLIDE 2

Introduction Divide and conquer D.P. G.A. NP-completeness

Outline

1

Introduction and basic notions

2

Divide and conquer algorithms

3

Dynamic programming

4

Greedy algorithms

5

Introduction to NP-completeness

slide-3
SLIDE 3

Introduction Divide and conquer D.P. G.A. NP-completeness

Aims of the course To know basic notions about computational complexity To be able to evaluate the complexity of a given algorithm To master, and to know when 3 popular resolution techniques do apply for exact solutions:

Divide and conquer Dynamic programming Greedy algorithms

To be able to distinguish between a complex and an easy problem The course is almost all about problems solving

slide-4
SLIDE 4

Introduction Divide and conquer D.P. G.A. NP-completeness

Organization: You are expected to write your solutions a pieces of ”pseudo-code” But real code (in C, Python, C# or any other language) is suitable too in many cases Two pratical labs will take place, and will be assessed : one for dynamic programming, one for greedy algorithms. 25% of final grade each. Work is individual (strictly). Final exam 50% of final grade.

slide-5
SLIDE 5

Introduction Divide and conquer D.P. G.A. NP-completeness

Introduction and basic notions

slide-6
SLIDE 6

Introduction Divide and conquer D.P. G.A. NP-completeness

A few definitions

Definition (RAM) A Random access machine (RAM) is a machine that can address its memory only one machine word at a time, but in constant time and in any order. The same holds for elementary instructions. We will furthermore assume that our RAMs have only one processor, and a limited set of instructions : Control: if, goto, call (function), return Comparison: =, =, <, > Assignment: x ← y where x and y are machine words Arithmetic: +, −, ×, ÷ Logical: bitwise not (!), and (&) , or (|) All operations are assumed done in constant time provided

  • perands are no larger than a machine word.
slide-7
SLIDE 7

Introduction Divide and conquer D.P. G.A. NP-completeness

What if operands x and y are n-bit integers, with n arbitrary large? Operations (arithmetic in particular) then take a time that highly depends on hardware and/or software implementation. Some common bounds: x + y done in ∝ n clock cycles x − y done in ∝ n clock cycles as well x × y done in ∝ nlogk(2k−1) cycles, where k ≥ 2 (Cook-Toom’s algorithm, wireable), or ∝ n log n log log n cycles (Sh¨

  • nage-Strassen’s algorithm)

x ÷ y done in... ∝ n2 cycles !! (old elementary school method, nothing really better known in practice) !, &, | all achieved in ∝ n cycles

slide-8
SLIDE 8

Introduction Divide and conquer D.P. G.A. NP-completeness

Let f : N → R+ and g : N → R+ be functions. Definition f = Ω(g) if and only there exist constants a ∈ R+ and n0 ∈ N, such that for all n ≥ n0, ag(n) ≤ f (n) holds true f = ω(g) if and only if for any constant a ∈ R+, there exists a constant n0 large enough so that for all n ≥ n0,ag(n) ≤ f (n) holds true f = O(g), if and only if there exist constants b ∈ R+, and n0 ∈ N such that for all n ≥ n0, bg(n) ≥ f (n) holds true f = o(g), if and only if for any constant b ∈ R+, there exists a constant n0 large enough so that for all n ≥ n0, bg(n) ≥ f (n) holds true

slide-9
SLIDE 9

Introduction Divide and conquer D.P. G.A. NP-completeness

Moreover: Definition

  • f = Θ(g) if and only if f ∈ Ω(g) and f ∈ O(g)
  • f ∈ θ(g) if and only if f ∈ ω(g) and f ∈ o(g)

f = O(g) if g asymptotically bounds f both by below and above after a possible rescaling. Small o establishes a stronger result than big O, as the inequality must hold for any rescaling constant.

slide-10
SLIDE 10

Introduction Divide and conquer D.P. G.A. NP-completeness

Corollary The following implications hold true: f = Θ(g) = ⇒ f = Ω(g) f = Θ(g) = ⇒ f = O(g) f = ω(g) = ⇒ f = Ω(g) f = θ(g) = ⇒ f = Θ(g) Asymptotic bounds may be used within equalities, for instance 3n2 + n + 2 = 3n2 + θ(n) = θ(n2) + θ(n) = θ(n2) Exercise

1 Show that n! = ω(2n) 2 Does ⌊log n⌋! admit a polynomial upper bound ? How about

⌊log log n⌋! ?

slide-11
SLIDE 11

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Let f , g : N+ → N+ be two functions. Prove or invalidate each of the following claims:

1 if f (n) = O(g(n)) then g(n) = O(f (n)) 2 f (n) + g(n) = Θ(min(f (n), g(n))) (assuming that it exists) 3 if f (n) = O(g(n)) then log(f (n)) = O(log(g(n)) 4 if f (n) = O(g(n)) then 2f (n) = O(2g(n)) 5 f (n) = O(f 2(n)) 6 if f (n) = O(g(n)) then g(n) = Ω(f (n)) 7 f (n) = Θ(f (n/2)) 8 f (n) + o(f (n)) = Θ(f (n))

slide-12
SLIDE 12

Introduction Divide and conquer D.P. G.A. NP-completeness

Time complexity is an evaluation (as a bound) of the running time

  • f an algorithm as a function of its input size:

Definition An algorithm with input size n has a time complexity g in the worst, the best, and any case, if its total number of instructions is O(g), Ω(g), and Θ(g), respectively. A similar definition holds for memory complexity: Definition An algorithm with input size n has a memory complexity g in the worst, the best, and any case, if the number of bytes it needs to allocate is O(g), Ω(g), and Θ(g), respectively. Why pay attention to complexity? Consider the famous Fibonacci numbers: 0,1,1,2,3,5,8,13,... Fn =

  • n

if n= 0 or n=1 Fn−1 + Fn−2

  • therwise

(1)

slide-13
SLIDE 13

Introduction Divide and conquer D.P. G.A. NP-completeness

How to implement a function that computes Fn? First idea (naive): int fibo1(int n) { if (n <= 1) return n; return fibo1(n-1)+fibo1(n-2); } What is its cost in time T(n)? Because T is increasing, for any n≥2 : T(n) ≥ T(n − 1) + T(n − 2) ≥ 2T(n − 2). So: T(n) ≥ 2T(n−2) ≥ 4T(n−4) ≥ 8T(n−6) ≥ ... ≥ 2n/2T(1) = 2n/2 Note that this is a lower bound of the cost, and already the worst possible one (nonpolynomial).

slide-14
SLIDE 14

Introduction Divide and conquer D.P. G.A. NP-completeness

Can we do better? Each time we request the computation of fibo1(n-1), we already have computed fibo1(n-2)... Seems to be the problem! int fibo2(int n) { int x= 0, y=1; for (i= 1; i <= n; i++) { int z= y; y= y+x; x=z; } return x; } What is the cost now? Just linear in n, because of the for loop...

slide-15
SLIDE 15

Introduction Divide and conquer D.P. G.A. NP-completeness

What are the following pieces of code doing? What are their lower and upper bounds (when predictable)? int foo1(int val, int *tab, int n) { for (int i= 0; i < n; i++) if (tab[i] == v) return i; return -1; } int foo2(int val, int *tab, int p, int q) { int m= (p+q)/2; if (tab[q] < val || tab[p] > val) return 0; if (tab[p] == val) return -1; if (tab[m] < val) return foo2(val, tab, m, q); else return foo2(val,tab,p,m); }

slide-16
SLIDE 16

Introduction Divide and conquer D.P. G.A. NP-completeness

int foo3(int x) { int y= x; while (y*y-x > 1) y=y/2; return y; } int foo4(int y) { int x=y; while (x*x-y > 1) x=(x+y/x)/2; return x; }

slide-17
SLIDE 17

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Can you think of a faster method fibo3 to compute Fn? Sketch: Show that for any n ≥ 1:

  • Fn

Fn+1

  • =

1 1 1 n F0 F1

  • Show that O(log n) matrix multiplications suffice to compute
  • Fn. Hint: write n as either n = (⌊n/2⌋)2 or

n = ⌊n/2⌋⌊n/2 + 1⌋ Show that all computations involve at most O(n) bits Conclude that the running time of fibo3 must be no more than O(n2 log n) Assuming that 2 n-bits integers can be multiplied in O(M(n)) time, can you prove that fibo3 is in O(M(n)) time?

slide-18
SLIDE 18

Introduction Divide and conquer D.P. G.A. NP-completeness

Divide and conquer algorithms

slide-19
SLIDE 19

Introduction Divide and conquer D.P. G.A. NP-completeness

D & Q is a strategy that solve a problem of size n merely as follows: Break the problem of size n into smaller subproblems of size < n Recursively solve each suproblem independently Return the solution as a combination of the solutions of subproblems Example 1: multiplication Suppose x and y are 2 integers coded using 2n bits (n = 3 → 8, n = 7 → 128, ...) in the little endian fashion. So x can be split into n/2 “high weight” bits, and n/2 “low weight” bits, as x = 2n/2xH + xL. Same for y = 2n/2yH + yL.

slide-20
SLIDE 20

Introduction Divide and conquer D.P. G.A. NP-completeness

The D& Q strategy computes x × y as DQ(x, y) = x.y = (2n/2xH + xL)(2n/2yH + yL) = 2nxHyH + 2n/2(xHyL + xLyH) + xLyL and involves 4 subproblems of multiplicating n-bits only integers

  • together. The solution requires two “shiftings” of n and n/2 bits,

and 3 additions. What is the running time T(n) of DQ? If an addition on n bits takes O(n) time, it appears that T(n) = 4T(n/2) + O(n) How to exploit this recurrence relation?

slide-21
SLIDE 21

Introduction Divide and conquer D.P. G.A. NP-completeness

Theorem (weak Master) Suppose that T(n) = aT(n/b) + αnd for some a > 0,b > 1,d ≥ 0, and n = br (with r > 0 integer). Then it holds that T(n) =      O(nd) if d > logb a O(nd log n) if d = logb a O(nlogb a) if d < logb a

  • Proof. Observe first that

the recurrence turns into a tree, whose height is ex- actly r = logb n :

T(n/b^2) T(n/b^2) T(n/b^2) T(n/b^2) T(n) T(n/b) T(n/b) T(n/b) . . . . . . . . . . . . . . . . T(1) T(1) T(1) T(1)

slide-22
SLIDE 22

Introduction Divide and conquer D.P. G.A. NP-completeness

Substitute the definition of T(n) in itself r times: T(n) = aT(n/b) + αnd = a(aT(n/b2) + α(n/b)d) + αnd = a2T(n/b2) + α(a(n/b)d + nd)) = a3T(n/b3) + α(a2(n/b2)d + a(n/b)d + nd) = ... = arT(1) + αnd

r−1

  • i=0

ai (bd)i = arT(1) + αnd

r−1

  • i=0

( a bd )i From the last term, we have to distinguish wether the series ratio a/bd is equal to 1 or not to continue. If a/bd = 1, or equivalently, d = logb a, then the terms under summation all equal 1 and it remains T(n) = arT(1) + αndr = alogb nT(1) + αnd logb(n) = O(nd log(n)) which proves the second case of the theorem.

slide-23
SLIDE 23

Introduction Divide and conquer D.P. G.A. NP-completeness

Suppose now d = logb a. Then the summation in T(n) evaluates to a

bd

r − 1

a bd − 1

= β a bd r − 1

  • = β
  • nlogb a−d − 1
  • putting β =

a

bd − 1

−1. Therefore, we always have that T(n) = alogb nT(1) + αβnd nlogb a−d − 1

  • = nlogb aT(1) + αβ
  • nlogb a − nd

provided d = logb a. Two cases need be distinguished: Case 1: d < logb a. Then the nlogb a term dominates, and β > 0, so T(n) = nlogb aT(1) + αβnlogb a = O(nlogb a) which proves the last case of the theorem. Case 2: d > logb a. Then this time the nd term dominates, and β < 0, so T(n) = alogb nT(1) + α|β|nd = O(nd) as claimed in the first case of the theorem.

slide-24
SLIDE 24

Introduction Divide and conquer D.P. G.A. NP-completeness

A similar result holds with integer parts, n not necessarily a power

  • f b, and big O notation:

Theorem (Master) Suppose that T(n) = aT(⌈n/b⌉) + O(nd) for some a > 0, b > 1, and d ≥ 0. Then it holds that T(n) =      O(nd) if d > logb a O(nd log n) if d = logb a O(nlogb a) if d < logb a Back to our multiplication... since we can answer the question: T(n) = 4T(n/2) + O(n) So a = 4, b = 2, d = 1, and the master theorem tells our D&Q strategy runs in O(n2) :(

slide-25
SLIDE 25

Introduction Divide and conquer D.P. G.A. NP-completeness

Can we do better? DQ(x, y) = 2nxHyH + 2n/2(xHyL + xLyH) + xLyL xHyL + xLyH = (xH + xL)(yH + yL)− xHyH − xLyL

  • already known

So ... DQ_mul (xH,xL,yH,yL,n) { a= xH*yH; b= xL*yL; c= (xH+xL)*(yH+yL)-a-b; return a << n + c << (n/2) + b; } which contains 6 additions, but only 3 multiplications now. So now, a = 3, b = 2, d = 1, and an application of Master theorem tells us that DQ runs in O(n1.585) time.

slide-26
SLIDE 26

Introduction Divide and conquer D.P. G.A. NP-completeness

Example 2: merge sort (MS) The MS algorithms sorts an input array of n elements by recursively splitting it into 2 subarrays of size n/2, and merges the result of the sorted subarrays to keep the solution sorted itself.

1 2 3 4 4 7 8 9 1 8 4 2 7 3 9 4 1 8 2 4 3 7 4 9 1 2 4 8 3 4 7 9

function MS(tab[begin..end]) if begin = end then return tab; else s1 ← MS(tab[begin..⌊ begin+end

2

⌋]); s2 ← MS(tab[⌊ begin+end

2

⌋+1..end]); return merge(s1,s2); end if end function

slide-27
SLIDE 27

Introduction Divide and conquer D.P. G.A. NP-completeness

The merge function is easily achieved in linear time: function merge(tab1[b1..e1],tab2[b2..e2]) p1 ← b1, p2 ← b2, t ← 1, s ← b2-e2+b1-e1+2; res ← new array(1..s); for t=1..s do if tab1[p1] < tab2[p2] then res[t] ← tab1[p1]; p1 ← min(p1+1,e1); else res[t] ← tab2[p2]; p2 ← min(p2+1,e2); end if t ← t+1; end for return res; end function

slide-28
SLIDE 28

Introduction Divide and conquer D.P. G.A. NP-completeness

What is the time complexity of mergesort? For an array of size n: T(2p) = 2T(p) + O(2p) T(2p + 1) = T(p) + T(p + 1) + O(2p + 1) ⇒ 2T(⌊n/2⌋) + O(n) ≤ T(n) ≤ 2T(⌈n/2⌉) + O(n) The master theorem, applied on the right-hand side, tells that than T(n) = O(n log n) since a = b = 2, and log2(2) = 1 = d. The left-hand side tells us the same, so mergesort has a time-complexity of Θ(n log n). It is its biggest advantage : n log n operations ensured whatever the input. Its drawback: running time constants (must copy all subarrays... see exercise on quicksort).

slide-29
SLIDE 29

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Find the Ω and O bounds for each of the following recurrences (floors ⌊.⌋ have been omitted to simplify notations): T(n) = 2T(n/3) + n2 T(n) = T(n − 1) + n T(n) = T(n − 1) + 1/n T(n) = T(n − 1) + T(n − 2) + n/2 T(n) = T(n/2) + T(n/3) T(n) = 2T(n/2) + log n T(n) = T(√n) + 1 T(n) = √nT(√n) + n

slide-30
SLIDE 30

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Professor Sempron has n chipsets, not all of them are in good

  • state. He can only test them pairwise, and according to the

following: if a chip is good, it will always say the truth on the other chip (that is, the other chip is good or damaged) if a chip is damaged, its opinion on the other chip is unpredictable

1 Show it is impossible to diagnose which chips are reliable if

more than n/2 of them are damaged.

2 Assuming that more than n/2 chips are good, show that only

Θ(n) operations are sufficient to find a good chip amongst the n

3 Show that Θ(n) operations are sufficient to diagnose the good

chips, still assuming more than n/2 of them are good.

slide-31
SLIDE 31

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise You are given two sorted arrays A and B of integers of size m and

  • n. Give an O(log m + log n) algortihm to find the kth smallest

element of A ∪ B. Exercise You are given k sorted arrays of integers of n elements each, and you would like to merge them into a single array of kn elements. Give an efficient algorithm to do this.

slide-32
SLIDE 32

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Suppose you are given a set S of n points of the plane Pi = (xi, yi), i = 0, . . . , n. You would like to find which pair of points Pi, Pj has smallest Euclidean distance: (i, j) = arg min

u,v∈[1:n],u=v ||Pu − Pv||

How can you find a value x such that the sets of points L and R for which xi ≤ x and xi ≥ y are of equal size (up to a unit)? Say than S has been split that way. Recursively find the pairs

  • f points (pL, qL) ∈ L × L and (pR, qR) ∈ R × R whose

Euclidean distance is smallest in each subset. Put δ = min(||pL − qL||, ||pR − qR||). It remains to check wether we can find (p, q) ∈ L × R such that ||p − q|| < δ. How can you achieve this? (hint: sort only a very specific set

  • f points along the y axis)
slide-33
SLIDE 33

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise ((continued)) Show that your algorithm is correct (hint: show that any box

  • f size δ × δ contains at most 4 points)

Write the full pseudocode of your algorithm, and show that its running time obeys T(n) = 2T(n/2) + O(n log n) Show that this recurrence solves to O(n log2 n) Can you bring the time complexity down to O(n log n)? What happens if your points live in Rd, d ≥ 3 rather than R2? (From “Algorithms” by Dasgupta, Papadimitriou, and Vazirani. Related problem: Guibas-Stolfi & Leach algorithms for Delaunay triangulation)

slide-34
SLIDE 34

Introduction Divide and conquer D.P. G.A. NP-completeness

Dynamic programming

slide-35
SLIDE 35

Introduction Divide and conquer D.P. G.A. NP-completeness

What is dynamic programming (DP)? In short: a technique suitable to solve problems which exhibit two mains features:

1 Optimal substructure: An1 optimal solution for a problem of

size n can be exressed as a combination of the optimal solutions of subproblems of sizes less than n. Subproblems may partially (and not completely) overlap one each other.

2 Any subproblem of size less than n is easier to solve than a

problem of size n. In other words, a solution S can be expressed as I =

k

  • i=1

Ai such that |Ai| < |I|, ∀i = 1, . . . , k, and Ai ⊆ Aj does not hold for any pair (i, j). All problems pertaining of Ai’s must be solvable.

1Unicity is not required

slide-36
SLIDE 36

Introduction Divide and conquer D.P. G.A. NP-completeness

The word “programming” does not stand for “writing code”, but rather for “program the solution”, thanks to tables. Example 1: the shortest path problem Let G = (V , E) be a graph (fig. 1) with vertices V and edges E, and let s, t ∈ V . A path P between s and s is a sequence of vertices (x1, ..., xn) such that x1 = s, xn = s, all xi’s are different, and (xi, xi+1) ∈ E, i = 1, ..., n − 1. The length of P is |P| = n.

S=a b c d e f g h i j=T

Figure 1: The shortest path problem between S and T

slide-37
SLIDE 37

Introduction Divide and conquer D.P. G.A. NP-completeness

How to compute a shortest path SP(s, t; G) between s and t given G ? Suppose s = a and t = j as in the figure. Whatever the solution (if it exists), a shortest path ending at j must necessarily pass through one of g or i. Moreover, the “cost” for moving from g or i is the same in both cases: unitary. Therefore: cost(a, j; G) = 1 + min

x∈{g,i} cost(a, x; Gx)

SP(a, j; G) = {j} ∪ {arg min

x∈{g,i} cost(a, x; Gx)}

  • r more generally:

cost(u, u) = 0 cost(u, v; G) = 1 + min

x∈pred(v) cost(u, x; Gx)

SP(u, v; G) = {v} ∪ {arg min

x∈pred(v) cost(u, x; Gx)}

slide-38
SLIDE 38

Introduction Divide and conquer D.P. G.A. NP-completeness

where pred(x) is the set of neighbours of x not visited yet, Gx is the subgraph of G with nodes V \ {x}, and edges E \ {(z, x) : z ∈ V }. Can we exploit these relations directly, in a recursive manner? From Fig. 1, we see that after at most 2 recursions, both subproblems require to solve SP(a, f ; Gghij), as any path from a to j must pass through f . More glaring:

s t x x x x −1l −1r −2r −2l x x −nl −nr

Figure 2: A failure case for recursive SP

slide-39
SLIDE 39

Introduction Divide and conquer D.P. G.A. NP-completeness

Running time T(n) = 2cost(n − 2) + O(1), which solves to T(n) = Θ(2n). Wheezes:

1 Wouldn’t it be better to use a table so as to avoid recompute

the solutions we already know?

2 Do we really need recursion?

Towards to the solution... Observe first that SP(a, j; G) = SP(j, a; G) (“backpropagating” towards to a, or “propagating” from a towards to j are equivalent) Basic idea of the algorithm: keep two tables C[] and P[], both indexed by the vertices of G, up to date so that at each iteration and for any x ∈ V : C[x] contains either the length of a shortest path known from a to x, or ∞ if no path is known yet.

slide-40
SLIDE 40

Introduction Divide and conquer D.P. G.A. NP-completeness

P[x] contains the node y such that yx belongs to the shortest path if C[x] = ∞ Suppose we start from a, which we insert into A, the set of active

  • vertices. Informal algorithm:

1 Initialize A to {a}, C[a] to 0, and other C’s to ∞. P need not

be initialized.

2 For all active vertices x ∈ A, look for neighbours y (that is, all

y ∈ V such that (x, y) ∈ E)

3 For all such y: if C[y] > C[x] + 1 then the path stemming

from x is better than the best path known so far, so we should update the tables: C[y] = C[x] + 1, P[y] = x

4 Insert y in A 5 Iterate to step 3 until no more y 6 Remove x from A 7 Iterate to step 3 unless y = j 8 Iterate to step 2 unless y = j or A is empty

slide-41
SLIDE 41

Introduction Divide and conquer D.P. G.A. NP-completeness

More formally: Input: G = (V , E) a non-empty graph, s, t ∈ V . Output: either A = ∅, and there is no solution; or a shortest path can be derived by walking P (see exercise next frame) Init: C[s] ← 0, C[V \ {s}] ← ∞, A ← {s}, P created while A = ∅ do A′ ← ∅ for all x ∈ A do if x = t then stop else for all (x, y) ∈ E do if C[y] > C[x] + 1 then C[y] ← C[x] + 1, P[y] ← x, A′ ← A′ ∪ {y} end if end for end if end for A ← A′

slide-42
SLIDE 42

Introduction Divide and conquer D.P. G.A. NP-completeness

Remarks Within the while loop, we built an A′ subset, that we finally substituted to A This is for the sake of clarity : the loop should indeed not modify A itself Entering the while loop for the n-th time, we update all nodes touched by a path whose length is exactly n The very first time t is hit therefore correponds to the shorted possible path that effectively reaches t More remarks: The solution returned is not unique, and the only case in which we terminate with no solution is one where s and t belong to different connected components

slide-43
SLIDE 43

Introduction Divide and conquer D.P. G.A. NP-completeness

The above algorithm would work equally well if we used positve weighted edges It is a variant of Dijkstra’s algorithm, which compute the shortest paths to all nodes reachable from s : suffice, for this, to remove the if test. Exercise

1 Run the above algorithm on the data of Fig. 1, and for each

while iteration, give the values of A, C, and P.

2 Write the pseudo-code to print the solution of the shortest

path in case the algorithm has stopped with t ∈ A = ∅.

3 What is the complexity of the above algorithm? Could you

improve it?

slide-44
SLIDE 44

Introduction Divide and conquer D.P. G.A. NP-completeness

Remarks: (cont’d) Bellman-Ford’s algorithm does the same, but weights can be < 0 too provided G carries no cycle. Floyd-Warshall algorithm does the same job for all source nodes Example 2: the Levenstein distance of two strings Suppose you are given two strings S (source) and T (target), defined over the same alphabet: S is a corrupted version of R: some characters have either been replaced, inserted, or deleted with a unitary cost 2: cost(insertion)=cost(replacement)=cost(deletion)=1, cost(match)=0 T is not corrupted Which sequence of insertions/replacements transforms S into T with minimal cost?

2Only to keep the problem simple...

slide-45
SLIDE 45

Introduction Divide and conquer D.P. G.A. NP-completeness

T= h

  • r

s e s S= h e r b i s t s Could be: e→ o, b→ ∅ , i → ∅, t→ e : cost = C(S,T)= 4 Put s ∈ [1 : |S|] and t ∈ [1 : |T|], and consider S[s] and T[t]

  • alone. The crucial point: how S[s] matches T[t] does not depend
  • n how S[1..s − 1] matched T[1..t − 1]. Indeed, when we arrive at

s: If S[s] is neither an inserted nor a deleted character, then C(S[1..s], T[1..t]) = C(S[s], T[t])+C(S[1..s −1], T[1..t −1]) If S[s] is an inserted character, then C(S[1..s], T[1..t]) = 1 + C(S[1..s − 1], T[1..t]) If S[s] is a deleted character, then C(S[1..s], T[1..t]) = 1 + C(S[1..s], T[1..t − 1])

slide-46
SLIDE 46

Introduction Divide and conquer D.P. G.A. NP-completeness

Of all these costs, only the one that is minimal is of interest. In

  • ther words, we always have that

C(S[1..s], T[1..t]) = min    C(S[s], T[s]) + C(S[1..s − 1], T[1..t − 1]) 1 + C(S[1..s − 1], T[1..t]), 1 + C(S[1..s], T[1..t − 1])    The relation is trivially true at s = |S| and t = |T|, and true at s = t = 1 if we commit that S[1..0] and T[1..0] are empty strings. As usual, direct use of recursion is a bad idea. The Levenstein algorithms reads as follows:

slide-47
SLIDE 47

Introduction Divide and conquer D.P. G.A. NP-completeness

Input: two strings S and T Output: the Levenstein distance in C[|S|, |T|] C ← array(0..|S|, 0..|T|); s ← |S|, t ← |T| for i=0..s do C[i,0] ← i; end for for i=0..t do C[0,i] ← i; end for for j=1..t do for i=1..s do

  • k ← C[i-1,j-1] + 1S[i]=T[j];

ins ← C[i-1,j]+1; del := C[i,j-1]+1; C[i,j] ← min(ok, ins, del); end for end for

slide-48
SLIDE 48

Introduction Divide and conquer D.P. G.A. NP-completeness

Todo (in class) : run the alg. on the example blade05$ gnatmake dp_levenstein gcc-4.4 -c dp_levenstein.adb gnatbind -x dp_levenstein.ali gnatlink dp_levenstein.ali blade05$ ./dp_levenstein h e r b i s t s 1 2 3 4 5 6 7 8 h 1 1 2 3 4 5 6 7

  • 2

1 1 2 3 4 5 6 7 r 3 2 2 1 2 3 4 5 6 s 4 3 3 2 2 3 3 4 5 e 5 4 3 3 3 3 4 4 5 s 6 5 4 4 4 4 3 4 4 blade05$

slide-49
SLIDE 49

Introduction Divide and conquer D.P. G.A. NP-completeness

Example 3: matrix multiplication Suppose A, B, C, D are 4 matrices with dimensions m × n,n × p,p × q, and q × r. We want to evaluate A × B × C × D. Matrix multiplication is not commutative, but associative: A × B × C = (A × B) × C = A × (B × C) Multiplicating two matrices of dimensions m × n and n × p takes mnp multiplications and (m − 1)(n − 1)p additions, so running time dominated by the mnp multiplications.

  • peration

multiplications numeric ((AB)C)D mnp + mpq + mqr 23500 A((BC)D) npq + nqr + mnr 12500 A(B(CD)) pqr + npr + mnr 28750 (AB)(CD) mnp + pqr + mpr 31000

Table 1: Effect of different parenthesing order (m = 30,n = 8,p = 20,q = 50,r = 25)

slide-50
SLIDE 50

Introduction Divide and conquer D.P. G.A. NP-completeness

How to compute the parenthesing order that yield the lowest number of multiplications? Take a tree representation of the solution.

B BC C BCD A ABCD D

At toplevel: cost(ABCD) = mnr + cost(A) + cost(BCD)

  • and cost(BCD) has to be in turn optimal. We

can easily generalize to any chain of j − i + 1 matrices: Mi ∈ Mdi−1×di Mi+1 ∈ Mdi×di+1 ..... Mn ∈ Mdn−1×dn

slide-51
SLIDE 51

Introduction Divide and conquer D.P. G.A. NP-completeness

If we decide to cut at index k (k matrices for the left child), then: We have two matrices of dimensions di−1 × dk and dk × dj to multiply → cost = di−1dkdj Final cost =di−1dkdj + cost(i − 1, . . . , k)+cost(k, . . . , j) Of all possible costs, we want the lowest possible, so: cost(i, j) = min

k=i,...,j−1{di−1dkdj + cost(i, k) + cost(k + 1, j)}

The code readily follows:

slide-52
SLIDE 52

Introduction Divide and conquer D.P. G.A. NP-completeness

Input: dimensions d0, . . . , dn of n matrices Output: C[1, n] as the lowest cost Init: C ← array(n, n) for i = 1, . . . , n do C[i, i] ← 0 end for for size = 1, . . . , n − 1 do for i = 1, . . . , n − size do j ← size + i C[i, j] ← mink=1,...,j−1{di−1dkdj + C[i, k] + C[k + 1, j]} end for end for

slide-53
SLIDE 53

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise A contiguous subsequence of a list S is a subsequence made up of consecutive elements of S. For instance, if S={1, −3, 2, 7, 8, 0, 3}, then {−3, 2, 7} is a subsequence, but {2, 7, 3} is not. Give a linear time algorithm to compute the contiguous subsequence of maximum sum. Hint: consider subsequences ending exactly at position j. Exercise You are given a piece of fabric with integer dimensions X × Y . You have a set of n template objects, each of which requires a piece of fabric with integer dimensions xi × yi to be copied. If you produce a copy of object i, your profit is ci; you can produce as many copies

  • f any object you want, or none. You have a machine that can cut

any piece of fabric into two pieces, either horizontally or vertically. Propose an algorithm which tells you how to maximize your profit.

slide-54
SLIDE 54

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Let P be a convex polygon with n vertices. A triangulation of P is an array of n − 3 diagonals of P, no two of which intersect each

  • ther. The cost of a triangulation is the sum of the lengths of all
  • diagonals. Give an efficient algorithm to compute a triangulation
  • f P of minimal cost, and evaluate its complexity.

Exercise The travelling salesman problem is NP-complete (see forthcoming chapter). A relaxed version restricts the problem to bitonic cycles: in such cycles, one is allowedi only to visit points from left to right, then right to left to visit points. Propose an O(n2) to solve the TSP problem under this hypothesis.

slide-55
SLIDE 55

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise You have a machine that can process only a single task at a time. You have n such tasks a1, ..., an, whose respective duration is ti seconds, and (absolute) execution deadline is di. If you terminate task ai before di, you earn pi Euros; otherwise, you earn nothing. Propose an algorithm to find the scheduling that maximise your benefit, and evaluate its complexity.

slide-56
SLIDE 56

Introduction Divide and conquer D.P. G.A. NP-completeness

Greedy algorithms

slide-57
SLIDE 57

Introduction Divide and conquer D.P. G.A. NP-completeness

Consider a slightly modified version3 of the last exercise from the dynamic programming chapter: Suppose you have a machine that can process only a single task at a time. You have a set S of n such tasks S = {τ1, ..., τn}. Task τi must begin at time bi, and must be terminated before time ei. Propose an algortihm to find the scheduling that maximise the number of tasks performed. Can we do better than the solution of D.P? Assume (wlog) that the tasks are indexed by increasing deadlines

  • rder:

d1 ≤ d2 ≤ ... ≤ dn Commit that b0 = −∞,e0 = 0, and bn+1 = ∞,en+1 = 1 + ∞.

3We have changed the profits ci’s to 1’s

slide-58
SLIDE 58

Introduction Divide and conquer D.P. G.A. NP-completeness

b1 e1 e2 b2 e7 e0 b8

Figure 3: Sorted intervals for task sel. problem

Define Si,j = {τk : ei ≤ bk < ek< bj}. The search solution is S0,n+1. First observe that ∀i ≥ j, Sij = ∅: if there were τk ∈ Sij, then ei ≤ bk < ek < bj < ej, so ei < ej with i ≥ j, a contradiction (the ei’s are in increasing order).

slide-59
SLIDE 59

Introduction Divide and conquer D.P. G.A. NP-completeness

Suppose S⋆

ij is an optimal subset for Sij: ∀T ⊂ S⋆ ij,

|T| < |S⋆

ij|.

Moreover assume S⋆

ij = ∅, so that τk ∈ S⋆ ij well exists. Then

S⋆

ij = S⋆ ik ∪ {τk} ∪ S⋆ kj

Because all three subsets in the union are disjoint4, it follows that |S⋆

ij| = |S⋆ ik| + 1 + |S⋆ kj|

Remind that Sij = ∅ whenever i ≥ j, so is S⋆

  • ij. Thus, if P[i, j] is a

table representing our profit, then P[i, j] =

  • if Sij = ∅

1 + maxi<k<j{P[i, k] + P[k, j]}

  • therwise

A DP code could readily follow... however there is this:

4For otherwise, the subset would not be optimal

slide-60
SLIDE 60

Introduction Divide and conquer D.P. G.A. NP-completeness

Theorem Assume Sij = ∅, and denote by τm ∈ Sij the task with lowest ending date. We claim that:

1 τm ∈ S⋆

ij

2 Sim = ∅

  • Proof. Point 2: Suppose τk ∈ Sim = ∅. By definition of Sim:

ei ≤ bk < ek < bm < em, so ek < em, a contradiction. Point 1: let τk be the task of S⋆

ij with lowest ending time. If it

happens that k = m, the proof is achieved. If not, observe that all intervals (b, e) in S⋆

ij must be disjoint. If τm /

∈ S⋆

ij, that means

there would exist another task τk whose ending date would be larger than that of τk without being more than the beginning date

  • f the next task in S⋆
  • ij. Since both subsets would have the same

time, using τk or τm makes no difference, therefore τm ∈ S⋆

ij.

slide-61
SLIDE 61

Introduction Divide and conquer D.P. G.A. NP-completeness

Very important consequences:

1 Don’t bother evaluating Smj : you won’t find anything

interesting there

2 This results into a dramatically simple algorithm:

Find the task a with lowest ending date Add it to the optimal solution ← the greedy choice Discard all tasks whose beginning date is before the ending of a Loop to step 1 until no more tasks

3 Complexity: θ(n) if tasks are given sorted by their ending

date, θ(n log n) if they are not.

slide-62
SLIDE 62

Introduction Divide and conquer D.P. G.A. NP-completeness

Resulting algorithm: Input : n tasks τi = (bi, ei) Ouput : A = optimal scheduling of the τ’s Sort the τi’s by increasing ei’s tmax ← 0, A ← ∅ for i=1..n do if tmax < bi then tmax ← ei A ← A ∪ {τi} end if end for In two words: Greedy algorithms make a choice of the most immediate profitable choice among a (possibly very large) set That choice is never put back into question later on

slide-63
SLIDE 63

Introduction Divide and conquer D.P. G.A. NP-completeness

They might be the answer to an optimal solution (like in task sched.) : don’t miss that chance, and try to derive the proof. They might be suboptimal (in time or objective) as well Quite often, the greedy strategy finds solutions not too bad to NP-complete problems Example 2: Huffman coding Assume a file F containing only 5 characters : a,b,c,d,e. Suppose na = 50000, nb = nc = 800, nd = 600, ne = 300. Since 22 < 5 < 23, a fixed-length coding requires 3 bits/character, so size(F)=51700 × 3= 155100 bits= 19388 bytes

slide-64
SLIDE 64

Introduction Divide and conquer D.P. G.A. NP-completeness

However, if a “=” 1, b ”=” 010, c ”=” 011, d ”=” 000, and e “=” 001, then size(F) = 50000 × 1 + (800 + 800 + 600 + 300) × 3 = 57500 bits = 7188 bytes

50000 a 2700 1800 900 600 300 800 800 b c d e 1 1 1 1

symbol code length a 1 1 b 010 2 c 011 3 d 000 3 e 001 3

slide-65
SLIDE 65

Introduction Divide and conquer D.P. G.A. NP-completeness

In a Huffman tree, symbols are stored in leafs only ⇒ coding sequences are variable in length, but unique per symbol. Questions:

1 What is the optimal structure5 of a tree given an alphabet Λ

with 2 or more symbols, and number of occurences n : s ∈ Λ → n(s) ?

2 Assuming the optimal structure is know, where should

symbols be stored? We can answer both thanks to 2 lemmas. Lemma Let T be a Huffman tree (optimal), and x, y ∈ Λ denote the 2 symbols with lowest number of occurences amongst all. Then x and y have to be stored on some leafs of T with highest possible depth.

5The one which yields the lowest coding cost, not accounting for the

Huffman tree itself

slide-66
SLIDE 66

Introduction Divide and conquer D.P. G.A. NP-completeness

  • Proof. Call d : s ∈ Λ → N ∋ d(s) the function that gives the

depth of any symbol s of Λ in T. Then the cost of T is C(T) =

  • s∈Λ

d(s)n(s) Suppose x be not located on a leaf of T with maximal depth : ∃z ∈ Λ : d(z) > d(x), n(z) ≥ n(x) Call T ′ a copy of T in which x and z have been swapped. Denote d′ the related depth function. Then C(T ′) − C(T) =d′(x)n(x) + d′(z)n(z) +

  • s∈Λ\{x,z}

d(s)′n(s)′ − d(x)n(x) − d(z)n(z) −

  • s∈Λ\{x,z}

d(s)n(s)

slide-67
SLIDE 67

Introduction Divide and conquer D.P. G.A. NP-completeness

= d′(x)n(x) + d′(z)n(z) − d(x)n(x) − d(z)n(z) = n(x)(d′(x) − d(x)

  • <0

) + n(z)(d′(z) − d(z)

  • <0

) < 0 which contradicts optimality of T. Same results holds for y too, without it be necessary to allocate any new node.

  • Lemma

Consider x,y, Λ, and T as defined in previous lemma. Call z a dummy character on Λ′ = Λ \ {x, y} ∪ {z}, such that n′(z) = n(x) + n(y). If T ′ is optimal for Λ′, then T is optimal for Λ.

slide-68
SLIDE 68

Introduction Divide and conquer D.P. G.A. NP-completeness

  • Proof. Because x and y are both sons of z,

d(x) = d(y) = d′(z) + 1. This implies d(x)n(x) + d(y)n(y) = (d′(z) + 1)n(x) + (d′(z) + 1)n(y) = d′(z)(n(x) + n(y)) + n(x) + n(y) = d′(z)n(z) + n(x) + n(y) Therefore (copy-past argument from previous proof) C(T) − C(T ′) = n(x)d(x) + n(y)d(y) − n(z)d′(z) = n(z)d′(z) + n(x) + n(y) − n(z)d′(z) = n(x) + n(y) Now suppose T be not optimal. So there exists an optimal coding O, which must have x and y as extreme leafs according to the first lemma.

slide-69
SLIDE 69

Introduction Divide and conquer D.P. G.A. NP-completeness

Denote O′ a copy of O with {x, y} replaced by z. Then C(O′) = C(O) + n(x) + n(y) < C(T) − n(x) − n(y) = C(T ′) which contradicts optimality of T ′.

slide-70
SLIDE 70

Introduction Divide and conquer D.P. G.A. NP-completeness

Resulting algorithm: Input : Λ and n Ouput : Huffman tree of (Λ, n) Q ← ∅ – Q is a priority queue ranked by increasing n’s for s ∈ Lambda do insert((s, n(s)),Q) end for for i=1..|Λ| − 1 do left ← pop(Q) right ← pop(Q) z ← new node(left, right) insert((z, n(left) + n(right)), Q) end for return pop(Q)

slide-71
SLIDE 71

Introduction Divide and conquer D.P. G.A. NP-completeness

Example 3: Minimal spanning trees → Jean Cousty’s lecture on Kruskall algorithm and the cut property of MST.

slide-72
SLIDE 72

Introduction Divide and conquer D.P. G.A. NP-completeness

Example 4: bounded fractional knapsack problem During a burglary, a robber finds n bags containing different kinds

  • f powders. Powder i is worth pi > 0 Euros a gram, but is

available in limited quantity, say qi > 0 grams. Also, the robber cannot carry more than b grams of product altogether. How much

  • f each bag should he steal?

We wish to maximise S =

n

  • i=1

xipi subject to S ≤ b xi ≤ qi, ∀i = 1, ..., n Assuming (wlog) that the pi’s are sorted by decreasing order, does the following greedy algorithm provide the right answer?

slide-73
SLIDE 73

Introduction Divide and conquer D.P. G.A. NP-completeness

Input: pi’s, qi’s and b Output: amount xi’s to steal in each bag r ← b for i=1..n do xi ← min(qi, r) r ← r − xi end for Justification:

slide-74
SLIDE 74

Introduction Divide and conquer D.P. G.A. NP-completeness

Example 5: bounded integer knapsack problem Same problem as before, but bag i now contains ni objects, each

  • f which is worth pi Euros and weights wi grams. Assuming bags

are ordered by decreasing pi’s, does the following code produce the right answer? Input: pi’s, wi’s, ni’s and b Output: number xi’s of objects to steal in each bag r ← b for i=1..n do xi ← min(ni, r ÷ wi) r ← r mod wi end for Justification:

slide-75
SLIDE 75

Introduction Divide and conquer D.P. G.A. NP-completeness

Example 6: vertex cover (to show in class)

slide-76
SLIDE 76

Introduction Divide and conquer D.P. G.A. NP-completeness

Summary: Dynamic programming

Make a choice at each step k Tabulate all known optimal solutions to sub-problems at steps < k before Bottom-up approach

Greedy strategy:

Make the most profitable choice at each step Solve the remaining problem after Top-down approach

Optimal sub-structure must be shown in both cases Greedy additionally requires proving the most profitable choice leads to optimal solution

slide-77
SLIDE 77

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise You are going on a long journey between Antwerpen and Napoli. Once the tank of your car is filled, you know you can do at most n

  • km. You have a roadmap that tells you where the fuel stations are

located, and you would like to make as few stops as possible. Give an efficient algorithm to solve this problem. Exercise Suppose you have a set of n lectures that need be scheduled in

  • classrooms. Each lecture has fixed (non-modifiable) starting and

ending times. You would like to use as few classrooms as possible to schedule all lectures.

1 Describe an naive θ(n2) algorithm to determine the scheduling

  • f lectures

2 Try to improve this solution to an O(n log n) time algorithm,

and possibly O(n) under some conditions.

slide-78
SLIDE 78

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Suppose you have two sequences of n positive numbers A = {ai}n

i=1 and B = {bi}n i=1. You are free to reorganize them as

you want, after what, you get a profit of Πn

i=1abi i . Give a strategy

to maximize your profit, and justify it. Exercise A service has n customers waiting to be served, and can only serve

  • ne at a time. Customer i will spend di minutes in the service, and

will have to wait i−1

j=1 dj minutes before being served. The

penality for making customer i wait m minutes is mpi, where pi > 0 is some constant. You would like to schedule the customers, that is, find a permutation φ : [1 : n] → [1 : n] so as to minimize the overall penalty P(φ) = n

i=1 pφ(i)

i−1

j=1 dφ(j).

slide-79
SLIDE 79

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise (cont’d)

1 Consider 3 customers C1,C2,C3, with service duration 3,5,7,

and priorities 6, 11, 9. Among the possible schedulings φ1(1) = 1, φ1(2) = 3, φ1(3) = 2 and φ2(1) = 3, φ2(2) = 1, φ2(3) = 2, which one is preferable?

2 Consider two schedulings φ1 and φ2, identical everywhere

except that φ1 makes customer j served immediately after customer i, while φ2 does just the opposite. What does ∆ = P(φ1) − P(φ2) equal to?

3 Derive the expression of an evaluation function f which

associate a number to any customer i and decides whether ∆ > 0 or not.

4 Derive an algorithm for this problem, and justify it.

Complexity?

slide-80
SLIDE 80

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Consider the problem of giving change on n cents using as few coins as possible.

1 Give a greedy algorithm that gives back change using coins of

50, 20, 10, 5 and 1 cents. Show that it is optimal.

2 Suppose now you have k + 1 coins whose values are powers of

some constant c > 0, that is, 1, c, c2, ..., ck. Prove that your greedy algorithm is still optimal.

3 Give a set of values for the coins for which the greedy solution

is not optimal.

4 Give an optimal O(nk) algorithm which gives back change

whatever the values of the coins – but assuming there is always a coin of 1 cent

slide-81
SLIDE 81

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Your boss asks you to organize a party in which new colleagues can

  • meet. So that the party be successful, you believe it reasonable not

to invite someone if he/she knows more than n − p persons out of the n, or fewer than p. However, you would like to invite as many people as possible.

1 Propose an algorithm to solve this problem for p = 1.

Complexity?

2 Can you generalize your solution to p ≥ 2?

Exercise Given a graph G=(V,E), a matching is a subset E ′ of the edges E such that no pair of E ′ share a same node. A matching is perfect if the set of all vertices touched by E ′ is exactly V . A tree is a graph that has no cycle (the path that links any vertices x and y is unique). A forest is a collection of trees. Give an efficient algorithm which determines whether a forest G has a perfect match or not. What happens if G is a general graph?

slide-82
SLIDE 82

Introduction Divide and conquer D.P. G.A. NP-completeness

Introduction to NP-completeness

slide-83
SLIDE 83

Introduction Divide and conquer D.P. G.A. NP-completeness

So far, we coped with algorithms whose time complexity in worst case was polynomial: O(nk), where k > 0, and n = |I| = size of an instance of the problem (in bits) They were at least “acceptable” : could determine a solution in polynomial time in a space of solution of exponential size. Examples:

Sorting, in θ(n log n) time, whereas n! candidates exist Shortest path of a graph, in O(V 2) (Dijkstra’s algorithm), amongst up a set of up to |V − 2|!

Are there problems for which no polynomial time solution is known – doesn’t mean there does no exist any? Equivalently, are there problems for which all we can propose is to enumerate all possible solutions and retain one of them?

slide-84
SLIDE 84

Introduction Divide and conquer D.P. G.A. NP-completeness

NP-completeness theory distinguishes optimization and decision problems, and restricts itself to the study of the latters. Definition An optimization problem (or search problem) is one of the form: find an x ∈ X, where X is a set,a and f : X → Y is a function in some space (Y , ≤), such that f (x) is maximal.

aPossibly infinite

Classical examples of optimization problems:

1 QCQIP: Maximize f (x, y) = 2x2 − 3y + 7 over Z2 subject to

x2 + 2(y − 1)2 ≤ 510 and x2 + y > 1

2 TSP: given a graph G = (V , E), find a cycle visiting all

vertices exactly once, and having minimal total length

3 CLIQUE: given a graph G = (V , E), find a clique of G with

maximal number of vertices.

slide-85
SLIDE 85

Introduction Divide and conquer D.P. G.A. NP-completeness

Definition A decision problem is one of the form: does there exists x ∈ X such that f (x) is true? Decision version of the above problems:

1 QCQIP: Does there exists (x, y) ∈ Z2 such that

f (x, y) = 2x3 − 3y + 7 ≥ k, x2 + 2(y − 1)2 ≤ 510, and x2 + y > 1 ?

2 TSP: Does G = (V , E) admit a cycle with length ≤ k visiting

all vertices exactly once? Some cases are trivial : k ≥

e∈E length(e), and

k ≤

x∈V miny∈V ,y=xlength((x, y))

3 CLIQUE: Does G = (V , E) admit a clique of order k. In

contrast, no trivial case for this problem.

slide-86
SLIDE 86

Introduction Divide and conquer D.P. G.A. NP-completeness

Why study only decision problems? Because if one doesn’t know how to solve a DP in reasonable time, there is no hope to solve the OP version in reasonable time too Equivalently, because unless very particular cases in which search space is of infinite dimension, or objective function is piecewise monotonic but exhibits an exponential number of discontinuities, it takes O(logε k) iterations to determine which part of the space contains the optimal solution within ε

  • tolerance. Even if k = 2n, logε k = O(n) is polynomial in the

size of the input.

slide-87
SLIDE 87

Introduction Divide and conquer D.P. G.A. NP-completeness

Definition A problem is in class P if there exists an algorithm C that can solve any instance I of this problem in polynomial time of |I|. The solution of C(I) is ensured to be correct, so P ensures that both computing and checking a solution to a problem can be done in O(|I|k) time. Definition A problem is in class NP if there exists an algorithm C that can check, for any instance I, whether a proposal solution S is really a solution for I in polynomial time. I.o.w, C(I, S) now returns a boolean value, and runs in polynomial

  • time. NP does not ensure anything more than checkability of a

solution in polynomial time. Lemma If P ∈ P, then P ∈ NP.

slide-88
SLIDE 88

Introduction Divide and conquer D.P. G.A. NP-completeness

The big question: how about the converse? Does P ∈ NP ⇒ P ∈ P or not? No one ever could answer this question so far. It does not mean that because a problem belongs to NP, it does not admit a polynomial solution. It could admit one – but no one ever found it. ”The classes of problems which are respectively known and not known to have good algorithms are of great theoretical interest. [...] I conjecture that there is no good algorithm for the traveling salesman problem. My reasons are the same as for any mathematical conjecture: (1) It is a legitimate mathematical possibility, and (2) I do not know.” – Jack Edmonds, 1966

slide-89
SLIDE 89

Introduction Divide and conquer D.P. G.A. NP-completeness

Let’s recast the problem in a more convenient way: the input (instance) of a problem can be seen as a string of bits of length n6 the output of a search algorithm is again a string of bits the output of a decision algorithm is a “true/false” answer, so a string of 1 bit A decision problem is equivalent to checking whether a string of bits is correct or not. Language theory offers a convenient framework to express this. Definition Let Σ = {0, 1} be the binary aplhabet. Denote by Σk the set of all strings obtained by concatenating exactly k symbols of Σ, and by ε the empty string. A language is any subset of Σ⋆ = {ε} ∪ Σ1 ∪ Σ2 ∪ Σ3 ∪ ...

6We do not bother what this string means: we just know there is an

encoding that can encode our input data on n bits

slide-90
SLIDE 90

Introduction Divide and conquer D.P. G.A. NP-completeness

Definition An algorithm A accepts an input string s ⇔ A(s) outputs true : A(s) = true. It rejects it ⇔ A(s) = false. A language L is accepted by A ⇔ ∀s ∈ L, A(s) = true A language is decided by A ⇔ it is accepted by A and ∀s ∈ ¯ L = L⋆ \ L, A(s) = false. Moreover, if A runs in polynomial time, then the above definitions are straightforwardly extended to “in polynomial time” as well : accept/reject/acepted/decided in polynomial time. The following theorem redefines NP for decision problems: Theorem A language L is in P ⇔ there exists an algorithm that can decide L in polynomial time. Similarly, since NP is the class of all decisions problems/languages whose solutions are verifiable in polynomial time:

slide-91
SLIDE 91

Introduction Divide and conquer D.P. G.A. NP-completeness

Theorem A language L over Σ is in NP if there exists a polynomial time algorithm A and a polynomial p such that: x ∈ L ⇔ ∃c ∈ Σp(|x|) : A(x, c) = true Any c ∈ Σp(|x|) satisfying A(x, c) = true is called a certificate for x. A crucial notion in NP-completeness is that of reduction, as defined by Karp: Definition Let L1 and L2 be two languages. One says L1 is reductible to L2 in polynomial time, written as L1 ≤P L2, if and only if there exists a polynomial time algorithm f such that s ∈ L1 ⇔ f (s) ∈ L2

slide-92
SLIDE 92

Introduction Divide and conquer D.P. G.A. NP-completeness

Karp’s reduction is of great interest in the following result: Lemma Let L1 and L2 be two languages such that L1 ≤P L2. Then L2 ∈ P ⇒ L1 ∈ P. Proof See Fig. 4. Since L1 ≤P L2, there must exist a reduction function f that can compute the image f (x) ∈ L2 of any x ∈ L1 in less that T1(x) = c1|x|u1 time, where c1, u1 are > 0 constants. Put y = f (x). Since L2 ∈ P, there must also exist an algorithm A that can decide wether to accept or reject f (x) in less than T2(f (x)) = c2|f (x)|u2 time. The time needed to accept or reject x as a string of L1 is therefore less that T1(x) + T2(f (x)) = c1|x|u1 + c2|f (x)|u2, which is polynomial in |x| since |f (x)| is polynomial in |x|. Therefore, L1 ∈ P.

slide-93
SLIDE 93

Introduction Divide and conquer D.P. G.A. NP-completeness

f

  • Alg. for L2

f(x)

  • Alg. for L1

true iff x in L1 Final output: x

Figure 4: Karp reduction of L1 to L2

Corollary Let L1 and L2 be two languages such that L1 ≤P L2. If L1 / ∈ P, then L2 / ∈ P as well. Proof Take the counteraposite of the claim of Lemma 22.

  • The theorem tells us that unless P = NP, if we can reduce a new

language/problem to one for which no polynomial time solution is known, then our new language/problem won’t have any polynomial time solution as well. For otherwise, this would contradict P = NP.

slide-94
SLIDE 94

Introduction Divide and conquer D.P. G.A. NP-completeness

An other useful result on Karp’s reduction is its transitivity: Property If A ≤p B and B ≤p C, then A ≤P C (Proof easy and left to the reader). We finally come to the central definitions of this chapter: Definition A language L is NP-hard, or belongs to NPC, if X ≤P L holds for every X ∈ NP. A language is NP-complete if it is both NP-hard and in NP.

P NP−hard NP−complete NPC NP

slide-95
SLIDE 95

Introduction Divide and conquer D.P. G.A. NP-completeness

Unless P = NP, any problem in NPC can not be decided in polynomial time. Since Karp’s reduction is the common technique to prove NP-hardness, is there a reference problem proven NP-complete? Theorem (Cook-Levin) Let F be a Boolean expression in conjunctive normal form of order n: F(x1, ..., xn) =

m−1

  • i=0

(yni+1 ∨ yni+2... ∨ yni+n) where each yi is either : any variable amongst the set {x1, ..., xn, x1, ..., xn}

  • r the false value

The n-SAT problem (or simply SAT) is to assign values to the xi’s so that F be true. The SAT problem is NP-complete.

slide-96
SLIDE 96

Introduction Divide and conquer D.P. G.A. NP-completeness

Reductions Many (thousands, indeed) problems L can be proven NP-complete by either proving SAT ≤P L, or SAT ≤P X ≤P L. Below are only a few examples. Theorem 3-SAT is NP-complete.

  • Proof. 3-SAT ∈ NP: The certificate is just the assignment x

itself, and it takes 3m|x| = O(|x|) time units to evaluate it. 3-SAT is NP-hard: Suffice we find a transform that 1) maps any k-CNF, k > 3 to a 3-CNF and conversely, 2) runs in polynomial

  • time. Consider a k-CNF F = C1 ∧ C2 ∧ ... ∧ Cm. Each Cp is a

disjunction of the form Cp = y1 ∨ y2 ∨ ... ∨ yk Introduce a new variable z, and C (1)

p

= y1... ∨ yk−2∨z, C (2)

p

= yk−1 ∨ yk∨¯

  • z. Then:
slide-97
SLIDE 97

Introduction Divide and conquer D.P. G.A. NP-completeness

If Cp is true and both yk−1 and yk are false, we set z to false, and C (1)

p

∧ C (2)

p

is true If Cp is true and any of yk−1 is yk true, we set z to true and C (1)

p

∧ C (2)

p

is true If Cp is false, then C (1)

p

∧ C (2)

p

is false too irrespective of z. Recursive application of this transform to C (1)

p

results in a 3-CNF involving ⌈k/2⌉ terms and ⌈k/2⌉ new variables zi’s. This shows that to any assignment of yi’s satisfying Cp corresponds an assignment of yi’s and zi’s satisfying C (1)

p

∧ ... ∧ C (⌈k/2⌉). Apply it to each term of C to conclude it will take ⌈mk/2⌉ = O(mk) time to transform C, a polynomial time of the of the length of the input, as required.

slide-98
SLIDE 98

Introduction Divide and conquer D.P. G.A. NP-completeness

CLIQUE: A clique of order k of a graph G = (V , E) is a subgraph G ′ = (V ′ ⊆ V , E ′ ⊆ E) such that ∀(x, y) ∈ E ′ × E ′, x = y ⇒ (x, y) ∈ E ′. In a clique, ∀x ∈ V , degree(x) = k − 1 (why?). The CLIQUE problem is to find a clique of maximal order.

a b c d x y z

Figure 6: A graph and a clique of order 4 ({a, b, c, d})

Theorem The CLIQUE problem is NP-complete.

  • Proof. Clearly, CLIQUE∈ NP. We shall show that 3-SAT≤P

CLIQUE to prove CLIQUE is NP-hard.

slide-99
SLIDE 99

Introduction Divide and conquer D.P. G.A. NP-completeness

Given a 3-CNF formula F = C1 ∧ ... ∧ Ck, we must build, in polynomial time, a graph G = (V , E) such that F is satisfied ⇔ G admits a clique of order k. Commit that Cp = y1

p ∨ y2 p ∨ y3 p for any

p, yi

p being one of the symbols x1, x1, x2, x2, x3, x3 by construction.

We build G as follows: V = {(p, i) ∈ [1 : k] × [1 : 3]} encodes all litterals represented by the yi

p ’s

((i, p), (j, q)) ∈ E ⇔ i = j and yi

p = yj q

This clearly runs in O(k2), a polynomial time of k. If F is satisfiable, then at least one of y1

p, y2 p, y3 p, say yl p, is true per clause

  • Cp. Each yl

p encodes a vertex of G, and must be linked to 7 at

least one of y1

q, y2 q, y3 q, whenever q = p. So degree(yl p) ≥ k − 1,

and since this holds for any p, G has a clique of order k.

7Check that otherwise, F would not be satisfiable

slide-100
SLIDE 100

Introduction Divide and conquer D.P. G.A. NP-completeness

Conversely, assume G has a clique of order k, and choose two of its vertices ((p, i), (q, j)). Because p = q and pi = pj, we are ensured clauses Cp and Cq independently evaluate to true provided they include symbols yi

p and yj

  • q. Therefore, F is satisfied too.
  • VERTEXCOVER

Given a graph G = (V , E), a vertex cover of G is a subset V ′ of its vertices such that ∀(x, y) ∈ E, {x, y} ∩ V ′ = ∅. The VERTEXCOVER problem is to find a vertex cover of G with minimal number of verices. Theorem VERTEXCOVER is NP-complete.

  • Proof. VERTEXCOVER is NP : it takes O(nk) time to mark all

vertices covered by a certificate of size k in a graph of size n, and O(n) to check all vertices are covered. We show VERTEXCOVER ≥P CLIQUE to prove NP-hardness of

  • VERTEXCOVER. Consider G = (V , E), the complement graph of

G, in which E = V 2 \ E.

slide-101
SLIDE 101

Introduction Divide and conquer D.P. G.A. NP-completeness

We show that G has a vertex cover of size |V | − k ⇔ G has a clique C of order k. ⇐: Take any edge (x, y) ∈ E; by definition, (x, y) / ∈ C, so at least

  • ne of x or y is not a vertex of C. This amounts to saying

{x, y} ∩ V \ C = ∅, so (x, y) is covered by some vertex of V \ C. Select all possible edges (x, y) ∈ E: we get that V \ C covers E. ⇒: assume G has a vertex cover C of size |V | − k. Choose any two vertices x, y ∈ V : if (x, y) / ∈ E, then {x, y} ∩ C = ∅, or equivalently: ∀x, y ∈ V , x / ∈ C and y / ∈ C ⇒ (x, y) ∈ E. Therefore, the set V \ C is maximally connected, or is a clique, and its size equals |V | − |C| = k .

slide-102
SLIDE 102

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Consider ZOE (zero-one equations): given an m × n matrix A filled with 0’s or 1’s, is there an n-vector x filled of 0’s or 1’s only such that Ax = 1?

1 Show that ZOE is NP-complete. 2 Infer that ILP (integer linear programming) is NP-hard.

slide-103
SLIDE 103

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Bonnie and Clyde have just robbed a bank. They have a loot consisting of n dollars, and they would like to split it into 2 parts of equal size. State whether splitting the loot is a P or NP-complete problem in each of the following case: they have coins of x and y dollars they have coins of 1,2,4,8,... 2i dollars they have bearer bonds of arbitrary value same, but don’t mind if the difference between the 2 parts is less that 100 dollars

slide-104
SLIDE 104

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise You are given a graph G = (V , E), a subset V ′ of its vertices, and a positive integer k > 0. You are looking for a minimal spanning tree T of G under some particular conditions. State whether each

  • f these conditions leads to a P or an NP-complete, and justify:

The leafs of T should be exactly V ′ The leafs of T should be chosen only from V ′ The leafs of T should include V ′ T must contain k or fewer leaves T must contain exactly k leaves T must contain k or more leaves

slide-105
SLIDE 105

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise Professor Cacochyme Val´ etudinaire claims the following: “We know that the CLIQUE problem in general graphs is NP-complete, so it is enough to present a reduction from CLIQUE-3 to CLIQUE. Given a graph G with vertices of degree 3, and a parameter g, the reduction leaves the graph and the parameter unchanged: clearly the output of the reduction is a possible input for the CLIQUE problem. Furthermore, the answer to both problems is

  • identical. This proves the correctness of the

reduction and, therefore, the NP-completeness of CLIQUE-3.” Is this correct or not?

slide-106
SLIDE 106

Introduction Divide and conquer D.P. G.A. NP-completeness

Exercise (cont’d) He moreover claims this: We present a reduction from VC-3 to CLIQUE-3. Given a graph G = (V , E) with node degrees bounded by 3, and a parameter b, we create an instance of CLIQUE-3 by leaving the graph unchanged and switching the parameter to |V | − b. Now, a subset C ⊆ V is a vertex cover in G if and

  • nly if the complementary set V \ C is a clique in G.

Therefore G has a vertex cover of size b if and only if it has a clique of size |V | − b. This proves the correctness of the reduction and, consequently, the NP-completeness of CLIQUE-3. It this proof valid? Describe an O(|V |4) algorithm to solve CLIQUE-3.