CPSC 490 DP - Guest Lecture Part 4: DP Optimizations David Zheng - - PowerPoint PPT Presentation

cpsc 490 dp guest lecture
SMART_READER_LITE
LIVE PREVIEW

CPSC 490 DP - Guest Lecture Part 4: DP Optimizations David Zheng - - PowerPoint PPT Presentation

CPSC 490 DP - Guest Lecture Part 4: DP Optimizations David Zheng 2020/01/30 University of British Columbia Announcements An upsolver for A1 has been released! You can get 25% marks on the questions you didnt solve. 1 Guest Lecturer:


slide-1
SLIDE 1

CPSC 490 DP - Guest Lecture

Part 4: DP Optimizations

David Zheng 2020/01/30

University of British Columbia

slide-2
SLIDE 2

Announcements

  • An upsolver for A1 has been released! You can get 25% marks on the questions you didn’t

solve.

1

slide-3
SLIDE 3

Guest Lecturer: David Zheng

  • Current coach of the ICPC competitive

programming team

  • ICPC World Finalist in 2018 and 2019
  • Master’s student in Computer Science
  • Taught CS490 in 2017W2

2

slide-4
SLIDE 4

What is DP Optimization?

What do you do when you work really hard to come up with a dynamic program, but the runtime is too slow? For example if 1 ≤ k ≤ n ≤ 5000, but your algorithm runs in O(n2k). If we exploit the underlying structure of the problem at hand, it’s possible to optimize to a solution that runs in

  • 1. O(n2)
  • 2. O(nk log n) or O(nk)
  • 3. O(n log n)

This is all VERY dependant on the structure of the original problem you are trying to solve.

3

slide-5
SLIDE 5

How do you do DP Optimization?

This is a general outline of how to solve harder DP problems.

  • 1. Understand the problem. (This is sometimes hard!)
  • 2. Make observations about the problem. These range from simple to complex observations.
  • 3. Formulate a dynamic program by exploiting the recursive substructure you observed.
  • 4. Make further observations about the underlying structure of the dynamic program.
  • 5. Optimize the runtime with one of these common techniques:
  • Range min/max query structures,
  • Knuth Optimization (*),
  • Divide and Conquer Optimization (*),
  • Convex Lower/Upper Envelope (?),
  • Sophisticated Binary Search (AKA the “Alien’s trick”) (?).

We will be discussing the optimizations with (*) and maybe the ones with (?) if we have time.

4

slide-6
SLIDE 6

Overview of the Lecture

One problem. Six subproblems. We will do the following:

  • 1. Understanding the problem
  • 2. Making observations
  • 3. Cooking up a DP
  • 4. Making more observations (about the DP)
  • 5. DP Optimization 1: Knuth Optimization
  • 6. DP Optimization 2: Divide and Conquer Optimization
  • 7. DP Optimization 3: Convex Hull Optimization
  • 8. DP Optimization 4: Binary Search on the Marginal (Alien’s trick)

Note: Typically for a problem you can’t use all these DP optimization tricks.

5

slide-7
SLIDE 7

Motivating Problem (IOI 2016 Aliens) - Part 1

Our satellite has just discovered an alien civilization on a remote planet. We have already obtained a low-resolution photo of a square area of the planet. The photo shows many signs of intelligent life. Our experts have identified n points

  • f interest in the photo. We now want to take

high-resolution photos that contain all of those points. Internally, the satellite has divided the area of the low-resolution photo into an m by m grid of unit square cells. Our satellite is on a stable orbit that passes directly over the main diagonal of the grid. The main diagonal is the line segment that connects the top left and the bottom right corner of the grid.

6

slide-8
SLIDE 8

Motivating Problem (IOI 2016 Aliens) - Part 2

The satellite can take a high-resolution photo of any area that satisfies the following constraints:

  • 1. the shape of the area is a square,
  • 2. two opposite corners of the square both lie on the main diagonal of the grid,
  • 3. each cell of the grid is either completely inside or outside the photographed area.

The satellite is able to take at most k high-resolution photos. Once the satellite is done taking photos, it will transmit the high-resolution photo of each photographed cell to our home base (regardless of whether that cell contains some points of interest). The data for each photographed cell will only be transmitted once, even if the cell was photographed several times. Thus, we have to choose at most square areas that will be photographed, assuring that:

  • 1. each cell containing at least one point of interest is photographed at least once, and
  • 2. the number of cells that are photographed at least once is minimized.

Your task is to find the minimum possible total number of photographed cells.

7

slide-9
SLIDE 9

Understanding the problem - Part 1

The middle diagram shows a solution that photographs 41 cells, while the right one is optimal and photographs only 25 cells. Reminder: n points, the grid is m by m, at most k regions, must be squares on the diagonal.

8

slide-10
SLIDE 10

Understanding the problem - Part 2

The right set of photos is the optimal. Since the points are symmetric, if a square has one of the points, it has both of them. We can assume that if point i is at (xi, yi), then xi ≤ yi. Reminder: n points, the grid is m by m, at most k regions, must be squares on the diagonal.

9

slide-11
SLIDE 11

Goal for this Lecture

Goal: We want to solve this problem with these constraints: 1 ≤ k ≤ n ≤ 100 000 and 1 ≤ m ≤ 1 000 000 This is hard, so let’s start small to get an understanding of this problem.

10

slide-12
SLIDE 12

Warmup Question - Understanding the Problem (Subtask 1)

Can you come up with a solution that will run fast enough if we have the following constraints? 1 ≤ k = n ≤ 50 and 1 ≤ m ≤ 100. Discuss! Hint: k = n makes this much easier Reminder: n points, the grid is m by m, at most k regions, must be squares on the diagonal.

11

slide-13
SLIDE 13

Warmup Question - Find a solution (Subtask 1)

Can you come up with a solution that will run fast enough if we have the following constraints? 1 ≤ n = k ≤ 50 and 1 ≤ m ≤ 100 Solution Since k = n, we can afford to have a square region for every

  • point. For point at (ri, ci) we need a square with corners at

(ri, ri) and (ci, ci). We can mark parts of the grid as covered, then go through the grid and count covered grid cells in O(nm2). Reminder: n points, the grid is m by m, at most k regions, must be squares on the diagonal.

12

slide-14
SLIDE 14

Intro Problem - Understand the Problem (Subtask 2)

Can you come up with a solution that will run fast enough if we have the following constraints? 1 ≤ k ≤ n ≤ 500 and 1 ≤ m ≤ 1000 and All points are on the diagonal, so points are at (xi, xi). Discuss! (Make observations and work towards a DP) Note that we may have fewer photos then we have points this time! Example: n = 5, k = 3, m = 7. Reminder: n points, the grid is m by m, at most k regions, must be squares on the diagonal.

13

slide-15
SLIDE 15

Intro Problem - Make Observations (Subtask 2)

1 ≤ k ≤ n ≤ 500 and 1 ≤ m ≤ 1000 and All points are on the diagonal, so points are at (xi, xi). Observations

  • We can preprocess the points to remove duplicates and

sort based on xi.

  • If a photo covers both (a, a) and (b, b), it must cover all

points in between.

  • Each photo must be of the form (xi, xj). Photos that

don’t do this are worse.

  • The photos we take will never overlap.

The problem now is to finding k segments that cover n points, such that sum of squares of lengths in minimum. Example: n = 5, k = 3, m = 7.

14

slide-16
SLIDE 16

Intro Problem - Formulate DP (Subtask 2)

Solution Let f (i, j) be min cost for first i points with j photos. Base case: f (0, j) = 0 for all 0 ≤ j ≤ k. f (i, j) = min

0≤t<i f (t, j − 1) + (xi−1 − xt + 1)2

Our solution can be found at f (n, k). We have a total of O(nk) states, and transitions take O(n), so overall runtime is O(n2k). Example: n = 5, k = 3, m = 7.

15

slide-17
SLIDE 17

DP Problem - Understand the Problem (Subtask 2)

1 ≤ k ≤ n ≤ 500 and 1 ≤ m ≤ 1000 and Points can be at general coordinates (xi, yi). Discuss! (Make observations and work towards a DP) Example: n = 5, k = 3, m = 7. Reminder: n points, the grid is m by m, at most k regions, must be squares on the diagonal.

16

slide-18
SLIDE 18

DP Problem - Make Observations (Subtask 3)

1 ≤ k ≤ n ≤ 500 and 1 ≤ m ≤ 1000 and Points can be at general coordinates (xi, yi). Observations

  • Most of the observations we made before still hold!
  • The photos we take will overlap.
  • To cover (xi, yi), we need at least a square from (xi, xi)

to (yi, yi). Any square that begins before and ends after will also cover it.

  • Our solution looks like a set of intervals that cover

some given intervals.

  • We will never choose two intervals where one is

contained in the other. Example: n = 5, k = 3, m = 7. Induces intervals (0, 2), (1, 3), (3, 4), (4, 4), and (4, 6).

17

slide-19
SLIDE 19

DP Problem - Formulate a DP (Subtask 3)

Solution The same DP from last time almost works. Sort required intervals (xi, yi) by left end point after preprocessing. Base case: f (0, j) = 0 for all 0 ≤ j ≤ k. f (i, j) = min

0≤t<i f (t, j − 1) + (yi−1 − xt + 1)2

− (max(0, yt−1 − xt + 1))2 This new term is the overlapping square! (It will be exactly this square that gets overcounted) Our solution can be found at f (n, k). Example: n = 5, k = 3, m = 7. Induces intervals (0, 2), (1, 3), (3, 4), (4, 4), and (4, 6).

18

slide-20
SLIDE 20

Optimization (Part 1) - Make more Observations (Subtask 4)

1 ≤ k ≤ n ≤ 4 000 and 1 ≤ m ≤ 1 000 000 f (i, j) = min

0≤t<i f (t, j − 1) + (yi−1 − xt + 1)2 − (max(0, yt−1 − xt + 1))2

Now, O(n2k) is going to be way too slow! It might help to define this variable: A(i, j) is the best t that we can choose for f (i, j). Claim: A(i, j − 1) ≤ A(i, j) ≤ A(i + 1, j)

19

slide-21
SLIDE 21

Optimization (Part 1) - Make more Observations (Subtask 4)

1 ≤ k ≤ n ≤ 4 000 and 1 ≤ m ≤ 1 000 000 f (i, j) = min

0≤t<i f (t, j − 1) + (yi−1 − xt + 1)2 − (max(0, yt−1 − xt + 1))2

A(i, j) is the best t that we can choose for f (i, j). Claim: A(i, j − 1) ≤ A(i, j) ≤ A(i + 1, j) Both inequalities “feel” intuitive, but proving it formally is challenging. Observation: f (i, j) only depends on f (∗, j − 1). This is enough to do Knuth Optimization! (This is a somewhat non-standard Knuth optimization)

20

slide-22
SLIDE 22

Aside 1 - Knuth Optimization (Part 1)

Standard Knuth Optimization We have a DP that runs in O(n3) that looks like: f (i, j) = cost(i, j) + min

i≤t≤j{f (i, t) + f (t, j)}

This sort of recursion often shows up for problems that involve splitting an interval into pieces,

  • r glueing an interval together from the pieces, but there’s a cost associated with each step.

If the costs satisfy the quadrangle inequality, as well as monotonicity for a ≤ b ≤ c ≤ d: cost(a, c) + cost(b, d) ≤ cost(a, d) + cost(c, b) (Quadrangle Inequality) cost(a, b) ≤ cost(c, d) (Monotonicity) Let A(i, j) denote the optimal choice of t. Then the above inequalities implies that: A(i, j − 1) ≤ A(i, j) ≤ A(i + 1, j)

21

slide-23
SLIDE 23

Aside 1 - Knuth Optimization (Part 2)

Standard Knuth Optimization We have a DP that looks like: f (i, j) = cost(i, j) + min

i≤t≤j{f (i, t) + f (t, j)}

This is the important inequality: A(i, j − 1) ≤ A(i, j) ≤ A(i + 1, j) We can rewrite our recurrence as: f (i, j) = cost(i, j) + min

A(i,j−1)≤t≤A(i+1,j){f (i, t) + f (t, j)}

If we solve subproblems in order of increasing j − i, we can compute A(i, j − 1) and A(i + 1, j) before we solve f (i, j).

22

slide-24
SLIDE 24

Aside 1 - Knuth Optimization Efficiency

Standard Knuth Optimization f (i, j) = cost(i, j) + min

A(i,j−1)≤t≤A(i+1,j){f (i, t) + f (t, j)}

Now we can consider the number of things we need to iterate over when we solve for f (i, j) whenever j and i differ by exactly ℓ.

n−ℓ+1

  • i=0

1 + A(i + 1, i + ℓ + 1) − A(i, i + ℓ) = n − ℓ + A(n − ℓ, n) − A(0, ℓ) ≤ 2n Since we need to do this for all lengths of intervals ℓ from 0 to n, applying Knuth optimization made this algorithm become O(n2). This is a factor of n speedup from our original O(n3) algorithm! This optimization is one of the easiest ones to code, just create a separate array to lookup A.

23

slide-25
SLIDE 25

Optimization (Part 1) - Knuth Optimization (Subtask 4)

Back to our original problem 1 ≤ k ≤ n ≤ 4 000 and 1 ≤ m ≤ 1 000 000 f (i, j) = min

0≤t≤i f (t, j − 1) + (yi−1 − xt + 1)2 − max(0, (yt−1 − xt + 1)2)

Observation 1: A(i, j − 1) ≤ A(i, j) ≤ A(i + 1, j) Observation 2: This DP recurrence for f (i, j) only uses recurrences from f (∗, j − 1). Hence we can compute values of f in increasing j and decreasing i, and guarentee much like before a O(n2) runtime like in standard Knuth Optimization! (Analysis of runtime is exactly the same!)

24

slide-26
SLIDE 26

Optimization (Part 2) - Divide and Conquer (Subtask 5)

Now let’s look at some new constraints: 1 ≤ k ≤ 100 and 1 ≤ n ≤ 50 000 and 1 ≤ m ≤ 1 000 000 f (i, j) = min

0≤t<i f (t, j − 1) + (yi−1 − xt + 1)2 − (max(0, yt−1 − xt + 1))2

Observation 1: A(i, j) ≤ A(i + 1, j) (For fixed j, A(i, j) is monotonically increasing) Observation 2: This DP recurrence for f (i, j) only uses recurrences from f (∗, j − 1). This is enough for standard Divide and Conquer DP optimization!

25

slide-27
SLIDE 27

Aside 2 - Divide and Conquer (Part 1)

Standard Divide and Conquer Optimization The typical form of dynamic program looks like: f (i, j) = min

0≤t<i{f (t, j − 1) + cost(i, t, j)}

This frequently arises when we need to partition n objects into at most k contiguous groups, but we have costs associated with each group (cost function varies with the question). Let A(i, j) denote the optimal choice of t. We can do Div. Conq. if we have monotonicity. A(i, j) ≤ A(i + 1, j)

26

slide-28
SLIDE 28

Aside 2 - Divide and Conquer (Part 2)

Standard Divide and Conquer Optimization f (i, j) = min

0≤t<i{f (t, j − 1) + cost(i, t, j)}

and A(i, j) ≤ A(i + 1, j) High level idea: Loop over j, compute A(imid, j), and use that to guide your search.

1

function Calculate(j, Imin , Imax , Tmin , Tmax):

2

if Imin > Imax: return

3

initialize Imid to 1/2 * (Imin + Imax)

4

initialize f[Imid ][j] to infinity

5

for t from Tmin to Tmax:

6

update f[Imid ][j] with f[t][j-1] + cost(i,j,t)

7

let Topt be the optimal t we found

8

Calculate(j, Imin , Imid - 1, Tmin , Topt)

9

Calculate(j, Imid + 1, Imax , Topt , Tmax)

27

slide-29
SLIDE 29

Aside 2 - Divide and Conquer Runtime

The Calculate(j, imin, imax, Tmin, Tmax) function computes f (imin, j), f (imin + 1, j), . . . f (imax, j) with the information that it only needs to look at t ∈ [Tmin, Tmax]. To get our solution, we need to loop through all j.

1

initialize f[i][0] forall i

2

for j from 1 to k:

3

Calculate(j, 0, n, 0, n) Runtime Consider how many times we consider a value of t as a candidate for some optimal range. Every time we recurse, we partition Tmin and Tmax among two parts, so each value will be a candidate for the number of times we recurse. The depth of the recursion is O(log(n)) for, so every call to Calculate takes O(n log n). The total runtime is O(nk log n).

28

slide-30
SLIDE 30

Optimization (Part 2) - Divide and Conquer Optimization (Subtask 5)

Let’s revisit our problem: 1 ≤ k ≤ 100 and 1 ≤ n ≤ 50 000 and 1 ≤ m ≤ 1 000 000 f (i, j) = min

0≤t≤i f (t, j − 1) + (yi−1 − xt + 1)2 − max(0, yt−1 − xt + 1)2

Observation 1: A(i, j) ≤ A(i + 1, j) (For fixed j, A(i, j) is monotonically increasing) Observation 2: This DP recurrence for f (i, j) only uses recurrences from f (∗, j − 1). We can do Divide and Conquer DP optimization!

29

slide-31
SLIDE 31

Optimization (Part 3) - Convex Hull (Subtask 5)

1 ≤ k ≤ 100 and 1 ≤ n ≤ 50 000 and 1 ≤ m ≤ 1 000 000 f (i, j) = min

0≤t≤i f (t, j − 1) + (yi−1 − xt + 1)2 − max(0, yt−1 − xt + 1)2

Let’s go in another direction now, and expand the square. f (i, j) = min

0≤t≤i f (t, j − 1) + (yi−1 − xt + 1)2 − max(0, (yt−1 − xt + 1)2)

= min

0≤t≤i f (t, j − 1) + y 2 i−1 − 2(xt − 1)yi−1 + (xt − 1)2 − max(0, yt−1 − xt + 1)2

= y 2

i−1 + min 0≤t≤i −2(xt − 1)yi−1 + f (t, j − 1) + (xt − 1)2 − max(0, yt−1 − xt + 1)2

= Ci + min

0≤t≤i Mtyi−1 + Bt

Where Ci = y 2

i−1, Mt = −2(xt − 1), and Bt = f (t, j − 1) + (xt − 1)2 − max(0, (yt−1 − xt + 1)2) 30

slide-32
SLIDE 32

Optimization (Part 3) - Convex Hull (Subtask 5)

1 ≤ k ≤ 100 and 1 ≤ n ≤ 50 000 and 1 ≤ m ≤ 1 000 000 f (i, j) = min

0≤t≤i f (t, j − 1)

+(yi−1 − xt + 1)2 − max(0, yt−1 − xt + 1)2 = Ci + min

0≤t≤i Mtyi−1 + Bt

Where Ci = y 2

i−1, Mt = −2(xt − 1), and Bt = f (t, j − 1) + (xt − 1)2 − max(0, yt−1 − xt + 1)2.

Note that Ci is a function of only i, while Mt and Bt depend only on t. Things of the form My + B are equations of lines! We could solve this problem if we had a data structure where you could quickly do the following:

  • Get the minimum of a collection of linear functions at a point yi−1.
  • Insert a linear function into the collection.

31

slide-33
SLIDE 33

Aside 3: Convex Lower Envelope

There exists a data structure that maintains the lower envelope, and performs insertions of lines and queries for minimum value at a point in O(log n) time. Unfortunately, there isn’t enough time to cover the details of how it works here.

32

slide-34
SLIDE 34

Optimization (Part 4) - “Alien’s Trick” (Subtask 6)

1 ≤ k ≤ n ≤ 100 000 and 1 ≤ m ≤ 1 000 000 Reminder: n points, the grid is m by m, at most k regions, must be squares on the diagonal. Now we’re out of luck. We have no hope of computing our dynamic program since it has O(nk) different states. We can think about this a different way, by thinking about f (n, ∗) for different values of ∗. Claim: f (n, j) − f (n, j − 1) ≥ f (n, j + 1) − f (n, j) for all 1 ≤ j ≤ n − 1, so f (n, ∗) is convex. Intuitively this makes “feels” correct, the more photos we are allowed to have, the less effective having an extra photo would be. Proving this takes some work. This convexity is what let’s us do this trick.

33

slide-35
SLIDE 35

Optimization (Part 4) - Binary Search on the Marginal (Subtask 6)

1 ≤ k ≤ n ≤ 100 000 and 1 ≤ m ≤ 1 000 000 f (n, ∗) is convex. Convex functions have the property of decreasing marginal returns that are well-studied by

  • economists. Let’s take an economic view of f (n, ∗).

Let’s suppose that f (n, ∗) was a utility function, say the amount of happiness you get from eating a chocolate. There are n chocolates in front of you. The more chocolate you eat, the more full you get (and the more you regret how much chocolate you just ate). Now let’s assume that you had to pay c dollars for each chocolate you ate. How much chocolate would you eat? You would eat until the utility you got from eating one more chocolate is less than the utility of c dollars. Similarly, if you charged c dollars for each photo taken, you would stop exactly when the marginal return of taking one more photo was equal to c.

34

slide-36
SLIDE 36

Optimization (Part 4) - Binary Search on the Marginal (Subtask 6)

Recall that this was our DP when we had the constraint of having j photos. f (i, j) = min

0≤t≤i f (t, j − 1) + (yi−1 − xt + 1)2 − max(0, (yt−1 − xt + 1)2)

We can now write a similar looking DP for a fixed c, where we charge c for each photo we take. gc(i) = min

0≤t≤i gc(t) + (yi−1 − xt + 1)2 − max(0, (yt−1 − xt + 1)2) + c

We can solve this problem with convex hull optimization as before in time O(n log n). But how do we choose what c to pick? Well if c = m2 we know that we would use exactly one photo. If c = 0 we would take n

  • photos. There exists some value in between where we taking k photos is optimal, so you can

binary search for the value.

35

slide-37
SLIDE 37

Binary Search on the Marginal - Technical Point

gc(i) = min

0≤t≤i gc(t) + (yi−1 − xt + 1)2 − max(0, (yt−1 − xt + 1)2) + c

Beware, there may be a range k ∈ [ℓ, r] where the marginal return of taking the photo is the

  • same. You would have found an optimal solution with p ∈ [ℓ, r] photos and the marginal value

is equal, and you know gc(n) = f (n, p) − pc, so you can simply output gc(n) + kc = f (n, p) + (k − p)c = f (n, k) This is because the marginals are all the same between k and p as k, p ∈ [ℓ, r].

36

slide-38
SLIDE 38

Summary - Solving DP Problems

How do you solve a hard DP problem?

  • 1. Understand the problem. (This is sometimes hard!)
  • 2. Make observations about the problem. These range from simple to complex observations.
  • 3. Formulate a dynamic program by exploiting the recursive substructure you observed.
  • 4. Make further observations about the underlying structure of the dynamic program.
  • 5. Optimize the runtime with one of these common techniques:
  • Range min/max query structures,
  • Knuth Optimization,
  • Divide and Conquer Optimization,
  • Convex Lower/Upper Envelope,
  • Sophisticated Binary Search (AKA the “Alien’s trick”).

37

slide-39
SLIDE 39

Fun Things to do This Weekend

Jack’s recommendation: Contagion

38

slide-40
SLIDE 40

Fun Things to do This Weekend

David’s recommendation: Falling Down

39

slide-41
SLIDE 41

Fun Things to do This Weekend

Lucca’s recommendation: ✭✭✭✭✭ La La Land Moonlight

40