CS3000: Algorithms & Data Jonathan Ullman Lecture 8: Dynamic - - PowerPoint PPT Presentation

cs3000 algorithms data jonathan ullman
SMART_READER_LITE
LIVE PREVIEW

CS3000: Algorithms & Data Jonathan Ullman Lecture 8: Dynamic - - PowerPoint PPT Presentation

CS3000: Algorithms & Data Jonathan Ullman Lecture 8: Dynamic Programming: RNA Folding, Practice Feb 3, 2020 RNA Folding DNA DNA is a string of four bases {A,C,G,T} Two complementary strands of DNA stick together and form a


slide-1
SLIDE 1

CS3000: Algorithms & Data Jonathan Ullman

Lecture 8:

  • Dynamic Programming: RNA Folding, Practice

Feb 3, 2020

slide-2
SLIDE 2

RNA Folding

slide-3
SLIDE 3

DNA

  • DNA is a string of four bases {A,C,G,T}
  • Two complementary strands of DNA stick together

and form a double helix

  • A—T and C—G are complementary pairs
slide-4
SLIDE 4

RNA Folding

  • RNA is a string of four bases {A,C,G,U}
  • A single RNA strand sticks to itself and folds into

complex structures

  • A—U and C—G are complementary pairs
slide-5
SLIDE 5

RNA Folding

  • RNA strand will try to minimize energy (form the

most bonds) subject to constraints

slide-6
SLIDE 6

RNA Folding

  • RNA is a string of bases 𝒄𝟐, … , 𝒄𝒐 ∈ 𝑩, 𝑫, 𝑯, 𝑽
  • The structure is given by a set of bonds 𝑇 consisting
  • f pairs 𝑗, 𝑘 with 𝑗 < 𝑘
  • (Complements) Only 𝐵 − 𝑉 or 𝐷 − 𝐻 can be paired
  • (Matching) No base 𝑐5 is in two pairs in 𝑇
  • (No Sharp Turns) If 𝑗, 𝑘 ∈ 𝑇, then 𝑗 < 𝑘 − 4
  • (Non-Crossing) If 𝑗, 𝑘 , 𝑙, ℓ ∈ 𝑇 then it cannot be the

case that 𝑗 < 𝑙 < 𝑘 < ℓ

slide-7
SLIDE 7

RNA Folding

  • Input: RNA sequence 𝒄𝟐, … , 𝒄𝒐 ∈ 𝐵, 𝐷, 𝐻, 𝑉
  • Output: A set of pairs 𝑇 ⊆ 1, … , 𝑜 × 1, … , 𝑜
  • Goal: maximize the size of 𝑇
  • (Complements) Only 𝐵 − 𝑉 or 𝐷 − 𝐻 can be paired
  • (Matching) No base 𝑐5 is in two pairs in 𝑇
  • (No Sharp Turns) If 𝑗, 𝑘 ∈ 𝑇, then 𝑗 < 𝑘 − 4
  • (Non-Crossing) If 𝑗, 𝑘 , 𝑙, ℓ ∈ 𝑇 then it cannot be the

case that 𝑗 < 𝑙 < 𝑘 < ℓ

slide-8
SLIDE 8

Dynamic Programming

  • Let 𝑃 be the optimal set of pairs for 𝑐> ⋯ 𝑐@
  • Case 1: 𝑃 does not include any pair involving 𝑜
  • Case 2: 𝑃 has 𝑜 pair with some 𝑢 < 𝑜 − 4 in 𝑃
slide-9
SLIDE 9

Dynamic Programming

  • Let 𝑃5,B be the optimal set of pairs for 𝑐5 ⋯ 𝑐

B

  • Case 1: 𝑃5,B does not include any pair involving 𝑘
  • Case 2: 𝑃5,B has 𝑘 pair with some 𝑢 < 𝑘 − 4 in 𝑃
slide-10
SLIDE 10

Dynamic Programming

  • Let OPT 𝑗, 𝑘 be the opt. number of pairs for 𝑐5 ⋯ 𝑐

B

  • Case 1: 𝑘 pairs with nothing
  • Case 2: 𝑘 pairs with 𝑢 < 𝑘 − 4
slide-11
SLIDE 11

Dynamic Programming

  • Let OPT 𝑗, 𝑘 be the opt. number of pairs for 𝑐5 ⋯ 𝑐

B

  • Case 1: 𝑘 pairs with nothing
  • Case 2: 𝑘 pairs with 𝑢 < 𝑘 − 4

Recurrence: OPT 𝑗, 𝑘 = max OPT 𝑗, 𝑘 − 1 , max OPT 𝑗, 𝑢 − 1 + OPT 𝑢 + 1, 𝑘 − 1 Base Cases: OPT 𝑗, 𝑘 = 0 if 𝑗 ≥ 𝑘 − 4

Maximum over all 𝑢 such that

  • 𝑗 ≤ 𝑢 < 𝑘 − 4
  • 𝑐N, 𝑐

B are compatible bases

slide-12
SLIDE 12

Filling the Table

Recurrence:

OPT 𝑗, 𝑘 = max OPT 𝑗, 𝑘 − 1 , max

OPPQROSPT N OPT 𝑗, 𝑢 − 1 + OPT 𝑢 + 1, 𝑘 − 1

6 7 8 j = 9 4 3 2 i = 1 Sequence: 𝐵𝐷𝐷𝐻𝐻𝑉𝐵𝐻𝑉

slide-13
SLIDE 13

RNA Folding Summary

  • Compute the optimal RNA folding in time 𝑃 𝑜V

and space 𝑃 𝑜W

  • Dynamic Programming:
  • Decide on an optimal pair 𝑐N − 𝑐@
  • Remaining RNA is two non-overlapping pieces
  • Adding variables: one subproblem for each interval
  • Non-crossing is critical
  • Think about how the dynamic programming algorithm

changes if we remove each of the conditions

slide-14
SLIDE 14

Dynamic Programming Practice

slide-15
SLIDE 15

Midterm I Review

slide-16
SLIDE 16

Midterm I Topics

  • Fundamentals:
  • Induction
  • Asymptotics
  • Recurrences
  • Stable Matching
  • Divide and Conquer
  • Dynamic Programming
slide-17
SLIDE 17

Topics: Induction

  • Proof by Induction:
  • Mathematical formulas, e.g. ∑

𝑗

@ 5Y> = @ @Z> W

  • Spot the bug
  • Solutions to recurrences
  • Correctness of divide-and-conquer algorithms
  • Good way to study:
  • Lehman-Leighton-Meyer, Mathematics for CS
  • Review divide-and-conquer in Kleinberg-Tardos
slide-18
SLIDE 18

Practice Question: Induction

  • Suppose you have an unlimited supply of 3 and 7 cent coins,

prove by induction that you can make any amount 𝑜 ≥ 12.

slide-19
SLIDE 19

Topics: Asymptotics

  • Asymptotic Notation
  • 𝑝, 𝑃, 𝜕, Ω, Θ
  • Relationships between common function types
  • Good way to study:
  • Kleinberg-Tardos Chapter 2
slide-20
SLIDE 20

Notation … means … Think… E.g. f(n)=O(n) ∃𝑑 > 0, 𝑜c > 0, ∀𝑜 ≥ 𝑜c: 0 ≤ 𝑔 𝑜 ≤ 𝑑𝑕(𝑜) At most “≤” 100n2 = O(n3) f(n)=W(g(n)) ∃𝑑 > 0, 𝑜c > 0, ∀𝑜 ≥ 𝑜c: 0 ≤ 𝑑𝑕 𝑜 ≤ 𝑔(𝑜) At least “≥” 2n = W(n100) f(n)=Q(g(n)) 𝑔 𝑜 = 𝑃 𝑕 𝑜 and 𝑔 𝑜 = 𝛻(𝑕 𝑜 ) Equals “=” log(n!) = Q(n log n) f(n)=o(g(n)) ∀𝑑 > 0, ∃𝑜c > 0, ∀𝑜 ≥ 𝑜c: 0 ≤ 𝑔 𝑜 < 𝑑𝑕(𝑜) Less than “<” n2 = o(2n) f(n)=w(g(n)) ∀𝑑 > 0, ∃𝑜c > 0, ∀𝑜 ≥ 𝑜c: 0 ≤ 𝑑𝑕 𝑜 < 𝑔(𝑜) Greater than “>” n2 = w(log n)

Topics: Asymptotics

slide-21
SLIDE 21
  • Constant factors can be ignored
  • ∀𝐷 > 0 𝐷𝑜 = 𝑃 𝑜
  • Smaller exponents are Big-Oh of larger exponents
  • ∀𝑏 > 𝑐 𝑜l = 𝑃 𝑜m
  • Any logarithm is Big-Oh of any polynomial
  • ∀𝑏, 𝜁 > 0 logW

m 𝑜 = 𝑃 𝑜r

  • Any polynomial is Big-Oh of any exponential
  • ∀ 𝑏 > 0, 𝑐 > 1 𝑜m = 𝑃 𝑐@
  • Lower order terms can be dropped
  • 𝑜W + 𝑜V/W + 𝑜 = 𝑃 𝑜W

Topics: Asymptotics

slide-22
SLIDE 22

Practice Question: Asymptotics

  • Put these functions in order so that 𝑔

5 = 𝑃 𝑔 5Z>

  • 𝑜PQtu v
  • 8PQtu @
  • 2V PQtu PQtu @
  • 2 PQtu @ u

𝑗

@ 5Y>

  • 𝑜W logW 𝑜
slide-23
SLIDE 23

Practice Question: Asymptotics

  • Suppose 𝑔

> = 𝑃 𝑕 and 𝑔 W = 𝑃 𝑕 .

Prove that 𝑔

> + 𝑔 W = 𝑃 𝑕 .

slide-24
SLIDE 24

Topics: Recurrences

  • Recurrences
  • Representing running time by a recurrence
  • Solving common recurrences
  • Master Theorem
  • Good way to study:
  • Erickson book
  • Kleinberg-Tardos divide-and-conquer chapter
slide-25
SLIDE 25

Practice Question: Recurrences

  • Write a recurrence for the running time of this algorithm.

Write the asymptotic running time given by the recurrence.

F(n): For i = 1,…,n2: Print “Hi” For i = 1,…,3: F(n/3)

slide-26
SLIDE 26

Topics: Recurrences

  • Consder the recurrence 𝑈 𝑜 =

𝑜

  • ⋅ 𝑈

𝑜

  • + 𝑜

with 𝑈 1 = 1. Solve using a recursion tree.

slide-27
SLIDE 27

Topics: Divide-and-Conquer

  • Divide-and-Conquer
  • Writing pseudocode
  • Proving correctness by induction
  • Analyzing running time via recurrences
  • Examples we’ve studied:
  • Mergesort, Binary Search, Karatsuba’s, Selection
  • Good way to study:
  • Example problems from Kleinberg-Tardos or Erickson
  • Practice, practice, practice!
slide-28
SLIDE 28

Topics: Dynamic Programming

  • Dynamic Programming
  • Identify sub-problems
  • Write a recurrence, 𝑃𝑄𝑈 𝑜 = max 𝑤@ + 𝑃𝑄𝑈 𝑜 − 6 , 𝑃𝑄𝑈(𝑜 − 1)
  • Fill the dynamic programming table
  • Find the optimal solution
  • Analyze running time
  • Good way to study:
  • Example problems from Kleinberg-Tardos or Erickson
  • Practice, practice, practice!
slide-29
SLIDE 29

Practice Question

  • Design an 𝑃(𝑜)-time algorithm that takes an array

𝐵[1: 𝑜] and returns a sorted array containing the smallest 𝑜

  • elements of 𝐵
slide-30
SLIDE 30

Practice Question

  • Consider the following sorting algorithm
  • Prove that it is correct
  • Analyze its running time

A[1:n] is a global array SillySort(1,n): if (n <= 2): put A in order else: SillySort(1,2n/3) SillySort(n/3,n) SillySort(1,2n/3)

slide-31
SLIDE 31

Dynamic Programming Practice

slide-32
SLIDE 32

Chocolate Bar Splitting

  • Input: A chocolate bar with 𝑜 × 𝑛 pieces
  • Output: The minimum number of cuts needed to

divide the block into perfect squares

slide-33
SLIDE 33

Chocolate Bar Splitting

slide-34
SLIDE 34

Vankin’s Mile

  • Input: An 𝑜 × 𝑜 board of numbers
  • Rules:
  • Place a chip on the board
  • Keep moving the tile down or right until you fall off
  • Score = sum of the numbers your chip visited
  • Output: The best possible strategy