CS3000: Algorithms & Data Jonathan Ullman Midterm Info Lecture - - PowerPoint PPT Presentation

cs3000 algorithms data
SMART_READER_LITE
LIVE PREVIEW

CS3000: Algorithms & Data Jonathan Ullman Midterm Info Lecture - - PowerPoint PPT Presentation

CS3000: Algorithms & Data Jonathan Ullman Midterm Info Lecture 8: Dynamic Programming: RNA Folding, Practice Feb 3, 2020 Examples Dynamic Programming Choose a subset Interval One variable recurrence Scheduling Partition the line


slide-1
SLIDE 1

CS3000: Algorithms & Data Jonathan Ullman

Lecture 8:

  • Dynamic Programming: RNA Folding, Practice

Feb 3, 2020

Midterm Info

slide-2
SLIDE 2

Dynamic Programming

Examples

Choose a

subset

Interval

Scheduling

One variable

recurrence

Partition the line into intervals

Segmented Least Squares

Choose a subset

Knapsack

Two arable

recurrence

Choose the last piece of

Edit Distance Alignments

theatrgnnert

Choose

a subset

Concert Scheduling

  • ne variable

recurrence

RNA Folding

Parr up items Two variable recurrence

slide-3
SLIDE 3

RNA Folding

slide-4
SLIDE 4

DNA

  • DNA is a string of four bases {A,C,G,T}
  • Two complementary strands of DNA stick together

and form a double helix

  • A—T and C—G are complementary pairs
slide-5
SLIDE 5

RNA Folding

  • RNA is a string of four bases {A,C,G,U}
  • A single RNA strand sticks to itself and folds into

complex structures

  • A—U and C—G are complementary pairs
slide-6
SLIDE 6

RNA Folding

  • RNA strand will try to minimize energy (form the

most bonds) subject to constraints

O O

O

O O

I

2

5

6

no crossing

00 O

he pours too close together

slide-7
SLIDE 7

RNA Folding

  • RNA is a string of bases !", … , !% ∈ ', (, ), *
  • The structure is given by a set of bonds + consisting
  • f pairs ,, - with , < -
  • (Complements) Only / − 1 or 2 − 3 can be paired
  • (Matching) No base 45 is in two pairs in +
  • (No Sharp Turns) If ,, - ∈ +, then , < - − 4
  • (Non-Crossing) If ,, - , 7, ℓ ∈ + then it cannot be the

case that , < 7 < - < ℓ

  • f

in

slide-8
SLIDE 8

RNA Folding

  • Input: RNA sequence !", … , !% ∈ /, 2, 3, 1
  • Output: A set of pairs + ⊆ 1, … , ; × 1, … , ;
  • Goal: maximize the size of +
  • (Complements) Only / − 1 or 2 − 3 can be paired
  • (Matching) No base 45 is in two pairs in +
  • (No Sharp Turns) If ,, - ∈ +, then , < - − 4
  • (Non-Crossing) If ,, - , 7, ℓ ∈ + then it cannot be the

case that , < 7 < - < ℓ

slide-9
SLIDE 9

Dynamic Programming

  • Let = be the optimal set of pairs for 4> ⋯ 4@
  • Case 1: = does not include any pair involving ;
  • Case 2: = has ; pair with some A < ; − 4 in =

O is the optimal solutionusing by ibn

i

is

It

n

t

the optimal soliton using bi

be

t the optimal solution using beet

bn

I

  • ptimal here

3

AM

A

  • ptimal

n f

n

here

slide-10
SLIDE 10

Dynamic Programming

  • Let =5,B be the optimal set of pairs for 45 ⋯ 4

B

  • Case 1: =5,B does not include any pair involving -
  • Case 2: =5,B has - pair with some A < - − 4 in =
slide-11
SLIDE 11

Dynamic Programming

  • Let OPT ,, - be the opt. number of pairs for 45 ⋯ 4

B

  • Case 1: - pairs with nothing
  • Case 2: - pairs with A < - − 4

Kien

itycj.cn

L

subproblem

bonds

mom

OPT

i j

OPT

i j

1

OPT

i j

It

0PT

i

t

1

OPT t 11 j

l

slide-12
SLIDE 12

Dynamic Programming

  • Let OPT ,, - be the opt. number of pairs for 45 ⋯ 4

B

  • Case 1: - pairs with nothing
  • Case 2: - pairs with A < - − 4

Recurrence: OPT ,, - = max OPT ,, - − 1 , max OPT ,, A − 1 + OPT A + 1, - − 1 Base Cases: OPT ,, - = 0 if , ≥ - − 4

Maximum over all A such that

  • , ≤ A < - − 4
  • 4N, 4

B are compatible bases

Max EA B

hyrax

At

felt

slide-13
SLIDE 13

Filling the Table

Recurrence:

OPT ,, - = max OPT ,, - − 1 , max

OPPQROSPT N OPT ,, A − 1 + OPT A + 1, - − 1

6 7 8 j = 9 4 3 2 i = 1 Sequence: /22331/31

9

ACCGGU

ACCGGUA

it

2

CCGGUA

CCGGUAG

2 8

CGGUAG

CGGUAGU

3 9

2 8

GGUAGU ACCGGUAG

8

3 7

4 7

CCGGUAGU

2 9

O L

L

I

I

1 I

2

slide-14
SLIDE 14

RNA Folding Summary

  • Compute the optimal RNA folding in time = ;V

and space = ;W

  • Dynamic Programming:
  • Decide on an optimal pair 4N − 4@
  • Remaining RNA is two non-overlapping pieces
  • Adding variables: one subproblem for each interval
  • Non-crossing is critical
  • Think about how the dynamic programming algorithm

changes if we remove each of the conditions

slide-15
SLIDE 15

Midterm I Review

slide-16
SLIDE 16

Midterm I Topics

  • Fundamentals:
  • Induction
  • Asymptotics
  • Recurrences
  • Stable Matching
  • Divide and Conquer
  • Dynamic Programming

Last year's midterm will be online

Cheatheets

One 8 11 page

Double sided

Typed

  • r handunteer

mammoths

use the

Hu tempore

  • r

Hpt fort

slide-17
SLIDE 17

Topics: Induction

  • Proof by Induction:
  • Mathematical formulas, e.g. ∑

,

@ 5Y> = @ @Z> W

  • Spot the bug
  • Solutions to recurrences
  • Correctness of divide-and-conquer algorithms
  • Good way to study:
  • Lehman-Leighton-Meyer, Mathematics for CS
  • Review divide-and-conquer in Kleinberg-Tardos

Link to the

  • nthe website
slide-18
SLIDE 18

Practice Question: Induction

  • Suppose you have an unlimited supply of 3 and 7 cent coins,

prove by induction that you can make any amount ; ≥ 12.

slide-19
SLIDE 19

Topics: Asymptotics

  • Asymptotic Notation
  • \, =, ], Ω, Θ
  • Relationships between common function types
  • Good way to study:
  • Kleinberg-Tardos Chapter 2

Jeff

Erickson Book

Also linked online

slide-20
SLIDE 20

Notation … means … Think… E.g. f(n)=O(n) ∃a > 0, ;c > 0, ∀; ≥ ;c: 0 ≤ f ; ≤ ag(;) At most “≤” 100n2 = O(n3) f(n)=W(g(n)) ∃a > 0, ;c > 0, ∀; ≥ ;c: 0 ≤ ag ; ≤ f(;) At least “≥” 2n = W(n100) f(n)=Q(g(n)) f ; = = g ; and f ; = j(g ; ) Equals “=” log(n!) = Q(n log n) f(n)=o(g(n)) ∀a > 0, ∃;c > 0, ∀; ≥ ;c: 0 ≤ f ; < ag(;) Less than “<” n2 = o(2n) f(n)=w(g(n)) ∀a > 0, ∃;c > 0, ∀; ≥ ;c: 0 ≤ ag ; < f(;) Greater than “>” n2 = w(log n)

Topics: Asymptotics

slide-21
SLIDE 21
  • Constant factors can be ignored
  • ∀2 > 0 2; = = ;
  • Smaller exponents are Big-Oh of larger exponents
  • ∀k > 4 ;l = = ;m
  • Any logarithm is Big-Oh of any polynomial
  • ∀k, n > 0 logW

m ; = = ;r

  • Any polynomial is Big-Oh of any exponential
  • ∀ k > 0, 4 > 1 ;m = = 4@
  • Lower order terms can be dropped
  • ;W + ;V/W + ; = = ;W

Topics: Asymptotics

slide-22
SLIDE 22

Practice Question: Asymptotics

  • Put these functions in order so that f

5 = = f 5Z>

  • ;PQtu v
  • 8PQtu @
  • 2V PQtu PQtu @
  • 2 PQtu @ u

,

@ 5Y>

  • ;W logW ;

2.882

2.804

n

n

Z

3

n

n

slide-23
SLIDE 23

Practice Question: Asymptotics

  • Suppose f

> = = g and f W = = g .

Prove that f

> + f W = = g .

slide-24
SLIDE 24

Topics: Recurrences

  • Recurrences
  • Representing running time by a recurrence
  • Solving common recurrences
  • Master Theorem
  • Good way to study:
  • Erickson book
  • Kleinberg-Tardos divide-and-conquer chapter

1

n

1 Fo

t TLE

in

Drawing the recursion tree

1

n

TIE

T E

in

g TIE

tn

slide-25
SLIDE 25

Practice Question: Recurrences

  • Write a recurrence for the running time of this algorithm.

Write the asymptotic running time given by the recurrence.

F(n): For i = 1,…,n2: Print “Hi” For i = 1,…,3: F(n/3) 1In

n2t3T

slide-26
SLIDE 26

Topics: Recurrences

  • Consder the recurrence x ; =

;

  • ⋅ x

;

  • + ;

with x 1 = 1. Solve using a recursion tree.

TIM

n loglogn

T 27

2

T 2

log127

slide-27
SLIDE 27

Topics: Divide-and-Conquer

  • Divide-and-Conquer
  • Writing pseudocode
  • Proving correctness by induction
  • Analyzing running time via recurrences
  • Examples we’ve studied:
  • Mergesort, Binary Search, Karatsuba’s, Selection
  • Good way to study:
  • Example problems from Kleinberg-Tardos or Erickson
  • Practice, practice, practice!

Good discussionof

pseudocode

slide-28
SLIDE 28

Topics: Dynamic Programming

  • Dynamic Programming
  • Identify sub-problems
  • Write a recurrence, ={x ; = max |@ + ={x ; − 6 , ={x(; − 1)
  • Fill the dynamic programming table
  • Find the optimal solution
  • Analyze running time
  • Good way to study:
  • Example problems from Kleinberg-Tardos or Erickson
  • Practice, practice, practice!
slide-29
SLIDE 29

Practice Question

  • Design an =(;)-time algorithm that takes an array

/[1: ;] and returns a sorted array containing the smallest ;

  • elements of /
slide-30
SLIDE 30

Practice Question

  • Consider the following sorting algorithm
  • Prove that it is correct
  • Analyze its running time

A[1:n] is a global array SillySort(1,n): if (n <= 2): put A in order else: SillySort(1,2n/3) SillySort(n/3,n) SillySort(1,2n/3)

slide-31
SLIDE 31

Dynamic Programming Practice

slide-32
SLIDE 32

Chocolate Bar Splitting

  • Input: A chocolate bar with ; × € pieces
  • Output: The minimum number of cuts needed to

divide the block into perfect squares

ki

slide-33
SLIDE 33

Chocolate Bar Splitting

slide-34
SLIDE 34

Vankin’s Mile

  • Input: An ; × ; board of numbers
  • Rules:
  • Place a chip on the board
  • Keep moving the tile down or right until you fall off
  • Score = sum of the numbers your chip visited
  • Output: The best possible strategy

O

O

OO

O 6

score

10