CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 - PowerPoint PPT Presentation

CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 1

Course Outline Introduction and basic concepts • Asymptotic notation • Greedy algorithms • Graph theory • Amortized analysis • Recursion • Divide-and-conquer algorithms • Randomized algorithms • Dynamic programming algorithms • NP-completeness • 2

Dynamic Programming 3

Dynamic Programming Components Analyse the structure of an optimal solution • Separate one choice (usually the last) from a subproblem • Phrase the value of a choice as a function of the choice and the subproblem • Phrase an optimal solution as the value of the best choice • Usually a max/min result • Implement the calculation of the optimal value • Memoization: save optimal values as we compute them • Bottom-up: evaluate smaller problems and use them for bigger problems • Top-down: evaluate big problem by calling smaller problems recursively and • saving result Keep record of the choice made in each level • Rebuild the optimal solution from the optimal value result • 4

Knapsack Problem Algorithm Knapsack( 𝑥 , 𝑞 , 𝑁 ) – 𝑥 is array of weights, 𝑞 is array of values, 𝑁 is limit 𝑠 0, 𝑛 ← 0 , 𝑚 0, 𝑛 ← false for 𝑛 = 0,1,2, … , 𝑁 For 𝑗 ← 1 To 𝑥 Do For 𝑛 ← 1 To 𝑁 Do If 𝑥 𝑗 > 𝑛 Or 𝑠 𝑗 − 1, 𝑛 > 𝑠 𝑗 − 1, 𝑛 − 𝑥 𝑗 + 𝑞[𝑗] Then 𝑠 𝑗, 𝑛 ← 𝑠 𝑗 − 1, 𝑛 , 𝑚 𝑗, 𝑛 ← 𝑚[𝑗 − 1, 𝑛] Else + 𝑞[𝑗] , 𝑚 𝑗, 𝑛 ← 𝑗 𝑠 𝑗, 𝑛 ← 𝑠 𝑗 − 1, 𝑛 − 𝑥 𝑗 𝑡 ← ∅ , 𝑦 ← 𝑥 While 𝑁 > 0 And 𝑚[𝑦, 𝑁] is not false Do 𝑦 ← 𝑚[𝑦, 𝑁] , 𝑡 ← 𝑡 ∪ 𝑦 , 𝑁 ← 𝑁 − 𝑥 𝑦 , 𝑦 ← 𝑦 − 1 Return 𝑡 5

Knapsack Algorithm - Complexity What is the time complexity of the knapsack algorithm? • 𝑃(𝑜𝑋) (number of items times the weight limit) • This algorithm is called pseudo-polynomial • Time complexity is based on the value of the input, not just the size • There is no known polynomial algorithm to solve the knapsack problem • 6

Algorithm Strategies - Review Dynamic programming algorithms: • Choice is made based on evaluation of all possible results • Time and space complexity are usually higher • Greedy algorithms: • Choice is made based on locally optimal solution • Usually faster, but may not result in globally optimal solution • Divide and conquer algorithms: • Choice of input division is made based on assumption that merging result of • subproblems is optimal 7

Global Sequence Alignment Problem Problem: given two sequences, analyse how similar they are • Allow both gaps and mismatches • Application: • Finding suggestions for misspelled words (comparing strings) • Comparing files (diff) • Analyse if two pieces of DNA match • Example: “ ocurrance ” vs “occurrence” • There is a letter “c” missing (gap) • An “a” was used instead of an “e” (mismatch) • Mismatches may be seen as gaps in both sides • “ oc-urra-nce ” vs “ occurr-ence ” • 8

Formal Definition We represent a gap with a hyphen “ − ” • A sequence alignment of (𝑌, 𝑍) is a pair (𝑌 ′ , 𝑍 ′ ) of sequences, such that: • 𝑌 ′ minus the gaps is 𝑌 , 𝑍 ′ minus the gaps is 𝑍 • 𝑌 ′ = 𝑍 ′ (the size is the same for both sides) • ′ = − , then 𝑍 ′ ≠ − (you can’t have gaps on both sides) If 𝑌 𝑗 • 𝑗 A parameter 𝜀 > 0 defines the gap penalty (penalty if one side has a gap) • A parameter 𝛽 𝑞𝑟 defines the mismatch penalty of matching 𝑞 and 𝑟 ( 𝛽 𝑞𝑞 = 0 ) • 𝑌 ′ The cost of a matching (𝑌 ′ , 𝑍 ′ ) is 𝑗=0 𝑞𝑓𝑜(𝑦 𝑗 , 𝑧 𝑗 ) • 9

Finding the Best Alignment What is the choice to be made? • Last character could be a gap on either side, or a potential mismatch • Assume 𝐺(𝑗, 𝑘) is the penalty for the best alignment of 𝑦 1 . . 𝑦 𝑗 and 𝑧 1 . . 𝑧 𝑘 • 𝑘 ⋅ 𝜀 𝑗 = 0 𝑗 ⋅ 𝜀 𝑘 = 0 𝐺 𝑗, 𝑘 = min 𝐺 𝑗 − 1, 𝑘 − 1 + 𝛽 𝑦 𝑗 𝑧 𝑘 , 𝐺 𝑗 − 1, 𝑘 + 𝜀, 𝐺 𝑗, 𝑘 − 1 + 𝜀 otherwise 10

Algorithm (Smith-Wasserman) Algorithm SmithWasserman( 𝑌 , 𝑍 , 𝜀 , 𝛽 ) For 𝑗 ← 0 To |𝑌| Do 𝐺 𝑗, 0 ← 𝑗 ⋅ 𝜀 For 𝑘 ← 1 To |𝑍| Do 𝐺 0, 𝑘 ← 𝑘 ⋅ 𝜀 For 𝑗 ← 1 To |𝑌| Do -- matching cost 𝑛 ← 𝐺 𝑗 − 1, 𝑘 − 1 + 𝛽 𝑌 𝑗 , 𝑍 𝑘 𝑕 𝑦 ← 𝐺 𝑗, 𝑘 − 1 + 𝜀 , 𝑕 𝑧 ← 𝐺 𝑗 − 1, 𝑘 + 𝜀 -- gap penalty in 𝑌, 𝑍 If 𝑛 ≤ 𝑕 𝑦 And 𝑛 ≤ 𝑕 𝑧 Then 𝐺 𝑗, 𝑘 ← 𝑛 , 𝐼 𝑗, 𝑘 ← ”match” Else If 𝑕 𝑦 ≤ 𝑕 𝑧 Then 𝐺 𝑗, 𝑘 ← 𝑕 𝑦 , 𝐼 𝑗, 𝑘 ← ”gap in X” Else 𝐺 𝑗, 𝑘 ← 𝑕 𝑧 , 𝐼 𝑗, 𝑘 ← ”gap in Y” 11

Algorithm (cont.) … 𝑌 ′ ← “”, 𝑍 ′ ← “” 𝑗 ← 𝑛 , 𝑘 ← 𝑜 While 𝑗 > 0 Or 𝑘 > 0 Do If 𝐼 𝑗, 𝑘 = “match” Then 𝑌 ′ ← 𝑌 𝑗 . X′ , 𝑍 ′ ← 𝑍 𝑘 . Y′ 𝑗 ← 𝑗 − 1 , 𝑘 ← 𝑘 − 1 Else If 𝐼 𝑗, 𝑘 = “gap in X” Then 𝑌 ′ ← − . X′ , 𝑍 ′ ← 𝑍 𝑘 . Y′ 𝑘 ← 𝑘 − 1 Else 𝑌 ′ ← 𝑌 𝑗 . X′ , 𝑍 ′ ← − . Y′ 𝑗 ← 𝑗 − 1 Return 𝑌 ′ , 𝑍 ′ , 𝐺[𝑛, 𝑜] 12

Longest Common Subsequence Subsequence: any sequence of items that is contained in the original sequence in • the same order (but not necessarily consecutively) Example: 𝐶, 𝐷, 𝐸, 𝐶 is a subsequence of 𝐵, 𝑪, 𝑫, 𝐶, 𝑬, 𝐵, 𝑪 • Problem: Given two sequences 𝑌 and 𝑍 , find the longest common subsequence of • 𝑌 and 𝑍 Application: • Find common DNA sequences in different organisms • Video compression (inter-frame comparison) • 13

Characterizing the LCS Define 𝑌 𝑗 as the sequence 𝑌 limited to the first 𝑗 elements • Given two sequences 𝑌 = 𝑦 1 , . . , 𝑦 𝑛 and 𝑍 = 𝑧 1 , . . , 𝑧 𝑜 , let 𝑎 = 𝑨 1 , . . , 𝑨 𝑙 be the longest • common subsequence (LCS) of 𝑌 and 𝑍 If 𝑦 𝑛 = 𝑧 𝑜 , then 𝑨 𝑙 = 𝑦 𝑛 = 𝑧 𝑜 , and 𝑎 𝑙−1 is an LCS of 𝑌 𝑛−1 and 𝑍 • 𝑜−1 If 𝑦 𝑛 ≠ 𝑧 𝑜 , then 𝑎 is either an LCS of 𝑌 𝑛 and 𝑍 𝑜−1 , or an LCS of 𝑌 𝑛−1 and 𝑍 • 𝑜 Define the length of the LCS of 𝑌 𝑗 and 𝑍 𝑘 as: • 0 𝑗 = 0 ∨ 𝑘 = 0 𝑑 𝑗 − 1, 𝑘 − 1 + 1 𝑗, 𝑘 > 0 ∧ 𝑦 𝑗 = 𝑧 𝑘 𝑑 𝑗, 𝑘 = max{𝑑 𝑗, 𝑘 − 1 , 𝑑 𝑗 − 1, 𝑘 } otherwise 14

Algorithm Algorithm LongestCommonSubsequence( 𝑌 , 𝑍 ) For 𝑗 ← 0 To 𝑌 Do c[𝑗, 0] ← 0 For 𝑘 ← 1 To 𝑍 Do c[0, 𝑘] ← 0 For 𝑗 ← 1 To |𝑌| Do If 𝑌 𝑗 = 𝑍[𝑘] Then 𝑑 𝑗, 𝑘 ← 𝑑 𝑗 − 1, 𝑘 − 1 + 1 , ℎ 𝑗, 𝑘 ← “+” Else If 𝑑 𝑗 − 1, 𝑘 > 𝑑[𝑗, 𝑘 − 1] Then 𝑑 𝑗, 𝑘 ← 𝑑[𝑗 − 1, 𝑘] , ℎ 𝑗, 𝑘 ← “X” Else 𝑑 𝑗, 𝑘 ← 𝑑[𝑗, 𝑘 − 1] , ℎ 𝑗, 𝑘 ← “Y” PrintLCS( 𝑌 , ℎ , |𝑌| , |𝑍| ) Return 𝑑 𝑌 , 𝑍 15

Algorithm (cont.) Algorithm PrintLCS( ℎ , 𝑌 , 𝑗 , 𝑘 ) If 𝑗 = 0 Or 𝑘 = 0 Then Return If ℎ 𝑗, 𝑘 = “+” Then PrintLCS( ℎ , 𝑌 , 𝑗 − 1 , 𝑘 − 1 ) Print 𝑌[𝑗] Else If ℎ 𝑗, 𝑘 = “X” Then PrintLCS( ℎ , 𝑌 , 𝑗 − 1 , 𝑘 ) Else PrintLCS( ℎ , 𝑌 , 𝑗 , 𝑘 − 1 ) 16

NP Complexity 17

Time Complexity for Decision Problems From this point on we analyse time complexity for problems, not algorithms • We want to know what is the best possible complexity for the problem • Our focus now is on decision problems, not optimization problems • Decision problems: Yes/No answer • Optimization: “find best”, “find maximum”, “find minimum” • We also need to distinguish “finding” and “checking” a solution • 18

Time Complexity - Classes A problem is solvable in polynomial time if there is an algorithm that solves it, that • runs in 𝑃 𝑜 𝑙 , where 𝑙 ∈ Θ 1 and 𝑜 is the size of the input representation Example: sort ( 𝑃 𝑜 log 𝑜 ⊂ 𝑃 𝑜 2 ), select ( 𝑃(𝑜) ), longest common subsequence • ( 𝑃(𝑜 2 ) ), matrix multiplication ( 𝑃(𝑜 3 ) or better) P: set of all decision problems that are solvable in polynomial time • NP (non-deterministic P): set of all decision problems for which a given certificate • can be checked in polynomial time 19

Example: Hamiltonian Path Problem: given a graph, is there a path that goes through every node exactly • once? Decision problem: answer is yes or no • Optimization problem: find a path with minimum cost, etc.; not required • Is this problem in NP? • Given a path, can we verify that the path is correct in polynomial time? • Is this problem in P? • Can we solve it in polynomial time? • 20

CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 - PowerPoint PPT Presentation

CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 1 Course Outline Introduction and basic concepts Asymptotic notation Greedy algorithms Graph theory Amortized analysis Recursion

CPSC 320: NP-Completeness CPSC 320 2013W2 CPSC 320: NP-Completeness Up to now: We have been

Amortized Analysis of Union/Find operations CPSC 320 2012W T1 1 Potential Function

CPSC 320: Intermediate Algorithm Design and Analysis Schedule transformation example Schedule

CPSC 320: Intermediate Algorithm Design and Analysis July 25, 2014 1 Course Outline

CPSC 320: Intermediate Algorithm Design and Analysis July 18, 2014 1 Course Outline

CPSC 320: Intermediate Algorithm Design and Analysis August 6, 2014 1 Schedule Monday: BC Day,

Intermediate forms: A-Normal Form Matt Might University of Utah www.ucombinator.org

BK9103 & BK9104 320 W Multi-Range DC Power Supplies BK9103 & BK9104 320 W Multi-Range DC

Custer Baker Intermediate School Welcome to Custer Baker Intermediate School Intermediate

Generalized Intermediate Value Theorem Intermediate Value Theorem Theorem Intermediate Value

Mars Climate Orbiter Shooting Down of Airbus 320 Purpose: to relay signals 1988 from the

Post Marathon report 22 320 General participant information 22 320 people people Total

Intermediate Capital Group PLC Half Year Results 30 September 2011 Intermediate Capital Group

Lecture Outline Intermediate Code & Intermediate code Local Optimizations Local

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Example r1 Free list In use On free list Slides of the CS 320 course by David Walker Example

How ( not ) to protect genomic data privacy in a distributed network: using trail re - identi fi

Data Structures and Algorithms Course at D-MATH (CSE) of ETH Zurich Spring 2020 1. Introduction

Principles and Applicaons of Modern Principles and Applicaons of Modern DNA Sequencing DNA

Non-Interactive Secure Computation from One-Way Functions Saikrishna Badrinarayanan Abhishek

CS3000: Algorithms & Data Jonathan Ullman Lecture 8: Dynamic Programming: RNA Folding,

Improving Access to Courts through Technology: Innovative Ideas for the STOP Formula Grants

Outline: Preeminent Over the Creation (15-17) Preeminent Over the Church (18a)

Outline: Preeminent Over the Creation (15-17) Preeminent Over the Church (18a)

CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 - PowerPoint PPT Presentation

CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 1 Course Outline Introduction and basic concepts Asymptotic notation Greedy algorithms Graph theory Amortized analysis Recursion

CPSC 320: NP-Completeness CPSC 320 2013W2 CPSC 320: NP-Completeness Up to now: We have been

Amortized Analysis of Union/Find operations CPSC 320 2012W T1 1 Potential Function

CPSC 320: Intermediate Algorithm Design and Analysis Schedule transformation example Schedule

CPSC 320: Intermediate Algorithm Design and Analysis July 25, 2014 1 Course Outline

CPSC 320: Intermediate Algorithm Design and Analysis July 18, 2014 1 Course Outline

CPSC 320: Intermediate Algorithm Design and Analysis August 6, 2014 1 Schedule Monday: BC Day,

Intermediate forms: A-Normal Form Matt Might University of Utah www.ucombinator.org

BK9103 &amp; BK9104 320 W Multi-Range DC Power Supplies BK9103 &amp; BK9104 320 W Multi-Range DC

Custer Baker Intermediate School Welcome to Custer Baker Intermediate School Intermediate

Generalized Intermediate Value Theorem Intermediate Value Theorem Theorem Intermediate Value

Mars Climate Orbiter Shooting Down of Airbus 320 Purpose: to relay signals 1988 from the

Post Marathon report 22 320 General participant information 22 320 people people Total

Intermediate Capital Group PLC Half Year Results 30 September 2011 Intermediate Capital Group

Lecture Outline Intermediate Code &amp; Intermediate code Local Optimizations Local

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Example r1 Free list In use On free list Slides of the CS 320 course by David Walker Example

How ( not ) to protect genomic data privacy in a distributed network: using trail re - identi fi

Data Structures and Algorithms Course at D-MATH (CSE) of ETH Zurich Spring 2020 1. Introduction

Principles and Applicaons of Modern Principles and Applicaons of Modern DNA Sequencing DNA

Non-Interactive Secure Computation from One-Way Functions Saikrishna Badrinarayanan Abhishek

CS3000: Algorithms &amp; Data Jonathan Ullman Lecture 8: Dynamic Programming: RNA Folding,

Improving Access to Courts through Technology: Innovative Ideas for the STOP Formula Grants

Outline: Preeminent Over the Creation (15-17) Preeminent Over the Church (18a)

Outline: Preeminent Over the Creation (15-17) Preeminent Over the Church (18a)

BK9103 & BK9104 320 W Multi-Range DC Power Supplies BK9103 & BK9104 320 W Multi-Range DC

Lecture Outline Intermediate Code & Intermediate code Local Optimizations Local

CS3000: Algorithms & Data Jonathan Ullman Lecture 8: Dynamic Programming: RNA Folding,