02110
play

02110 String indexing Computational geometry Introduction to - PowerPoint PPT Presentation

Overview Balanced binary search trees: Red-black trees and 2-3-4 trees Amortized analysis Dynamic programming Network flows String matching 02110 String indexing Computational geometry Introduction to NP-completeness


  1. Overview • Balanced binary search trees: Red-black trees and 2-3-4 trees • Amortized analysis • Dynamic programming • Network flows • String matching 02110 • String indexing • Computational geometry • Introduction to NP-completeness Inge Li Gørtz • Randomized algorithms Balanced binary search trees Splay trees • 2-3-4 trees. • Self-adjusting BST (Sleator-Tarjan 1983). E R larger than R • Allow 1, 2, or 3 keys per node smaller than E • Most frequently accessed nodes are close to the root. • Perfect balance. Every path from root to between leaf has same length. • Tree reorganizes itself after each operation. E and R A A C H I N S • After access to a node it is moved to the root by splay operation. • Worst case time for insertion, deletion and search is O(n) . • Amortized time per operation O(log n) . • Red-black trees. E • The root is always black R A • All root-to-leaf paths have the same number of black nodes. A C I S • Red nodes do not have red children H N • All leaves (NIL) are black 3

  2. Splaying Splaying • Splay(x): do following rotations until x is the root. Let y be the parent of x. • Splay(x): do following rotations until x is the root. Let p(x) be the parent of x. • right (or left): if x has no grandparent. • right (or left): if x has no grandparent. • zig-zag (or zag-zig): if one of x,p(x) is a left child and the other is a right child. z z x y x w x w z right y x d d x w a c left a c a b c d a b b c b c a b right rotation at x (and left rotation at y) zig-zag at x Splaying Dynamic set implementations Worst case running times (except splay trees) • Splay(x): do following rotations until x is the root. Let y be the parent of x. • right (or left): if x has no grandparent. Implementation search insert delete minimum maximum successor predecessor • zig-zag (or zag-zig): if one of x,y is a left child and the other is a right child. linked lists O(n) O(1) O(1) O(n) O(n) O(n) O(n) • roller-coaster: if x and p(x) are either both left children or both right ordered array O(log n) O(n) O(n) O(1) O(1) O(log n) O(log n) children. BST O(h) O(h) O(h) O(h) O(h) O(h) O(h) 2-3-4 tree O(log n) O(log n) O(log n) O(log n) O(log n) O(log n) O(log n) z y x y x z y red-black tree O(log n) O(log n) O(log n) O(log n) O(log n) O(log n) O(log n) d a x z splay tree O(log n) † O(log n) † O(log n) † O(log n) † O(log n) † O(log n) † O(log n) † a b c d c b †: amortized running time a b c d right roller-coaster at x (and left roller-coaster at z) 8

  3. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Dynamic Programming Amortized analysis • General algorithmic technique • Can be used when the problem have “optimal substructure”: solution can be constructed from optimal solutions to subproblems. • Amortized analysis. • Time required to perform a sequence of data operations is averaged over all the operations performed. • Example: dynamic tables with doubling and halving • Examples • If the table is full copy the elements to a new array of double size. • Rod cutting • If the table is a quarter full copy the elements to a new array of half the size. • Longest common subsequence • Worst case time for insertion or deletion: O(n) • Sequence alignment • Amortized time for insertion and deletion: O(1) • All pairs shortest path • Any sequence of n insertions and deletions takes time O(n). • Methods. • Aggregate method • Accounting method • Potential method Longest common subsequence Longest common subsequence • subproblem property: • subproblem property: X i-1 x i X i-1 x i y j y j Y j-1 Y j-1   0 if i = 0 or j = 0 0 if i = 0 or j = 0     LCS( X i , Y j ) = LCS( X i − 1 , Y j − 1 ) + 1 if x i = y j LCS( X i , Y j ) = LCS( X i − 1 , Y j − 1 ) + 1 if x i = y j   max(LCS( X i , Y j − 1 ) , LCS( X i − 1 , Y j )) if x i 6 = y j max(LCS( X i , Y j − 1 ) , LCS( X i − 1 , Y j )) if x i 6 = y j   B A N A N A S B A N A N A S B A N A N A S S S 0 0 0 0 0 0 1 S � Depends on A A 0 1 1 1 1 1 1 A � N N 0 1 2 2 2 2 2 N � Value, not solution D D 0 1 2 2 2 2 2 D � LCS( X 5 , Y 4 ) A A 0 1 2 3 3 3 3 A � L L 0 1 2 3 3 3 3 L S S 0 1 2 3 3 3 4 S �

  4. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Longest common subsequence Network Flow • subproblem property: 1 • Network flow: X i-1 x i 2 • graph G=(V,E). 2 2 y j Y j-1 s 1 t • Special vertices s (source) and t (sink). 2 2 2 • Every edge (u,v) has a capacity c(u,v) ≥ 0.  0 if i = 0 or j = 0  • Flow:  1 LCS( X i , Y j ) = LCS( X i − 1 , Y j − 1 ) + 1 if x i = y j • capacity constraint: every edge e has a flow 0 ≤ f(u,v) ≤ c(u,v).  max(LCS( X i , Y j − 1 ) , LCS( X i − 1 , Y j )) if x i 6 = y j  • flow conservation: for all u ≠ s, t: flow into u equals flow out of u. X X f ( v, u ) = f ( u, v ) u B A N A N A S B A N A N A S v :( v,u ) ∈ E v :( u,v ) ∈ E S 0 0 0 0 0 0 1 S � • Value of flow f is the sum of flows out of s minus sum of flows into s: A 0 1 1 1 1 1 1 A � X X N 0 1 2 2 2 2 2 N � | f | = f ( s, v ) − f ( v, s ) D 0 1 2 2 2 2 2 D � v :( s,v ) ∈ E v :( v,s ) ∈ E • Maximum flow problem: find s-t flow of maximum value A 0 1 2 3 3 3 3 A � L 0 1 2 3 3 3 3 L S 0 1 2 3 3 3 4 S � Augmenting paths Network flow: s-t Cuts • Augmenting path (definition di ff erent than in CLRS): s-t path where • forward edges have leftover capacity • backwards edges have positive flow • Cut: Partition of vertices into S and T, such that s ∈ S and t ∈ T. - δ + δ S T + δ + δ - δ - δ s t f 1 < c 1 f 2 > 0 f 3 < c 3 f 4 < c 4 f 5 > 0 f 6 > 0 t s • There is no augmenting path <=> f is a maximum flow. • Ford-Fulkerson algorithm: • Repeatedly find augmenting path, use it, until no augmenting path exists • Capacity of cut: total capacity of edges going from S to T. • Running time: O(|f*| m). • Flow across cut: flow from S to T minus flow from T to S. • Edmonds-Karp algorithm: • Value of flow any flow |f| ≤ c(S,T) for any s-t cut (S,T). • Repeatedly find shortest augmenting path, use it, until no augmenting path exists • Suppose we have found flow f and cut (S,T) such that |f| = c(S,T). Then f is a • Use BFS to find a shortest augmenting path. maximum flow and (S,T) is a minimum cut. • Running time: O(nm 2 ) • Find minimum cut. All vertices to which there is an augmenting path from s goes into S, rest into T.

  5. Network flow String Matching • Can model and solve many problems via maximum flow. • String matching problem: • Maximum bipartite matching • string T (text) and string P (pattern) over an alphabet Σ . |T| = n, |P| = m. • k edge-disjoint paths • Report all starting positions of occurrences of P in T. • capacities on vertices • String matching automaton. Running time: O(n + m| Σ |) • Many sources/sinks • assignment problems: Example. X doctors, Y holidays, each doctor should work • Knuth-Morris-Pratt (KMP). Running time: O(m + n) at at most c holidays, each doctor is available at some of the holidays. • Rabin-Karp (fingerprinting). Running time: Expected O(m + n) 1 1 c c s t c Finite Automaton Finite Automaton • Finite automaton: alphabet Σ = {a,b,c}. P= ababaca. • Finite automaton: alphabet Σ = {a,b,c}. P= ababaca. a a accepting state accepting state starting state starting state a a a a a b a b a c a a b a b a c a b b a a b b longest prefix of P that is a suffix of ‘abaa'

  6. String Indexing Knuth-Morris-Pratt (KMP) • String indexing problem. Given a string S of characters from an alphabet Σ . Preprocess S into a data structure to support • Search(P): Return starting position of all occurrences of P in S. • Matched P[1…q]: Find longest block P[1..k] that matches end of P[2..q]. • Tries. a a a b a a a a b a b a a a a b a b a • Find longest prefix P[1...k] of P that is a proper su ffi x of P[1...q] • Array π [1…m]: • π [q] = max k < q such that P[1...k] is a su ffi x of P[1…q]. • Compressed trie. Chains of nodes with a single child merged into a single • Can be seen as finite automaton with failure links : node • Su ffi x tree. Compressed trie over all the su ffi xes of a string. a b a b a c a 6 1 2 3 4 5 i 1 2 3 4 5 6 7 π [i] 0 0 1 2 3 0 1 Suffix tree Suffix tree • Su ffi x tree. Compressed trie over all su ffi xes of a • Su ffi x tree. Compressed trie over all su ffi xes of a string. string. [6] $ a [1] na 6 [2,3] 6 banana [0,6] na [2,3] $ [6] 0 0 5 5 na $ [4,6] [6] 4 4 2 2 na $ [4,6 ] [6] 1 3 1 3 • Su ffi x trees can be used to solve the String indexing • Su ffi x trees can be used to solve the String indexing problem in: problem in: 0 1 2 3 4 5 6 b a n a n a $ • Space: O(n) • Space: O(n) • Search time: O(m+occ) • Search time: O(m+occ) • Preprocessing: O(sort(n,| Σ |)) time • Preprocessing: O(sort(n,| Σ |)) time

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend