what comes next
play

What comes next? example for a hardness result: cross plain cross - PowerPoint PPT Presentation

What comes next? example for a hardness result: cross plain cross , all operations is Max SNP-hard (i.e. without the restriction w a = w r + w b ). 2 S.Will, 18.417, Fall 2011 Max-Cut-3 max cut v v v v v v


  1. → What comes next? example for a hardness result: cross × plain → cross , ’all operations’ is Max SNP-hard (i.e. without the restriction w a = w r + w b ). 2 S.Will, 18.417, Fall 2011

  2. → Max-Cut-3 max cut v v v v v v • formal: 2 2 2 3 3 3 • let G = ( V , E ) be a graph v v v v v v 1 1 1 4 4 4 • a cut in G is a set of edges s.t. there is a partition v v v v v v 5 5 5 6 6 6 V 1 ⊎ V 2 = V , where for every edge one endpoint is in V 1 , the other in V 2 . • Max-Cut-3: given graph g with degree ≤ 3, find cut with maximal cardinality. Theorem Max-Cut-3 is Max-SNP-hard Remark An optimization problem is Max-SNP-hard iff it does not have a PTAS (Polynomial Time Approximation Scheme). S.Will, 18.417, Fall 2011 A PTAS is an algorithm that takes an instance of a maximization problem and a parameter ǫ > 0 and, in polynomial time, produces a solution that is within a factor 1 − ǫ of being maximal.

  3. → Reduction of Max-Cut-3 to cross × plain → cross Reduction idea: represent Max-Cut-3 problem as alignment problem cross × plain → cross such that optimal alignment corresponds to maximum cut. → if Max-Cut-3 can be solved using the alignment problem, the alignment problem must also be Max-SNP-hard. Plan • show how to represent graph G as input of alignment problem (e.g. Sequences S 1 , S 2 + structure P 1 for S 1 ) • show how optimal alignment corresponds to maximum cut for G . S.Will, 18.417, Fall 2011

  4. → Representation of Graph G as Alignment Problem (Example) v v 2 3 v v 1 4 v v 5 6 S.Will, 18.417, Fall 2011 AAAUUU AAAUUU AAAUUU AAAUUU AAAUUU AAAUUU UUUAAA UUUAAA UUUAAA UUUAAA UUUAAA UUUAAA v v v v v v 1 2 3 4 5 6

  5. → Representation of Graph G as Alignment Problem (formally) v v 2 3 • given G = v v 1 4 v v 5 6 • sequences • S 1 = ( AAAUUU ( C ) c ) n − 1 AAAUUU , and • S 2 = ( UUUAAA ( C ) c ) n − 1 UUUAAA . • the segments AAAUUU in S 1 and UUUAAA in S 2 correspond to the nodes • each edge ( v i , v j ) of G corresponds to two arcs in P 1 : one connecting S.Will, 18.417, Fall 2011 an A of the i -th segment with a U of the j -th segment and one connecting a U of the i -th segment with an A of the j -th segment. • C s are used to avoid alignment of different segments, and their number c depends on the ratio min( w b , w a , w r ) ← arc changes w d ← base deletion

  6. → Correspondence of Optimal Alignment and Max Cut Properties of Optimal Alignment • we choose c such that every optimal alignment must match all C s • we choose a scoring with w m > w d and 2 w a > w b + w r . A A A U U U A A A U U U • w m > w d implies no base mismatch: > U U U A A A U U U A A A A A A U U U • two alignment types for each node v i : • A-type: U U U A A A A A A U U U • U-type: U U U A A A • A-type : ⇔ node in V 1 U-type : ⇔ node in V 2 . • cost for each edge of the cut ( v i and v j have different type) arc breaking S.Will, 18.417, Fall 2011 arc removing A A A U U U A A A U U U U U U A A A U U U A A A cost: w b + w r

  7. → Correspondence of Optimal Alignment and Max Cut • cost for each edge that is not in the cut ( v i and v j have same type) arc altering arc altering A A A U U U A A A U U U U U U A A A U U U A A A cost: 2 w a • total cost for alignment: • V 1 = all A-type nodes • V 2 = all U-type nodes • n nodes, each degree 3 ⇒ 3 n 2 edges • k := | cut( V 1 , V 2 ) | C = k ( w b + w r ) + (3 n 2 − k ) 2 w a + n 3 w d assumption: 2 w a > w b + w r > 0 ⇒ S.Will, 18.417, Fall 2011 � �� � = 3 n ( w a + w d ) − k (2 w a − w b − w r ) • ⇒ C minimal ≡ k maximal • ⇒ maximal cut ≡ minimal edit distance.

  8. → Approaches for Alignments of RNAs Plan C Plan A Plan B A: B: ALIGN FOLD single single sequences sequences A: A: B: simultanously FOLD ALIGN and FOLD B: alignment [Sankoff 85] ALIGN sequence AND structure A: B: consensus structure S.Will, 18.417, Fall 2011 A: adopted from: B: [Gardener & Giiegerich BMC 2004] consensus: consensus structure:

  9. → Simultaneous Alignment and Folding: Sankoff (1985) • What do we want? What means folding into a common structure? • First idea: preserve “shape” ≡ branching structure • Formally: let i 1 < i 2 . . . < i v in a and j 1 < j 2 . . . < j w in b be the positions in pairs that limit multiloops or are external ( branching configuration ) Then: structures equivalent (according to branching) iff v = w , and ( i f , i g ) ∈ P a if and only if ( j f , j g ) ∈ P b • finding good equivalent structures not sufficient: S.Will, 18.417, Fall 2011 • Hence: minimize edit distance + energies (of 2 equiv. structures)

  10. → Sankoff Problem Definition • Idea: Sankoff = Zuker Folding + Needleman/Wunsch Alignment • IN: two sequences a and b • find two equivalent structures P a and P b and compatible alignment A of a and b such that Energy ( a , P a ) + Energy ( b , P b ) + EditDistance ( A ) minimal • where: Energy yields (loop-based) Turner free energy, EditDistance is edit distance (base mismatch x, indel y) • what means compatible ? alignment must be “consistent” with branching structure S.Will, 18.417, Fall 2011 formally: the base pairs ( i f , i g ) ∈ P a and ( j f , j g ) ∈ P b (from Def. of equivalent) must be aligned to each other

  11. → Constraints We want to find the optimal structures + alignment with the following constraints: constraints on the predicted structures: • must be equivalent (intuitively: same kind of multiloops) constraints on the alignment: • multiloops must be aligned to their equivalent partner • hairpin loops must be aligned to their equivalent partner • each 2-loop (or stacking or bulge) must be aligned to exactly one other 2-loop or must be entirely aligned to a gap. S.Will, 18.417, Fall 2011

  12. → Edit distance of sub-sequences • distance based score x = base mismatch y = base deletion/insertion • D ( i , j ; h , k ) minimum sequence alignment cost between sequences a i . . . a j and b h . . . b k .  D ( i , j − 1; h , k − 1) + x if a j � = b k    D ( i , j − 1; h , k − 1) if a j = b k  D ( i , j ; h , k ) = min D ( i , j − 1; h , k ) + y    D ( i , j ; h , k − 1) + y  • Recursion:  D ( i + 1 , j ; h + 1 , k ) + x if a i � = b h    D ( i + 1 , j ; h + 1 , k ) if a i = b h  = min D ( i + 1 , j ; h , k ) + y    D ( i , j ; h + 1 , k ) + y  S.Will, 18.417, Fall 2011 � x if a i � = b h • Initialization: D ( i , i ; h , h ) = 0 else

  13. → Recall Zuker • Energies: e ( s ), where s is k-loop (or s = φ for empty structure) • F ( i , j ) “free”, minimum energy for subsequence a i . . . a j • C ( i , j ) “closed”, minimum energy for subsequence where ( i , j ) ∈ P • Zuker Recursion: • Problem: (6) requires time proportional to n 2 K S.Will, 18.417, Fall 2011 where K maximum k in k -loops

  14. → Usual Simplification • e(s) for k-loops with k ≥ 3 (multiloops) e ( s ) = A + ( k − 1) P + uQ • New matrix: G ( i , j ) for multiloops • Recursion: S.Will, 18.417, Fall 2011

  15. → Simultanous Alignment and Folding • Extend definition of D ( i 1 , j 1 ; i 2 , j 2 ) if i 1 > j 1 , then cost for deleting b i 2 . . . b j 2 . if j 2 > i 2 , then cost for deleting a i 1 . . . a j 1 . • F ( i 1 , j 1 ; i 2 , j 2 ) minimum cost (sum of alignment and free energy) for a i 1 . . . a j 1 and b i 2 . . . b j 2 . • C ( i 1 , j 1 ; i 2 , j 2 ): minimum cost for a i 1 +1 . . . a j 1 − 1 and b i 2 +1 . . . b j 2 − 1 under condition ( i 1 , j 1 ) ∈ P a and ( i 2 , j 2 ) ∈ P b S.Will, 18.417, Fall 2011

  16. → Simultanous Alignment and Folding: “Closed” S.Will, 18.417, Fall 2011

  17. → Simultanous Alignment and Folding: Multiloop • G ( i 1 , j 1 ; i 2 , j 2 ): matrix for multiloop alignment • Recursion for G G ( i 1 , j 1 ; i 2 , j 2 )  match j 1 and j 2 match i 1 and i 2    � �� � � �� �   C ( i 1 , j 1 ; i 2 , j 2 ) + 2 P + D ( i 1 , i 1 ; i 2 , i 2 ) + D ( j 1 , j 1 ; j 2 , j 2 )       G ( i 1 , h 1 ; i 2 , h 2 ) + ( j 1 − h 1 + j 2 − h 2 ) Q       = min  + D ( h 1 + 1 , j 1 ; h 2 + 1 , j 2 ) ,     min G ( i 1 , h 1 ; i 2 , h 2 ) + G ( h 1 + 1 , j 1 ; h 2 + 1 , j 2 ) ,      i 1 < h 1 < j 1   ( h 1 − i 1 + 1 + h 2 − i 2 + 1) Q    i 2 < h 2 < j 2        + D ( i 1 , h 1 ; i 2 , h 2 ) + G ( h 1 + 1 , j 1 ; h 2 + 1 , j 2 ) S.Will, 18.417, Fall 2011

  18. → Simultanous Alignment and Folding: “free” • Recursion for F  C ( i 1 , j 1 ; i 2 , j 2 ) + D ( i 1 , i 1 ; i 2 , i 2 ) + D ( j 1 , j 1 ; j 2 , j 2 )     min F ( i 1 , h 1 ; i 2 , h 2 ) + F ( h 1 + 1 , j 1 ; h 2 + 1 , j 2 ) F ( i 1 , j 1 ; i 2 , j 2 ) = min i 1 < h 1 < j 1  i 2 < h 2 < j 2    D ( i 1 , j 1 ; i 2 , j 2 ) • with initial conditions C ( i 1 , i 1 ; i 2 , i 2 ) = ∞ and G ( i 1 , i 1 ; i 2 , j 2 ) = G ( i 1 , j 1 ; i 2 , i 2 ) = ∞ S.Will, 18.417, Fall 2011

  19. → Complexity space complexity O ( n 4 ) • constant number of matrices (C,D,F, and G) • each of them has O ( n 4 ) entries time complexity O ( n 6 ) • each entry of matrix D requires constant time • each entry of F,C, and G requires O ( n 2 ) time (minimize over all h 1 , h 2 ) • hence: n 4 · n 2 = n 6 S.Will, 18.417, Fall 2011

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend