a polynomial time algorith for finding minimal
play

A polynomial-time algorith for finding minimal conflicting sets St - PowerPoint PPT Presentation

A polynomial-time algorith for finding minimal conflicting sets St ephane Vialette vialette@univ-mlv.fr LIGM Universit e Paris-Est Marne-la-Vall ee November 8-10, 2010 S. Vialette (LIGM) MCSR 11, 8-10, 10 1 / 1 Consecutive 1 s


  1. A polynomial-time algorith for finding minimal conflicting sets St´ ephane Vialette vialette@univ-mlv.fr LIGM Universit´ e Paris-Est Marne-la-Vall´ ee November 8-10, 2010 S. Vialette (LIGM) MCSR 11, 8-10, 10 1 / 1

  2. Consecutive 1 s Property Definition A binary matrix has the Consecutive 1’s Property (C1P) if its columns can be ordered in such a way that all 1’s on each rows are consecutive. Deciding if a given binary matrix has the C1P and finding the corresponding columns permutation can be done in linear-time [Booth, and Lueker, 1976 ; McConnell, 2004] . Algorithmic questions related to the C1P for binary matrices are central in genomics (e.g. physical mapping and ancestral genome reconstruction). S. Vialette (LIGM) MCSR 11, 8-10, 10 2 / 1

  3. C1P and ancestral genomes [ Chauve et al. , 2009 ] When inferring an ancestral genome architecture from the comparison of extant genomes, it is common to represent partial information about the ancestral genome G as a binary matrix M : columns represent genomic markers that are believed to have been present in G , rows of M represent groups of markers that are believed to be co-localized in G , and the goal is to infer the order of the markers on the chromosomes of G (such an ordering of the markers define chromosomal segments called Contiguous Ancestral Regions (CARs)). S. Vialette (LIGM) MCSR 11, 8-10, 10 3 / 1

  4. C1P and ancestral genomes [ Chauve et al. , 2009 ] If the matrix M contains only correct information ( i.e. , groups of markers that were colocalized in the ancestral genome of interest), then it has the C1P . For most real datasets, M contains errors (incorrect columns, that represent genomic markers that were not present in G, or incorrect rows, that represent groups of markers that were not co-localized in G ). A fundamental question is to detect such errors in order to correct M , and the classical approach to handle these (unknown) errors relies on combinatorial optimization, asking for an optimal transformation of M into a matrix that has the C1P , for some notion of transformation of a matrix linked to the expected errors. S. Vialette (LIGM) MCSR 11, 8-10, 10 4 / 1

  5. From ( 0 , 1 ) -matrices to B&W bipartite graphs Definition Let M be a m × n ( 0 , 1 ) -matrix. Its corresponding vertex-colored bipartite graph G ( M ) = ( R , C , E ) is defined as follows: for every row of M there is a black vertex in R = { r i : 1 ≤ i ≤ m } , for every column of M there is a white vertex in C = { c i : 1 ≤ i ≤ n } , there is an edge between a black vertex r i ∈ R and a white vertex c j ∈ C if and only if M [ i , j ] = 1. S. Vialette (LIGM) MCSR 11, 8-10, 10 5 / 1

  6. Tucker configurations Theorem (Tucker, 1972) A ( 0 , 1 ) -matrix has the Consecutive 1 ’s Property (C1P) if and only if it contains none of the matrices M I k , M II k , M III k (k ≥ 1 ), M IV and M V . S. Vialette (LIGM) MCSR 11, 8-10, 10 6 / 1

  7. Minimum conflicting sets Definition A Minimal Conflicting Sets (MCS) is a set of rows R of a matrix that does not have the C1P but such that any proper subset of R has the C1P . The Conflicting Index (CI) of a row r is the number of MCS it belongs to. Remarks In [ Bergeron et al. , 2004 ] an extreme approach was followed in handling non-C1P matrices: all rows belonging to at least one MCS were discarded. In [ Stoye and Wittler, 2009 ] rows were ranked according to their CI (or more precisely an approximation of their CI) before being processed by a branch-and-bound algorithm to extract a maximal subset of rows of M that has the C1P S. Vialette (LIGM) MCSR 11, 8-10, 10 7 / 1

  8. Related results Theorem (Chauve et al. , 2009) Let M be a binary matrix that does not have the C1P , and r a row of M. Deciding if r belongs to an MCS due to a bounded Tucker configuration is solvable in m max { 3 ,∆ } ∆ ( n + ∆ + e ) , where ∆ is the maximum number of 1 ’s in a row. Remarks Does a row have a positive conflicting index? Bounded ∆ s are well-suited for some practical applications (e.g. reconstruction of ancestral mammalian genomes [ Ma et al. , 2006 ]). The general problem was left open in [ Chauve et al. , 2009 ]. S. Vialette (LIGM) MCSR 11, 8-10, 10 8 / 1

  9. Main result Theorem Let M be m × n ( 0 , 1 ) -matrix. For any row r of M, deciding whether there exists an MCSR involving row r is solvable in O ( m 6 n 5 ( m + n ) 2 log ( m + n )) time. Remarks The proof is by providing a sequence of polynomial-time algorithms for finding a minimal Tucker configuration of a given type T ∈ { M I k , M II k , M III k , M IV , M V } responsible for an MCSR involving a given row (if it exists). our approach is based on two graph pruning techniques S. Vialette (LIGM) MCSR 11, 8-10, 10 9 / 1

  10. Graph pruning techniques Definition ( clean ) Let M be a binary matrix and G ( M ) = ( R , C , E ) be the corresponding vertex-colored bipartite graph. For any vertex v ∈ R , clean ( v ) results in the graph G ( M )[ R ∪ ( C \ N ( v ))] . For any vertex v ∈ C , clean ( v ) results in the graph G ( M )[( R \ N ( v )) ∪ C ] . Definition ( clean ) Let M be a binary matrix and G ( M ) = ( R , C , E ) be the corresponding vertex-colored bipartite graph. For any vertex v ∈ R ∪ C , clean ( v ) results in a graph where any neighbor of v has been deleted. S. Vialette (LIGM) MCSR 11, 8-10, 10 10 / 1

  11. Graph pruning techniques Definition ( anticlean ) Let M be a binary matrix and G ( M ) = ( R , C , E ) be the corresponding vertex-colored bipartite graph. For any node v ∈ R , anticlean ( v ) results in the graph G ( M )[ R ∪ ( C \ { u : u �∈ N ( v ) } )] . For any node v ∈ C , anticlean ( v ) results in the graph G ( M )[( R \ { u : u �∈ N ( v ) } ) ∪ C ] . Definition ( anticlean ) Let M be a binary matrix and G ( M ) = ( R , C , E ) be the corresponding vertex-colored bipartite graph. For any vertex v ∈ R ∪ C , anticlean ( v ) results in a graph where any node that does not belong to the same partition nor the neighborhood of v has been deleted. S. Vialette (LIGM) MCSR 11, 8-10, 10 11 / 1

  12. An easy but useful theorem Theorem Let T = ( R T , C T , E T ) be a Tucker configuration responsible for an MCS involving a given row r in G ( M ) = ( R , C , E ) . Then R T is an MCS involving r and there is no smaller Tucker configuration – in terms of number of rows (or black nodes) – in G ( M )[ R T ∪ C ] . S. Vialette (LIGM) MCSR 11, 8-10, 10 12 / 1

  13. G ( M I k ) Tucker configurations Theorem Let M be a ( 0 , 1 ) -matrix with corresponding vertex-colored bipartite graph G ( M ) = ( R , C , E ) , and r be any row of M. Finding (if it exists) a minimum cardinality R ′ ⊆ R responsible for an MCS involving row r such that G ( M )[ R ′ , C ′ ] = G ( M I k ) for some C ′ ⊆ C and some k ≥ 1 can be done in O ( m 4 n 4 ) time. Proof. Brute-force algorithm for k = 1 and k = 2. Graph pruning techiques for k > 2: S. Vialette (LIGM) MCSR 11, 8-10, 10 13 / 1

  14. G ( M I k ) Tucker configurations: Algorithm ∀ c x , c y ∈ C , ∀ r B , r C ∈ R , such that ( r C , c y , r A , c x , r B ) is a path in G ( M ) 1: if N ( r A ) ∩ N ( r B ) ∩ N ( r C ) � = ∅ then 2: return ”NO” 3: end if 4: clean ( c ) for all c ∈ N ( r A ) \ N ( r B ) 5: clean ( c ) for all c ∈ N ( r A ) \ N ( r C ) 6: clean ( r A , c x , c y ) 7: delete vertex r A 8: if there exists a r B r C -path in the pruned graph then let P be a shortest r B r C -path in the pruned graph 9: return return { r A } ∪ { r i : r i ∈ V ( P ) ∩ R } 10: 11: else return ”NO” 12: 13: end if S. Vialette (LIGM) MCSR 11, 8-10, 10 14 / 1

  15. G ( M I k ) Tucker configurations: Algorithm Remarks We need to prove that the pruning operations are safe, i.e. , we don’t miss a solution, and the returned solution is indeed an MCS involving row r . S. Vialette (LIGM) MCSR 11, 8-10, 10 15 / 1

  16. Main result Theorem Let M be m × n ( 0 , 1 ) -matrix. For any row r of M, deciding whether there exists an MCSR involving row r is solvable in O ( m 6 n 5 ( m + n ) 2 log ( m + n )) time. Tucker configuration Complexity O ( m 4 n 4 ) M I k O ( m 6 n 5 ( m + n ) 2 log ( m + n )) M II k O ( m 5 n 5 ( m + n ) 2 log ( m + n )) M III k O ( m 2 n 6 ) M IV O ( m 3 n 5 ) M V O ( m 6 n 5 ( m + n ) 2 log ( m + n )) Total S. Vialette (LIGM) MCSR 11, 8-10, 10 16 / 1

  17. Main result Theorem Let M be m × n ( 0 , 1 ) -matrix. For any row r of M, deciding whether there exists an MCSR involving row r is solvable in O ( m 6 n 5 ( m + n ) 2 log ( m + n )) time. Algorithms for M IV and M V Tucker configurations are by complete enumeration. Algorithms for M II k and M III k Tucker configurations are more difficult. Our algorithms are not independent (e.g. our algorithm for M II k assumes that we already failed in finding some M I k Tucker configuration responsible for an MCS involving row r ). S. Vialette (LIGM) MCSR 11, 8-10, 10 16 / 1

  18. Extensions and further research Our graph pruning framework can be extended to deal with the Circular 1’s Property. The algorithm is still not practicable for large (moderate) m and n . Our approach raises new combinatorial graph problems. S. Vialette (LIGM) MCSR 11, 8-10, 10 17 / 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend