outline this week
play

Outline: This week 1. Subnetwork querying. More colorcoding. - PDF document

4/28/09 CSCI1950Z Computa5onal Methods for Biology Lecture 23 Ben Raphael April 27, 2009 hJp://cs.brown.edu/courses/csci1950z/ Outline: This week 1. Subnetwork querying. More colorcoding. Treewidth graphs. 2. Network Mo5fs 3.


  1. 4/28/09 CSCI1950‐Z Computa5onal Methods for Biology Lecture 23 Ben Raphael April 27, 2009 hJp://cs.brown.edu/courses/csci1950‐z/ Outline: This week 1. Subnetwork querying. – More color‐coding. Tree‐width graphs. 2. Network Mo5fs 3. Network alignment : conserved complexes 4. Network integra5on: networks + gene expression data. 1

  2. 4/28/09 Network Querying Problem • Species A • well studied • protein interac5on sub‐ networks defined by extensive experimenta5on • Species B • less studied • liJle knowledge of sub‐ networks • protein interac5on network known using high‐throughput technologies • Can we use the knowledge of A to discover corresponding sub‐ networks in B if it is “present”? Graph Isomorphism G 1 = (V 1 , E 1 ) and G 2 = (V 2 , E 2 ) are isomorphic provided: There is a bijec5on between the ver5ces V 1 and V 2 that preserves edges i.e. There is a 1‐1, onto func5on Φ: V 1  V 2 such that (u,v) ∈ E 1 if and only if (Φ(u), Φ(v)) ∈ E 2 2

  3. 4/28/09 Graph Isomorphism Problem Are G 1 = (V 1 , E 1 ) and G 2 = (V 2 , E 2 ) isomorphic ? Neither known to be in P or to be NP‐complete. A E 1 4 D C 2 5 B 3 6 F Subgraph Isomorphism Problem Is G 1 = (V 1 , E 1 ) isomorphic to a subgraph of G 2 ? NP‐complete problem. 3

  4. 4/28/09 Network Querying Problem Query Q • Given a query graph Q and a network G, find the sub‐ network of G that is – Isomorphic to Q – aligned with maximal score • NP‐complete: subgraph isomorphism. Network G Network Querying Problem: Homeomorphic Alignment Species A Species B Q homeomorphic to Q match match match match dele5on inser5on match match Match of homologous proteins and dele5on/inser5on of degree‐2 nodes 4

  5. 4/28/09 Graph Subdivision and Homeomorphism Subdivision of an edge: “insert” a vertex. Subdivision of G is graph obtained by subdividing some edges. Q homeomorphic to Q match G and G’ are match homeomorphic provided match there is an isomorphism match from a subdivision of G to dele5on a subdivision of G’ inser5on match match Network Querying Problem: Score of Alignment Sequence Penalty for Interac5on Score + + = similarity dele5ons& reliabili5es score for inser5ons score matches h(q 1 ,v 1 ) q 1 v 1 w(v 1 ,v 2 ) h(q 2 ,v 2 ) h(q 3 ,v 3 ) v 2 h(q 4 ,v 4 ) del pen ins pen h(q 5 ,v 5 ) h(q 6 ,v 6 ) 5

  6. 4/28/09 Network Querying Problem Query Q • Given a query graph Q and a network G, find the sub‐ network of G that is – homeomorphic to Q – aligned with maximal score Network G Complexity • Network querying problem is NP‐ complete. (for general n and k) – by reduc5on from sub‐graph isomorphism problem  Naïve algorithm has O(n k ) complexity  n = size of the PPI network, k=size of the query  Intractable for realis5c values of n and k  n ~5000, k~10  We use randomized “color coding” technique developed by [Alon et al, JACM, 1995] to find a tractable solu5on. Reduces O(n k ) to n 2 2 O(k) .  6

  7. 4/28/09 QNET  Implemented for tree‐like queries.  Color coding approach to search for the global op5mal sub‐network.  Extension of QPATH [Shlomi et al., 2006]  Solves the problem of querying chains using color coding approach. Color Coded Querying ‐ Trees Network Query has k nodes. Query 7

  8. 4/28/09 Color Coded Querying ‐ Trees Network Query has k nodes. Randomly color the network with k dis5nct colors. Suppose op5mal sub‐network is “colorful”. Use the colors to remember the visited nodes. DP solu5on for Color Coded Querying ‐ Trees Query Network q 1 v 1 q 2 q 3 v 2 v 4 q 4 v 3 q 5 v 6 q 6 v 7 q 7 DP: Whiteboard 8

  9. 4/28/09 Probability of failure • The op5mal alignment Network can be found only if the op5mal sub‐network is “colorful”. v 1 P ( failure ) = 1 − k ! v 2 v 4 k k ≤ 1 − e − k v 3 v 6 v 5 • Repeat color‐coded search mul5ple 5mes v 7 un5l probability of failure ≤ ε. Number of Repeats  Necessary number of repeats to guarantee a failure ≤ ε ?  Repeat 5mes, then 9

  10. 4/28/09 Network Querying with Color Coding Approach randomly color Network Graph query repeat N 5mes high scoring subnetwork DP algorithm Querying General Graphs • Extend algorithm to general query graphs. • Idea: – Map the original graph into a tree, i.e. tree decomposi5on. – Solve the querying problem on this tree using DP. 10

  11. 4/28/09 Color Coded Querying – General Graphs Map the original query into a tree using tree‐decomposi5on. node=set of ver5ces T G u v z vertex Tree Decomposi5on Given G = (V, E). Form tree T = (X, E T ) Each X i ∈ X is a subset of V. For all edges (u,v) ∈ E : there is a set X i node=set of ver5ces containing both u and v. For every v ∈ V: the nodes that contain v T form a connected subtree. G u v z vertex 11

  12. 4/28/09 Color Coded Querying – General Graphs Tree decomposi5on is not unique . Width of a tree decomposi5on is the size of its largest node minus one: Treewidth of a graph G is the minimum width T among all possible tree decomposi5ons of G . G Color Coded Querying – General Graphs The treewidth of a graph G is the minimum width among all possible tree decomposi5ons of G . DPs on trees can usually be extended to tree decomposi5ons. Problems solved efficiently on trees by DP can be solved efficiently on graphs with bounded treewidth. T G 12

  13. 4/28/09 Color Coded Querying – General Graphs Network Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors. . q 1 q 2 q 3 q 2 q 3 q 4 q 5 q 5 q 4 q 8 q 6 q 7 Color Coded Querying – General Graphs Network Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors. q 1 v 1 q 2 q 3 v 2 v 3 q 2 q 3 v 4 v 5 q 4 q 5 v 7 v 8 q 5 q 4 v 6 q 8 q 6 q 7 13

  14. 4/28/09 Running 5me • n=size of network, k=size of query. • Tree queries: – Reduces O(n k ) to n 2 2 O(k) . • Tractable for realis5c values of n and k. • n ~5000, k~10 • Bounded‐tree‐width graphs: – t : tree‐width – n (t+1) 2 O(k) Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. G 14

  15. 4/28/09 Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. 15

  16. 4/28/09 Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. 3. Merge the matching trees to obtain matching graph. 16

  17. 4/28/09 Test: Cross‐species comparison of MAPK pathways Query Match • Query: human MAPK pathway from in human fly involved in cell prolifera5on and differen5a5on. • Network: fly PPI network • Result: a known fly MAPK pathway involved in dorsal paJern forma5on. Test: Cross‐species comparison of protein complexes • Queries: trees of size 3‐8 extracted from ~100 yeast hand‐curated MIPS complexes. • Network: fly PPI network • Result: – ~40 of the queries resulted in a match with >1 protein. – 72% of the matches are func5onally enriched. (pvalue < 0.05) • 17% of the random trees extracted from network are func5onally enriched. 17

  18. 4/28/09 Outline: This week 1. Subnetwork querying. – More color‐coding. Tree‐width graphs. 2. Network Mo5fs 3. Network alignment : conserved complexes 4. Network integra5on: networks + gene expression data. Network Structure Is there structure in this network, or is it “random”? 18

  19. 4/28/09 Random Network Erdos‐Renyi model: n ver5ces. For each pair ( u , v ) of ver5ces, connect with edge with probability p . Random Networks Erdos‐Renyi graphs have a number of special proper5es. 1. Degree distribu5on is asympto5cally Poisson. D = degree of a vertex. Pr[D = k]  Exp[‐λ] λ k /k! 2. If p > 1/n, there is a large connected component: second largest component has size O(log n ) 3. More… hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks 19

  20. 4/28/09 Random Networks Empirically, biological networks have different proper5es. Degree distribu5on follows a power law . p k = Pr[D = k] ∼ C k ‐λ or log p k ∼ –λ C’log[k] There are a few nodes of high degree, “hubs” Log‐log scale hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks Random Networks Empirically, biological networks have different proper5es. Suggests that “aJachment process” does not follow Erdos‐ Renyi model. Is there any biological significance??? A different aJachment process? Clues about network evolu5on? Major caveat : All biological networks are incomplete. hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend