Outline: This week 1. Subnetwork querying. More colorcoding. - - PDF document

outline this week
SMART_READER_LITE
LIVE PREVIEW

Outline: This week 1. Subnetwork querying. More colorcoding. - - PDF document

4/28/09 CSCI1950Z Computa5onal Methods for Biology Lecture 23 Ben Raphael April 27, 2009 hJp://cs.brown.edu/courses/csci1950z/ Outline: This week 1. Subnetwork querying. More colorcoding. Treewidth graphs. 2. Network Mo5fs 3.


slide-1
SLIDE 1

4/28/09 1

CSCI1950‐Z Computa5onal Methods for Biology Lecture 23

Ben Raphael April 27, 2009

hJp://cs.brown.edu/courses/csci1950‐z/

Outline: This week

  • 1. Subnetwork querying.

– More color‐coding. Tree‐width graphs.

  • 2. Network Mo5fs
  • 3. Network alignment : conserved complexes
  • 4. Network integra5on: networks + gene

expression data.

slide-2
SLIDE 2

4/28/09 2

Network Querying Problem

  • Species A
  • well studied
  • protein interac5on sub‐

networks defined by extensive experimenta5on

  • Species B
  • less studied
  • liJle knowledge of sub‐

networks

  • protein interac5on network

known using high‐throughput technologies

  • Can we use the knowledge of A to

discover corresponding sub‐ networks in B if it is “present”?

Graph Isomorphism

G1 = (V1, E1) and G2 = (V2, E2) are isomorphic provided: There is a bijec5on between the ver5ces V1 and V2 that preserves edges i.e. There is a 1‐1, onto func5on Φ: V1  V2 such that (u,v) ∈ E1 if and only if (Φ(u), Φ(v)) ∈ E2

slide-3
SLIDE 3

4/28/09 3

Graph Isomorphism Problem

Are G1 = (V1, E1) and G2 = (V2, E2) isomorphic? Neither known to be in P or to be NP‐complete.

A F C B D E 1 4 2 3 5 6

Subgraph Isomorphism Problem

Is G1 = (V1, E1) isomorphic to a subgraph of G2? NP‐complete problem.

slide-4
SLIDE 4

4/28/09 4

Network Querying Problem

  • Given a query graph Q and a

network G, find the sub‐ network of G that is

– Isomorphic to Q – aligned with maximal score

  • NP‐complete: subgraph

isomorphism.

Query Q Network G

Network Querying Problem:

Homeomorphic Alignment

homeomorphic to Q

inser5on match match match match match match

Match of homologous proteins and dele5on/inser5on of degree‐2 nodes

dele5on

Species B Species A

Q

slide-5
SLIDE 5

4/28/09 5

Graph Subdivision and Homeomorphism

homeomorphic to Q

inser5on match match match match match match dele5on

Q

G and G’ are homeomorphic provided there is an isomorphism from a subdivision of G to a subdivision of G’ Subdivision of an edge: “insert” a vertex. Subdivision of G is graph obtained by subdividing some edges.

Network Querying Problem: Score of Alignment

Score Sequence similarity score for matches Penalty for dele5ons& inser5ons Interac5on reliabili5es score + + =

h(q1,v1)

q1 v1

h(q2,v2) h(q3,v3) h(q4,v4) h(q6,v6) h(q5,v5) del pen

ins pen

v2

w(v1,v2)

slide-6
SLIDE 6

4/28/09 6

Network Querying Problem

  • Given a query graph Q and a

network G, find the sub‐ network of G that is

– homeomorphic to Q – aligned with maximal score

Query Q Network G

Complexity

  • Network querying problem is NP‐
  • complete. (for general n and k)

– by reduc5on from sub‐graph isomorphism problem

 Naïve algorithm has O(nk) complexity

 n = size of the PPI network, k=size of the query  Intractable for realis5c values of n and k  n ~5000, k~10  We use randomized “color coding” technique developed by [Alon

et al, JACM, 1995] to find a tractable solu5on.

Reduces O(nk) to n22O(k).

slide-7
SLIDE 7

4/28/09 7

QNET

 Implemented for tree‐like queries.  Color coding approach to search for the global op5mal

sub‐network.

 Extension of QPATH [Shlomi et al., 2006]

 Solves the problem of querying chains using color coding

approach.

Query

Color Coded Querying ‐ Trees

Network Query has k nodes.

slide-8
SLIDE 8

4/28/09 8

Network

Color Coded Querying ‐ Trees

Query has k nodes. Randomly color the network with k dis5nct colors. Suppose op5mal sub‐network is “colorful”. Use the colors to remember the visited nodes. Query Network

DP solu5on for Color Coded Querying ‐ Trees

q1 q2 q3 q4 q5 q6 q7 v1 v2 v3 v6 v7 v4

DP: Whiteboard

slide-9
SLIDE 9

4/28/09 9

Network

Probability of failure

  • The op5mal alignment

can be found only if the

  • p5mal sub‐network is

“colorful”.

  • Repeat color‐coded

search mul5ple 5mes un5l probability of failure ≤ ε.

v1 v2 v3 v6 v7 v4 v5

P( failure) =1− k! k k ≤1− e−k

Number of Repeats

 Necessary number of repeats to guarantee a failure ≤ ε?

 Repeat 5mes, then

slide-10
SLIDE 10

4/28/09 10

Network Querying with Color Coding Approach

Network Graph high scoring subnetwork query randomly color DP algorithm repeat N 5mes

Querying General Graphs

  • Extend algorithm to general query graphs.
  • Idea:

– Map the original graph into a tree, i.e. tree decomposi5on. – Solve the querying problem on this tree using DP.

slide-11
SLIDE 11

4/28/09 11

Color Coded Querying – General Graphs

Map the original query into a tree using tree‐decomposi5on. G T

vertex node=set of ver5ces u v z

Tree Decomposi5on

G T

vertex node=set of ver5ces u v z

Given G = (V, E). Form tree T = (X, ET) Each Xi ∈ X is a subset of V. For all edges (u,v) ∈ E : there is a set Xi containing both u and v. For every v ∈ V: the nodes that contain v form a connected subtree.

slide-12
SLIDE 12

4/28/09 12

Color Coded Querying – General Graphs

Tree decomposi5on is not unique. Width of a tree decomposi5on is the size of its largest node minus one: Treewidth of a graph G is the minimum width among all possible tree decomposi5ons of G.

G T

Color Coded Querying – General Graphs

The treewidth of a graph G is the minimum width among all possible tree decomposi5ons of G.

DPs on trees can usually be extended to tree decomposi5ons. Problems solved efficiently on trees by DP can be solved efficiently on graphs with bounded treewidth. G T

slide-13
SLIDE 13

4/28/09 13

Network

Color Coded Querying – General Graphs

Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors. .

q1 q2 q2 q3 q3 q4 q4 q5 q5 q6 q7 q8

Network

Color Coded Querying – General Graphs

Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors.

q1 q2 q2 q3 q3 q4 q4 q5 q5 q6 q7 q8 v2 v3 v7 v6 v5 v8 v1 v4

slide-14
SLIDE 14

4/28/09 14

Running 5me

  • n=size of network, k=size of query.
  • Tree queries:

– Reduces O(nk) to n22O(k).

  • Tractable for realis5c values of n and k.
  • n ~5000, k~10
  • Bounded‐tree‐width graphs:

– t : tree‐width

– n(t+1)2O(k)

Heuris5c for Color Coded Querying ‐ General Graphs

G

1. Extract several spanning trees from the original query.

slide-15
SLIDE 15

4/28/09 15

1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network.

Heuris5c for Color Coded Querying ‐ General Graphs

1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network.

Heuris5c for Color Coded Querying ‐ General Graphs

slide-16
SLIDE 16

4/28/09 16

1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network.

Heuris5c for Color Coded Querying ‐ General Graphs

1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. 3. Merge the matching trees to obtain matching graph.

Heuris5c for Color Coded Querying ‐ General Graphs

slide-17
SLIDE 17

4/28/09 17

Test: Cross‐species comparison of MAPK pathways

  • Query: human MAPK pathway

involved in cell prolifera5on and differen5a5on.

  • Network: fly PPI network
  • Result: a known fly MAPK

pathway involved in dorsal paJern forma5on.

Query from human Match in fly

Test: Cross‐species comparison of protein complexes

  • Queries: trees of size 3‐8 extracted from ~100 yeast hand‐curated MIPS complexes.
  • Network: fly PPI network
  • Result:

– ~40 of the queries resulted in a match with >1 protein. – 72% of the matches are func5onally enriched. (pvalue < 0.05)

  • 17% of the random trees extracted from network are func5onally enriched.
slide-18
SLIDE 18

4/28/09 18

Outline: This week

  • 1. Subnetwork querying.

– More color‐coding. Tree‐width graphs.

  • 2. Network Mo5fs
  • 3. Network alignment : conserved complexes
  • 4. Network integra5on: networks + gene

expression data.

Network Structure

Is there structure in this network, or is it “random”?

slide-19
SLIDE 19

4/28/09 19

Random Network

Erdos‐Renyi model: n ver5ces. For each pair (u, v) of ver5ces, connect with edge with probability p.

Random Networks

Erdos‐Renyi graphs have a number of special proper5es.

  • 1. Degree distribu5on is asympto5cally

Poisson.

D = degree of a vertex. Pr[D = k]  Exp[‐λ] λk/k!

  • 2. If p > 1/n, there is a large connected

component: second largest component has size O(log n)

  • 3. More…

hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks

slide-20
SLIDE 20

4/28/09 20

Random Networks

Empirically, biological networks have different proper5es. Degree distribu5on follows a power law.

pk = Pr[D = k] ∼ C k‐λ

  • r log pk ∼ –λ C’log[k]

There are a few nodes of high degree, “hubs”

hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks

Log‐log scale

Random Networks

Empirically, biological networks have different proper5es. Suggests that “aJachment process” does not follow Erdos‐ Renyi model. Is there any biological significance??? A different aJachment process? Clues about network evolu5on? Major caveat: All biological networks are incomplete.

hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks

slide-21
SLIDE 21

4/28/09 21

Network Mo5fs

Subnetworks with more occurrences than expected by chance.

  • How to find?
  • How to assess sta5s5cal significance?

Shen‐Orr et al. 2002

Network Mo5fs

Subnetworks with more occurrences than expected by chance.

  • How to find?

– Exhaus5ve: Count all n‐node subgraphs. – Greedy and other heuris5c methods. – Approximate coun5ng via randomized algorithms.

slide-22
SLIDE 22

4/28/09 22

Sources

  • Dost B, Shlomi T, Gupta N, Ruppin E, Bafna V, Sharan R. QNet:

a tool for querying protein interac5on networks. (2008) J Comput Biol. 15(7):913‐25.

  • QNET slides: modified from slides of Banu Dost.