Network Mo3fs Subnetworks with more occurrences than expected by - - PDF document

network mo3fs
SMART_READER_LITE
LIVE PREVIEW

Network Mo3fs Subnetworks with more occurrences than expected by - - PDF document

5/2/09 CSCI1950Z Computa3onal Methods for Biology Lecture 24 Ben Raphael April 29, 2009 hGp://cs.brown.edu/courses/csci1950z/ Network Mo3fs Subnetworks with more occurrences than expected by chance. How to find? Exhaus3ve:


slide-1
SLIDE 1

5/2/09 1

CSCI1950‐Z Computa3onal Methods for Biology Lecture 24

Ben Raphael April 29, 2009

hGp://cs.brown.edu/courses/csci1950‐z/

Network Mo3fs

Subnetworks with more occurrences than expected by chance.

  • How to find?

– Exhaus3ve: Count all k‐node subgraphs. – Heuris3c methods: sampling, greedy, etc. – Approximate coun3ng via randomized algorithms.

slide-2
SLIDE 2

5/2/09 2

Network Mo3fs

Subnetworks with more occurrences than expected by chance.

  • How to assess sta3s3cal significance?

– Compare number of occurrences to random network.

Random Networks

Occurrence of mo3fs depend strongly on network topology. What is an appropriate ensemble of random networks? (null model)

slide-3
SLIDE 3

5/2/09 3

Random Networks

One parameter governing occurrence of mo3fs is degree distribu3on.

hGps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks

Preserving Degree Distribu3on

How to sample a graph with the same degree sequence?

Method of Newman, Strogatz and Watts (2001)

  • 1. Assign indegree i(v) and
  • utdegree o(v) to vertex v

according to degree sequence.

  • 2. Randomly pair o(v) and i(w).
slide-4
SLIDE 4

5/2/09 4

Network Mo3fs

  • Transcrip3onal regulatory network of E. coli:
  • 116 transcrip3on factors
  • ~700 “genes” (operons)
  • 577 interac3ons.

Shen‐Orr et al. 2002

  • E. coli Network Mo3fs

Shen‐Orr et al. 2002

  • Enumerated all 3 and 4 node

mo3fs.

  • Looked for iden3cal rows in

adjacency matrix (SIM)

  • Used clustering algorithm to

iden3fy DOR.

slide-5
SLIDE 5

5/2/09 5

Coun3ng Subnetworks

G = (V,E). |V| = n. |E| = m.

  • Network‐centric approach

– Count/enumerate all subgraphs with ≤ k ver3ces. – Imprac3cal for large n, m, k

  • Query‐based approach

– Enumerate query graphs Q. – For each Q, count occurrences. (Subgraph isomorphism) – Q could be a non‐induced subgraph.

Coun3ng non‐induced subgraphs

Suppose want to count paths in G = (V,E). Idea: use color‐coding to count colorful paths

– Dynamic programming solu3on (Whiteboard)

Can extend dynamic program to count trees and bounded treewidth graphs.

slide-6
SLIDE 6

5/2/09 6

Rela3on between Forward and Viterbi

VITERBI

Ini0aliza0on: V0(0) = 1 Vk(0) = 0, for all k > 0 Itera0on: Vj(i) = ej(xi) maxk Vk(i‐1) akj Termina0on: P(x, π*) = maxk Vk(N)

FORWARD

Ini0aliza0on: f0(0) = 1 fk(0) = 0, for all k > 0 Itera0on: fl(i) = el(xi) Σk fk(i‐1) akl Termina0on: P(x) = Σk fk(N) ak0

Importance of Network Mo3fs

  • Building block of networks.
  • Indicate modular structure of biological

networks.

  • Appearance of some mo3fs might be

explained by par3cular dynamics (e.g. feedforward and feedback loops) Healthy skep3cism about all these claims, par3cularly because data is incomplete.

slide-7
SLIDE 7

5/2/09 7

Network Integra3on

Normalized expression “z‐score” zij for gene i in condi3on/sample j. Given: G = (V,E) interac3on network. V = genes E = protein‐DNA or protein‐ protein interac3ons Goal: Find “ac3ve subnetworks”.

Subgraphs whose genes are are differen3ally expressed in many condi3ons.

Ideker, et al. (2002); Chuang et al. (2007)

(Whiteboard)

Network Integra3on

M = [ zij ] z‐scores of gene i in condi3on/sample j. Given: G = (V,E) interac3on network. V = genes E = protein‐DNA or protein‐ protein interac3ons Goal: Find A* = argmax rA

A: connected subgraph

Ideker, et al. (2002); Chuang et al. (2007)

slide-8
SLIDE 8

5/2/09 8

Finding High‐scoring subnetwork

Iden3fy set of ac3ve nodes. Gw = working subgraph induced by ac3ve nodes. Simulated Annealing: Global op3miza3on method. Based on idea of random, local search – similar to MCMC. “Temperature” func3on controls when moves to subop3mal neighbors are permitng. Temperature decreased during search, so that eventually seGle in local op3mum.

Results

slide-9
SLIDE 9

5/2/09 9

Future: Knockout Experiments & Reverse Engineering

Input: Signal Output: Gene/protein expression. Given input‐output rela3onship for normal (“wild type”) and mutant (“knockout”) cells, what can one infer about the network?

  • Topology: hard or

impossible de novo: too many combina3ons.

  • New interac3ons or signs
  • f exis3ng interac3ons.

Future: Engineering Networks

Engineer biological networks to perform new tasks. Change metabolic networks to create cells that produce new products.

slide-10
SLIDE 10

5/2/09 10

Sources

  • Shen‐Orr, S.S., Milo, R., Mangan, S., et al. 2002. Network mo3fs in the

transcrip3onal regula3on network of Escherichia coli. Nature Gene;cs 31, 64–68.

  • Newman, M.E.J., Strogatz, S.H., and WaGs, D.J. 2001. Random graphs with

arbitrary degree distribu3ons and their applica3ons. Phys. Rev. E 64, 026118– 026134.

  • Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling

circuits in molecular interac3on networks. Bioinforma;cs. 2002;18 Suppl 1:S233‐40.

  • Chuang HY, Lee E, Liu YT, Lee D, Ideker T. 2007. Network‐based classifica3on of

breast cancer metastasis. Mol Syst Biol. 2007;3:140.