Bio Graph Analysis Lecture 9 CSCI 4974/6971 29 Sep 2016 1 / 14 - - PowerPoint PPT Presentation

bio graph analysis
SMART_READER_LITE
LIVE PREVIEW

Bio Graph Analysis Lecture 9 CSCI 4974/6971 29 Sep 2016 1 / 14 - - PowerPoint PPT Presentation

Bio Graph Analysis Lecture 9 CSCI 4974/6971 29 Sep 2016 1 / 14 Todays Biz 1. Reminders 2. Review 3. Biological Network Analysis Topics 4. Hybrid processing - direction optimizing push/pull 5. Assignment 2 solutions 2 / 14 Todays


slide-1
SLIDE 1

Bio Graph Analysis

Lecture 9 CSCI 4974/6971 29 Sep 2016

1 / 14

slide-2
SLIDE 2

Today’s Biz

  • 1. Reminders
  • 2. Review
  • 3. Biological Network Analysis Topics
  • 4. Hybrid processing - direction optimizing push/pull
  • 5. Assignment 2 solutions

2 / 14

slide-3
SLIDE 3

Today’s Biz

  • 1. Reminders
  • 2. Review
  • 3. Biological Network Analysis Topics
  • 4. Hybrid processing - direction optimizing push/pull
  • 5. Assignment 2 solutions

3 / 14

slide-4
SLIDE 4

Reminders

◮ Project Presentation 1: in class 6 October

◮ Email me your slides (pdf only please) before class ◮ 5-10 minute presentation ◮ Introduce topic, give background, current progress,

expected results

◮ No class 10/11 October ◮ Assignment 3: Thursday 13 Oct 16:00 (social analysis,

posted soon)

◮ Office hours: Tuesday & Wednesday 14:00-16:00 Lally

317

◮ Or email me for other availability 4 / 14

slide-5
SLIDE 5

Today’s Biz

  • 1. Reminders
  • 2. Review
  • 3. Biological Network Analysis Topics
  • 4. Hybrid processing - direction optimizing push/pull
  • 5. Assignment 2 solutions

5 / 14

slide-6
SLIDE 6

Quick Review

◮ Balanced graph partitioning:

◮ Create k independent subsets of graph ◮ Satisfy some balance criteria

◮ Traditional (mesh-like graph) methods:

◮ Coordinate-based methods - inertial bisection,

coordinate-based

◮ Spectral bisection - compute eigenvector using graph

Laplacian

◮ KL-refinement - find best cost/gain for vertex swaps ◮ Multilevel - iterative coarsening/expanding+refinement 6 / 14

slide-7
SLIDE 7

Quick Review

◮ Drawbacks of tradition methods for small-world/massive

scale graphs

◮ KL and spectral methods require O(n2) ◮ Coarsening occurs a high overhead costs ◮ Traditional matching methods perform poorly on skewed

graphs

◮ Small-world and large-graph methods:

◮ Streaming methods: perform immediate assignment

based on some weighted cost/gain function for each vertex/edge encountered in the stream

◮ Single-level label propagation: hold full graph in memory,

exploit community-like structure of small-world graphs to get quality partitions without a multilevel framework

◮ Other: Use distributed label propagation for coarsening

in multilevel, tradeoff in quality vs. overhead

7 / 14

slide-8
SLIDE 8

Today’s Biz

  • 1. Reminders
  • 2. Review
  • 3. Biological Network Analysis Topics
  • 4. Hybrid processing - direction optimizing push/pull
  • 5. Assignment 2 solutions

8 / 14

slide-9
SLIDE 9

Network Motifs: simple Building Blocks of Complex Networks Slides from Yoav Lahini, Harvard University

9 / 14

slide-10
SLIDE 10

Network Motifs: simple Building Blocks of Complex Networks

  • R. Milo et. al. Science 298, 824 (2002)
  • Y. Lahini
slide-11
SLIDE 11

The cell and the environment

  • Cells need to react to their environment
  • Reaction is by synthesizing task-specific proteins, on demand.
  • The solution – regulated transcription network
  • E. Coli – 1000 protein types at any given moment >4000 genes (or possible

protein types) – need regulatory mechanism to select the active set

  • We are interested in the design principles of this network
slide-12
SLIDE 12

Proteins are encoded by DNA

DNA – the instruction manual, 4-letter chemical alphabet – A,G,T,C

DNA RNA Protein transcription translation

slide-13
SLIDE 13

Gene Regulation

Transcription factor external signal

  • Proteins are encoded by the DNA of the organism.

protein

promoter region

ACCGTTGCAT

Coding region

DNA

  • Proteins regulate expression of other proteins by interacting

with the DNA

slide-14
SLIDE 14

INCREASED TRANSCRIPTION X X * Sx X * Y Y X Y Y X binding site gene Y

X Y

Bound activator

Two types of Transcription Factors: 1.Activators

No transcription Sub-second Seconds Hours Separation of time scales: TF activation level is in steady state

slide-15
SLIDE 15

Bound repressor

X Y

X X * Sx No transcription X * Bound repressor

Two types of Transcription Factors: Repressors

Unbound repressor X Y Y Y Y

slide-16
SLIDE 16

0.5 1 1.5 2 Repressor concentration X*/K Y promoter activity  /2

Equations of gene regulation

  • If X* regulates Y, the net production rate of gene Y is
  • α- Dilution/degradation rate
  • K – activation coefficient [concentration]; related to the affinity
  • β – maximal expression level
  • Step approximation – gene is on (rate β) or off (rate 0) with threshold K

) ( ) (

* *

K X X f   ) ( ) (

* *

K X X f  

  Y X f dt dY   

* 0.5 1 1.5 2 Activator concentration X*/K Y promoter activity  /2

Y X  * * X Y

slide-17
SLIDE 17
  • Nodes are proteins (or the genes that encode them)
  • Edges = regulatory relation between two proteins

X Y

The gene regulatory network of E. coli

slide-18
SLIDE 18

Analyzing networks

  • The idea- patterns that occur in the real network much more then in

a randomized network, must have functional significance.

  • The randomized networks share the same number of edges and

number of nodes, but edges are assigned at random

slide-19
SLIDE 19

The known E. Coli transcription network

slide-20
SLIDE 20

A random graph based on the same node statistics

slide-21
SLIDE 21

3-node network motif – the feedforward loop

Nreal=40 Nrand=7±3

slide-22
SLIDE 22

Mangan, Alon, PNAS, JMB, 2003

The feedforward loop : a sign sensitive filter

The feedforward loop is a filter for transient signals while allowing fast shutdown The feedforward loop is a filter for transient signals while allowing fast shutdown

slide-23
SLIDE 23

OFF pulse

Vs.

=lacZYA =araBAD

The Feedforward loop : a sign sensitive filter

Mangan, Alon, PNAS, JMB, 2003

slide-24
SLIDE 24

Temporal and expression level program generator

  • The temporal order is encoded in a hierarchy of thresholds
  • Expression levels hierarchy is encoded in hierarchy of promoter activities

Single Input Module

Z1 Z2 Z3 Z1 Z2 Z3

slide-25
SLIDE 25

Single Input Module motif is responsible for exact timing in the flagella assembly

slide-26
SLIDE 26

Kalir et. al., science,2001

Single Input Module motif is responsible for exact timing in the flagella assembly

slide-27
SLIDE 27
  • Shallow network, few long cascades.
  • Modular

The gene regulatory network of E. coli

Shen-Orr et. al. Nature Genetics 2002

Single input modules Feed-forward loops

slide-28
SLIDE 28

Evolution of transcription networks

  • In 1 day, 1010 copies of e-coli, 1010 replication of DNA.
  • Mutation rate is 10-9

– 10 mutations per letter in the population per day

  • Even single DNA base change in the promoter can change the

activation/repression rate

  • Edges can be lost or gained (i.e. selected) easily.
slide-29
SLIDE 29

Links between WebPages – a completely different set of motifs is found

  • WebPages are nodes and Links are directed edges
  • 3 node results:
slide-30
SLIDE 30

Head Sensory Ring Motor Ventral Cord Motor

[White, Brenner 1986; Durbin, Thesis, 1987]

Structure of a nematode neuronal circuitry

slide-31
SLIDE 31

Neurons and transcription share similar motifs

  • C. elegans
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34

Summary

  • The production of proteins in cells is regulated using a complex

regulation network

  • Network motifs: simple building blocks of complex networks
  • An algorithm to identify network motifs
  • Example: the transcription network of E. coli.
  • The feed forward loop as a sign sensitive filter
  • The single input module: exact temporal ordering of protein

expression

slide-35
SLIDE 35

Biological Network Alignment Slides from Johannes Berg, University of Cologne

10 / 14

slide-36
SLIDE 36

Graph Alignment and Biological Networks

Johannes Berg

http://www.uni-koeln.de/˜berg

Institute for Theoretical Physics University of Cologne Germany

– p.1/12

slide-37
SLIDE 37

Networks in molecular biology

New large-scale experimental data in the form of networks: transcription networks protein interaction networks co-regulation networks signal transduction networks, metabolic networks, etc.

– p.2/12

slide-38
SLIDE 38

Networks in molecular biology

New large-scale experimental data in the form of networks: transcription networks

transcription factors bind to regulatory DNA polymerase molecule begins transcription of the gene

– p.2/12

slide-39
SLIDE 39

Networks in molecular biology

New large-scale experimental data in the form of networks: transcription networks

transcription factors bind to regulatory DNA polymerase molecule begins transcription of the gene

– p.2/12

slide-40
SLIDE 40

Networks in molecular biology

New large-scale experimental data in the form of networks: transcription networks

transcription factors bind to regulatory DNA polymerase molecule begins transcription of the gene sea urchin Bolouri &Davidson (2001)

– p.2/12

slide-41
SLIDE 41

Networks in molecular biology

New large-scale experimental data in the form of networks: protein interaction networks

proteins interact to form larger units protein aggregates may catalyze reactions etc.

– p.2/12

slide-42
SLIDE 42

Networks in molecular biology

New large-scale experimental data in the form of networks: protein interaction networks

proteins interact to form larger units protein aggregates may catalyze reactions etc. protein interactions in yeast Uetz et al. (2000)

– p.2/12

slide-43
SLIDE 43

Sequence alignment in molecular biology

more than 100 organisms are fully sequenced genome sizes range from 3 × 107 to 7 × 1011 basepairs

– p.3/12

slide-44
SLIDE 44

Sequence alignment in molecular biology

more than 100 organisms are fully sequenced genome sizes range from 3 × 107 to 7 × 1011 basepairs Global alignment: search for related sequences across species evolutionary relationships hints at common functionality

– p.3/12

slide-45
SLIDE 45

Sequence alignment in molecular biology

more than 100 organisms are fully sequenced genome sizes range from 3 × 107 to 7 × 1011 basepairs Motif search: search for short repeated subsequences binding sites in transcription control

– p.3/12

slide-46
SLIDE 46

Sequence alignment in molecular biology

more than 100 organisms are fully sequenced genome sizes range from 3 × 107 to 7 × 1011 basepairs Tools statistical models are used infer non-random correlations against a background build score function from statistical models design efficient algorithms to maximize score evaluate statistical significance of a given score

– p.3/12

slide-47
SLIDE 47

Sequence alignment in molecular biology

more than 100 organisms are fully sequenced genome sizes range from 3 × 107 to 7 × 1011 basepairs Tools statistical models are used infer non-random correlations against a background build score function from statistical models design efficient algorithms to maximize score evaluate statistical significance of a given score

  • rganism

number of genes worm C. elegans 19 000 fruit fly drosophila 17 000 human homo sapiens

25 000

– p.3/12

slide-48
SLIDE 48

Graph alignment

What can be learned from network data? Can we distinguish functional patterns from a random background?

  • 1. Search for network motifs [Alon lab]

patterns occurring repeatedly within a given network

  • 2. Alignment of networks across species

identify conserved regions pinpoint functional innovations

– p.4/12

slide-49
SLIDE 49

Graph alignment

What can be learned from network data? Can we distinguish functional patterns from a random background?

  • 1. Search for network motifs [Alon lab]

patterns occurring repeatedly within a given network

  • 2. Alignment of networks across species

identify conserved regions pinpoint functional innovations Tools scoring function based on statistical models heuristic algorithms: algorithmic complexity

– p.4/12

slide-50
SLIDE 50

Graph alignment I: The search for network motifs

patterns occurring repeatedly in the network building blocks of information processing [Alon lab]

– p.5/12

slide-51
SLIDE 51

Graph alignment I: The search for network motifs

patterns occurring repeatedly in the network building blocks of information processing [Alon lab] counting of identical patterns: Subgraph census alignment of topologically similar regions of a network allow for mismatches construct a scoring function comparing the aligned subgraphs to a background model

– p.5/12

slide-52
SLIDE 52

Graph alignment I: The search for network motifs

patterns occurring repeatedly in the network building blocks of information processing [Alon lab] counting of identical patterns: Subgraph census alignment of topologically similar regions of a network allow for mismatches construct a scoring function comparing the aligned subgraphs to a background model

– p.5/12

slide-53
SLIDE 53

Graph alignment I: The search for network motifs

patterns occurring repeatedly in the network building blocks of information processing [Alon lab] counting of identical patterns: Subgraph census alignment of topologically similar regions of a network allow for mismatches construct a scoring function comparing the aligned subgraphs to a background model

α=1 Alignment α=3 α=2

– p.5/12

slide-54
SLIDE 54

Statistical properties of alignments

consensus motif c = c

ij

Alignment α=3 α=2 α=1 i=1 i=2

Σ

α α ij

– p.6/12

slide-55
SLIDE 55

Statistical properties of alignments

consensus motif c = c

ij

Alignment α=3 α=2 α=1 i=1 i=2

Σ

α α ij

consensus motif c = 1

p

p

α=1 cα

number of internal links average correlation between two subgraphs fuzziness of motif

– p.6/12

slide-56
SLIDE 56

Statistics of network motifs

null model: ensemble of uncorrelated networks with the same connectivities as the data

– p.7/12

slide-57
SLIDE 57

Statistics of network motifs

null model: ensemble of uncorrelated networks with the same connectivities as the data model describing network motifs ensemble with enhanced number of links enhanced correlation of subgraphs divergent vs convergent evolution?

– p.7/12

slide-58
SLIDE 58

Statistics of network motifs

null model: ensemble of uncorrelated networks with the same connectivities as the data model describing network motifs ensemble with enhanced number of links enhanced correlation of subgraphs divergent vs convergent evolution? Log likelihood score S(c1, . . . , cp) = log Q(c1, . . . , cp) p

α=1 Pσ(cα)

  • =

(σ − σ0)

p

  • α=1

L(cα) − µ 2p

p

  • α,β=1

M(cα, cβ) − log Z

– p.7/12

slide-59
SLIDE 59

Statistics of network motifs

null model: ensemble of uncorrelated networks with the same connectivities as the data model describing network motifs ensemble with enhanced number of links enhanced correlation of subgraphs divergent vs convergent evolution? Log likelihood score S(c1, . . . , cp) = log Q(c1, . . . , cp) p

α=1 Pσ(cα)

  • =

(σ − σ0)

p

  • α=1

L(cα) − µ 2p

p

  • α,β=1

M(cα, cβ) − log Z Algorithm: Mapping onto a model from statistical mechanics (Potts model)

– p.7/12

slide-60
SLIDE 60

Consensus motif of the E. coli transcription network

µ = µ∗ = 2.25 µ = 5 µ = 12

– p.8/12

slide-61
SLIDE 61

Consensus motif of the E. coli transcription network

µ = µ∗ = 2.25 µ = 5 µ = 12

0.2 0.4 0.6 0.8 1 <c

α >

0.2 0.4 0.6 0.8 1 c 0.2 0.4 0.6 0.8 1 <c

αc β>

0.2 0.4 0.6 0.8 1 c c

α α β

– p.8/12

slide-62
SLIDE 62

Graph alignment II: Comparing networks across species

– p.9/12

slide-63
SLIDE 63

Graph alignment II: Comparing networks across species

Alignment: Pairwise association of nodes across species

– p.9/12

slide-64
SLIDE 64

Graph alignment II: Comparing networks across species

Last common ancestor

– p.9/12

slide-65
SLIDE 65

Graph alignment II: Comparing networks across species

Evolutionary dynamics: Link attachment and deletion

– p.9/12

slide-66
SLIDE 66

Graph alignment II: Comparing networks across species

Evolutionary dynamics: Link attachment and deletion

– p.9/12

slide-67
SLIDE 67

Graph alignment II: Comparing networks across species

Representation of the alignment in a single network. Conserved links are shown in green.

– p.9/12

slide-68
SLIDE 68

Scoring graph alignments across species

null model P: ensemble of uncorrelated networks with the same connectivities as the data Q-model correlated networks (due to functional constraints or common ancestry) statistical assessment of orthologs: interplay between sequence similarity

and network topology

Scoring alignments log-likelihood score S = log(Q/P) is used to search for conserved parts of the networks

– p.10/12

slide-69
SLIDE 69

Application to Co-Expression networks

alignment of H. sapiens and M. musculus

– p.11/12

slide-70
SLIDE 70

Application to Co-Expression networks

ribosomal proteins mitochondrial precursors myelin proteolipid protein skeletal muscle proteins

alignment of H. sapiens and M. musculus

– p.11/12

slide-71
SLIDE 71

Genomic systems biology and network analysis

New concept and tools are needed to fully utilize high-throughput data functional design versus noise: statistical analysis evolutionary conservation indicates function Topological conservation versus sequence conservation genes may change functional role in network with small corresponding change in sequence the role of a gene in one species may be taken on by an entirely unrelated gene in another species References:

  • J. Berg and M. Lässig, "Local graph alignment and motif search in biological

networks”, Proc. Natl. Acad. Sci. USA, 101 (41) 14689-14694 (2004)

  • J. Berg, M. Lässig, and A. Wagner, “Structure and Evolution of Protein Interaction

Networks: A Statistical Model for Link Dynamics and Gene Duplications”, BMC

Evolutionary Biology 4:51 (2004)

  • J. Berg, S. Willmann und M. Lässig, “Adaptive evolution of transcription factor binding

sites”, BMC Evolutionary Biology 4(1):42 (2004)

  • J. Berg and M. Lässig, "Correlated random networks", Phys. Rev. Lett. 89(22), 228701

(2002)

– p.12/12

slide-72
SLIDE 72

Detecting Signaling Pathways using Color-coding Slides from H¨ uffner et al., Friedrich-Schiller-Universit¨ at Jena

11 / 14

slide-73
SLIDE 73

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection

Falk H¨ uffner Sebastian Wernicke Thomas Zichner

Friedrich-Schiller-Universit¨ at Jena

Fifth Asia Pacific Bioinformatics Conference January 17, 2007

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 1/22

slide-74
SLIDE 74

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Outline

1

Signaling Pathways Protein Interaction Networks Signaling Pathways Graph Model

2

Color-Coding

3

Algorithm Engineering Worst-case Speedup Lower Bounds

4

Experiments Protein Interaction Networks Simulations

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 2/22

slide-75
SLIDE 75

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Protein Interaction Networks

[www.cellsignal.com]

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 3/22

slide-76
SLIDE 76

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Protein Interaction Networks

Representation of protein interactions as a graph: Proteins are nodes Interactions are edges Edges are annotated with interaction probability (obtained by two-hybrid screening)

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 4/22

slide-77
SLIDE 77

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Signaling Pathways

[www.cellsignal.com]

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 5/22

slide-78
SLIDE 78

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Signaling Pathways

[www.cellsignal.com]

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 5/22

slide-79
SLIDE 79

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Signaling Pathways

Sequence of distinct proteins, where each interacts strongly with the previous one.

Most Probable Path

Input: Graph G = (V , E), interaction probabilities p : E → [0, 1], integer k > 0. Task: Find a non-overlapping path v1, . . . , vk of length k in G that maximizes p(v1, v2) · . . . · p(vk−1, vk).

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 6/22

slide-80
SLIDE 80

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Signaling Pathways

Sequence of distinct proteins, where each interacts strongly with the previous one.

Most Probable Path

Input: Graph G = (V , E), interaction probabilities p : E → [0, 1], integer k > 0. Task: Find a non-overlapping path v1, . . . , vk of length k in G that maximizes p(v1, v2) · . . . · p(vk−1, vk). Setting w(e) := − log(p(e)):

Minimum-Weight Path

Input: Graph G = (V , E), weights w : E → [0, 1], integer k > 0. Task: Find a non-overlapping path v1, . . . , vk of length k in G that minimizes w(v1, v2) + · · · + w(vk−1, vk).

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 6/22

slide-81
SLIDE 81

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Yeast Network

4 400 proteins, 14 300 interactions, looking for paths of length 5–15

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 7/22

slide-82
SLIDE 82

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Minimum-Weight Path

Theorem

Minimum-Weight Path is NP-hard [ Garey&Johnson 1979]. For an exact algorithm, we have to accept exponential runtime.

Idea

Exploit the fact that the paths sought for are rather short (≈ 5–15): restrict the exponential part of the runtime to k (parameterized complexity).

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 8/22

slide-83
SLIDE 83

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Color-Coding

Color-coding [Alon, Yuster&Zwick J. ACM 1995]: randomly color each vertex of the graph with one of k colors hope that all vertices in the subgraph searched for obtain different colors (colorful) solve the Minimum-Weight Path under this assumption (which is much quicker) repeat until it is reasonably certain that the path was colorful at least once Result: exponential part of the runtime depends only on k

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 9/22

slide-84
SLIDE 84

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Dynamic Programming for Minimum-Weight Colorful Path

Idea

Table entry W [v, C] stores the minimum-weight path that ends in v and uses exactly the colors in S.

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 10/22

slide-85
SLIDE 85

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Dynamic Programming for Minimum-Weight Colorful Path

Idea

Table entry W [v, C] stores the minimum-weight path that ends in v and uses exactly the colors in S.

8 6 5 2 1 4 7 3

E A B C D

W [B, { , , }] = 4

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 10/22

slide-86
SLIDE 86

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Dynamic Programming for Minimum-Weight Colorful Path

Coloring c : V → {1, . . . , k}

Recurrence

W [v, C] = min

u∈N(v)|c(u)∈C\{c(v)}(W [u, C \ {c(v)}] + w(u, v))

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 11/22

slide-87
SLIDE 87

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Dynamic Programming for Minimum-Weight Colorful Path

Coloring c : V → {1, . . . , k}

Recurrence

W [v, C] = min

u∈N(v)|c(u)∈C\{c(v)}(W [u, C \ {c(v)}] + w(u, v))

Each table entry can be calculated in O(n) time n2k table entries Runtime: O(n · n2k) = n2 · 2k

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 11/22

slide-88
SLIDE 88

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Color-coding Runtime

O(n2 · 2k) time per trial To obtain error probability ε, one needs O(| ln ε| · ek) trials

Theorem ([Alon et al. JACM 1995])

Minimum-Weight Path can be solved in O(| ln ε| · 5.44k|G|) time).

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 12/22

slide-89
SLIDE 89

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Color-coding Runtime

O(n2 · 2k) time per trial To obtain error probability ε, one needs O(| ln ε| · ek) trials

Theorem ([Alon et al. JACM 1995])

Minimum-Weight Path can be solved in O(| ln ε| · 5.44k|G|) time). Color-coding can find minimum-weight paths of length 10 in the yeast protein interaction networks within 3 hours (n = 4 400, k = 10) [Scott et al., RECOMB’05]

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 12/22

slide-90
SLIDE 90

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Increasing the Number of Colors

Idea

Use k + x colors instead of k colors. Trial runtime: O(2k|G|) → O(2k+x|G|)

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 13/22

slide-91
SLIDE 91

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Increasing the Number of Colors

Idea

Use k + x colors instead of k colors. Trial runtime: O(2k|G|) → O(2k+x|G|) Probability Pc for colorful path (k = 8, ε = 0.001): x 1 2 3 4 5 Pc 0.0024 0.0084 0.0181 0.0310 0.0464 0.0636 trials 2871 816 378 220 146 106

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 13/22

slide-92
SLIDE 92

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Increasing the Number of Colors

Idea

Use k + x colors instead of k colors. Trial runtime: O(2k|G|) → O(2k+x|G|) Probability Pc for colorful path (k = 8, ε = 0.001): x 1 2 3 4 5 Pc 0.0024 0.0084 0.0181 0.0310 0.0464 0.0636 trials 2871 816 378 220 146 106

Theorem

Minimum-Weight Path can be solved in O(| ln ε| · 4.32k|G|) time by choosing x = 0.3k.

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 13/22

slide-93
SLIDE 93

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Increasing the Number of Colors

Idea

Use k + x colors instead of k colors. Trial runtime: O(2k|G|) → O(2k+x|G|) Probability Pc for colorful path (k = 8, ε = 0.001): x 1 2 3 4 5 Pc 0.0024 0.0084 0.0181 0.0310 0.0464 0.0636 trials 2871 816 378 220 146 106

Theorem

Minimum-Weight Path can be solved in O(| ln ε| · 4.32k|G|) time by choosing x = 0.3k. But: Higher memory usage

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 13/22

slide-94
SLIDE 94

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Increasing the Number of Colors

6 8 10 12 14 16 18 20 22 number of colors 1 101 102 103 running time [seconds] k=5 k=6 k=7 k=8 k=9 k=10 k=11 k=12

Runtimes for the yeast protein interaction network (highlighted point of each curve marks worst-case optimum)

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 14/22

slide-95
SLIDE 95

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Exploiting Lower Bounds

Idea

Use a known solution to prune “hopeless” table entries. Discard entries that already have a weight higher than the known solution.

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 15/22

slide-96
SLIDE 96

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Exploiting Lower Bounds

Idea

Use a known solution to prune “hopeless” table entries. Discard entries that already have a weight higher than the known solution. Discard entries when weight + (minimum edge weight · edges left) is higher than the weight of the known solution.

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 15/22

slide-97
SLIDE 97

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Precalculated Lower Bounds

For each vertex u and a range of lengths 1 ≤ i ≤ d, determine the minimum weight of a path of i edges that starts at u. T v

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 16/22

slide-98
SLIDE 98

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Lower Bounds Experiments

d=0 d=1 d=2 d=3 4 6 8 10 12 14 16 18 20 path length 1 101 102 103 104 running time [seconds]

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 17/22

slide-99
SLIDE 99

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Yeast Network

YEAST, Scott et al. (adjusted) YEAST, this work 4 6 8 10 12 14 16 18 20 22 path length 1 101 102 103 104 105 running time [seconds]

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 18/22

slide-100
SLIDE 100

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Network Comparison

|V | |E|

  • clust. coeff.
  • avg. degree
  • max. degree

4 389 14 319 0.067 6.5 237 7 009 20 440 0.030 5.8 175

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 19/22

slide-101
SLIDE 101

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Network Comparison

|V | |E|

  • clust. coeff.
  • avg. degree
  • max. degree

4 389 14 319 0.067 6.5 237 7 009 20 440 0.030 5.8 175

DROSOPHILA, 20 best paths DROSOPHILA, 100 best paths YEAST, 20 best paths YEAST, 100 best paths 4 6 8 10 12 14 16 18 20 22 path length 1 101 102 103 104 105 running time [seconds]

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 19/22

slide-102
SLIDE 102

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Simulations: Robustness of Algorithm

2000 4000 6000 8000 10000 number of vertices 10-1 1 101 102 103 running time [seconds] k=5 k=10 k=15 0.2 0.4 0.6 0.8 1 clustering coefficient 10-1 1 101 102 103 running time [seconds] k=5 k=10 k=15

  • 3
  • 2
  • 1

value of α 10-1 1 101 102 103 running time [seconds] k=5 k=10 k=15 uniform distribution YEAST distribution YEAST distrib., regarding degree 5 10 15 path length 10-1 1 101 102 103 running time [seconds]

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 20/22

slide-103
SLIDE 103

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Conclusion & Outlook

Color-coding, with some algorithm engineering, is a practical and reliable method for finding signaling pathways in protein interaction networks.

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 21/22

slide-104
SLIDE 104

Signaling Pathways Color-Coding Algorithm Engineering Experiments

Conclusion & Outlook

Color-coding, with some algorithm engineering, is a practical and reliable method for finding signaling pathways in protein interaction networks. Future work: Pathway queries Richer motifs (cycles, trees, . . . ) Derandomization

  • F. H¨

uffner et al. (Uni Jena) Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 21/22

slide-105
SLIDE 105

Today’s Biz

  • 1. Reminders
  • 2. Review
  • 3. Biological Network Analysis Topics
  • 4. Hybrid processing - direction optimizing push/pull
  • 5. Assignment 2 solutions

12 / 14

slide-106
SLIDE 106

Direction-optimizing BFS Slides from Yasui et al., Chuo University & JST CREST and Intel

13 / 14

slide-107
SLIDE 107

Outline

  • 1. Background$
  • 2. .Breadth%first.Search.(BFS).
  • 3. NUMA$architecture$
  • 4. Proposal$:$NUMAIopDmized$parallel$BFS$
  • 5. Numerical$Results$
slide-108
SLIDE 108

Breadth%first.Search.(BFS)

  • Obtains$level$of$each$verDces$from$source$vertex$
  • Level$=$certain$#$of$hops$away$from$the$source

Input:$ Graph$G.and$source Output:$ Tree$with$root$as$source

BFS

Source

Level.3

source

Level.2 Level.1

slide-109
SLIDE 109

Hybrid.BFS.for.low%diameter.graph

  • Efficient.for.Low%diameter.graph$

– scale%free$and/or$small%world$property$such$as$social$network.$

  • At$higher$ranks$in$Graph500$benchmark$
  • Hybrid$algorithm$

– combines$topIdown$algorithm$and$boVomIup$algorithm$ – reduces$unnecessary$edge$traversal$

Fron(er Neighbors

Level.k Level.k+1

Fron(er

Level.k Level.k+1

neighbors

Top%down.algorithm BoQom%up.algorithm

switch Efficient$for$a$smallIfronDer Efficient$for$a$largeIfronDer

[Beamer2011,.2012]

Fron(er$<$neighbor Fron(er$>$neighbor

slide-110
SLIDE 110

Top%down.algorithm

  • Explores$outgoing$edges$of$fron(er.queue.QF.
  • Appends$unvisited$verDces$into$neighbor.queue.QN.

Level.1

Source

Level.0

QN QF

slide-111
SLIDE 111

Top%down.algorithm

  • Explores$outgoing$edges$of$fron(er.queue.QF.
  • Appends$unvisited$verDces$into$neighbor.queue.QN.

QN

Level.1

Source

Level.0

QF

Level.2 Level.1

QN QF

Unnecessary.edge.traversal

slide-112
SLIDE 112

Top%down.algorithm

  • Explores$outgoing$edges$of$fron(er.queue.QF.
  • Appends$unvisited$verDces$into$neighbor.queue.QN.
  • Efficient.for.a.small.fron(er.
  • Has$an$unnecessary$edge$traversal$for$a$large$fronDer$

QN

Level.1

Source

Level.0

QF

Level.2 Level.1

QN QF

Unnecessary.edge.traversal Level.3 Level.2

QN QF

slide-113
SLIDE 113

BoQom%up.algorithm

  • Explores$fron(er.queue.QF$from$unvisited.ver(ces.
  • Appends$adjacent$verDces$into$neighbors.QN.

source

QN QF

Unvisited.ver(ces

Level.1 Unnecessary. edge.traversal

slide-114
SLIDE 114

BoQom%up.algorithm

  • Explores$fron(er.queue.QF$from$unvisited.ver(ces.
  • Appends$adjacent$verDces$into$neighbors.QN.

source

QN QF

Unvisited.ver(ces

Level.1 Unnecessary. edge.traversal Level.2

QF

Level.1

QN

Unvisited.ver(ces

slide-115
SLIDE 115

BoQom%up.algorithm

  • Explores$fron(er.queue.QF$from$unvisited.ver(ces.
  • Appends$adjacent$verDces$into$neighbors.QN.
  • Efficient.for.a.large.fron(er.
  • Has$unnecessary$edge$traversal$for$a$small$fronDer$

source

QN QF

Unvisited.ver(ces

Level.1 Unnecessary. edge.traversal Level.2

QF

Level.1

QN

Unvisited.ver(ces

Level.3 Level.2

QN QF

slide-116
SLIDE 116

Hybrid.BFS.combines.Top%down.and.BoQom%up

Fron(er Neighbors

Level.k Level.k+1

Fron(er

Level.k Level.k+1

neighbors

Top%down.algorithm BoQom%up.algorithm

switch Level Top-down Bottom-up Hybrid

mF mB min(mF , mB) 2 2,103,840,895 2 1 66,206 1,766,587,029 66,206 2 346,918,235 52,677,691 52,677,691 3 1,727,195,615 12,820,854 12,820,854 4 29,557,400 103,184 103,184 5 82,357 21,467 21,467 6 221 21,240 227 Total 2,103,820,036 3,936,072,360 65,689,631 Ratio 100.00% 187.09% 3.12%

  • Traversal.edges$of$

Kronecker$graph$ (SCALE$26)

  • nly

switch switch

slide-117
SLIDE 117

Today’s Biz

  • 1. Reminders
  • 2. Review
  • 3. Biological Network Analysis Topics
  • 4. Hybrid processing - direction optimizing push/pull
  • 5. Assignment 2 solutions

14 / 14