Nick Hamilton Institute for Molecular Bioscience Essential Graph - - PowerPoint PPT Presentation

nick hamilton institute for molecular bioscience
SMART_READER_LITE
LIVE PREVIEW

Nick Hamilton Institute for Molecular Bioscience Essential Graph - - PowerPoint PPT Presentation

Nick Hamilton Institute for Molecular Bioscience Essential Graph Theory for Biologists Image: Matt Moores, The Visible Cell Outline Core definitions Which are the most important bits? Which are the most important bits? What happens when


slide-1
SLIDE 1

Nick Hamilton Institute for Molecular Bioscience Essential Graph Theory for Biologists

Image: Matt Moores, The Visible Cell

slide-2
SLIDE 2

Outline

  • Core definitions
  • Which are the most important bits?

Which are the most important bits?

  • What happens when I break it? Robustness

Wh h f i l d l ?

  • What are the functional modules?
  • Are there functional modules?
  • Getting around in a graph
  • Graph algorithms

Graph algorithms

  • Trees & hierarchical structure

S ll ld d l f h

  • Small world and scale free graphs
  • Software
slide-3
SLIDE 3

Core Definitions

A graph is a collection of nodes or vertices and a set of edges that t i f d connect pairs of nodes. Edges may be undirected or directed or have loops A graph might have multiple disconnected components

3 components

slide-4
SLIDE 4

A simple example p p

Nodes: people in this room Edges: “are friends” Nodes: people in this room Nodes: people in this room Edges: “likes”

slide-5
SLIDE 5

Which graph bit is the most important? g p p

For an undirected graph, the degree of a node is the number of edges connected to a node

Degree 6 Degree 0

If the graph is directed, define in‐degree and out‐degree defined similarly similarly

I d 2 In‐degree 2 Out‐degree 4

slide-6
SLIDE 6

Which graph bit is the most important?

A hub node is a node of “high” degree, relatively The inevitable example, the p53 protein interaction network

Image: Dartnell et al, FEBS Letters 579, 2005 P53: crucial for cell cycle and apoptosis

slide-7
SLIDE 7

Importance: What happens if I break it? p pp

Node Deletion. Take the graph and delete a node and all its edges. Node separation set: a subset of nodes whose deletion causes Node separation set: a subset of nodes whose deletion causes the number of components in the graph to increase Mutations reducing p53 activity are present in over 50% of human tumours! (Haupt et al. 2003)

slide-8
SLIDE 8

Importance: What happens if I break it? p pp

Edge Deletion. Delete an edge (but not the nodes it joins) Cut set: as for node separation set, but deleting edges Network Robustness: how hard is it to break the network? Delete a random node or edge: it is still connected?

slide-9
SLIDE 9

What are the (functional) modules? ( )

Components But what about:

Mathematicians Biologists

Clique A subset of nodes each pair joined by an edge

  • Clique. A subset of nodes, each pair joined by an edge

A maximal clique is contain in no larger clique

slide-10
SLIDE 10

What at the (functional) modules? ( )

e‐Near Clique. A subset of nodes such that a fraction of e pairs

  • f nodes have an edge between them

10/15 near clique 10/15 – near clique 3‐clique q Co‐Clique. A subset of nodes, no two joined by an edge

Green nodes are a co‐clique

slide-11
SLIDE 11

Are there modules? ‐ Clustering Coefficient g

How do we tell if a node u is in a cluster? C = 8/21 Cu = 0 Cu 8/21

u u

Why? ‐ Lots of triangles on the node i e mutual connection ‐ i.e. mutual connection

For a node u of degree k, where there are e edges between neighbours of u, define the cluster coefficient Cu as:

Cu = e / [k(k‐1)/2]

u

/ [ ( )/ ]

# triangles on u Maximum possible # triangles on u

For a graph, then define the average cluster coefficient

slide-12
SLIDE 12

Getting around in a Graph

  • Path. A “walk” through the graph with no repeated edges
  • Path. A walk through the graph with no repeated edges

a c d a-c-d

  • Cycle. A path that begins and ends at the same node

b

  • Cycle. A path that begins and ends at the same node

a c d a-b-c-a b

  • Connected. There is a path between any two nodes
slide-13
SLIDE 13

For instance, Metabolic Pathways

http://www.genome.jp/kegg/pathway/map/map00260.html

slide-14
SLIDE 14

Path Example: Shotgun sequence reconstruction

Original Sequence Fragments b e Fragments a c d f g

Construct overlap graph d f t

a f

nodes: sequence fragments edges: the tail of one fragment overlaps the head of another

b d e a c f g Warning: the above ignore all the awful details: sequencing errors, repeats, … f

slide-15
SLIDE 15

Hamiltonian (no relation) Paths

Original Sequence Fragments b d e c d g a f

Hamiltonian Path: Visits every node exactly once

b d e g a c f

slide-16
SLIDE 16

Edge Weights

But there might be multiple Hamiltonian paths Which is “best”? Which is best ?

4

  • r

? 3 5 3 6 6 3 5 3 3 6 6

U d i ht t f l b t f t

3 3 Total 11 Total 15

Use edge weights : amount of overlap between fragments M l h t bi d b tt f h h “f ” More overlap means a shorter combined sequence: better In fact this is just the “famous” travelling salesman problem

slide-17
SLIDE 17

Trees and Hierarchical Structure

A tree is an undirected connected acyclic graph A directed tree is a directed graph that would be tree if the directions were ignored directions were ignored

Noam Chomsky, Syntactic Structures

Species Tree with LGT events

slide-18
SLIDE 18

Small World Networks

Stanley Milgram in 1967 “showed” social networks have “six degrees of separation” and other shocking experiments Variations: Six degrees of Kevin Bacon, Erdös Number, Six degrees of Eric Clapton. Erdös‐Bacon‐Sabbath Number. g p Defining characteristics of small world networks Defining characteristics of small world networks ‐ Most nodes are not directly connected to each other C t f b t t i f d i f t ‐ Can get from between most pair of nodes in few steps [For N nodes, average pair distance proportional to Log(N)] Watts & Strogatz (Nature, 1998): constructed networks with small average shortest path & high clustering coefficient

slide-19
SLIDE 19

Properties and Examples of Small World Networks p p

Think “airports”, “connecting flights”

  • Lots of hubs
  • Often have cliques and near cliques

q q

  • Said to be robust to perturbation (though hubs are vulnerable)

For example (but beware, cf Lima‐Mendez & van Helden 2009)

  • Transcriptional networks

Transcriptional networks

  • Metabolic networks
  • Protein interaction networks
  • Neural connections
  • You name it, it is a small world!
slide-20
SLIDE 20

Scale Free Networks

  • Barabasi & Albert (Science, 1999)
  • Have power law distribution of degrees: P(k) ~ k‐α

ee k s with degre

  • n of nodes

Actors Web pages Power grid

Proportio

  • Can be constructed by preferential attachment
  • They are “ultra‐small worlds”: Log(Log(N)) steps

(Cohen & Havlin, 2003)

slide-21
SLIDE 21

Software for Graph Exploration & Visualisation

Tulip: 2D and 3D interactive visualisation of graphs Pajek: graph algorithms and visualisation

See: http://www google com/ http://www.google.com/ Top/Science/Math/ Combinatorics/Software/ Graph_Drawing/ For a selection of tools

Matlab (MatlabBGL): Graph algorithms & metrics Cytoscape: viz. interaction networks/pathways GraphViz: sophisticated graph layout

images nicked from the respective websites

slide-22
SLIDE 22

Further Reading

  • Mark Buchanan, Small World: Uncovering Nature’s Hidden

Networks

  • Albert & Barabasi, Emergence of scaling in random networks,

Science 286(286):509‐512 , 1999

  • Watts, & Stogatz, Collective dynamics of small world

, g , y networks, Nature 393:440‐444, 1998

  • Lima‐Mendez & van Helden. The powerful law of the power

l d th th i t k bi l M l Bi law and other myths in network biology. Mol. Biosys. 5(12):1482‐9, 2009

slide-23
SLIDE 23

Summary

  • Node Degree: Which are the most important bits?
  • Node & Edge Cuts: What happens when I break it? Robustness
  • Cliques & Clusters: What are the functional modules?

Cliques & Clusters: What are the functional modules?

  • Cluster Coefficient: Are there functional modules?

h d h d h

  • Paths & Edge Weights: Getting around in a graph
  • Graph algorithms: Are usually hard
  • Trees: Are ubiquitous
  • Small world and scale free graphs: Are popular

Small world and scale free graphs: Are popular

  • Software: There is some
slide-24
SLIDE 24

Nick Hamilton Institute for Molecular Bioscience

The End The End

Image: Matt Moores, The Visible Cell