CS481: Bioinformatics Algorithms Can Alkan EA224 - - PowerPoint PPT Presentation

cs481 bioinformatics
SMART_READER_LITE
LIVE PREVIEW

CS481: Bioinformatics Algorithms Can Alkan EA224 - - PowerPoint PPT Presentation

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/~calkan/teaching/cs481/ CLUSTERING USING GRAPHS Clique Graphs A clique is a graph with every vertex connected to every other vertex


slide-1
SLIDE 1

CS481: Bioinformatics Algorithms

Can Alkan EA224 calkan@cs.bilkent.edu.tr

http://www.cs.bilkent.edu.tr/~calkan/teaching/cs481/

slide-2
SLIDE 2

CLUSTERING USING GRAPHS

slide-3
SLIDE 3

Clique Graphs

 A clique is a graph with every vertex connected

to every other vertex

 A clique graph is a graph where each

connected component is a clique

slide-4
SLIDE 4

Transforming an Arbitrary Graph into a Clique Graphs

  • A gra

raph can be tr transfo form rmed ed into to a cliqu que gra raph by adding or r r removing ing edges

slide-5
SLIDE 5

Corrupted Cliques Problem

Input: A graph G Output: The smallest number of additions and removals of edges that will transform G into a clique graph

slide-6
SLIDE 6

Distance Graphs

 Turn the distance matrix into a distance graph

 Genes are represented as vertices in the graph  Choose a distance threshold θ  If the distance between two vertices is below θ,

draw an edge between them

 The resulting graph may contain cliques  These cliques represent clusters of closely

located data points

slide-7
SLIDE 7

Transforming Distance Graph into Clique Graph

The distance graph (threshold θ=7) is transformed into a clique graph after removing the two highlighted edges After transforming the distance graph into the clique graph, the dataset is partitioned into three clusters

slide-8
SLIDE 8

Heuristics for Corrupted Clique Problem

 Corrupted Cliques problem is NP-Hard, some

heuristics exist to approximately solve it:

 CAST (Cluster Affinity Search Technique): a

practical and fast algorithm:

 CAST is based on the notion of genes close to

cluster C or distant from cluster C

 Distance between gene i and cluster C:

d(i,C) = average distance between gene i and all genes in C

Gene i is clo lose to cluster C if d(i,C)< θ and dis istant nt otherwise

slide-9
SLIDE 9

CAST Algorithm

1.

CAST(S, G, θ)

2.

P  Ø

3. 3.

while S ≠ Ø

4.

V  vertex of maximal degree in the distance graph G

5.

C  {v}

6. 6.

while a close gene i not in C or distant gene i in C exists

7.

Find the nearest close gene i not in C and add it to C

8.

Remove the farthest distant gene i in C

9.

Add cluster C to partition P

10.

S  S \ C

11.

Remove vertices of cluster C from the distance graph G

12.

return P S S – se set of elements ments, G G – dist stance ce graph, θ - dist stance ce thresh eshold

slide-10
SLIDE 10

CAST Algorithm

g1 g3 g2 g8 g4 g6 g5 g7 g9 g10 Θ = 7 7 P = Ø S={g1,…,g10} degree(g10) = 4 C1 = {g10} C1 = {g2, g10} d(g1, C1) = (7+8.1) / 2 = 7.55 d(g4, C1) = (0.9+1.1) / 2 = 1 d(g9, C1) = (2+1.1) / 2 = 1.55 C1 = {g2, g4, g10} d(g9,C) = (2+1.6+1) / 3 = 1.53 C1 = {g2, g4, g9, g10} P = {C1} 7 5.1 2.3 5.6 1.1 1 1.1 2 0.9 1.6 1.1 0.7 1

slide-11
SLIDE 11

CAST Algorithm

g1 g3 g8 g6 g5 g7 Θ = 7 7 P = {C1} C1 = {g2, g4, g9, g10} S={g1,g3,g5, g6,g7, g8} degree(g1) = 2 C2 = {g1} C2 = {g1, g6} d(g7, C2) = (5.1+5.6) / 2 = 5.35 C2 = {g1, g6, g7} P = {C1, C2} 5.1 2.3 5.6 1.1 0.7 1

slide-12
SLIDE 12

CAST Algorithm

g3 g8 g5 Θ = 7 7 P = {C1, C2} C1 = {g2, g4, g9, g10} C2 = {g1, g6, g7} S={g3,g5, g8} degree(g3) = 2 C3 = {g3} C3 = {g3, g5} d(g8, C3) = (1.1+1) / 2 = 1.05 C3 = {g3, g5, g8} P = {C1, C2, C3} 1.1 0.7 1

slide-13
SLIDE 13

CAST Algorithm

Θ = 7 7 P = {C1, C2, C3} C1 = {g2, g4, g9, g10} C2 = {g1, g6, g7} C3 = {g3, g5, g8} S = Ø … done

slide-14
SLIDE 14

GENOME REARRANGEMENTS

slide-15
SLIDE 15

Turnip vs Cabbage: Look and Taste Different

 Although cabbages and turnips share a recent

common ancestor, they look and taste different

slide-16
SLIDE 16

Turnip vs Cabbage: Almost Identical mtDNA gene sequences

 In 1980s Jeffrey Palmer studied evolution

  • f plant organelles by comparing

mitochondrial genomes of the cabbage and turnip

 99% similarity between genes  These surprisingly identical gene

sequences differed in gene order

 This study helped pave the way to

analyzing genome rearrangements in molecular evolution

slide-17
SLIDE 17

Turnip vs Cabbage: Different mtDNA Gene Order

 Gene order comparison:

Similarity blocks

slide-18
SLIDE 18

Turnip vs Cabbage: Different mtDNA Gene Order

 Gene order comparison:

slide-19
SLIDE 19

Turnip vs Cabbage: Different mtDNA Gene Order

 Gene order comparison:

slide-20
SLIDE 20

Turnip vs Cabbage: Different mtDNA Gene Order

 Gene order comparison:

slide-21
SLIDE 21

Turnip vs Cabbage: Different mtDNA Gene Order

 Gene order comparison:

Before After

Evolution is manifested as the divergence in gene order

slide-22
SLIDE 22

Transforming Cabbage into Turnip

slide-23
SLIDE 23

What are the similarity blocks and how to find

them?

What is the architecture of the ancestral

genome?

What is the evolutionary scenario for

transforming one genome into the other?

Unknown ancestor ~ 75 million years ago Mouse (X chrom.) Human (X chrom.)

Genome rearrangements

slide-24
SLIDE 24

History of Chromosome X

Rat Consortium, Nature, 2004

slide-25
SLIDE 25

Reversals

Blocks represent conserved genes.

1 3 2 4 10 5 6 8 9 7

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

slide-26
SLIDE 26

Reversals

1 3 2 4 10 5 6 8 9 7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

Blocks represent conserved genes.

In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.

slide-27
SLIDE 27

Reversals and Breakpoints

1 3 2 4 10 5 6 8 9 7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

The reversion introduced two breakpoints (disruptions in order).

slide-28
SLIDE 28

Reversals: Example

5’ ATGCCTGTACTA 3’ 3’ TACGGACATGAT 5’ 5’ ATGTACAGGCTA 3’ 3’ TACATGTCCGAT 5’ Break and Invert

slide-29
SLIDE 29

Types of Rearrangements

Reversal

1 2 3 4 5 6 1 2 -5 -4 -3 6

Translocation

1 2 3 4 5 6 1 2 6 4 5 3 1 2 3 4 5 6 1 2 3 4 5 6

Fusion Fission

slide-30
SLIDE 30

Comparative Genomic Architectures: Mouse vs Human Genome

 Humans and mice

have similar genomes, but their genes are

  • rdered differently

 ~245 rearrangements

 Reversals  Fusions  Fissions  Translocation

slide-31
SLIDE 31

Human chromosome 2