The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, - - PowerPoint PPT Presentation

the potential of family free genome comparison
SMART_READER_LITE
LIVE PREVIEW

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, - - PowerPoint PPT Presentation

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th evenin, Roland Wittler (Bielefeld, Bordeaux, Rio de Janeiro, Vancouver) MAGE, 26 August 2013 The


slide-1
SLIDE 1

The Potential of Family-Free Genome Comparison

Mar´ ılia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th´ evenin, Roland Wittler

(Bielefeld, Bordeaux, Rio de Janeiro, Vancouver)

MAGE, 26 August 2013

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

The Potential of Family-Free Genome Comparison

Mar´ ılia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th´ evenin, Roland Wittler

(Bielefeld, Bordeaux, Rio de Janeiro, Vancouver)

MAGE, 26 August 2013

slide-8
SLIDE 8

Introduction

Comparative genomics

Two levels of genome evolution: Small scale mutations: point mutations Large scale mutations: rearrangements, duplications, insertions, deletions Structural organization provides insights into: phylogeny and evolution gene function and interactions

The Potential of Family-Free Genome Comparison (5 / 27) Jens Stoye

slide-9
SLIDE 9

1 2 3 4 5 6 7 8 6 5 4 3 2 1 7 8

slide-10
SLIDE 10

Introduction

Comparative genomics with gene families

Picture with gene families: Simple and powerful data type Many databases and tools available Produce reasonable results

The Potential of Family-Free Genome Comparison (7 / 27) Jens Stoye

slide-11
SLIDE 11

Introduction

The Family-free Principle

More realistic picture: Computational prediction of gene families is (mostly) unsupervised Do not always correspond to biological gene families Wrong gene family assignments may produce incorrect results in subsequent analyses

The Potential of Family-Free Genome Comparison (8 / 27) Jens Stoye

slide-12
SLIDE 12

Introduction

The Family-free Principle

Gene family assignments not necessary:

◮ If subsequent analyses can deal with

  • riginal data

◮ For example gene similarity scores

We may even invert the scenario:

◮ Integrated analysis: ortholog

assignments and gene order analysis

◮ Gene family assignment based on

positional orthology

The Potential of Family-Free Genome Comparison (9 / 27) Jens Stoye

slide-13
SLIDE 13

Introduction

The Family-free Principle

Family-free Principle

Gene similarities Other applications Combined methods... Conserved structures Rearrangements Ancestral genome reconstruction Pairwise proximities Gene set proximities Single

  • peration

models Combined

  • perations

Median-

  • f-three

Whole genome duplication Contig layouting Gene family prediction Phylogenetic distances ...for conserved structure detection, ancestral genome reconstruction and gene family prediction. The Potential of Family-Free Genome Comparison (10 / 27) Jens Stoye

slide-14
SLIDE 14

Introduction

The Family-free Principle

Family-free Principle

Gene similarities Other applications Combined methods... Conserved structures Rearrangements Ancestral genome reconstruction Pairwise proximities Gene set proximities Single

  • peration

models Combined

  • perations

Median-

  • f-three

Whole genome duplication Contig layouting Gene family prediction Phylogenetic distances ...for conserved structure detection, ancestral genome reconstruction and gene family prediction. The Potential of Family-Free Genome Comparison (10 / 27) Jens Stoye

slide-15
SLIDE 15

Conserved structures

Conserved structures

Family-free Principle

Gene similarities Other applications Combined methods... Conserved structures Rearrangements Ancestral genome reconstruction Pairwise proximities Gene set proximities Single

  • peration

models Combined

  • perations

Median-

  • f-three

Whole genome duplication Contig layouting Gene family prediction Phylogenetic distances ...for conserved structure detection, ancestral genome reconstruction and gene family prediction. The Potential of Family-Free Genome Comparison (11 / 27) Jens Stoye

slide-16
SLIDE 16

Conserved structures

Gene similarity graph

Gene similarity graph of 3 genomes:

The Potential of Family-Free Genome Comparison (12 / 27) Jens Stoye

slide-17
SLIDE 17

Conserved structures

Gene similarity graph

Gene similarity graph of 3 genomes:

The Potential of Family-Free Genome Comparison (12 / 27) Jens Stoye

slide-18
SLIDE 18

Conserved structures

Gene similarity graph

Gene similarity graph of 3 genomes:

The Potential of Family-Free Genome Comparison (12 / 27) Jens Stoye

slide-19
SLIDE 19

Conserved structures

Partial k-matching Partial k-(dimensional) matching

Given a gene similarity graph B = (G1, . . . , Gk, E), a partial k-matching M ⊆ E is a selection of edges such that for each connected component C ⊆ BM := (G1, . . . , Gk, M) no two genes in C belong to the same genome. For k = 3: 2k − 1 = 7 valid components

The Potential of Family-Free Genome Comparison (13 / 27) Jens Stoye

slide-20
SLIDE 20

Conserved structures

Partial k-matching

Gene similarity graph of 3 genomes: . . . how to construct such a matching?

The Potential of Family-Free Genome Comparison (14 / 27) Jens Stoye

slide-21
SLIDE 21

Conserved structures

Assessing matching properties

Adjacency: proximity relation between two genes Adjacency score for consecutive genes (g, g′) in genome G and (h, h′) in genome H:

s(g, g′, h, h′) = w(eg,h) · w(eg′,h′) if (g, g′), (h, h′) form a conserved adjacency

  • therwise

The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

slide-22
SLIDE 22

Conserved structures

Assessing matching properties

Adjacency: proximity relation between two genes Adjacency score for consecutive genes (g, g′) in genome G and (h, h′) in genome H:

s(g, g′, h, h′) = w(eg,h) · w(eg′,h′) if (g, g′), (h, h′) form a conserved adjacency

  • therwise

Adjacency measure in M:

adj(M) =

  • G,H
  • g left of g′ in G

h,h′ in H

s(g, g′, h, h′)

The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

slide-23
SLIDE 23

Conserved structures

Assessing matching properties

Adjacency: proximity relation between two genes Adjacency score for consecutive genes (g, g′) in genome G and (h, h′) in genome H:

s(g, g′, h, h′) = w(eg,h) · w(eg′,h′) if (g, g′), (h, h′) form a conserved adjacency

  • therwise

Adjacency measure in M:

adj(M) =

  • G,H
  • g left of g′ in G

h,h′ in H

s(g, g′, h, h′)

Similarity measure in M:

edg(M) =

  • e∈M

w(e)

The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

slide-24
SLIDE 24

Conserved structures

Family-free Adjacencies Problem Family-free Adjacencies Problem

Find matching M that maximizes the following formula: Fα(M) = α · adj(M) + (1 − α) · edg(M) . α Similarity Synteny 1

The Potential of Family-Free Genome Comparison (16 / 27) Jens Stoye

slide-25
SLIDE 25

Conserved structures

Gene set proximities: gene clusters

Relaxation: conserved neighborhood up to θ > 0 genes Scoring θ-adjacencies:

sθ(g, g′, h, h′) = w(eg,h) · w(eg′,h′) if (g, g′) and (h, h′) form a θ-adjacency

  • therwise

The Potential of Family-Free Genome Comparison (17 / 27) Jens Stoye

slide-26
SLIDE 26

Conserved structures

Gene set proximities: gene clusters

Based on θ-adjacencies we can define gene clusters as pairs of intervals with large maximum weight matching M:

The Potential of Family-Free Genome Comparison (18 / 27) Jens Stoye

slide-27
SLIDE 27

Conserved structures

Gene set proximities: consimilar intervals

Calculating a maximum matching for all pairs of intervals is expensive. Therefore use unweighted gene similarity graph Consimilar interval: many edges inside, no edges to neighbors. Algorithm: O(n3) time

The Potential of Family-Free Genome Comparison (19 / 27) Jens Stoye

slide-28
SLIDE 28

Conserved structures

Gene set proximities: consimilar intervals

Calculating a maximum matching for all pairs of intervals is expensive. Therefore use unweighted gene similarity graph Consimilar interval: many edges inside, no edges to neighbors. Algorithm: O(n3) time Ranking by score of maximum weight matching inside the intervals.

The Potential of Family-Free Genome Comparison (19 / 27) Jens Stoye

slide-29
SLIDE 29

Conserved structures

Gene set proximities: consimilar intervals

Calculating a maximum matching for all pairs of intervals is expensive. Therefore use unweighted gene similarity graph Consimilar interval: many edges inside, no edges to neighbors. Algorithm: O(n3) time Ranking by score of maximum weight matching inside the intervals.

The Potential of Family-Free Genome Comparison (19 / 27) Jens Stoye

slide-30
SLIDE 30

Rearrangements

Rearrangements

Family-free Principle

Gene similarities Other applications Combined methods... Conserved structures Rearrangements Ancestral genome reconstruction Pairwise proximities Gene set proximities Single

  • peration

models Combined

  • perations

Median-

  • f-three

Whole genome duplication Contig layouting Gene family prediction Phylogenetic distances ...for conserved structure detection, ancestral genome reconstruction and gene family prediction. The Potential of Family-Free Genome Comparison (20 / 27) Jens Stoye

slide-31
SLIDE 31

Rearrangements

DCJ – Double Cut and Join

DCJ accounts for rearrangement events: inversion, translocation, fusion, fission, transposition, block interchange Adjacency graph: distance dDCJ = N − C − I

2

The Potential of Family-Free Genome Comparison (21 / 27) Jens Stoye

slide-32
SLIDE 32

Rearrangements

DCJ – Double Cut and Join

From the gene similarity graph . . .

The Potential of Family-Free Genome Comparison (22 / 27) Jens Stoye

slide-33
SLIDE 33

Rearrangements

DCJ – Double Cut and Join

From the gene similarity graph to the weighted adjacency graph (WAG):

The Potential of Family-Free Genome Comparison (22 / 27) Jens Stoye

slide-34
SLIDE 34

Rearrangements

DCJ – Double Cut and Join

From the gene similarity graph to the weighted adjacency graph (WAG):

The Potential of Family-Free Genome Comparison (22 / 27) Jens Stoye

slide-35
SLIDE 35

Rearrangements

Family-free Rearrangement Problem Family-free Rearrangement Problem

Find matching MGH that maximizes the following formula: FDCJ

α

(MGH) = α · cyc(MGH) + (1 − α) · edg(MGH) where cyc(MGH) =

  • C∈C(MGH)
  • 1

|C|

  • e∈C

w(e)

  • C(MGH) := set of connected components in WAG(MGH)

The Potential of Family-Free Genome Comparison (23 / 27) Jens Stoye

slide-36
SLIDE 36

Ancestral genome reconstruction

Ancestral genome reconstruction

Family-free Principle

Gene similarities Other applications Combined methods... Conserved structures Rearrangements Ancestral genome reconstruction Pairwise proximities Gene set proximities Single

  • peration

models Combined

  • perations

Median-

  • f-three

Whole genome duplication Contig layouting Gene family prediction Phylogenetic distances ...for conserved structure detection, ancestral genome reconstruction and gene family prediction. The Potential of Family-Free Genome Comparison (24 / 27) Jens Stoye

slide-37
SLIDE 37

Ancestral genome reconstruction

Reconstruction of Ancestral Adjacencies

Emphasize adjacencies that are conserved in closely related genomes.

Phylogeny Aware Optimization Problem

Given an additive distance matrix DT , find matching M that maximizes the following formula:

Fα,T (M) =

  • G,H
  • (DT

max − DT GH) (α · adj(MGH) + (1 − α) · edg(MGH))

  • where

DT

max = max G,H {DT GH} The Potential of Family-Free Genome Comparison (25 / 27) Jens Stoye

slide-38
SLIDE 38

Conclusion and outlook

Conclusion and outlook

Family-free Principle

Gene similarities Other applications Combined methods... Conserved structures Rearrangements Ancestral genome reconstruction Pairwise proximities Gene set proximities Single

  • peration

models Combined

  • perations

Median-

  • f-three

Whole genome duplication Contig layouting Gene family prediction Phylogenetic distances ...for conserved structure detection, ancestral genome reconstruction and gene family prediction. The Potential of Family-Free Genome Comparison (26 / 27) Jens Stoye

slide-39
SLIDE 39

Thanks to:

Mar´ ılia D. V. Braga Cedric Chauve Daniel Doerr Katharina Jahn Annelyse Th´ evenin Roland Wittler

The Potential of Family-Free Genome Comparison (27 / 27) Jens Stoye

slide-40
SLIDE 40

Thanks to:

Mar´ ılia D. V. Braga Cedric Chauve Daniel Doerr Katharina Jahn Annelyse Th´ evenin Roland Wittler Andreas Dress: Happy birthday!

The Potential of Family-Free Genome Comparison (27 / 27) Jens Stoye

slide-41
SLIDE 41

Thanks to:

Mar´ ılia D. V. Braga Cedric Chauve Daniel Doerr Katharina Jahn Annelyse Th´ evenin Roland Wittler Andreas Dress: Happy birthday!

You!

The Potential of Family-Free Genome Comparison (27 / 27) Jens Stoye