ORTHOLOGYAND PARALOGY CONSTRAINTS: SATISFIABILITY AND CONSISTENCY - - PowerPoint PPT Presentation

orthologyand paralogy
SMART_READER_LITE
LIVE PREVIEW

ORTHOLOGYAND PARALOGY CONSTRAINTS: SATISFIABILITY AND CONSISTENCY - - PowerPoint PPT Presentation

ORTHOLOGYAND PARALOGY CONSTRAINTS: SATISFIABILITY AND CONSISTENCY Manuel Lafond, Nadia El-Mabrouk University of Montreal Outline Introduction Gene trees, orthologs, paralogs , 3 problems, given a set of orthologs and paralogs


slide-1
SLIDE 1

ORTHOLOGYAND PARALOGY CONSTRAINTS: SATISFIABILITY AND CONSISTENCY

Manuel Lafond, Nadia El-Mabrouk University of Montreal

slide-2
SLIDE 2

Outline

  • Introduction
  • Gene trees, orthologs, paralogs, …
  • 3 problems, given a set of orthologs and paralogs
  • Satisfiability
  • Consistency with a species tree S
  • Self-consistency
  • Experiments
slide-3
SLIDE 3

Introduction

  • Gene trees reflect the evolutionary history of a family of

homologous genes

  • Genes that all descend from a common ancestor

G : a1 a2 b1 c1 d1 a,b,c,d are species Gene trees don’t have to be binary.

slide-4
SLIDE 4

Introduction

  • Ancestral genes may have undergone speciation or

duplication

Duplication Speciation G : a1 a2 b1 c1 d1

slide-5
SLIDE 5

Introduction

Orthologs : LCA has undergone speciation Paralogs : LCA has undergone duplication

For instance, according to G : a1, b1 are paralogs a1, c1 are orthologs

G : Duplication Speciation (LCA = Lowest Common Ancestor) a1 a2 b1 c1 d1

slide-6
SLIDE 6

Introduction

If we have G (and trust its Dup/Spec labeling), then we have all orthology/paralogy relationships.

G :

Orthologs a1b1 a1c1 a1d1 a2c1 a2d1 b1c1 b1d1 c1d1 Paralogs a1a2 a1b1

a1 a2 b1 c1 d1

slide-7
SLIDE 7

Introduction

How does that go the other way around ?

If we have the orthology/paralogy relationships, can we get the gene tree ?

Orthologs a1b1 a1c1 a1d1 a2c1 a2d1 b1c1 b1d1 c1d1 Paralogs a1a2 a1b1

?

slide-8
SLIDE 8

Introduction

Various software let us infer orthology (and sometimes paralogy) without a gene tree Sequence-based

COG (Tatusov, Galperin, Natale & Koonin, 2000) OrthoMCL (Li, Stoeckert & Roos, 2003) InParanoid (Berglund, Sjolund, Ostlund & Sonnhammer, 2008) Proteinortho (Findeib, Steiner, Marz, Stadler & Prohaska, 2011) …

Gene order-based

GIGA (Thomas, 2010) SYNERGY (Wapinski, Pfeffer, Friedman & Regev, 2007) [Unnamed] (Lafond, Swenson, El-Mabrouk, 2013)

slide-9
SLIDE 9

Introduction

None of them finds ALL

  • rthologies/paralogies !

Various software let us infer orthology (and sometimes paralogy) without a gene tree Sequence-based

COG OrthoMCL InParanoid Proteinortho …

Gene order-based

GIGA SYNERGY [Unnamed]

slide-10
SLIDE 10

Satisfiability

Orthologs = (a, b) (a, c) (c, d) Paralogs = (a, d) (b, d) Is there some gene tree and Dup/Spec labeling that displays these relationships ?

slide-11
SLIDE 11

Satisfiability

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d)

a b c d

slide-12
SLIDE 12

Satisfiability

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d)

a b c d

slide-13
SLIDE 13

Satisfiability

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

slide-14
SLIDE 14

Satisfiability

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

a d

slide-15
SLIDE 15

Satisfiability

a d b

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

slide-16
SLIDE 16

Satisfiability

a d b c

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

slide-17
SLIDE 17

Satisfiability

a d b c

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

slide-18
SLIDE 18

Satisfiability

a d b c

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

slide-19
SLIDE 19

Satisfiability

a d b c

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

slide-20
SLIDE 20

Satisfiability

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) I JUST CAN’T ! THESE DON’T MAKE SENSE !

slide-21
SLIDE 21

Consistency with a species tree S

Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d)

a b d c Species tree S Gene tree G ?

slide-22
SLIDE 22

Consistency with a species tree S

Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d)

a b d c Species tree S Gene tree G a c d b

slide-23
SLIDE 23

Consistency with a species tree S

a b d c Species tree S Gene tree G

Consistency with a species tree S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S.

a c d b

slide-24
SLIDE 24

Consistency with a species tree S

a b d c Species tree S Gene tree G

Consistency with a species tree S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S.

Speciation a c d b

slide-25
SLIDE 25

Consistency with a species tree S

Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d)

a b d c Species tree S Gene tree G ?

slide-26
SLIDE 26

Consistency with a species tree S

Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d)

a b d c Species tree S Gene tree G a c b d

slide-27
SLIDE 27

Consistency with a species tree S

Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d)

a b d c Species tree S Gene tree G a c b d Speciation

slide-28
SLIDE 28

Self-consistency

Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Can we build a gene tree G displaying these relationships such that there exists some species tree S consistent with it ?

slide-29
SLIDE 29

Self-consistency

Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d)

Gene tree G Speciation a c d b

slide-30
SLIDE 30

Self-consistency

Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d)

Gene tree G Speciation a c d b a c d b Species tree S

slide-31
SLIDE 31

Not self-consistent

a b c S a1 b1 c1 b2 a2 c2 G

slide-32
SLIDE 32

Not self-consistent

a b c S a1 b1 c1 b2 a2 c2 G b a c S’

slide-33
SLIDE 33

The problem(s)

Given a set C of orthologs and paralogs :

  • 1. Is C satisfiable ?

Does there exist a DS-tree that exhibits all relationships in C ?

  • 2. Is C consistent with a given species tree S ?

Is there some DS-tree that satisfies C that is also consistent with S ?

  • 3. Is C self-consistent ?

Is there some species tree that C is consistent with ?

slide-34
SLIDE 34

Satisfiability

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) Constraint graph R

Orthologs Paralogs a b c d

slide-35
SLIDE 35

Satisfiability

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

a b c d Orthologs Paralogs a b c d a b c d R RO RP

slide-36
SLIDE 36

Satisfiability

(Hernandez-Rosales & al., 2012) If R is a complete graph, then the given set of relationships is satisfiable iff RO is P4-free (and equivalently, if RP is P4-free)

a b c d Orthologs Paralogs a b c d a b c d R RO RP

slide-37
SLIDE 37

Unknown relationships

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d)

a b c d R The (b,c) relationship is unknown. Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P4-free

slide-38
SLIDE 38

Unknown relationships

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d)

a b c d R The (b,c) relationship is unknown. Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P4-free

slide-39
SLIDE 39

Unknown relationships

Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d)

a b c d R The (b,c) relationship is unknown. Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P4-free This problem is equivalent to the Graph Sandwich Problem on the class of cographs

slide-40
SLIDE 40

Satisfiability

Theorem (Golumbic, Kaplan and Shamir, 1994) : A relationship graph R is satisfiable iff at least one

  • f the following holds :

1) RO is disconnected, and each of its component

is satisfiable

2) RP is disconnected, and each of its component

is satisfiable

slide-41
SLIDE 41

Constructing a gene tree

a b c d e f g

slide-42
SLIDE 42

Constructing a gene tree

a b c d e f g

RP is connected, nothing to do here.

slide-43
SLIDE 43

Constructing a gene tree

a b c d e f g

RO has 2 components, X and Y.

X Y

slide-44
SLIDE 44

Constructing a gene tree

a b c d e f g

RO has 2 components, X and Y. All edges going from X to Y are either black or blue (paralogy or unknown).

X Y

slide-45
SLIDE 45

Constructing a gene tree

a b c d e f g X Y

RO has 2 components, X and Y. All edges going from X to Y are either black or blue (paralogy or unknown). Make it all blue !

slide-46
SLIDE 46

Constructing a gene tree

a b c d e f g

Now, all genes of X are paralog to all genes of Y. We can start building

  • ur gene tree as such :

X Y Y X

slide-47
SLIDE 47

Constructing a gene tree

Repeat with X, and Y.

Y X a b c a b c a b c RO[X] RP[X] X

slide-48
SLIDE 48

Constructing a gene tree

Repeat with X, and Y,

Y a b c a b c a b c a b c RO[X] RP[X]

slide-49
SLIDE 49

Constructing a gene tree

Repeat with X, and Y.

Y b c a b c

slide-50
SLIDE 50

Constructing a gene tree

Repeat with X, and Y.

a b c d e g d e g e g f f f d RP[Y]

slide-51
SLIDE 51

Constructing a gene tree

a b c e g f d a b c d e f g

slide-52
SLIDE 52

Consistency with a species tree S

a b c d g e f S a b c e g f d G

slide-53
SLIDE 53

Consistency with a species tree

Consistency with S: If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S.

a b c d g e f S a b c e g f d G

slide-54
SLIDE 54

Consistency with a species tree

a b c d g e f S a b c e g f d G

Inconsistent ! Consistency with S: If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S.

slide-55
SLIDE 55

Careful component selection

Problem: at this step Y, we chose to separate {e,g} from {f,d} by speciation, contradicting S.

d e g f e g f d a b c d g e f S RP[Y]

slide-56
SLIDE 56

Careful component selection

a b d c S a b c d

slide-57
SLIDE 57

Careful component selection

a b d c S a b c d a b c d RP

slide-58
SLIDE 58

Careful component selection

a b d c S a b c d a b c d RP a c d b NOT CAREFUL S does not separate {a,c} from {b}

slide-59
SLIDE 59

Careful component selection

a b d c S a b c d a b c d RP a c d b CAREFUL

slide-60
SLIDE 60

Careful component selection

a b d c S a b c d a b c d RP a c d b CAREFUL

slide-61
SLIDE 61

Consistency with S

Theorem : A relationship graph R is consistent with S iff at least one of the following holds :

1) RO is disconnected, and each of its component

is satisfiable

2) RP is disconnected, its components admit a

non-trivial speciation partition P, and each member of P is consistent with S

slide-62
SLIDE 62

Self-consistency

a b c S a1 b1 c1 b2 a2 c2 G b a c S’

slide-63
SLIDE 63

Self-consistency

Is there some gene tree G that satsfies R, such that some species tree S is consistent with G ? The complexity of the problem is open…

slide-64
SLIDE 64

Self-consistency

Suppose we have all relationships. Every triangle with exactly one blue edge forces a triplet in the gene tree, and consequently in the species tree.

a1 b1 c1 a1 b1 c1 G b c S a

slide-65
SLIDE 65

Self-consistency

Theorem : a full (no unknowns), satisfiable relationship graph R is self-consistent (consistent with some species tree) iff all triplets forced by one-blue-edge triangles can all be displayed together in the same species tree.

slide-66
SLIDE 66

Self-consistency

Theorem : a full (no unknowns), satisfiable relationship graph R is self-consistent (consistent with some species tree) iff all triplets forced by one-blue-edge triangles can all be displayed together in the same species tree. Branch-and-bound algorithm with unknown edges : Try both possibilities with every unknown edge e. At every choice, run BUILD on the forced triplets. If BUILD fails, don’t keep going and try some other choice.

slide-67
SLIDE 67

Experiments

We looked at 265 inferred families from ProteinOrtho, under 5 parameter sets {-2, -1, 0, +1, +2}. Looser => More orthologies Stricter => Less orthologies

  • 2
  • 1

+1 +2 Default

slide-68
SLIDE 68

Experiments

Looser => More orthologies Stricter => Less orthologies

  • 2
  • 1

+1 +2 Default

slide-69
SLIDE 69

Experiments

Looser => More orthologies Stricter => Less orthologies

  • 2
  • 1

+1 +2 Default

Satisfiable ? Consistent ?

slide-70
SLIDE 70

Experiments

Looser => More orthologies Stricter => Less orthologies

  • 2
  • 1

+1 +2 Default

Satisfiable ? NO (~90% of families) Consistent ? NO (~96% of families)

slide-71
SLIDE 71

Experiments

Looser => More orthologies Stricter => Less orthologies

  • 2
  • 1

+1 +2 Default NOT Satisfiable NOT Consistent 80% 82% 90% 83% 70% 93% 95% 96% 95% 89%

slide-72
SLIDE 72

Experiments

Looser => More orthologies Stricter => Less orthologies

  • 2
  • 1

+1 +2 Default Can we get some robust relationships

  • ut of these ?
slide-73
SLIDE 73

Experiments

Looser => More orthologies Stricter => Less orthologies

  • 2
  • 1

+1 +2 Default Can we get some robust relationships

  • ut of these ?
slide-74
SLIDE 74

Experiments

  • 2

+2 Keep the common

  • rthologies and

paralogies. The rest is unknown.

slide-75
SLIDE 75

Experiments

When combining +2/-2 as such, we find that these partial relationships are satisfiable for 98% of families consistent for 65% of families On average, 42% of all possible relationships are known

  • 2

+2 Keep the common

  • rthologies and

paralogies. The rest is unknown.

slide-76
SLIDE 76

Experiments

  • 1/+2
  • 1/+1
  • 2/+1
  • 2/+2

NOT Satisfiable NOT Consistent 1.9% 2.6% 4.2% 4.1% 35.1% 35.1% 44.8% 40.8%

slide-77
SLIDE 77

Conclusion

  • Gene tree correction
  • Given a set of consistent orthollogs/paralogs, modify G such

that it exhibits the relationships

  • Multiple solutions…how to choose one or list them ?
  • Complexity O(n3) for satisfiability and consistency

with a species tree

  • Can we do better ?
  • Complexity of consistency : ????