Sophisticated models in Bio++ Julien Dutheil, Bastien Boussau Birc, - - PowerPoint PPT Presentation

sophisticated models in bio
SMART_READER_LITE
LIVE PREVIEW

Sophisticated models in Bio++ Julien Dutheil, Bastien Boussau Birc, - - PowerPoint PPT Presentation

Sophisticated models in Bio++ Julien Dutheil, Bastien Boussau Birc, Aarhus; LBBE, Lyon Friday, December 19th 2008 J. Dutheil, B. Boussau (Birc; LBBE) Models in Bio++ 19/12/08 1 / 13 Models of sequence evolution A tree a b c J. Dutheil,


slide-1
SLIDE 1

Sophisticated models in Bio++

Julien Dutheil, Bastien Boussau

Birc, Aarhus; LBBE, Lyon

Friday, December 19th 2008

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 1 / 13

slide-2
SLIDE 2

Models of sequence evolution

A tree

a b c

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 2 / 13

slide-3
SLIDE 3

Models of sequence evolution

A tree

a b c

A model of substitution

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 2 / 13

slide-4
SLIDE 4

Models of substitution in Bio++

  • for proteins and nucleic acids (codons: soon! )
  • with a gamma law to account for evolutionary rate heterogeneities

between sites

  • possibility for a class of invariant sites
  • possibility for covarion (heterotachous) models:
  • on-off models (Tuffley and Steel 1998)
  • change between rates of evolution (Galtier 2001)
  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 3 / 13

slide-5
SLIDE 5

Homogeneous and branch-heterogeneous models in Bio++

Homogeneous model

a b c

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 4 / 13

slide-6
SLIDE 6

Homogeneous and branch-heterogeneous models in Bio++

Homogeneous model

a b c

Heterogeneous model

a b c

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 4 / 13

slide-7
SLIDE 7

A simple model of substitution: Tamura’s (1992)

  • κ: Transition/transversion ratio
  • θ: Equilibrium G+C content

Galtier and Gouy, Mol. Biol. Evol. 1998.

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 5 / 13

slide-8
SLIDE 8

Galtier and Gouy model of sequence evolution (1998)

Model

a b c

  • 1 model per branch
  • each model is characterized by

an equilibrium G+C content Parameters

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 6 / 13

slide-9
SLIDE 9

Models in Bio++

General non-homogeneous model of substitution. In the homogeneous case, θ and κ are constant over the tree (case ’a’). In Galtier and Gouy’s 1998 model, κ is constant over the tree and one distinct θ is allowed per branch (case ’b’). Between these two extrema lay models with certain branches, but not all, sharing a common value of θ (case ’c’). In the most general case ’d’, there are two sets of parameters, one for κ and another for θ, that are shared by the branches of the tree.

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 7 / 13

slide-10
SLIDE 10

Associating models to branches

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 8 / 13

slide-11
SLIDE 11

Bio++ and BppSuite

BppSuite is a set of programs implementing various methods for the evolutionary study of sequences:

  • BppDist: distance estimation and tree reconstruction
  • BppPars: parsimony analyses
  • BppML: ML reconstruction of phylogenetic trees, including

using non-homogeneous models

  • BppSeqGen: sequence simulation, including using

non-homogeneous models

  • BppAncestor: ancestral sequence reconstruction, including

using non-homogeneous models

  • BppSeqMan: sequence and alignment manipulation
  • BppConsense: building of consensus trees
  • BppPhySamp: select sequences according to a tree or a distance

matrix

  • BppReRoot: automatic re-rooting of trees
  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 9 / 13

slide-12
SLIDE 12

Specifying options of BppSuite programs

Launching an analysis with bppml Example: bppml param=fichier.opt fichier.opt alphabet = DNA sequence.file = sequences.fasta sequence.format = Fasta sequence.sites to use = complete tree.file = tree.dnd etc...

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 10 / 13

slide-13
SLIDE 13

Associating models to branches in BppSuite

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 11 / 13

slide-14
SLIDE 14

Exercise

  • THE DATA: A well-known scientist is working on a family of

homologous genes (file ”sequences.fasta”). These sequences come from closely-related species and have been named according to their species of origin: S vulg, S con, S dio, S lat, S dic. Specifically in species S dio, S lat, and S dic, the gene is found on sexual

  • chromosomes. For each of these species, the alignment thus contains

two sequences, one from the X chromosome (X is put at the end of the name), and one from the Y chromosome (Y is put at the end of the name). The famous scientist has built a rooted phylogenetic tree relating all sequences in his dataset (file ”tree.dnd”).

  • THE PROBLEM: The scientist suspects there might have been some

Biased Gene Conversion (BGC) going on on the branch leading to the group containing sequences S dioY, S latY, and S dicY. This BGC is expected to increase the number of substitutions towards bases G and

  • C. Your aim is to test for the presence of BGC on this branch.
  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 12 / 13

slide-15
SLIDE 15

Exercise

  • THE AIMS:
  • Using bppML, devise a test to see whether the data rejects BGC on

this branch.

  • Meanwhile, try to accurately characterize the evolution in this dataset.

Is there significant rate heterogeneity? Covarion-like evolution? How important has been process heterogeneity in the evolution of this dataset?

  • THE METHOD:
  • Option files have been partially filled. You need to complete them to

build a proper model to make hypothesis 0 (model 0: there was no heterogeneity in the evolution of the dataset), hypothesis 1 (model 1: there has been one significant change in the evolutionary process on

  • ne particular branch), hypothesis 2 (model 2: the evolution has been

globally heterogeneous, with different processes on different branches).

  • Play with the options to better characterize sequence evolution
  • Use likelihood ratio tests to compare hypotheses.
  • BONUS QUESTION:
  • Think of another way to test whether the evolutionary process has

been particular on the branch of interest. BppSuite may be useful once again; you may need to do a little bit of programming.

  • J. Dutheil, B. Boussau (Birc; LBBE)

Models in Bio++ 19/12/08 13 / 13