Multiple sequence alignments and phylogenetic trees Multiple - - PowerPoint PPT Presentation

multiple sequence alignments and phylogenetic trees
SMART_READER_LITE
LIVE PREVIEW

Multiple sequence alignments and phylogenetic trees Multiple - - PowerPoint PPT Presentation

Multiple sequence alignments and phylogenetic trees Multiple sequence alignment (MSA) Software to generate MSAs MAFFT (very good, very fast) http://mafft.cbrc.jp/alignment/software/ Clustal Omega (very good, very fast)


slide-1
SLIDE 1

Multiple sequence alignments and phylogenetic trees

slide-2
SLIDE 2

Multiple sequence alignment (MSA)

slide-3
SLIDE 3

Software to generate MSAs

  • MAFFT

(very good, very fast) http://mafft.cbrc.jp/alignment/software/

  • Clustal Omega

(very good, very fast) http://www.ebi.ac.uk/Tools/msa/clustalo/

  • PRANK

(extremely good, very slow) http://wasabiapp.org/software/prank/

slide-4
SLIDE 4

File formats: FASTA (holds any sequence data)

>human MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY VTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLG YNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA >domestic_cat MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY VTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLG YNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTASKTETSQVAPA >chimpanzee MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY VTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLG YNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA

label (1 line) sequence (multiple lines)

slide-5
SLIDE 5

File formats: Clustal (holds an alignment)

CLUSTAL O(1.2.1) multiple sequence alignment human MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY chimpanzee MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY domestic_cat MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY *************** ******************************************** human VTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLG chimpanzee VTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLG domestic_cat VTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLG ***************************:****:*************************** human YNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA chimpanzee YNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA domestic_cat YNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTASKTETSQVAPA ********************:*************:*.***********

labels sequences consensus indicators: * = no variation : = highly similar amino acids . = somewhat similar amino acids

slide-6
SLIDE 6

File formats: Phylip (holds an alignment)

3 168 human MNGTEGPNFY VPFSNATGVV RSPFEYPQYY LAEPWQFSML AAYMFLLIVL chimpanzee MNGTEGPNFY VPFSNATGVV RSPFEYPQYY LAEPWQFSML AAYMFLLIVL domestic_c MNGTEGPNFY VPFSNKTGVV RSPFEYPQYY LAEPWQFSML AAYMFLLIVL GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVLGG FTSTLYTSLH GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVLGG FTSTLYTSLH GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG YNPVIYIMMN KQFRNCMLTT ICCGKNPLGD GYFVFGPTGC NLEGFFATLG YNPVIYIMMN KQFRNCMLTT ICCGKNPLGD GYFVFGPTGC NLEGFFATLG YNPVIYIMMN KQFRNCMLTT LCCGKNPLGD DEASATVSKT ETSQVAPA DEASATVSKT ETSQVAPA DEASTTASKT ETSQVAPA

labels sequences # of sequences sequence length

slide-7
SLIDE 7

Tools exist to convert from one sequence format to another

  • Online:

https://www.ebi.ac.uk/Tools/sfc/emboss_seqret/

  • In a script:

Use biopython SeqIO

slide-8
SLIDE 8

Storing trees: The Newick format

((A,B),(C,D)) A B C D (((A,B),C),D) A B C D A B C D

  • r
slide-9
SLIDE 9

What does this tree look like?

(A,((B,C),(D,E)),F)

slide-10
SLIDE 10

What does this tree look like?

(A,((B,C),(D,E)),F) B F A C D E

slide-11
SLIDE 11

uninformative

Not all sites in an alignment contain information about the tree topology A MNGTEG B MNGYER C MQGYDK D MQGTDI

slide-12
SLIDE 12

informative

Not all sites in an alignment contain information about the tree topology

A B C D

A MNGTEG B MNGYER C MQGYDK D MQGTDI

slide-13
SLIDE 13

uninformative

Not all sites in an alignment contain information about the tree topology A MNGTEG B MNGYER C MQGYDK D MQGTDI

slide-14
SLIDE 14

informative

Not all sites in an alignment contain information about the tree topology

C B A D

A MNGTEG B MNGYER C MQGYDK D MQGTDI

slide-15
SLIDE 15

informative

Not all sites in an alignment contain information about the tree topology

A B C D

A MNGTEG B MNGYER C MQGYDK D MQGTDI

slide-16
SLIDE 16

uninformative (in simplest model)

Not all sites in an alignment contain information about the tree topology A MNGTEG B MNGYER C MQGYDK D MQGTDI

slide-17
SLIDE 17

Not all sites in an alignment contain information about the tree topology

A B C D

A MNGTEG B MNGYER C MQGYDK D MQGTDI

by majority rule How confident are we in a given tree topology?

slide-18
SLIDE 18

Bootstrap: a method to assess confidence in tree toplogy

A MNGTEG B MNGYER C MQGYDK D MQGTDI Randomly re-sample columns from the alignment, count frequency of topologies A GMGTMG B GMRYMR C GMKYMK D GMITMI C B A D

slide-19
SLIDE 19

Bootstrap: a method to assess confidence in tree toplogy

A MNGTEG B MNGYER C MQGYDK D MQGTDI Randomly re-sample columns from the alignment, count frequency of topologies A NMNTMG B NMNYMG C QMQYMG D QMQTMG A B C D

slide-20
SLIDE 20

Bootstrap: a method to assess confidence in tree toplogy

A MNGTEG B MNGYER C MQGYDK D MQGTDI Randomly re-sample columns from the alignment, count frequency of topologies A MTNGEG B MYNREG C MYQKDG D MTQIDG A B C D

slide-21
SLIDE 21

Bootstrap: a method to assess confidence in tree toplogy

Randomly re-sample columns from the alignment, count frequency of topologies Bootstrapped trees (100 x): A B C D 64 x C B A D 36 x A B C D 64% Final result:

slide-22
SLIDE 22

Tree-building methods:

  • 1. Neighbor-joining
  • Calculate all pair-wise

distances

  • Join two closest taxa,

replace by new node

  • Repeat

Image: http://en.wikipedia.org/wiki/File:Neighbor_joining_7_taxa_start_to_finish_diagram.svg

slide-23
SLIDE 23

Tree-building methods:

  • 1. Neighbor-joining
  • Calculate all pair-wise

distances

  • Join two closest taxa,

replace by new node

  • Repeat

Image: http://en.wikipedia.org/wiki/File:Neighbor_joining_7_taxa_start_to_finish_diagram.svg

slide-24
SLIDE 24

Tree-building methods:

  • 2. Maximum likelihood
  • Builds likelihood model of molecular evolution
  • Finds tree that maximizes:

Pr(sequence data | tree)

  • Commonly used software:

RAxML, FastTree2

slide-25
SLIDE 25

Tree-building methods:

  • 3. Bayesian
  • Builds likelihood model of molecular evolution
  • Calculates:

Pr(tree | sequence data)

  • Commonly used software:

MrBayes, BEAST