Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. - - PowerPoint PPT Presentation

phylogenetics tutorial 1
SMART_READER_LITE
LIVE PREVIEW

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. - - PowerPoint PPT Presentation

Finlay Maguire Making Phylogenies Faculty of Computer Science Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt 5. Trimming 6. Approximate ML Tree 7. Maximum-Likelihood Tree 8. Phylogenomics 1


slide-1
SLIDE 1

Phylogenetics Tutorial 1:

Making Phylogenies

Finlay Maguire

Faculty of Computer Science

slide-2
SLIDE 2

Table of contents

  • 1. Overview
  • 2. Installation
  • 3. Data
  • 4. Multiple Sequence Alignemnt
  • 5. Trimming
  • 6. Approximate ML Tree
  • 7. Maximum-Likelihood Tree
  • 8. Phylogenomics

1

slide-3
SLIDE 3

Overview

slide-4
SLIDE 4

Protein Phylogeny Aims

  • Get a protein
  • Using pairwise alignment to find potential homologs
  • Perform a multiple sequence alignment
  • Trim the alignment
  • Infer a NJ distance phylogeny
  • Infer an approximate Maximum Likelihood phylogeny
  • Infer an accurate Maximum Likelihood phylogeny
  • Compare the trees

2

slide-5
SLIDE 5

Core Genome Phylogeny

  • Get genomes
  • Find core genome
  • Extract SNPs
  • Infer a Maximum Likelihood phylogeny
  • Visualise Phylogeny

3

slide-6
SLIDE 6

Requirements

  • mafft
  • trimal
  • aliview
  • FastTree2
  • iqtree
  • FigTree
  • prokka
  • roary
  • snp-sites

4

slide-7
SLIDE 7

Installation

slide-8
SLIDE 8

miniconda

If you don’t have miniconda https://docs.conda.io/en/latest/miniconda.html

conda create -n phylo -c bioconda mafft trimal prokka fasttree iqtree roary snp-sites

conda activate phylo

  • r if older miniconda version:

source activate phylo

5

slide-9
SLIDE 9

Other tools

Unfortunately, not everything is in bioconda:

  • AliView

https://github.com/AliView/AliView/releases

  • FigTree

https://github.com/rambaut/figtree/releases

6

slide-10
SLIDE 10

Data

slide-11
SLIDE 11

Starting Sequence

Figure 1: High-quality protein reference database: swiss-prot http://www.uniprot.org

7

slide-12
SLIDE 12

Starting Sequence

Figure 2: Choose ‘Gene Ontology’ and ‘biological process’

8

slide-13
SLIDE 13

Starting Sequence

Figure 3: Go down to ‘detoxification’ and expand

9

slide-14
SLIDE 14

Starting Sequence

Figure 4: Select ‘8 results’ next to ‘detoxification of arsenic’

10

slide-15
SLIDE 15

Using BLAST to find related sequences

Figure 5: Select the C. elegans sequence and BLAST

11

slide-16
SLIDE 16

Using BLAST to find related sequences

Figure 6: Wait...

12

slide-17
SLIDE 17

Using BLAST to find related sequences

Figure 7: Download 10 sequences across a range of similarity

13

slide-18
SLIDE 18

Multiple Sequence Alignemnt

slide-19
SLIDE 19

MAFFT

mafft-linsi arsenic.faa > arsenic.afa

14

slide-20
SLIDE 20

Trimming

slide-21
SLIDE 21

Inspecting the alignment

java -jar aliview.jar

15

slide-22
SLIDE 22

TrimAL

trimal -nogaps -in arsenic.afa -out arsenic_nogaps.mask

16

slide-23
SLIDE 23

TrimAL

trimal -automated1 -in arsenic.afa -out arsenic_auto.mask

17

slide-24
SLIDE 24

Compare Trimming

java -jar aliview.jar

18

slide-25
SLIDE 25

Approximate ML Tree

slide-26
SLIDE 26

FastTree

FastTree -lg arsenic_auto.mask > arsenic_dist.tree

19

slide-27
SLIDE 27

Inspect

FigTree

20

slide-28
SLIDE 28

Maximum-Likelihood Tree

slide-29
SLIDE 29

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-30
SLIDE 30

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-31
SLIDE 31

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-32
SLIDE 32

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-33
SLIDE 33

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-34
SLIDE 34

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-35
SLIDE 35

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-36
SLIDE 36

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-37
SLIDE 37

IQ-Tree

  • Generate 100 parsimony trees
  • Optimise all 100 with lazy SPR moves
  • Collect resulting unique topologies and optimise branch lengths
  • Select top 20 by likelihood
  • Perform hill-climbing NNI (stochastic followed by hill-climbing)
  • n each and optimise
  • Retain top 5 topologies as candidate trees
  • Randomly perturb candidates (stochastic NNI) and optimise

(hill-climbing)

  • If new tree is better than top candidate, replace
  • If top candidate doesn’t change after 100 random perturbations

then output.

21

slide-38
SLIDE 38

Running IQ-Tree

iqtree -mset LG,JTT,WAG -s arsenic_auto.mask

Note: IQTree does output a neighbour joining distance tree too (.bionj).

22

slide-39
SLIDE 39

Inspect

FigTree

23

slide-40
SLIDE 40

Phylogenomics

slide-41
SLIDE 41

Get genomes

Download the 6 listeria genomes wget finlaymagui.re/assets/listeria_genomes.tar.gz tar xvf listeria_genomes.tar.gz

24

slide-42
SLIDE 42

Annotate genomes

For genome GCA000008258:

prokka --kingdom Bacteria --outdir prokka_GCA_000008285

  • -genus Listeria --locustag GCA_000008285

GCA_000008285.1_ASM828v1_genomic.fna

Repeat for all genomes

25

slide-43
SLIDE 43

Find shared parts

mkdir annotations cp */*.gff annotations roary -f core_genome -e -n -v annotations/*.gff

26

slide-44
SLIDE 44

Extract SNPs

snp-sites -o listeria_snps.fna core_genome/core_gene_alignment.aln

27

slide-45
SLIDE 45

Infer ML Phylogeny

iqtree -mset GTR -s listeria_snps.fna

28

slide-46
SLIDE 46

Visualise Tree

Figure 8: Roary Tutorial

29

slide-47
SLIDE 47

Questions?

29