phylogenetics tutorial 1
play

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. - PowerPoint PPT Presentation

Finlay Maguire Making Phylogenies Faculty of Computer Science Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt 5. Trimming 6. Approximate ML Tree 7. Maximum-Likelihood Tree 8. Phylogenomics 1


  1. Finlay Maguire Making Phylogenies Faculty of Computer Science Phylogenetics Tutorial 1:

  2. 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt 5. Trimming 6. Approximate ML Tree 7. Maximum-Likelihood Tree 8. Phylogenomics 1 Table of contents

  3. Overview

  4. • Get a protein • Using pairwise alignment to find potential homologs • Perform a multiple sequence alignment • Trim the alignment • Infer a NJ distance phylogeny • Infer an approximate Maximum Likelihood phylogeny • Infer an accurate Maximum Likelihood phylogeny • Compare the trees 2 Protein Phylogeny Aims

  5. • Get genomes • Find core genome • Extract SNPs • Infer a Maximum Likelihood phylogeny • Visualise Phylogeny 3 Core Genome Phylogeny

  6. • mafft • trimal • aliview • FastTree2 • iqtree • FigTree • prokka • roary • snp-sites 4 Requirements

  7. Installation

  8. If you don’t have miniconda https://docs.conda.io/en/latest/miniconda.html conda create -n phylo -c bioconda mafft trimal prokka fasttree iqtree roary snp-sites conda activate phylo or if older miniconda version: source activate phylo 5 miniconda

  9. Unfortunately, not everything is in bioconda: • AliView https://github.com/AliView/AliView/releases • FigTree https://github.com/rambaut/figtree/releases 6 Other tools

  10. Data

  11. http://www.uniprot.org 7 Starting Sequence Figure 1: High-quality protein reference database: swiss-prot

  12. 8 Starting Sequence Figure 2: Choose ‘Gene Ontology’ and ‘biological process’

  13. 9 Starting Sequence Figure 3: Go down to ‘detoxification’ and expand

  14. 10 Starting Sequence Figure 4: Select ‘8 results’ next to ‘detoxification of arsenic’

  15. 11 Using BLAST to find related sequences Figure 5: Select the C. elegans sequence and BLAST

  16. 12 Using BLAST to find related sequences Figure 6: Wait...

  17. 13 Using BLAST to find related sequences Figure 7: Download 10 sequences across a range of similarity

  18. Multiple Sequence Alignemnt

  19. 14 mafft-linsi arsenic.faa > arsenic.afa MAFFT

  20. Trimming

  21. 15 java -jar aliview.jar Inspecting the alignment

  22. 16 trimal -nogaps -in arsenic.afa -out arsenic_nogaps.mask TrimAL

  23. 17 trimal -automated1 -in arsenic.afa -out arsenic_auto.mask TrimAL

  24. 18 java -jar aliview.jar Compare Trimming

  25. Approximate ML Tree

  26. 19 FastTree -lg arsenic_auto.mask > arsenic_dist.tree FastTree

  27. 20 FigTree Inspect

  28. Maximum-Likelihood Tree

  29. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  30. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  31. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  32. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  33. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  34. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  35. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  36. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  37. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  38. Note: IQTree does output a neighbour joining distance tree too iqtree -mset LG,JTT,WAG -s arsenic_auto.mask (.bionj). 22 Running IQ-Tree

  39. 23 FigTree Inspect

  40. Phylogenomics

  41. wget finlaymagui.re/assets/listeria_genomes.tar.gz Download the 6 listeria genomes tar xvf listeria_genomes.tar.gz 24 Get genomes

  42. For genome GCA000008258: prokka --kingdom Bacteria --outdir prokka_GCA_000008285 --genus Listeria --locustag GCA_000008285 GCA_000008285.1_ASM828v1_genomic.fna Repeat for all genomes 25 Annotate genomes

  43. cp */*.gff annotations mkdir annotations roary -f core_genome -e -n -v annotations/*.gff 26 Find shared parts

  44. core_genome/core_gene_alignment.aln snp-sites -o listeria_snps.fna 27 Extract SNPs

  45. 28 iqtree -mset GTR -s listeria_snps.fna Infer ML Phylogeny

  46. 29 Visualise Tree Figure 8: Roary Tutorial

  47. 29 Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend