Phylogene)c Trees COMPSCI 260 Spring 2016 Phylogene)cs - PowerPoint PPT Presentation

Phylogene)c ¡Trees ¡ COMPSCI ¡260 ¡– ¡Spring ¡2016 ¡

Phylogene)cs ¡ ¡ • Phylogene)cs ¡ is ¡the ¡study ¡of ¡evolu)onary ¡rela)onships ¡among ¡ organisms ¡or ¡genes ¡ • In ¡general, ¡we ¡are ¡interested ¡in ¡the ¡phylogeny ¡of ¡organisms ¡or ¡species ¡ ¡ • But ¡oEen)mes ¡phylogenies ¡are ¡constructed ¡from ¡genes ¡ • Phylogene)c ¡trees ¡ are ¡used ¡to ¡describe ¡phylogenies ¡ • The ¡purpose ¡of ¡ phylogene)c ¡studies: ¡ – reconstruct ¡evolu)onary ¡)es ¡ ¡ between ¡species ¡ – es)mate ¡the ¡)me ¡of ¡divergence ¡ ¡ between ¡species ¡since ¡they ¡ ¡ last ¡shared ¡a ¡common ¡ancestor ¡

Binomial ¡nomenclature ¡for ¡species ¡ chimp Pan mouse human troglodytes rat Mus Homo musculus Ratus sapiens norvegicus Binomial ¡nomenclature ¡is ¡a ¡formal ¡system ¡of ¡naming ¡species ¡by ¡giving ¡ • each ¡a ¡name ¡composed ¡of ¡ two ¡parts , ¡both ¡of ¡which ¡use ¡La)n ¡gramma)cal ¡ forms, ¡although ¡they ¡can ¡be ¡based ¡on ¡words ¡from ¡other ¡languages ¡ The ¡first ¡part ¡of ¡the ¡name ¡iden)fies ¡the ¡ genus ¡to ¡which ¡the ¡species ¡ • belongs; ¡the ¡second ¡part ¡iden)fies ¡the ¡ species ¡within ¡the ¡genus. ¡ ¡ Introduced ¡by ¡Carl ¡Linnaeus ¡in ¡1753 ¡ • Also ¡called ¡‘scien)fic ¡name’ ¡or ¡‘La)n ¡name’ ¡ •

What ¡is ¡a ¡phylogene)c ¡tree? ¡ • Binary ¡tree ¡ (every ¡node ¡has ¡<=3 ¡ neighbors) ¡ • Rooted ¡ or ¡ unrooted ¡ • Nodes ¡ chicken – Leaves: ¡current ¡species ¡ – Internal ¡nodes: ¡(hypothe)cal) ¡ chimp ancestral ¡species ¡ mouse human rat • Edges ¡ – Amount ¡of ¡change ¡(muta)on ¡ rate) ¡or ¡ human rat – Evolu)onary ¡)me ¡ chimp mouse

What ¡is ¡a ¡phylogene)c ¡tree ¡ • Binary ¡tree ¡ (every ¡node ¡has ¡<=3 ¡ neighbors) ¡ • Rooted ¡ or ¡ unrooted ¡ • Nodes ¡ chicken – Leaves: ¡current ¡species ¡ – Internal ¡nodes: ¡(hypothe)cal) ¡ chimp ancestral ¡species ¡ mouse human rat • Edges ¡ – Amount ¡of ¡change ¡(muta)on ¡ rate) ¡or ¡ time – Evolu)onary ¡)me ¡ today human chimp mouse rat chicken

Data ¡used ¡to ¡build ¡phylogene)c ¡trees ¡ • Tradi)onally, ¡phylogene)c ¡trees ¡were ¡built ¡from ¡ morphological ¡ features ¡ (e.g., ¡beak ¡shapes, ¡presence ¡of ¡feathers, ¡number ¡of ¡legs, ¡ etc). ¡ ¡ • Today, ¡we ¡use ¡mostly ¡ molecular ¡data ¡ like ¡DNA ¡sequences ¡and ¡ protein ¡sequences ¡ • Data ¡can ¡be ¡classified ¡into ¡2 ¡categories: ¡ ¡ • Discrete ¡characters ¡ ¡ – Each ¡character ¡has ¡a ¡finite ¡number ¡of ¡states. ¡For ¡example, ¡discrete ¡ characters ¡include ¡the ¡number ¡of ¡legs ¡of ¡an ¡organism, ¡or ¡a ¡column ¡in ¡ an ¡alignment ¡of ¡DNA ¡sequences. ¡ ¡ • Compara/ve ¡numerical ¡data ¡ ¡ – These ¡data ¡encode ¡the ¡ distances ¡ between ¡objects ¡and ¡are ¡usually ¡ derived ¡from ¡sequence ¡data. ¡For ¡example, ¡we ¡could ¡hypothe)cally ¡say ¡ distance(man,mouse) ¡= ¡500 ¡and ¡distance(man,chimp) ¡= ¡100. ¡ ¡

Phylogene)c ¡trees ¡ • NOTE: ¡in ¡general, ¡different ¡genes/proteins ¡may ¡give ¡slightly ¡ different ¡phylogene)c ¡trees ¡(because ¡different ¡genes/proteins ¡may ¡ evolve ¡at ¡different ¡rates) ¡ • Averaging ¡over ¡large ¡sets ¡of ¡genes/proteins ¡does ¡demonstrate ¡a ¡ broad ¡correspondence ¡between ¡lengths ¡of ¡branches ¡and ¡ evolu)onary ¡)me ¡ • NOTE: ¡Topology ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡vs. ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡(Phylogene)c) ¡Tree ¡ 5 + edge lengths 5 6 6 3 1 1 2 3 4 4 2 Which nodes are connected?

Specia)on ¡vs. ¡duplica)on ¡events ¡ • Another ¡thing ¡to ¡keep ¡in ¡mind: ¡in ¡general, ¡we ¡assume ¡that ¡the ¡ sequences ¡in ¡a ¡phylogene)c ¡tree ¡have ¡descended ¡from ¡an ¡ ancestral ¡ gene ¡A ¡in ¡an ¡ancestral ¡species ¡ • In ¡other ¡words, ¡we ¡assume ¡they ¡arose ¡through ¡a ¡ specia)on ¡event ¡ • Another ¡mechanism ¡by ¡which ¡two ¡sequences ¡can ¡diverge ¡from ¡a ¡ common ¡ancestor ¡is ¡through ¡a ¡ duplica)on ¡event ¡ in ¡the ¡same ¡species ¡ Gene A Gene A Species 3 Gene A Gene A Gene A 1 Gene A 2 Species 1 Species 2 Orthologs ¡ Paralogs ¡ We ¡need ¡to ¡make ¡sure ¡we ¡are ¡using ¡orthologs ¡when ¡building ¡ ¡ phylogene)c ¡trees!!! ¡

Homology ¡example: ¡evolu)on ¡of ¡globins ¡ Human ¡α-‑globin ¡and ¡human ¡β-‑ • globin ¡are ¡paralogs ¡or ¡orthologs? ¡ ¡ Paralogs ¡ • Human ¡α-‑globin ¡and ¡mouse ¡α-‑ • globin ¡are ¡homologs ¡or ¡orthologs? ¡ Both ¡ •

Building ¡a ¡phylogene)c ¡tree ¡ • Distance ¡methods ¡ ¡ – Evolu)onary ¡distances ¡are ¡computed ¡for ¡all ¡leaf ¡nodes, ¡ and ¡these ¡are ¡used ¡to ¡construct ¡trees ¡ • Maximum ¡parsimony ¡methods ¡ ¡ – The ¡tree ¡is ¡chosen ¡to ¡minimize ¡the ¡number ¡of ¡changes ¡ required ¡to ¡explain ¡the ¡data ¡ ¡ • Maximum ¡likelihood ¡methods ¡ ¡ – Under ¡a ¡model ¡of ¡sequence ¡evolu)on, ¡we ¡search ¡for ¡the ¡ tree ¡which ¡gives ¡the ¡highest ¡likelihood ¡of ¡the ¡data ¡ • Bootstrapping ¡ ¡ chicken Gene A chimp mouse human rat

Building ¡a ¡phylogene)c ¡tree ¡ • We ¡will ¡discuss ¡two ¡algorithms: ¡UPGMA ¡and ¡NJ ¡ • Both ¡algorithms ¡require ¡a ¡metric ¡that ¡describes ¡the ¡ distance ¡ between ¡ any ¡2 ¡leaf ¡nodes ¡(i.e., ¡any ¡2 ¡sequences) ¡ • How ¡can ¡we ¡obtain ¡such ¡distances? ¡ – Align ¡the ¡2 ¡sequences ¡and ¡take ¡the ¡frac)on ¡of ¡nucleo)des/amino ¡ acids ¡that ¡are ¡different ¡ – Use ¡models ¡of ¡residue/nucleo)de ¡subs)tu)on ¡(for ¡example, ¡the ¡ Jukes-‑Cantor ¡model ¡for ¡DNA ¡sequences) ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 1 ¡ 0 ¡ * ¡ 2 ¡ * ¡ 0 ¡ Assume ¡we ¡have ¡5 ¡sequences. ¡ ¡ d = 3 ¡ 0 ¡ We ¡need ¡to ¡define ¡a ¡metric: ¡ 4 ¡ 0 ¡ 5 ¡ 0 ¡

Building ¡ rooted ¡ phylogene)c ¡trees ¡ • UPGMA ¡ = ¡unweighted ¡pair ¡group ¡method ¡using ¡arithme)c ¡ averages ¡ [the ¡name ¡is ¡actually ¡more ¡complicated ¡than ¡the ¡method] ¡ • It ¡is ¡basically ¡a ¡ hierarchical ¡clustering ¡algorithm ¡ 1 2 ??? 3 4 5

Building ¡ rooted ¡ phylogene)c ¡trees ¡ • UPGMA ¡ = ¡unweighted ¡pair ¡group ¡method ¡using ¡arithme)c ¡ averages ¡ [the ¡name ¡is ¡actually ¡more ¡complicated ¡than ¡the ¡method] ¡ • It ¡is ¡basically ¡a ¡ hierarchical ¡clustering ¡algorithm ¡ 1 9 9 2 8 6 7 UPGMA 6 8 7 3 4 1 2 4 5 3 5

UPGMA ¡– ¡distance ¡between ¡clusters? ¡ 1 • Distance ¡between ¡2 ¡clusters ¡(groups)? ¡ 9 2 C p C q 6 8 1 ∑ d pq = d ij 7 3 4 C p × C q i ∈ C p , j ∈ C q 5 (average ¡linkage ¡clustering) ¡

UPGMA ¡algorithm ¡ • Ini)aliza)on ¡ – For ¡each ¡sequence ¡ i , ¡create ¡cluster ¡ C i ¡ – For ¡each ¡sequence ¡ i , ¡create ¡a ¡leaf ¡node ¡at ¡height ¡0 ¡ • Iterate ¡ – Find ¡ i,j such ¡that ¡ d ij is ¡minimal ¡ – Define ¡new ¡cluster ¡ C k = C i U C j ¡and ¡compute ¡ d kl k for ¡all ¡other ¡clusters ¡ l – Create ¡node ¡ k ¡(parent ¡of ¡ i and ¡ j ) ¡at ¡height ¡ d ij /2 ¡ – Remove ¡clusters ¡ i and ¡ j j i • Terminate ¡ – When ¡only ¡2 ¡clusters ¡remain ¡ – Create ¡root ¡at ¡height ¡ d ij /2 ¡

Phylogene)c Trees COMPSCI 260 Spring 2016 Phylogene)cs - PowerPoint PPT Presentation

Phylogene)c Trees COMPSCI 260 Spring 2016 Phylogene)cs Phylogene)cs is the study of evolu)onary rela)onships among organisms or genes In general,

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Scaling up phylogene/c networks to genome-size data

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

The number of spanning trees of random 2 -trees Stephan Wagner (joint work with Elmar Teufl)

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

AVL TREES Height Balance : AVL Trees h 1 h 2 | h - h | 1 AVL AVL 2 1 non-AVL trees

Algorithms and Data Structures Balanced Trees (AVL-Trees, (a,b)-Trees, Red-Black-Trees)

General Trees children that any node may have. Chapter 7 Well, non-binary trees anyway.

Boolean matrix factorization meets consecutive ones property Nikolaj T atti & Pauli

The Algonauts Project: Tutorial Day 1 Comparing Brains and DNNs: Theory of Science Radoslaw

LGMD CORE DATASET PROJECT Objectives survey answers-Patient representatives Jennifer

Evolution of Interconnection Joseph Lorenzo Hall March 11, 2015 Princeton CITP Global Conference

2 3 4 5 6 7

Genomic Ancestry Analysis in Wild Hybrid House Mice Megan Frayer Ph.D. Student, Laboratory of

Machine Learning and Association rules Petr Berka, Jan Rauch University of Economics, Prague

A Graph Modification Approach for Finding CorePeriphery Structures in Protein Interaction

Sambuz

Useful Links

Newsletter

Mail Us

Phylogene)c Trees COMPSCI 260 Spring 2016 Phylogene)cs - PowerPoint PPT Presentation

Phylogene)c Trees COMPSCI 260 Spring 2016 Phylogene)cs Phylogene)cs is the study of evolu)onary rela)onships among organisms or genes In general,

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Scaling up phylogene/c networks to genome-size data

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

The number of spanning trees of random 2 -trees Stephan Wagner (joint work with Elmar Teufl)

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

AVL TREES Height Balance : AVL Trees h 1 h 2 | h - h | 1 AVL AVL 2 1 non-AVL trees

Algorithms and Data Structures Balanced Trees (AVL-Trees, (a,b)-Trees, Red-Black-Trees)

General Trees children that any node may have. Chapter 7 Well, non-binary trees anyway.

Boolean matrix factorization meets consecutive ones property Nikolaj T atti &amp; Pauli

The Algonauts Project: Tutorial Day 1 Comparing Brains and DNNs: Theory of Science Radoslaw

LGMD CORE DATASET PROJECT Objectives survey answers-Patient representatives Jennifer

Evolution of Interconnection Joseph Lorenzo Hall March 11, 2015 Princeton CITP Global Conference

2 3 4 5 6 7

Genomic Ancestry Analysis in Wild Hybrid House Mice Megan Frayer Ph.D. Student, Laboratory of

Machine Learning and Association rules Petr Berka, Jan Rauch University of Economics, Prague

A Graph Modification Approach for Finding CorePeriphery Structures in Protein Interaction

Sambuz

Useful Links

Newsletter

Mail Us

Boolean matrix factorization meets consecutive ones property Nikolaj T atti & Pauli