1 Molecular characters Nucleotide sequences structural genes - PDF document

Phylogenetics 3: Methods to reconstruct phylogenies A generalized protocol for molecular phylogenetics, and the associated concerns with each step Concerns: Collect homolgous sequences gene tree-species tree / paralogy–orthology / trees within trees Multiple sequence alignment positional homology / gaps / subjectivity-objectivity / methods philosophy / methods / consistency / power and accuracy Phylogeny estimation Test reliability or fit of phylogenetic branch support / tree comparison / statistic issues with trees estimates independent contrasts / impact of error on conclusions Interpretation and application 1

Molecular characters • Nucleotide sequences structural genes (protein, RNA, regulatory) non-structural genes (introns, intergenic sequences, pseudogenes) • Protein sequences translate DNA to protein thousands to choose from • INDELS nucleotides, amino acids, genes, segments of DNA, etc. • DNA-DNA hybridization • Restriction fragment length polymorphisms (RFLPs) • Large scale genomic rearrangements A generalized protocol for molecular phylogenetics, and the associated concerns with each step Concerns: Collect homolgous sequences gene tree-species tree / paralogy–orthology / trees within trees Multiple sequence alignment positional homology / gaps / subjectivity-objectivity / methods philosophy / methods / consistency / power and accuracy Phylogeny estimation Test reliability or fit of phylogenetic branch support / tree comparison / statistic issues with trees estimates independent contrasts / impact of error on conclusions Interpretation and application 2

Molecular characters: multiple sequence alignment Some methods: 1. sum-all-pairs method: count cost of aligning all pairs of sequences and select alignment that minimizes the total cost 2. star alignment: alignment based on tree that assumes all seqeunces are equally related 3. tree alignment: uses “ known ” information about relationships of sequences (lineages) to guide the alignment Take course called “ Bioinformatics ” (BIOC 4010 / BIOL 4041) to learn more of alignments. Species 1 … T A G … Species 2 … T A G … Species 3 … T A A … Species 4 … T C A … Species 5 … C C A … 3

Molecular characters: multiple sequence alignment Molecular characters: DNA alignment β Alignment of the nucleotide character states of the -globin gene from five species of mammals human cow rabbit rat opossum GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC ... ... ... G.C ... ... ... T.. ..T ... ... ... ... ... ... ... ... ... .GC A.. ... ... ... ..C ..T ... ... ... ... A.. ... A.T ... ... .AA ... A.C ... AGC ... ... ..C ... G.A .AT ... ..A ... ... A.. ... AA. TG. ... ..G ... A.. ..T .GC ..T ... ..C ..G GA. ..T ... ... ..T C.. ..G ..A ... AT. ... ..T ... ..G ..A .GC ... GCT GGC GAG TAT GGT GCG GAG GCC CTG GAG AGG ATG TTC CTG TCC TTC CCC ACC ACC AAG ... ..A .CT ... ..C ..A ... ..T ... ... ... ... ... ... AG. ... ... ... ... ... .G. ... ... ... ..C ..C ... ... G.. ... ... ... ... T.. GG. ... ... ... ... ... .G. ..T ..A ... ..C .A. ... ... ..A C.. ... ... ... GCT G.. ... ... ... ... ... ..C ..T .CC ..C .CA ..T ..A ..T ..T .CC ..A .CC ... ..C ... ... ... ..T ... ..A ACC TAC TTC CCG CAC TTC GAC CTG AGC CAC GGC TCT GCC CAG GTT AAG GGC CAC GGC AAG ... ... ... ..C ... ... ... ... ... ... ... ..G ... ... ..C ... ... ... ... G.. ... ... ... ..C ... ... ... T.C .C. ... ... ... .AG ... A.C ..A .C. ... ... ... ... ... ... T.T ... A.T ..T G.A ... .C. ... ... ... ... ..C ... .CT ... ... ... ..T ... ... ..C ... ... ... ... TC. .C. ... ..C ... ... A.C C.. ..T ..T ..T ... The order of DNA sequences in the alignment is specified by the order of the taxa in the list. To fit it on the page, the alignment is broken into three parts; such alignments are called INTERLEAVED . The complete DNA sequence is shown for the fist taxon (human). All the other sequences are shown relative to human, with the dot, “.”, signifying a match in the character state with the human sequences. Differences are indicated by using the single-letter nucleotide code (A,C,T or G). Note that this alignment could also be analyzed by using distance, likelihood, and Bayesian methods. Positional homology is always assumed when constructing alignments 4

Molecular characters: presence-absence data matrix (easy) Hypothetical presence-absence data matrix for a diversity of molecular characters Species 1 1 0 0 1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 Species 2 1 0 1 0 1 1 1 0 0 0 1 0 0 1 0 0 1 1 1 1 0 1 0 1 0 1 Species 3 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 Species 4 1 1 0 0 1 0 1 1 1 0 0 0 0 1 0 1 0 1 1 1 0 1 0 1 0 1 Species 5 1 0 1 0 1 1 1 0 0 0 1 0 0 1 0 0 0 1 1 1 0 1 0 1 1 1 Amino acid INDELS Pseudogene / Presence / absence Tandem gene in 8 different genes functional gene of transposon duplication elements at 8 events different genomic locations Molecular characters: positional homology of gaps are a real pain in the ass 10 20 30 40 50 60 ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| Mus2.FAS MTTPALLPLS -----GRRIP PLNL--GPP- ----SFPHHR ATLRLSEKFI LLLILSAFIT Human_GIA ---------- ---------- ---------- -----MNSNF ITFDLKMSLL PSNLFSAFIT Human_GIB MTTPALLPLS -----GRRIP PLNL--GPP- ----SFPHHR ATLRLSEKFI LLLILSAFIT Mus_GIA MPVGGLLPLF SSPGGGGLGS GLGGGLGGG- ----RKGSGP AAFRLTEKFV LLLVFSAFIT Rabbit_GIA ---------- ---------- ---------- ---------- ---------- ---------- Sus_GIA MPVGGLLPLF SSPAGGGLGG GLGGGLGGGG GGGGRKGSGP SAFRLTEKFV LLLVFSAFIT 70 80 90 100 110 120 ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| Mus2.FAS LCFGAFFFLP DSSKHKRFDL G-LEDVLIPH VDAGKG---- AKNPGVFLIH GPDEHRHREE Human_GIA LCFGAIFFLP DSSKLLSGVL FHSSPALQPA ADHKPGPGAR AEDAAEGRAR RREEGAPGDP Human_GIB LCFGAFFFLP DSSKHKRFDL G-LEDVLIPH VDAGKG---- AKNPGVFLIH GPDEHRHREE Mus_GIA LCFGAIFFLP DSSKLLSGVL FHSNPALQPP AEHKPGLGAR AEDAAEGRVR HREEGAPGDP Rabbit_GIA ---------- ---------- ---------- ---------- AEDAADGRAR PGEEGAPGDP Sus_GIA LCFGAIFFLP DSSKLLSGVL FHSSPALQPA ADHKPGPGAR AEDAADGRAR PGEEGAPGDP An alignment of real amino acid sequences of the mannosidase protein 5

Molecular characters: multiple sequence alignment • software is far from flawless (many different methods) • all alignments must be inspected “ by eye ” • any manual adjustments (by eye) introduces subjectivity • one solution is to publish alignments: • public database • in scientific paper • supplementary online materials of a scientific journal A generalized protocol for molecular phylogenetics, and the associated concerns with each step Concerns: Collect homolgous sequences gene tree-species tree / paralogy–orthology / trees within trees Multiple sequence alignment positional homology / gaps / subjectivity-objectivity / methods philosophy / methods / consistency / power and accuracy Phylogeny estimation Test reliability or fit of phylogenetic branch support / tree comparison / statistic issues with trees estimates independent contrasts / impact of error on conclusions Interpretation and application 6

Molecular phylogenetics: methods We divide methods up by two criteria (data and method): Type of data: 1. characters: discrete character states at positionally homologous sites in a multiple sequence alignment (hence, discrete character methods) 2. distances: evolutionary distance, measured in average numbers of substitutions per positional homologous sites, between all pairs of taxa (hence, distance methods) Species 1 ! T A G ! Species 2 ! T A G ! Species 3 ! T A A ! Species 4 ! T C A ! Species 5 ! C C A ! Molecular phylogenetics: methods We divide methods up by two criteria (data and method): Type of data: 1. characters: discrete character states at positionally homologous sites in a multiple sequence alignment (hence, discrete character methods) 2. distances: evolutionary distance, measured in average numbers of substitutions per positional homologous sites, between all pairs of taxa (hence, distance methods) 7

Molecular phylogenetics: methods We divide methods up by two criteria (data and method): Type of data: 1. characters: discrete character states at positionally homologous sites in a multiple sequence alignment (hence, discrete character methods) 2. distances: evolutionary distance, measured in average numbers of substitutions per positional homologous sites, between all pairs of taxa (hence, distance methods) Type of tree-building: 1. clustering algorithm: computationally “ build a tree ” according to a specific set of “ steps ” . 2. optimality criterion: a criterion for scoring a tree and comparing different trees with the goal of finding the tree with the best (optimal) score. [also called objective function] Molecular phylogenetics: most common methods Type of data Character Distance Tree-building method UPGMA Clustering algorithm Neighbor-joining (NJ) Maximum parsimony Least squares (MP) Optimality Minimum evolution criterion Maximum likelihood (ME) (ML) 8

1 Molecular characters Nucleotide sequences structural genes - PDF document

Phylogenetics 3: Methods to reconstruct phylogenies A generalized protocol for molecular phylogenetics, and the associated concerns with each step Concerns: Collect homolgous sequences gene tree-species tree / paralogyorthology / trees

PTT 207 Biomolecular and Genetic Engineering Semester 1 2012/2013 BY: PUAN NURUL AIN HARMIZA

RESEARCH & METHODS RNA-RNA interaction prediction Jerome Waldispuhl School of Computer

P h y s i c s o f b i o l o g i c a l s y s t e ms P H 5 4 9 L

Hybrid SMR Drives Fenggang Wu , Bingzhe Li, Zhichao Cao, Baoquan Zhang Ming-Hong Yang, Hao Wen,

Synthesis of N -acetyl and N -formyl pyrazoline derivatives from vanillin and their antigenotoxic

Computational Study on the Structure of N- (2-Amino-benzoyl)-N-phenyl hydrazine Ibrahim SEN,

1 Miniature Mode Estimation Example ACS Subsystem GHe Pressure Transducer (S) Model

Overview of Controlled re-entry activities Tiago Soares 26/10/2017 ESA UNCLASSIFIED - For

Goals Stress the global importance of iden:fying laboratory

PREPARATION, PROPERTIES PREPARATION OF METALLIC PARTICLES AND APPLICATIONS Phase break

Research & Development Frank Doyle Director, Research & Development 1 Oct 6, 2009

GEOCHEMICAL CYCLES Carbon Reservoir Carbon (gigatons) Percent of total carbon on Earth Oceans

X-ray pulse-shape analysis on pulse-shape analysis on X-ray bridge-type microcalorimeters

Liquid Rocket Propulsion Types of Rocket Propulsion Solid Fuel and oxidizer coexist in a

osstest Xen Project automatic test system Community participation Recent developments Future

Target molecolari e metabolici per la terapia della AML

The Boring Python Office Talk Europython 2018 Edinburgh, Stefan Baerisch 1 Motivation Some

Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional

Energy Infrastructure Praveen Kumar, PhD University of Houston March 11, 2015 pkumar@uh.edu

Hydrocarbon Infrastructure Art Smith, PhD University of Houston March 11, 2015

Carey Chapter 2 Hydrocarbon Frameworks Alkanes Hydrocarbons Hydrocarbons Hydrocarbons

1 2.5 Introduction to Alkanes Methane, Ethane, Propane Figure 2.7 CH 4 CH 3 CH 3 CH 3 CH 2 CH

Naming and Drawing Hydrocarbons HChem 1.notebook May 02, 2018 Apr 2510:04 AM Apr 2510:06 AM

Eigenvalues of Saturated Hydrocarbons Craig Larson (joint work with Doug Klein) Virginia

Sambuz

Useful Links

Newsletter

Mail Us

1 Molecular characters Nucleotide sequences structural genes - PDF document

Phylogenetics 3: Methods to reconstruct phylogenies A generalized protocol for molecular phylogenetics, and the associated concerns with each step Concerns: Collect homolgous sequences gene tree-species tree / paralogyorthology / trees

PTT 207 Biomolecular and Genetic Engineering Semester 1 2012/2013 BY: PUAN NURUL AIN HARMIZA

RESEARCH &amp; METHODS RNA-RNA interaction prediction Jerome Waldispuhl School of Computer

P h y s i c s o f b i o l o g i c a l s y s t e ms P H 5 4 9 L

Hybrid SMR Drives Fenggang Wu , Bingzhe Li, Zhichao Cao, Baoquan Zhang Ming-Hong Yang, Hao Wen,

Synthesis of N -acetyl and N -formyl pyrazoline derivatives from vanillin and their antigenotoxic

Computational Study on the Structure of N- (2-Amino-benzoyl)-N-phenyl hydrazine Ibrahim SEN,

1 Miniature Mode Estimation Example ACS Subsystem GHe Pressure Transducer (S) Model

Overview of Controlled re-entry activities Tiago Soares 26/10/2017 ESA UNCLASSIFIED - For

Goals Stress the global importance of iden:fying laboratory

PREPARATION, PROPERTIES PREPARATION OF METALLIC PARTICLES AND APPLICATIONS Phase break

Research &amp; Development Frank Doyle Director, Research &amp; Development 1 Oct 6, 2009

GEOCHEMICAL CYCLES Carbon Reservoir Carbon (gigatons) Percent of total carbon on Earth Oceans

X-ray pulse-shape analysis on pulse-shape analysis on X-ray bridge-type microcalorimeters

Liquid Rocket Propulsion Types of Rocket Propulsion Solid Fuel and oxidizer coexist in a

osstest Xen Project automatic test system Community participation Recent developments Future

Target molecolari e metabolici per la terapia della AML

The Boring Python Office Talk Europython 2018 Edinburgh, Stefan Baerisch 1 Motivation Some

Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional

Energy Infrastructure Praveen Kumar, PhD University of Houston March 11, 2015 pkumar@uh.edu

Hydrocarbon Infrastructure Art Smith, PhD University of Houston March 11, 2015

Carey Chapter 2 Hydrocarbon Frameworks Alkanes Hydrocarbons Hydrocarbons Hydrocarbons

1 2.5 Introduction to Alkanes Methane, Ethane, Propane Figure 2.7 CH 4 CH 3 CH 3 CH 3 CH 2 CH

Naming and Drawing Hydrocarbons HChem 1.notebook May 02, 2018 Apr 2510:04 AM Apr 2510:06 AM

Eigenvalues of Saturated Hydrocarbons Craig Larson (joint work with Doug Klein) Virginia

Sambuz

Useful Links

Newsletter

Mail Us

RESEARCH & METHODS RNA-RNA interaction prediction Jerome Waldispuhl School of Computer

Research & Development Frank Doyle Director, Research & Development 1 Oct 6, 2009