Outline Review of trees. Coun4ng features. Characterbased - PDF document

1/27/09 CSCI1950‐Z Computa4onal Methods for Biology Lecture 2 Ben Raphael January 26, 2009 hHp://cs.brown.edu/courses/csci1950‐z/ Outline • Review of trees. Coun4ng features. • Character‐based phylogeny – Maximum parsimony – Maximum likelihood 1

1/27/09 Tree Defini4ons tree : A connected acyclic graph G = (V, E). graph : A set V of vertices ( nodes ) and a set E of edges , where each edge ( v i , v j ) connects a pair of vertices. A path in G is a sequence ( v 1 , v 2 , …, v n ) of vertices in V such that ( v i , v i+1 ) are edges in E. A graph is connected provided for every pair v i v j of vertices, there is a path between v i and v j . A cycle is a path with the same starting and ending vertices. A graph is acyclic provided it has no cycles. Tree Defini4ons degree of vertex v is the number of edges incident to v . A phylogenetic tree is a tree with a label for each leaf (vertex of degree one). A binary phylogenetic tree is a phylogenetic tree where every interior (non-leaf) vertex has degree 3; (one parent and two children ). A rooted (*binary) phylogenetic tree is phylogenetic tree with a single designated vertex r (* of degree 2). w is a parent (ancestor) of v provided (v,w) is on path to root. In this case v is a child ( descendant ) of w . 2

1/27/09 Tree Defini4ons tree : A connected acyclic graph G = (V, E). degree of vertex v is the number of edges incident to v . A phylogenetic tree is a tree with a label for each leaf (vertex of degree one). • Leaves represent existing species • Other vertices represent most recent common ancestor. • Length of branches represent evolutionary time. • Root (if present) represents the oldest evolutionary ancestor. Coun4ng and Trees • A tree with n ver4ces has n ‐1 edges. (Proof?) • A rooted binary phylogene4c tree with n leaves has n ‐1 internal ver4ces; and thus 2 n ‐1 total ver4ces. • How many rooted binary phylogene4c trees with n leaves? 3

1/27/09 Character‐based Phylogene4c Tree Reconstruc4on Output Input Op6mal phylogene4c Characters tree Molecular Algorithm Morphological 1. What is character data? 2. What is the criteria for evalua6ng a tree? 3. How do we op6mize this criteria: 1. Over all possible trees? 2. Over a restricted class of trees? Character‐Based Tree Reconstruc4on • Characters may be morphological features # of eyes or legs or the shape of a beak or a fin. • Characters may be nucleo4des of DNA (A, G, C, T) or amino acids (20 leHer alphabet). • Values are called states of character. 2‐state character Gorilla: CCTGTGACGTAACAAACGA Chimpanzee: CCTGTGACGTAGCAAACGA CCTGTGACGTAGCAAACGA Human: Non‐informa4ve character 4

1/27/09 Character‐Based Tree Reconstruc4on GOAL : determine what character strings at internal nodes would best explain the character strings for the n observed species An Example Value1 Value2 Mouth Smile Frown Eyebrows Normal Pointed 5

1/27/09 Character‐Based Tree Reconstruc4on Which tree is beAer? Character‐Based Tree Reconstruc4on Count changes on tree 6

1/27/09 Character‐Based Tree Reconstruc4on Maximum Parsimony : minimize number of changes on edges of tree Maximum Parsimony • Ockham’s razor: “simplest” explana4on for the data • Assumes that observed character differences resulted from the fewest possible muta4ons • Seeks tree with the lowest possible parsimony score , defined sum of cost of all muta4ons found in the tree 7

1/27/09 Character Matrix Given n species, each labeled by m characters. Each character has k possible states . Gorilla: CCTGTGACGTAACAAACGA Chimpanzee: CCTGTGACGTAGCAAACGA Human: CCTGTGACGTAGCAAACGA n x m character matrix Assume that characters in character string are independent. Parsimony Score Gorilla: CCTGTGACGTAACAAACGA Chimpanzee: CCTGTGACGTAGCAAACGA Human: CCTGTGACGTAGCAAACGA Assume that characters in character string are independent. Given character strings S=s 1 …s m and T=t 1 …t m : #changes (S  T) = Σ i d H ( s i , t i ) where d H = Hamming distance d H ( v , w ) = 0 if v=w d H ( v , w ) = 1 otherwise parsimony score of the tree as the sum of the lengths (weights) of the edges 8

1/27/09 Parsimony and Tree Reconstruc4on Maximum Parsimony Two computa4onal sub‐problems: 1. Find the parsimony score for a fixed tree. – Small Parsimony Problem (easy) 2. Find the lowest parsimony score over all trees with n leaves. – Large parsimony problem (hard) 9

1/27/09 Small Parsimony Problem Input: Tree T with each leaf labeled by an m ‐ character string. Output: Labeling of internal ver4ces of the tree T minimizing the parsimony score. Since characters are independent, every leaf is labeled by a single character. Small Parsimony Large Parsimony Problem Problem Input: Input: T : tree with each leaf M : an n x m character labeled by an m ‐character matrix . string. Output: A tree T with: Output: • n leaves labeled by the n Labeling of internal ver4ces rows of matrix M of the tree T minimizing • labeling of the internal the parsimony score. ver4ces of T minimizing the parsimony score over all possible trees and all possible labelings of internal ver4ces 10

1/27/09 Small Parsimony Problem Input: Binary tree T with each leaf labeled by an m ‐character string. Output: Labeling of internal ver4ces of the tree T minimizing the parsimony score. Since characters are independent, every leaf is labeled by a single character. Weighted Small Parsimony Problem More general version of Small Parsimony Problem • Input includes a k x k scoring matrix δ describing the cost of transforming each of k states into another state. • Small Parsimony Problem is special case: δ ij = 0, if i = j , 1, otherwise. 11

1/27/09 Scoring Matrices Weighted Small Small Parsimony Problem Parsimony Problem A T G C A T G C A 0 1 1 1 A 0 3 4 9 T 1 0 1 1 T 3 0 2 4 G 1 1 0 1 G 4 2 0 4 C 1 1 1 0 C 9 4 4 0 Unweighted vs. Weighted Small Parsimony Scoring Matrix: A T G C A 0 1 1 1 T 1 0 1 1 G 1 1 0 1 C 1 1 1 0 Small Parsimony Score: 5 12

1/27/09 Unweighted vs. Weighted Weighted Parsimony Scoring Matrix: A T G C A 0 3 4 9 T 3 0 2 4 G 4 2 0 4 C 9 4 4 0 Weighted Parsimony Score: 22 Weighted Small Parsimony Problem Input: T: tree with each leaf labeled by an m ‐character string from a k ‐leHer alphabet. δ : k x k scoring matrix Output: Labeling of internal ver4ces of the tree T minimizing the weighted parsimony score. 13

1/27/09 Sankoff Algorithm Calculate and keep track of a score for every possible label at each vertex: s t ( v ) = minimum parsimony score of the subtree rooted at vertex v if v has character t s t ( v ) t …. …. Sankoff Algorithm s t ( v ) = minimum parsimony score of the subtree rooted at vertex v if v has character t The score s t ( v ) is based only on scores of its children: s t (parent) = min i { s i ( leo child ) + δ i, t } + min j { s j ( right child ) + δ j, t } t δ i, t δ j, t s i (leo child) s j (right child) 14

1/27/09 Sankoff Algorithm (cont.) • Begin at leaves: – If leaf has the character in ques4on, score is 0 – Else, score is ∞ Sankoff Algorithm (cont.) s t ( v ) = min i { s i ( u ) + δ i, t } + min j { s j ( w ) + δ j, t } s i ( u ) sum δ i, A A 0 0 0 s A ( v ) = 0 s A ( v ) = min i { s i ( u ) + δ i, A } + min j { s j ( w ) + δ j, A } T ∞ 3 ∞ G ∞ 4 ∞ C ∞ 9 ∞ 15

1/27/09 Sankoff Algorithm (cont.) s t ( v ) = min i { s i ( u ) + δ i, t } + min j { s j ( w ) + δ j, t } s j ( u ) sum δ j, A A ∞ 0 ∞ s A ( v ) = min i { s i ( u ) + δ i, A } + s A ( v ) = 0 min j { s j ( w ) + δ j, A } + 9 = 9 T ∞ 3 ∞ G ∞ 4 ∞ C 0 9 9 Sankoff Algorithm (cont.) s t ( v ) = min i { s i ( u ) + δ i, t } + min j { s j ( w ) + δ j, t } Repeat for T, G, and C 16

1/27/09 Sankoff Algorithm (cont.) Repeat for right subtree Sankoff Algorithm (cont.) Repeat for root 17

1/27/09 Sankoff Algorithm (cont.) Smallest score at root is minimum weighted parsimony score In this case, 9 – so label with T Sankoff Algorithm: Traveling down the Tree • The scores at the root vertex have been computed by going up the tree • Aoer the scores at root vertex are computed the Sankoff algorithm moves down the tree and assign each vertex with op4mal character. 18

1/27/09 Sankoff Algorithm (cont.) 9 is derived from 7 + 2 So left child is T, And right child is T Sankoff Algorithm (cont.) And the tree is thus labeled… 19

1/27/09 Analysis of Sankoff’s Algorithm A dynamic programming problem algorithm: Op>mal substructure : solu4on obtained by solving smaller problem of same type. s t (parent) = min i { s i ( leo child ) + δ i, t } + min j { s j ( right child ) + δ j, t } t Recurrence terminates at δ i, t δ j, t leaves, where solu4on is s i (leo child) s j (right child) known. Analysis of Sankoff’s Algorithm How many computa6ons do we perform for n species, m characters, and k states per character? Forward step: • At each internal node of tree: s t (parent) = min i { s i ( leo child ) + δ i, t } + min j { s j ( right child ) + δ j, t } • 2k sums and 2 (k‐1) comparisons = 4k ‐2 • n‐1 internal nodes. • (4k – 2)(n ‐1) sums. Traceback: one “lookup” per internal node. (n‐1) opera4ons For each character (4k – 2)(n‐1) + (n‐1) opera4ons ≤ C n k • Above calcula4on performed once for each character: ≤ C m n k opera4ons • O( m n k) 4me. [“big‐O”] • Increases linearly w/ # of species or # of characters. 20

Outline Review of trees. Coun4ng features. Characterbased - PDF document

1/27/09 CSCI1950Z Computa4onal Methods for Biology Lecture 2 Ben Raphael January 26, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Review of trees. Coun4ng features. Characterbased phylogeny Maximum parsimony

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

What is a phylogenetic tree? Bioinformatics Algorithms (Fundamental Algorithms, module 2)

Marine Molluscs Simon Hills (biologist) Ecology Group Institute of Natural Resources Massey

Phylogeny Reconstruction Methods in Linguistics Tandy Warnow The University of Texas at Austin

An Approximate Approach for Solving the Balanced Minimum Evolution Problem A. Aringhieri * , C.

Computing parsimony Parsimony treats each site (position in a sequence) l independently Total

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Erin

Amorerealisticapproachto simulatingheterotachyanditseffect

Multiple sequence alignments and phylogenetic trees Multiple sequence alignment (MSA) Software