Introduction to characters and parsimony analysis Genetic - - PowerPoint PPT Presentation

introduction to characters and parsimony analysis genetic
SMART_READER_LITE
LIVE PREVIEW

Introduction to characters and parsimony analysis Genetic - - PowerPoint PPT Presentation

Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships exist between individuals within populations These include ancestor-descendent relationships and more indirect relationships based on common


slide-1
SLIDE 1

Introduction to characters and parsimony analysis

slide-2
SLIDE 2

Genetic Relationships

  • Genetic relationships exist between individuals within

populations

  • These include ancestor-descendent relationships and more

indirect relationships based on common ancestry

  • Within sexually reducing populations there is a network of

relationships

  • Genetic relations within populations can be measured with

a coefficient of genetic relatedness

slide-3
SLIDE 3

Phylogenetic Relationships

  • Phylogenetic relationships exist between lineages (e.g.

species, genes)

  • These include ancestor-descendent relationships and more

indirect relationships based on common ancestry

  • Phylogenetic relationships between species or lineages are

(expected to be) tree-like

  • Phylogenetic relationships are not measured with a simple

coefficient

slide-4
SLIDE 4

Phylogenetic Relationships

  • Traditionally phylogeny reconstruction was dominated by

the search for ancestors, and ancestor-descendant relationships

  • In modern phylogenetics there is an emphasis on indirect

relationships

  • Given that all lineages are related, closeness of

phylogenetic relationships is a relative concept.

slide-5
SLIDE 5

Phylogenetic relationships

  • Two lineages are more closely related to each other than to

some other lineage if they share a more recent common ancestor - this is the cladistic concept of relationships

  • Phylogenetic hypotheses are hypotheses of common

ancestry

Frog Toad Oak (Frog,Toad)Oak

Hypothetical ancestral lineage

slide-6
SLIDE 6

Phylogenetic Trees

A B C D E F G H I J ROOT polytomy terminal branches interior branches node 1 node 2

LEAVES

A CLADOGRAM

slide-7
SLIDE 7

CLADOGRAMS AND PHYLOGRAMS

ABSOLUTE TIME or DIVERGENCE RELATIVE TIME

A B C D E F G H I J A B C D E F G H I J

slide-8
SLIDE 8

Trees - Rooted and Unrooted

ROOT A B C D E F G H I J A B C D E F G H I J ROOT A B C D E F G H I J ROOT

slide-9
SLIDE 9

Characters and Character States

  • Organisms comprise sets of features
  • When organisms/taxa differ with respect to

a feature (e.g. its presence or absence or different nucleotide bases at specific sites in a sequence) the different conditions are called character states

  • The collection of character states with

respect to a feature constitute a character

slide-10
SLIDE 10

Character evolution

  • Heritable changes (in morphology, gene

sequences, etc.) produce different character states

  • Similarities and differences in character states

provide the basis for inferring phylogeny (i.e. provide evidence of relationships)

  • The utility of this evidence depends on how often

the evolutionary changes that produce the different character states occur independently

slide-11
SLIDE 11

Unique and unreversed characters

  • Given a heritable evolutionary change that is unique

and unreversed (e.g. the origin of hair) in an ancestral species, the presence of the novel character state in any taxa must be due to inheritance from the ancestor

  • Similarly, absence in any taxa must be because the

taxa are not descendants of that ancestor

  • The novelty is a homology acting as badge or marker

for the descendants of the ancestor

  • The taxa with the novelty are a clade (e.g. Mammalia)
slide-12
SLIDE 12

Unique and unreversed characters

  • Because hair evolved only once and is unreversed

(not subsequently lost) it is homologous and provides unambiguous evidence for of relationships

Lizard Frog Human Dog HAIR absent present

change

  • r step
slide-13
SLIDE 13
  • Homoplasy is similarity that is not homologous

(not due to common ancestry)

  • It is the result of independent evolution

(convergence, parallelism, reversal)

  • Homoplasy can provide misleading evidence of

phylogenetic relationships (if mistakenly interpreted as homology)

Homoplasy - Independent evolution

slide-14
SLIDE 14

Homoplasy - independent evolution

Human Lizard Frog Dog TAIL (adult) absent present

  • Loss of tails evolved independently in

humans and frogs - there are two steps on the true tree

slide-15
SLIDE 15

Homoplasy - misleading evidence of phylogeny

  • If misinterpreted as homology, the absence of tails

would be evidence for a wrong tree: grouping humans with frogs and lizards with dogs

Human Frog Lizard Dog TAIL absent present

slide-16
SLIDE 16

Homoplasy - reversal

  • Reversals are evolutionary changes back to an

ancestral condition

  • As with any homoplasy, reversals can provide

misleading evidence of relationships

True tree Wrong tree

10 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10

slide-17
SLIDE 17

Homoplasy - a fundamental problem of phylogenetic inference

  • If there were no homoplastic similarities

inferring phylogeny would be easy - all the pieces of the jig-saw would fit together neatly

  • Distinguishing the misleading evidence of

homoplasy from the reliable evidence of homology is a fundamental problem of phylogenetic inference

slide-18
SLIDE 18

Homoplasy and Incongruence

  • If we assume that there is a single correct

phylogenetic tree then:

  • When characters support conflicting phylogenetic

trees we know that there must be some misleading evidence of relationships among the incongruent or incompatible characters

  • Incongruence between two characters implies that at

least one of the characters is homoplastic and that at least one of the trees the character supports is wrong

slide-19
SLIDE 19

Incongruence or Incompatibility

  • These trees and characters are incongruent - both trees

cannot be correct, at least one is wrong and at least one character must be homoplastic

Lizard Frog Human Dog HAIR absent present Human Frog Lizard Dog TAIL absent present

slide-20
SLIDE 20

Distinguishing homology and homoplasy

  • Morphologists use a variety of techniques to

distinguish homoplasy and homology

  • Homologous features are expected to display detailed

similarity (in position, structure, development) whereas homoplastic similarities are more likely to be superficial

  • As recognised by Charles Darwin congruence with
  • ther characters provides the most compelling

evidence for homology

slide-21
SLIDE 21

The importance of congruence

  • “The importance, for classification, of trifling

characters, mainly depends on their being correlated with several other characters of more or less importance. The value indeed of an aggregate of characters is very evident ........ a classification founded on any single character, however important that may be, has always failed.”

  • Charles Darwin: Origin of Species, Ch. 13
slide-22
SLIDE 22

Congruence

  • We prefer the ‘true’ tree because it is supported

by multiple congruent characters

Lizard Frog Human Dog MAMMALIA Hair Single bone in lower jaw Lactation etc.

slide-23
SLIDE 23

Homoplasy in molecular data

Incongruence and therefore homoplasy can be common in molecular sequence data

– There are a limited number of alternative character states ( e.g. Only A, G, C and T in DNA) – Rates of evolution are sometimes high

Character states are chemically identical

– homology and homoplasy are equally similar – cannot be distinguished by detailed study of similarity and differences

slide-24
SLIDE 24

Parsimony analysis

  • Parsimony methods provide one way of

choosing among alternative phylogenetic hypotheses

  • The parsimony criterion favours hypotheses

that maximise congruence and minimise homoplasy

  • It depends on the idea of the fit of a character to

a tree

slide-25
SLIDE 25

Character Fit

  • Initially, we can define the fit of a character to

a tree as the minimum number of steps required to explain the observed distribution of character states among taxa

  • This is determined by parsimonious character
  • ptimization
  • Characters differ in their fit to different trees
slide-26
SLIDE 26

Character Fit

Frog Cocodile Bird Kangeroo Bat Human Hair absent present Frog Kangeroo Cocodile Human Bat Bird Tree A 1 step Tree B 2 steps

slide-27
SLIDE 27

Parsimony Analysis

  • Given a set of characters, such as aligned

sequences, parsimony analysis works by determining the fit (number of steps) of each character on a given tree

  • The sum over all characters is called Tree

Length

  • Most parsimonious trees (MPTs) have the

minimum tree length needed to explain the

  • bserved distributions of all the characters
slide-28
SLIDE 28

Parsimony in practice

Frog Bird Crocodile Kangeroo Bat Human amnion hair wings antorbital fenestra placenta lactation Tree 1 Tree 2 T A X A FIT

  • +
  • +
  • +
  • +
  • +
  • +
  • +
  • +
  • +
  • +
  • +
  • +
  • +
  • CHARACTERS

1 2 3 4 5 6 + + + + 1 1 TREE LENGTH 1 1 1 1 2 7 2 2 2 2 1 10 Frog Cocodile Kangeroo Bat Bird Human 1 2 3 6 4 4 5 5 23 Tree 2 Cocodile Kangeroo Frog Bird Bat Human 1 Tree 1 2 3 4 6 6 5

Of these two trees, Tree 1 has the shortest length and is the most parsimonious Both trees require some homoplasy (extra steps)

slide-29
SLIDE 29

Results of parsimony analysis

  • One or more most parsimonious trees
  • Hypotheses of character evolution associated with

each tree (where and how changes have occurred)

  • Branch lengths (amounts of change associated with

branches)

  • Various tree and character statistics describing the fit

between tree and data

  • Suboptimal trees - optional
slide-30
SLIDE 30

Character types

  • Characters may differ in the costs

(contribution to tree length) made by different kinds of changes

  • Wagner (ordered, additive)

0 1 2 (morphology, unequal costs)

  • Fitch (unordered, non-additive)

A G (morphology, molecules) T C (equal costs for all changes)

  • ne step

two steps

slide-31
SLIDE 31

Character types

  • Sankoff (generalised)

A G (morphology, molecules) T C (user specified costs)

  • For example, differential weighting of transitions and

transversions

  • Costs are specified in a stepmatrix
  • Costs are usually symmetric but can be asymmetric

also (e.g. costs more to gain than to loose a restriction site)

  • ne step

five steps

slide-32
SLIDE 32

Stepmatrices

  • Stepmatrices specify the costs of changes within a

character

A C G T A 0 5 1 5 C 5 0 5 1 G 1 5 0 5 T 5 1 5 0 To From

A G C T PURINES (Pu) PYRIMIDINES (Py) transitions Py Py Pu Pu transversions Py Pu

Different characters (e.g 1st, 2nd and 3rd) codon positions can also have different weights

slide-33
SLIDE 33

Weighted parsimony

  • If all kinds of steps of all characters have equal

weight then parsimony:

– Minimises homoplasy (extra steps) – Maximises the amount of similarity due to common ancestry – Minimises tree length

  • If steps are weighted unequally parsimony

minimises tree length - a weighted sum of the cost of each character

slide-34
SLIDE 34

Why weight characters?

  • Many systematists consider weighting unacceptable, but weighting is

unavoidable (unweighted = equal weights)

  • Transitions may be more common than transversions
  • Different kinds of transitions and transversions may be more or less

common

  • Rates of change may vary with codon positions
  • The fit of different characters on trees may indicate differences in their

reliabilities

  • However, equal weighting is the commonest procedure and is the

simplest (but probably not the best) approach Ciliate SSUrDNA data

Number of Characters

5 0 100 150 200 250 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1

Number of steps

slide-35
SLIDE 35

Different kinds of changes differ in their frequencies

To A C G T From A C G T Transitions Transversions

Unambiguous changes

  • n most parsimonious

tree of Ciliate SSUrDNA

slide-36
SLIDE 36

Parsimony - advantages

  • is a simple method - easily understood operation
  • does not seem to depend on an explicit model of

evolution

  • gives both trees and associated hypotheses of

character evolution

  • should give reliable results if the data is well

structured and homoplasy is either rare or widely (randomly) distributed on the tree

slide-37
SLIDE 37

Parsimony - disadvantages

  • May give misleading results if homoplasy is common or

concentrated in particular parts of the tree, e.g:

  • thermophilic convergence
  • base composition biases
  • long branch attraction
  • Underestimates branch lengths
  • Model of evolution is implicit - behaviour of method not well

understood

  • Parsimony often justified on purely philosophical grounds - we

must prefer simplest hypotheses - particularly by morphologists

  • For most molecular systematists this is uncompelling
slide-38
SLIDE 38

Parsimony can be inconsistent

  • Felsenstein (1978) developed a simple model phylogeny including four

taxa and a mixture of short and long branches

  • Under this model parsimony will give the wrong tree

A B C D Model tree p p q q q Rates or Branch lengths p >> q A B C D Parsimony tree Wrong

  • With more data the certainty that parsimony will give the wrong tree

increases - so that parsimony is statistically inconsistent

  • Advocates of parsimony initially responded by claiming that

Felsenstein’s result showed only that his model was unrealistic

  • It is now recognised that the long-branch attraction (in the Felsenstein

Zone) is one of the most serious problems in phylogenetic inference Long branches are attracted but the similarity is homoplastic

slide-39
SLIDE 39

Finding optimal trees - exact solutions

  • Exact solutions can only be used for small

numbers of taxa

  • Exhaustive search examines all possible

trees

  • Typically used for problems with less

than 10 taxa

slide-40
SLIDE 40

Finding optimal trees - exhaustive search

A B C 1 2a Starting tree, any 3 taxa A B D C A B D C A B C D 2b 2c E E E E E Add fourth taxon (D) in each of three possible positions -> three trees Add fifth taxon (E) in each of the five possible positions on each of the three trees -> 15 trees, and so on ....

slide-41
SLIDE 41

Finding optimal trees - exact solutions

  • Branch and bound saves time by discarding families
  • f trees during tree construction that cannot be

shorter than the shortest tree found so far

  • Can be enhanced by specifying an initial upper

bound for tree length

  • Typically used only for problems with less than 18

taxa

slide-42
SLIDE 42

Finding optimal trees - branch and bound

A B C B1 A B D C A B C D B3 A1 A B E D C C1.1 A B D E C C1.3 A B D C E C1.2 A B C C1.4 E D A B C C1.5 E D A B D C B2 C2.1 C2.2 C2.3 C2.4 C2.5 C3.1 C3.2 C3.3 C3.4 C3.5

slide-43
SLIDE 43

Finding optimal trees - heuristics

  • The number of possible trees increases exponentially with

the number of taxa making exhaustive searches impractical for many data sets (an NP complete problem)

  • Heuristic methods are used to search tree space for most

parsimonious trees by building or selecting an initial tree and swapping branches to search for better ones

  • The trees found are not guaranteed to be the most

parsimonious - they are best guesses

slide-44
SLIDE 44

Finding optimal trees - heuristics

  • Stepwise addition

Asis - the order in the data matrix Closest -starts with shortest 3-taxon tree adds taxa in order that produces the least increase in tree length (greedy heuristic) Simple - the first taxon in the matrix is a taken as a reference - taxa are added to it in the order of their decreasing similarity to the reference Random - taxa are added in a random sequence, many different sequences can be used

  • Recommend random with as many (e.g. 10-100) addition

sequences as practical

slide-45
SLIDE 45

Finding most parsimonious trees - heuristics

  • Branch Swapping:

Nearest neighbor interchange (NNI) Subtree pruning and regrafting (SPR) Tree bisection and reconnection (TBR) Other methods ....

slide-46
SLIDE 46

Finding optimal trees - heuristics

  • Nearest neighbor interchange (NNI)

A B C D E F G A B D C E F G A B C D E F G

slide-47
SLIDE 47

Finding optimal trees - heuristics

  • Subtree pruning and regrafting (SPR)

A B C D E F G A B C D E F G C D G B A E F

slide-48
SLIDE 48

Finding optimal trees - heuristics

  • Tree bisection and reconnection (TBR)

A B C D E F G A B C D E F G A C F D E B G

slide-49
SLIDE 49

Finding optimal trees - heuristics

  • Branch Swapping

Nearest neighbor interchange (NNI) Subtree pruning and regrafting (SPR) Tree bisection and reconnection (TBR)

  • The nature of heuristic searches means we cannot

know which method will find the most parsimonious trees or all such trees

  • However, TBR is the most extensive swapping

routine and its use with multiple random addition sequences should work well

slide-50
SLIDE 50

Tree space may be populated by local minima and islands of optimal trees

GLOBAL MINIMUM Local Minimum Local Minima Tree Length RANDOM ADDITION SEQUENCE REPLICATES SUCCESS FAILURE FAILURE Branch Swapping Branch Swapping Branch Swapping

slide-51
SLIDE 51

Searching with topological constraints

  • Topological constraints are user-defined

phylogenetic hypotheses

  • Can be used to find optimal trees that either:
  • 1. include a specified clade or set of

relationships

  • 2. exclude a specified clade or set of

relationships (reverse constraint)

slide-52
SLIDE 52

Searching with topological constraints

A B C D E F G ABCD EFG

((A,B,C,D)(E,F,G))

A B C D E F G ABCD EFG A B C E D F G

Compatible with constraint tree CONSTRAINT TREE Incompatible with reverse constraint tree Compatible with reverse constraint tree Incompatible with constraint tree

slide-53
SLIDE 53

Searching with topological constraints backbone constraints

  • Backbone constraints specify relationships

among a subset of the taxa

A B D E A B D E A D B E

possible positions of taxon C Compatible with backbone constraint Incompatible with reverse constraint Incompatible with backbone constraint Compatible with reverse constraint BACKBONE CONSTRAINT ((A,B)(D,E)) relationships of taxon C are not specified

slide-54
SLIDE 54

Parsimonious Character Optimization

A B C D E

* *

0 => 1

= =

OR parallelism 2 separate origins 0 => 1 (DELTRAN)

rigin nd eversal ACCTRAN)

1 1

1 => 0

Homoplastic characters often have alternative equally parsimonious

  • ptimizations

Commonly used varieties are: ACCTRAN - accelerated transformation DELTRAN - delayed transformation Consequently, branch lengths are not always fully determined PAUP reports minimum and maximum branch lengths

slide-55
SLIDE 55

Missing data

  • Missing data is ignored in tree building but can lead to alternative

equally parsimonious optimizations in the absence of homoplasy A B C D E

* *

single

  • rigin

0 => 1

  • n any
  • ne of 3

branches

1 ? ?

*

Abundant missing data can lead to multiple equally parsimonious trees. This can be a serious problem with morphological data but is unlikely to arise with molecular data unless analyses are of incomplete data

slide-56
SLIDE 56

Multiple optimal trees

  • Many methods can yield multiple equally
  • ptimal trees
  • We can further select among these trees with

additional criteria, but

  • Typically, relationships common to all the
  • ptimal trees are summarised with consensus

trees

slide-57
SLIDE 57

Consensus methods

  • A consensus tree is a summary of the agreement

among a set of fundamental trees

  • There are many consensus methods that differ in:
  • 1. the kind of agreement
  • 2. the level of agreement
  • Consensus methods can be used with multiple trees

from a single analysis or from multiple analyses

slide-58
SLIDE 58

Strict consensus methods

  • Strict consensus methods require agreement across all the

fundamental trees

  • They show only those relationships that are unambiguously

supported by the parsimonious interpretation of the data

  • The commonest method (strict component consensus)

focuses on clades/components/full splits

  • This method produces a consensus tree that includes all and
  • nly those full splits found in all the fundamental trees
  • Other relationships (those in which the fundamental trees

disagree) are shown as unresolved polytomies

  • Implemented in PAUP
slide-59
SLIDE 59

Strict consensus methods

A B C D E F G A B C E D F G

TWO FUNDAMENTAL TREES

A B C D E F G

STRICT COMPONENT CONSENSUS TREE

slide-60
SLIDE 60

Majority-rule consensus methods

  • Majority-rule consensus methods require agreement across

a majority of the fundamental trees

  • May include relationships that are not supported by the

most parsimonious interpretation of the data

  • The commonest method focuses on clades/components/full

splits

  • This method produces a consensus tree that includes all and
  • nly those full splits found in a majority (>50%) of the

fundamental trees

  • Other relationships are shown as unresolved polytomies
  • Of particular use in bootstrapping
  • Implemented in PAUP
slide-61
SLIDE 61

Majority rule consensus

A B C D E F G A B C E D F G A B C E D F G

MAJORITY-RULE COMPONENT CONSENSUS TREE

A B C E F D G 100 66 66 66 66

THREE FUNDAMENTAL TREES Numbers indicate frequency of clades in the fundamental trees

slide-62
SLIDE 62

Reduced consensus methods

  • Focuses upon any relationships (not just full splits)
  • Reduced consensus methods occur in strict and

majority-rule varieties

  • Other relationships are shown as unresolved

polytomies

  • May be more sensitive than methods focusing only
  • n clades/components/full splits
  • Strict reduced consensus methods are implemented

in RadCon

slide-63
SLIDE 63

Types of Cladistic Relationships

A B C D E (a) FIVE LEAF TREE C D E D E A B C (b) COMPONENTS / CLADES 5-TAXON STATEMENTS A B D A C D B C D D E A D E B A B C A B E D E C A C E B C E (c) ROOTED TRIPLETS 3-TAXON STATEMENTS A B D E A B D E D E A B FOUR LEAF SUBTREE 4-TAXON STATEMENTS (d) D E

Z

A B C

Y

A B

X X Y Z

slide-64
SLIDE 64

Reduced consensus methods

A B C D E F G

TWO FUNDAMENTAL TREES STRICT REDUCED CONSENSUS TREE Taxon G is excluded

A G B C D E F A B C D E F A B C D E F G

Strict component consensus completely unresolved

slide-65
SLIDE 65

Consensus methods

Spirostomumum Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Tracheloraphis Euplotes Gruberia Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Spirostomumum Euplotes Tracheloraphis Gruberia Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Euplotes Spirostomumum Tracheloraphis Gruberia

Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Tracheloraphis Spirostomum Euplotes Gruberia Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Spirostomum Euplotes Tracheloraphis Gruberia Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Spirostomum Tracheloraphis Gruberia

Three fundamental trees majority-rule strict (component) strict reduced cladistic

Euplotes excluded

100 100 100 100 66 66

slide-66
SLIDE 66

Consensus methods

Use strict methods to identify those relationships unambiguously supported by parsimonious interpretation of the data Use reduced methods where consensus trees are poorly resolved Use majority-rule methods in bootstrapping Avoid other methods which have ambiguous interpretations