Estimating the contribution of sequence context to nucleotide - - PowerPoint PPT Presentation

estimating the contribution of sequence context to
SMART_READER_LITE
LIVE PREVIEW

Estimating the contribution of sequence context to nucleotide - - PowerPoint PPT Presentation

Estimating the contribution of sequence context to nucleotide substitution rate heterogeneity Helen Lindsay and Gavin A. Huttley The Gamma Model Yang (1993) used a gamma distribution to model rate variation in - and - globin genes


slide-1
SLIDE 1

Estimating the contribution of sequence context to nucleotide substitution rate heterogeneity

Helen Lindsay and Gavin A. Huttley

slide-2
SLIDE 2

The Gamma Model

  • Yang (1993) used a gamma distribution

to model rate variation in α- and β- globin genes

  • The gamma distribution is often

approximated by four equi-probable bins

slide-3
SLIDE 3

Gamma rate variation

slide-4
SLIDE 4

Improvements on the Gamma model

  • Allow sites to change rates
  • Allow clustering of rates
  • Consider other/multiple rate

distributions

slide-5
SLIDE 5

What causes substitution rate variation?

slide-6
SLIDE 6

What causes substitution rate variation?

Natural selection

slide-7
SLIDE 7

What causes substitution rate variation?

Natural selection Differential repair

slide-8
SLIDE 8

What causes substitution rate variation?

Natural selection Differential repair Nucleotide properties

slide-9
SLIDE 9

AG CG TG (slow) (fast)

slide-10
SLIDE 10

Data

  • 470 alignments, each 50 000

nucleotides long, of introns from human, chimpanzee and macaque one- to-one orthologs.

  • Sampled from Ensembl version 49.
slide-11
SLIDE 11
slide-12
SLIDE 12

The baseline model

slide-13
SLIDE 13

The CpG model

slide-14
SLIDE 14

The Gamma Model

slide-15
SLIDE 15

Gamma vs Dinucleotide models

slide-16
SLIDE 16

Gamma vs Dinucleotide models

slide-17
SLIDE 17

Gamma vs Dinucleotide models

slide-18
SLIDE 18

186.05 40.77 51.07 175.52

slide-19
SLIDE 19

Accounting for CpG substitutions decreases rate variation

slide-20
SLIDE 20
  • Independent sites
  • Reversible
  • Compositional

variance

Alignment position (nucleotides) G+C% G+C%(alignment) GA GG rate

slide-21
SLIDE 21

Advantages of dinucleotide models

  • Less likelihood computation
  • Equivalently parameter-rich
  • No assumed distribution of rate

variation

  • Can incorporate known mutation

biases, for example deamination of methylated cytosine.

  • Smaller alphabet than amino acids
slide-22
SLIDE 22

Acknowledgements

Australian National University

  • Gavin Huttley
  • Hua Ying

University of Singapore

  • Von Bing Yap