Mathematics in the Sciences Eggeling et al. Gibbs sampling for - - PowerPoint PPT Presentation

mathematics
SMART_READER_LITE
LIVE PREVIEW

Mathematics in the Sciences Eggeling et al. Gibbs sampling for - - PowerPoint PPT Presentation

Introduction Gibbs sampling algorithm Case studies Gibbs sampling for parsimonious Markov models with latent variables Ralf Eggeling 1 , Pierre-Yves Bourguignon 2 , Andr e Gohr 1 , Ivo Grosse 1 1 Martin Luther University Halle-Wittenberg 2 Max


slide-1
SLIDE 1

Introduction Gibbs sampling algorithm Case studies

Gibbs sampling for parsimonious Markov models with latent variables

Ralf Eggeling1, Pierre-Yves Bourguignon2, Andr´ e Gohr1, Ivo Grosse1

1 Martin Luther University Halle-Wittenberg 2 Max Planck Institute for Mathematics in the Sciences

in the Sciences

Mathematics

Max Planck Institute for Eggeling et al. Gibbs sampling for parsMMs with latent variables 1

slide-2
SLIDE 2

Introduction Gibbs sampling algorithm Case studies

Premise 1: Parsimonious Markov models

proposed by Bourguignon (2008) generalize variable order Markov models use parsimonious context trees (PCTs) here: inhomogeneous models → seperate PCTs for each random variable

A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T Eggeling et al. Gibbs sampling for parsMMs with latent variables 2

slide-3
SLIDE 3

Introduction Gibbs sampling algorithm Case studies

Premise 1: Parsimonious Markov models

proposed by Bourguignon (2008) generalize variable order Markov models use parsimonious context trees (PCTs) here: inhomogeneous models → seperate PCTs for each random variable

A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T

AA Eggeling et al. Gibbs sampling for parsMMs with latent variables 2

slide-4
SLIDE 4

Introduction Gibbs sampling algorithm Case studies

Premise 1: Parsimonious Markov models

proposed by Bourguignon (2008) generalize variable order Markov models use parsimonious context trees (PCTs) here: inhomogeneous models → seperate PCTs for each random variable

A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T

CA GA AA Eggeling et al. Gibbs sampling for parsMMs with latent variables 2

slide-5
SLIDE 5

Introduction Gibbs sampling algorithm Case studies

Premise 1: Parsimonious Markov models

proposed by Bourguignon (2008) generalize variable order Markov models use parsimonious context trees (PCTs) here: inhomogeneous models → seperate PCTs for each random variable

A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T

TA CA GA AA Eggeling et al. Gibbs sampling for parsMMs with latent variables 2

slide-6
SLIDE 6

Introduction Gibbs sampling algorithm Case studies

Premise 1: Parsimonious Markov models

proposed by Bourguignon (2008) generalize variable order Markov models use parsimonious context trees (PCTs) here: inhomogeneous models → seperate PCTs for each random variable

A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T

AC AG AT GC GG GT TA CA GA AA Eggeling et al. Gibbs sampling for parsMMs with latent variables 2

slide-7
SLIDE 7

Introduction Gibbs sampling algorithm Case studies

Premise 1: Parsimonious Markov models

proposed by Bourguignon (2008) generalize variable order Markov models use parsimonious context trees (PCTs) here: inhomogeneous models → seperate PCTs for each random variable

A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T

CC CG CT TC TG TT AC AG AT GC GG GT TA CA GA AA Eggeling et al. Gibbs sampling for parsMMs with latent variables 2

slide-8
SLIDE 8

Introduction Gibbs sampling algorithm Case studies

Premise 1: Parsimonious Markov models

proposed by Bourguignon (2008) generalize variable order Markov models use parsimonious context trees (PCTs) here: inhomogeneous models → seperate PCTs for each random variable

A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T

CC CG CT TC TG TT AC AG AT GC GG GT TA CA GA AA Eggeling et al. Gibbs sampling for parsMMs with latent variables 2

slide-9
SLIDE 9

Introduction Gibbs sampling algorithm Case studies

Premise 2: Latent variable models

many practical applications: latent variables, unobserved/missing data

Eggeling et al. Gibbs sampling for parsMMs with latent variables 3

slide-10
SLIDE 10

Introduction Gibbs sampling algorithm Case studies

Premise 2: Latent variable models

many practical applications: latent variables, unobserved/missing data examples:

Naive Bayes Hidden Markov models Mixture models

Eggeling et al. Gibbs sampling for parsMMs with latent variables 3

slide-11
SLIDE 11

Introduction Gibbs sampling algorithm Case studies

Premise 2: Latent variable models

many practical applications: latent variables, unobserved/missing data examples:

Naive Bayes Hidden Markov models Mixture models

Mixture models

model assumption: data point i generated from one out of C component models → latent variable ui ∈ {1, . . . , C} analytical learning infeasible approximative algorithms:

EM algorithm Gibbs sampling

Eggeling et al. Gibbs sampling for parsMMs with latent variables 3

slide-12
SLIDE 12

Introduction Gibbs sampling algorithm Case studies

Premise 3: Bayesian prediction

Classical prediction

estimate optimal parameters ˆ Θ(X) from training data X Pclassic(Y |X) = P(Y |ˆ Θ(X))

Eggeling et al. Gibbs sampling for parsMMs with latent variables 4

slide-13
SLIDE 13

Introduction Gibbs sampling algorithm Case studies

Premise 3: Bayesian prediction

Classical prediction

estimate optimal parameters ˆ Θ(X) from training data X Pclassic(Y |X) = P(Y |ˆ Θ(X))

Bayesian prediction

do not estimate optimal parameters PBayes(Y |X) =

  • P(Y |Θ)P(Θ|X)dΘ

Eggeling et al. Gibbs sampling for parsMMs with latent variables 4

slide-14
SLIDE 14

Introduction Gibbs sampling algorithm Case studies

Premise 3: Bayesian prediction

Classical prediction

estimate optimal parameters ˆ Θ(X) from training data X Pclassic(Y |X) = P(Y |ˆ Θ(X))

Bayesian prediction

do not estimate optimal parameters PBayes(Y |X) =

  • P(Y |Θ)P(Θ|X)dΘ

classical prediction approximates Bayesian prediction posterior concentrated around ˆ Θ → good approximation posterior diverse → bad approximation

Eggeling et al. Gibbs sampling for parsMMs with latent variables 4

slide-15
SLIDE 15

Introduction Gibbs sampling algorithm Case studies

Putting premises together

Bayesian prediction Latent variable models Parsimonious Markov models

Gibbs Sampling

Eggeling et al. Gibbs sampling for parsMMs with latent variables 5

slide-16
SLIDE 16

Introduction Gibbs sampling algorithm Case studies

Gibbs sampling algorithm

goal: sample from posterior distribution Gibbs sampling: sample iteratively from conditional probability distributions of each variable/parameter

tree structures (1) probability parameters (2) latent variables (3)

probability parameters → simple latent variables → simple structure → difficult

Eggeling et al. Gibbs sampling for parsMMs with latent variables 6

slide-17
SLIDE 17

Introduction Gibbs sampling algorithm Case studies

Structure sampling

probability of a PCT structure: P(τ|X) ∝

  • w∈Cτ

κB( Nw + αw) B( αw) product of leaf scores

  • bservation: subtree (red) probability independent of sibling

subtree(s) given subtree root (green)

A C,G,T A C,G T A,G C,T Eggeling et al. Gibbs sampling for parsMMs with latent variables 7

slide-18
SLIDE 18

Introduction Gibbs sampling algorithm Case studies

Structure sampling

dynamic programming on extended PCT → sibling nodes form P(A) \ {∅}

X A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 8

slide-19
SLIDE 19

Introduction Gibbs sampling algorithm Case studies

Structure sampling

dynamic programming on extended PCT → sibling nodes form P(A) \ {∅} depth identical to that of PCT

X A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 8

slide-20
SLIDE 20

Introduction Gibbs sampling algorithm Case studies

Structure sampling

dynamic programming on extended PCT → sibling nodes form P(A) \ {∅} depth identical to that of PCT traverse tree top-down

X A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 8

slide-21
SLIDE 21

Introduction Gibbs sampling algorithm Case studies

Structure sampling

sample subtrees bottom-up child nodes are a) leaves b) roots of valid PCT subtrees

X A A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 9

slide-22
SLIDE 22

Introduction Gibbs sampling algorithm Case studies

Structure sampling

sample subtrees bottom-up child nodes are a) leaves b) roots of valid PCT subtrees compute score of all valid child combinations (15)

X A A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 9

slide-23
SLIDE 23

Introduction Gibbs sampling algorithm Case studies

Structure sampling

sample subtrees bottom-up child nodes are a) leaves b) roots of valid PCT subtrees compute score of all valid child combinations (15)

X A A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 9

slide-24
SLIDE 24

Introduction Gibbs sampling algorithm Case studies

Structure sampling

sample subtrees bottom-up child nodes are a) leaves b) roots of valid PCT subtrees compute score of all valid child combinations (15)

X A A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 9

slide-25
SLIDE 25

Introduction Gibbs sampling algorithm Case studies

Structure sampling

sample subtrees bottom-up child nodes are a) leaves b) roots of valid PCT subtrees compute score of all valid child combinations (15) sample from that distribution

X A A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 9

slide-26
SLIDE 26

Introduction Gibbs sampling algorithm Case studies

Structure sampling

sample subtrees bottom-up child nodes are a) leaves b) roots of valid PCT subtrees compute score of all valid child combinations (15) sample from that distribution discard rest

X A A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 9

slide-27
SLIDE 27

Introduction Gibbs sampling algorithm Case studies

Structure sampling

assign score of sampled children to subtree root

X A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 10

slide-28
SLIDE 28

Introduction Gibbs sampling algorithm Case studies

Structure sampling

assign score of sampled children to subtree root repeat procedure for all siblings

X A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 10

slide-29
SLIDE 29

Introduction Gibbs sampling algorithm Case studies

Structure sampling

assign score of sampled children to subtree root repeat procedure for all siblings sample a valid selection

X A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 10

slide-30
SLIDE 30

Introduction Gibbs sampling algorithm Case studies

Structure sampling

assign score of sampled children to subtree root repeat procedure for all siblings sample a valid selection discard rest

X A C G T A,C A,G A,T C,G C,T G,T A,C,G A,C,T A,G,T C,G,T A,C,G,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 10

slide-31
SLIDE 31

Introduction Gibbs sampling algorithm Case studies

Structure sampling

assign score of sampled children to subtree root repeat procedure for all siblings sample a valid selection discard rest

A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T A C,G,T A C,G T A,G C,T

Eggeling et al. Gibbs sampling for parsMMs with latent variables 10

slide-32
SLIDE 32

Introduction Gibbs sampling algorithm Case studies

Case studies

C = 2 mixture components each component: parsMM(2) two questions:

Eggeling et al. Gibbs sampling for parsMMs with latent variables 11

slide-33
SLIDE 33

Introduction Gibbs sampling algorithm Case studies

Case studies

C = 2 mixture components each component: parsMM(2) two questions:

Does the algorithm converge? → see paper/poster

Eggeling et al. Gibbs sampling for parsMMs with latent variables 11

slide-34
SLIDE 34

Introduction Gibbs sampling algorithm Case studies

Case studies

C = 2 mixture components each component: parsMM(2) two questions:

Does the algorithm converge? → see paper/poster Does the algorithm work? → classify splice donor sites vs non splices sites → compare: Bayesian prediction (using Gibbs sampling) with classical prediction (using EM algorithm)

Eggeling et al. Gibbs sampling for parsMMs with latent variables 11

slide-35
SLIDE 35

Introduction Gibbs sampling algorithm Case studies

Data

Splice sites

exon-intron boundaries in genes of higher organisms length = 9 (7) alphabet A = {A, C, G, T}

exon intron exon

N N N G T N N N N

splice donor site

Eggeling et al. Gibbs sampling for parsMMs with latent variables 12

slide-36
SLIDE 36

Introduction Gibbs sampling algorithm Case studies

Classification problem

data from Yeo/Burge 2004 repeated holdout, samplesize = 500

TTTGTAATA CAAGTAGTG GTAGTTGAC CAAGTATTT AAAGTATAG ... TGGGTTATG ATAGTGGGC TATGTATTA AGGGTTGAA AGGGTCCGA ... AAGGTATTG CAGGTAATA AAGGTAAAA ATAGTAAGT CTGGTGAGC ... CAGGTGTGT AGGGTGAGT CGGGTAAGG AAGGTGGGA ATAGTAAGT ...

splice sites non splice sites training test

Eggeling et al. Gibbs sampling for parsMMs with latent variables 13

slide-37
SLIDE 37

Introduction Gibbs sampling algorithm Case studies

Classification results

  • 20

50 100 0.960 0.965 0.970 0.975 0.980 Number of leaves Area under ROC curve

  • Gibbs sampling

EM algorithm

Eggeling et al. Gibbs sampling for parsMMs with latent variables 14

slide-38
SLIDE 38

Introduction Gibbs sampling algorithm Case studies

Summary

premises:

parsimonious Markov models latent variables Bayesian prediction

→ Gibbs sampling key step: structure sampling → Dynamic programming convergence: autocorrelations okay classification: Bayesian prediction/Gibbs sampling

  • utperforms classical prediction/EM algorithm

future: application to other problems involving latent variables

Eggeling et al. Gibbs sampling for parsMMs with latent variables 15