MCMC based machine learning a . (Bayesian Model Averaging) Nicos - - PowerPoint PPT Presentation

mcmc based machine learning
SMART_READER_LITE
LIVE PREVIEW

MCMC based machine learning a . (Bayesian Model Averaging) Nicos - - PowerPoint PPT Presentation

MCMC based machine learning a . (Bayesian Model Averaging) Nicos Angelopoulos n.angelopoulos@ed.ac.uk School of Biological Sciences Biochemistry Group University of Edinburgh, Scotland, UK. a Collaborative work with James Cussens, York


slide-1
SLIDE 1

MCMC based machine learning

(Bayesian Model Averaging)

a.

Nicos Angelopoulos

n.angelopoulos@ed.ac.uk

School of Biological Sciences Biochemistry Group University of Edinburgh, Scotland, UK.

aCollaborative work with James Cussens, York University, jc@cs.york.ac.uk

P .E.L. 2006 – p.1

slide-2
SLIDE 2

MCMC Overview

Class of sampling algorithms that estimate a posterior distribution. Markov chain construct a chain of visited values, M1, M2, . . . , Mn, by proposing M∗ from Mi , with probability q(M∗, Mi). Use prior knowledge, p(M∗) and relative likelihood of the two values, p(D|M∗)/p(D|Mi) to decide chain construction. Monte Carlo Use the chain to approximate the posterior p(M|D).

P .E.L. 2006 – p.2

slide-3
SLIDE 3

Bayesian learning with MCMC

Given some data D and a class of statistical models M (M ∈ M) that can express relations in the data, use MCMC to approximate normalisation factor in Bayes’ theorem p(M|D) = p(D|M)p(M)

  • M p(D|M)p(M)

p(M) is the prior probability of each model p(D|M) the likelihood (how well the model fits the data) p(M|D) the posterior

P .E.L. 2006 – p.3

slide-4
SLIDE 4

Example: Data

smoker bronchitis l_cancer person 1 y y n person 2 y n n person 3 y y y person 4 n y n person 5 n n n

P .E.L. 2006 – p.4

slide-5
SLIDE 5

Example: Models

B1

S B L

[b-[],l-[],s-[]] B2

S B L

[b-[s],l-[],s-[]] . . . B24

S B L

[b-[s],l-[b,s],s-[]]

P .E.L. 2006 – p.5

slide-6
SLIDE 6

Example: Objective

B24 . . . P(Bx) . . . B4 B3 B1 B2

  • Bx

p(Bx) = 1

P .E.L. 2006 – p.6

slide-7
SLIDE 7

Metropolis-Hastings (M-H) MCMC

  • 0. Set i = 0 and find M0 using the prior.
  • 1. From Mi produce a candidate model M∗. Let the

probability of reaching M∗ be q(M∗, Mi).

  • 2. Let

α(Mi, M∗) = min q(M∗, Mi)P(D|M∗)P(M∗) q(Mi, M∗)P(D|Mi)P(Mi) , 1

  • Mi+1 =
  • M∗

with probability α(Mi, M∗) Mi with probability 1 − α(Mi, M∗)

  • 3. If i reached limit then terminate, else set i = i + 1 and

repeat from 1.

P .E.L. 2006 – p.7

slide-8
SLIDE 8

Example: MCMC

Markov Chain: M1 B3

P .E.L. 2006 – p.8

slide-9
SLIDE 9

Example: MCMC

Markov Chain: M1, M2 B3, B3

P .E.L. 2006 – p.8

slide-10
SLIDE 10

Example: MCMC

Markov Chain: M1, M2, M3, M4, M5, . . . B3, B3, B10, B3, B24, . . .

P .E.L. 2006 – p.8

slide-11
SLIDE 11

Example: MCMC

Markov Chain: M1, M2, M3, M4, M5, . . . B3, B3, B10, B3, B24, . . . Monte Carlo: p(Bk) = #(Bk)

  • Bx #(Bx)

P .E.L. 2006 – p.8

slide-12
SLIDE 12

SLP defined model space

?− bn( [1,2,3], Bn ). G0 M* Mi Gi

From Mi identify Gi then sample forward to M⋆. q(Mi, M⋆) is the probability of proposing M⋆ when Mi is the current model.

P .E.L. 2006 – p.9

slide-13
SLIDE 13

BN Prior

bn( OrdNodes, Bn ) :- bn( Nodes, [], Bn ). bn( [], _PotPar, [] ). bn( [H|T], PotPar, [H-SelParOfH|RemBn] ) :- select_parents( PotPar, H, SelParOfH ), bn( T, [H|PotPar], RemBn ). select_parents( [], [] ). select_parents( [H|T], Pa ) :- include_element( H, Pa, RemPa ), select_parents( T, TPa ). 1/2 : include_element( H, [H|TPa], TPa ). 1/2 : include_element( _H, TPa, TPa ).

P .E.L. 2006 – p.10

slide-14
SLIDE 14

example BN (Asia)

For example ?- bn( [1,2,3,4,5,6,7,8], M ). M = [1-[],2-[1],3-[2,5],4-[],5-[4],6-[4],7-[3],8-[3,6]].

P .E.L. 2006 – p.11

slide-15
SLIDE 15

visits and stays

P .E.L. 2006 – p.12

slide-16
SLIDE 16

Edges recovery

With topological ordering constraint and a maximum of 2 parents per node, the algorithm recovers most of the BN arcs in 0.5 M iterations. For example for a .99 cut-off we have : Missing : 2 → 3 (.84) 3 → 7 (.47) Superfluous : 5 → 7

P .E.L. 2006 – p.13

slide-17
SLIDE 17

CART priors

Psplit(η) = α(1 + dη)−β

?- cart( M ).

=< 1 =< 0 0 < x1 1 < x2

M = node( b, 1, node(a,0,leaf,leaf), leaf )

1 - Sp: [Sp]: cart( Data, D, A/B, leaf(Data) ). Sp: [Sp]: cart( Data, D, A/B, node(F,V,L,R) ) :- branch( Data, F, V, LData, RData ), D1 is D + 1, NxtSp is A * ((1 + D1) ˆ -B), [NxtSp] : cart( LData, D1, A/B, L ), [NxtSp] : cart( RData, D1, A/B, R ).

P .E.L. 2006 – p.14

slide-18
SLIDE 18

Experiment

Pima Indians Diabetes Database 768 complete entries of 8 variables. Denison et.al. run 250,000 iterations of local perturbations. Their best likelihood model: -343.056 Our experiment run for 250,000 iterations with branch replacing. Parameters: uniform choice proposal, α = .95 β = .8 Our best likelihood model: -347.651

P .E.L. 2006 – p.15

slide-19
SLIDE 19

Likelihoods trace

  • 420
  • 410
  • 400
  • 390
  • 380
  • 370
  • 360
  • 350
  • 340

50000 100000 150000 200000 250000 ’tr_uc_rm_pima_idsd_a0_95b0_8_i250K__s776.llhoods’

β = .8, α = .95, proposal = uniform choice

P .E.L. 2006 – p.16

slide-20
SLIDE 20

Best likelihood

1:6 2:8 =< 29.3 11:2 >29.3 4:6 >27 3:141/5 =< 27 5:2 =< 26.3 8:8 >26.3 6:54/3 =< 152 7:2/9 >152 9:22/24 =< 54 10:10/1 >54 12:8 =< 166 31:7/62 >166 13:2 =< 29 28:2 >29 14:6 =< 127 17:3 >127 15:129/19 =< 45.4 16:1/4 >45.4 19:8 >61 18:1/11 =< 61 20:5 =< 27 27:7/5 >27 21:7 =< 200 26:12/1 >200 23:5 >0.314 22:4/1 =< 0.314 24:4/6 =< 32 25:1/6 >32 29:52/19 =< 106 30:53/92 >106

  • 347.61529077520584

best_llhood:vst(37):msclf(145) P .E.L. 2006 – p.17

slide-21
SLIDE 21

in Kyoto

Models: HMRFs for clustering. Likelihood: design and implement a likelihood-ratio function for HMRFs. Proposal: implement function(s) for reaching proposal model. Application: to real data. SLPs: for more complex priors.

P .E.L. 2006 – p.18