mcmc based machine learning
play

MCMC based machine learning a . (Bayesian Model Averaging) Nicos - PowerPoint PPT Presentation

MCMC based machine learning a . (Bayesian Model Averaging) Nicos Angelopoulos n.angelopoulos@ed.ac.uk School of Biological Sciences Biochemistry Group University of Edinburgh, Scotland, UK. a Collaborative work with James Cussens, York


  1. MCMC based machine learning a . (Bayesian Model Averaging) Nicos Angelopoulos n.angelopoulos@ed.ac.uk School of Biological Sciences Biochemistry Group University of Edinburgh, Scotland, UK. a Collaborative work with James Cussens, York University, jc@cs.york.ac.uk P .E.L. 2006 – p.1

  2. MCMC Overview Class of sampling algorithms that estimate a posterior distribution. Markov chain construct a chain of visited values, M 1 , M 2 , . . . , M n , by proposing M ∗ from M i , with probability q ( M ∗ , M i ) . Use prior knowledge, p ( M ∗ ) and relative likelihood of the two values, p ( D | M ∗ ) /p ( D | M i ) to decide chain construction. Monte Carlo Use the chain to approximate the posterior p ( M | D ) . P .E.L. 2006 – p.2

  3. Bayesian learning with MCMC Given some data D and a class of statistical models M ( M ∈ M ) that can express relations in the data, use MCMC to approximate normalisation factor in Bayes’ theorem p ( D | M ) p ( M ) p ( M | D ) = � M p ( D | M ) p ( M ) p ( M ) is the prior probability of each model p ( D | M ) the likelihood (how well the model fits the data) p ( M | D ) the posterior P .E.L. 2006 – p.3

  4. Example: Data smoker bronchitis l_cancer person 1 y y n person 2 y n n person 3 y y y person 4 n y n person 5 n n n P .E.L. 2006 – p.4

  5. Example: Models S B 1 B L [b-[],l-[],s-[]] S B 2 B L [b-[s],l-[],s-[]] . . . S B 24 [b-[s],l-[b,s],s-[]] B L P .E.L. 2006 – p.5

  6. Example: Objective P(Bx) . . . B1 B2 B3 B4 . . . B24 � p ( B x ) = 1 B x P .E.L. 2006 – p.6

  7. Metropolis-Hastings (M-H) MCMC 0. Set i = 0 and find M 0 using the prior. 1. From M i produce a candidate model M ∗ . Let the probability of reaching M ∗ be q ( M ∗ , M i ) . 2. Let � q ( M ∗ , M i ) P ( D | M ∗ ) P ( M ∗ ) � α ( M i , M ∗ ) = min q ( M i , M ∗ ) P ( D | M i ) P ( M i ) , 1 � M ∗ with probability α ( M i , M ∗ ) M i +1 = M i with probability 1 − α ( M i , M ∗ ) 3. If i reached limit then terminate, else set i = i + 1 and repeat from 1. P .E.L. 2006 – p.7

  8. Example: MCMC Markov Chain: M 1 B 3 P .E.L. 2006 – p.8

  9. Example: MCMC Markov Chain: M 1 , M 2 B 3 , B 3 P .E.L. 2006 – p.8

  10. Example: MCMC Markov Chain: M 1 , M 2 , M 3 , M 4 , M 5 , . . . B 3 , B 3 , B 10 , B 3 , B 24 , . . . P .E.L. 2006 – p.8

  11. Example: MCMC Markov Chain: M 1 , M 2 , M 3 , M 4 , M 5 , . . . B 3 , B 3 , B 10 , B 3 , B 24 , . . . Monte Carlo: #( B k ) p ( B k ) = � B x #( B x ) P .E.L. 2006 – p.8

  12. SLP defined model space ?− bn( [1,2,3], Bn ). G0 Gi Mi M* From M i identify G i then sample forward to M ⋆ . q ( M i , M ⋆ ) is the probability of proposing M ⋆ when M i is the current model. P .E.L. 2006 – p.9

  13. BN Prior bn( OrdNodes, Bn ) :- bn( Nodes, [], Bn ). bn( [], _PotPar, [] ). bn( [H|T], PotPar, [H-SelParOfH|RemBn] ) :- select_parents( PotPar, H, SelParOfH ), bn( T, [H|PotPar], RemBn ). select_parents( [], [] ). select_parents( [H|T], Pa ) :- include_element( H, Pa, RemPa ), select_parents( T, TPa ). 1/2 : include_element( H, [H|TPa], TPa ). 1/2 : include_element( _H, TPa, TPa ). P .E.L. 2006 – p.10

  14. example BN (Asia) For example ? - bn( [1,2,3,4,5,6,7,8], M ). M = [1-[],2-[1],3-[2,5],4-[],5-[4],6-[4],7-[3],8-[3,6]]. P .E.L. 2006 – p.11

  15. visits and stays P .E.L. 2006 – p.12

  16. Edges recovery With topological ordering constraint and a maximum of 2 parents per node, the algorithm recovers most of the BN arcs in 0.5 M iterations. For example for a .99 cut -off we have : Missing : 2 → 3 (.84) 3 → 7 (.47) Superfluous : 5 → 7 P .E.L. 2006 – p.13

  17. CART priors ? - cart( M ). x2 P split ( η ) = α (1 + d η ) − β =< 1 1 < x1 =< 0 0 < M = node( b, 1, node(a,0,leaf,leaf), leaf ) 1 - Sp: [Sp]: cart( Data, D, A/B, leaf(Data) ). Sp: [Sp]: cart( Data, D, A/B, node(F,V,L,R) ) :- branch( Data, F, V, LData, RData ), D1 is D + 1, NxtSp is A * ((1 + D1) ˆ -B), [NxtSp] : cart( LData, D1, A/B, L ), [NxtSp] : cart( RData, D1, A/B, R ). P .E.L. 2006 – p.14

  18. Experiment Pima Indians Diabetes Database 768 complete entries of 8 variables. Denison et.al. run 250,000 iterations of local perturbations. Their best likelihood model: -343.056 Our experiment run for 250,000 iterations with branch replacing. Parameters: uniform choice proposal, α = . 95 β = . 8 Our best likelihood model: -347.651 P .E.L. 2006 – p.15

  19. Likelihoods trace -340 ’tr_uc_rm_pima_idsd_a0_95b0_8_i250K__s776.llhoods’ -350 -360 -370 -380 -390 -400 -410 -420 0 50000 100000 150000 200000 250000 β = . 8 , α = . 95 , proposal = uniform choice P .E.L. 2006 – p.16

  20. -347.61529077520584 Best likelihood 1:6 best_llhood:vst(37):msclf(145) =< 29.3 >29.3 2:8 11:2 =< 27 >27 =< 166 >166 3:141/5 4:6 12:8 31:7/62 =< 26.3 >26.3 =< 29 >29 5:2 8:8 13:2 28:2 =< 152 >152 =< 54 >54 =< 127 >127 =< 106 >106 6:54/3 7:2/9 9:22/24 10:10/1 14:6 17:3 29:52/19 30:53/92 =< 45.4 >45.4 =< 61 >61 15:129/19 16:1/4 18:1/11 19:8 =< 27 >27 20:5 27:7/5 =< 200 >200 21:7 26:12/1 =< 0.314 >0.314 22:4/1 23:5 =< 32 >32 24:4/6 25:1/6 P .E.L. 2006 – p.17

  21. in Kyoto Models: HMRFs for clustering. Likelihood: design and implement a likelihood -ratio function for HMRFs. Proposal: implement function(s) for reaching proposal model. Application: to real data. SLPs: for more complex priors. P .E.L. 2006 – p.18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend