being bayesian about being bayesian about net work st
play

Being Bayesian About Being Bayesian About Net work St ruct ure Net - PowerPoint PPT Presentation

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works Nir Friedman and Daphne Koller 04/ 21/ 2005 CS673 1 Roadmap Roadmap Bayesian lear


  1. Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works Nir Friedman and Daphne Koller 04/ 21/ 2005 CS673 1

  2. Roadmap Roadmap • Bayesian lear ning of Bayesian Net wor ks – Exact vs Approximat e Learning • Markov Chain Mont e Carlo met hod – MCMC over st ruct ures – MCMC over orderings • Experiment al Result s • Conclusions 04/ 21/ 2005 CS673 2

  3. Bayesian Net works Bayesian Net works • Compact represent at ion of probabilit y dist ribut ions via condit ional independence E B E B P (A| E,B) Qualit at ive part : e b 0.9 0.1 Direct ed acyclic graph-DAG e !b 0.2 0.8 R • Nodes – random variables A !e b 0.9 0.1 !e !b 0.01 0.99 • Edges – direct inf luence C Quant it at ive part : Set of condit ional Toget her : probabilit y dist ribut ion Def ine a unique dist ribut ion in a f act ored f orm P(B,E,A,C,R) =P(B)P(E)P(A| B,E)P(R| E)P(C| A) 04/ 21/ 2005 CS673 3

  4. Why Learn Bayesian Net works? Why Learn Bayesian Net works? • Condit ional independencies & graphical represent at ion capt ure t he st ruct ure of many real-world dist ribut ions - P rovides insight s int o domain • Graph st ruct ure allows “ knowledge discovery” • I s t here a direct connect ion bet ween X & Y • Does X separat e bet ween t wo “ subsyst ems” • Does X causally af f ect Y • Bayesian Net works can be used f or many t asks – I nf erence, causalit y, et c. • Examples: scient if ic dat a mining - Disease propert ies and sympt oms - I nt eract ions bet ween t he expression of genes 04/ 21/ 2005 CS673 4

  5. Learning Bayesian Net works Learning Bayesian Net works E B Inducer Data + Prior Information R A E B P (A| E,B) e b 0.9 0.1 C e !b 0.2 0.8 !e b 0.9 0.1 !e !b 0.01 0.99 •I nducer needs t he prior probabilit y dist ribut ion P( I nducer needs t he prior probabilit y dist ribut ion P( B B ) ) • Using Bayesian condit ioning, updat e t he prior Using Bayesian condit ioning, updat e t he prior P( B B ) ) P( B B | D) | D) P( P( 04/ 21/ 2005 CS673 5

  6. Why St ruggle f or Accurat e St ruct ure? Why St ruggle f or Accurat e St ruct ure? “Tr ue” st r uct ur e Tr ue” st r uct ur e E A B S Adding an ar c Adding an ar c Missing an ar c Missing an ar c E A B E A B S S •I ncr eases t he number of I ncr eases t he number of •Cannot be compensat ed by Cannot be compensat ed by par amet er s t o be f it t ed par amet er s t o be f it t ed accur at e f it t ing of par amet er s accur at e f it t ing of par amet er s Wr ong assumpt ions about Wr ong assumpt ions about Also misses causalit y and domain Also misses causalit y and domain causalit y and domain st r uct ur e causalit y and domain st r uct ur e st r uct ur e st r uct ur e 04/ 21/ 2005 CS673 6

  7. Score- -based learning based learning Score • Def ine scoring f unct ion t hat evaluat es how well a st ruct ure mat ches t he dat a E, B, A < Y,N,N> < Y,Y,Y> < N,Y,Y> . E E B E . A < N,N,N> A A B B • Search f or a st ruct ure t hat maximizes t he score 04/ 21/ 2005 CS673 7

  8. Bayesian Score of a Model Bayesian Score of a Model ( | ) ( ) P D G P G = ( | ) P G D ( ) P D where where ∫ = θ θ θ ( | ) ( | , ) ( | ) P D G P D G P G d Marginal Likelihood Prior over parameters Likelihood 04/ 21/ 2005 CS673 8

  9. Discovering St ruct ure – – Model Select ion Model Select ion Discovering St ruct ure P(G| D) P(G| D) E B R A C •Current pract ice: model select ion Current pract ice: model select ion • P P ick a single high- ick a single high -scoring model scoring model Use t hat model t o inf er domain st ruct ure Use t hat model t o inf er domain st ruct ure 04/ 21/ 2005 CS673 9

  10. Discovering St ruct ure – – Model Averaging Model Averaging Discovering St ruct ure P(G| D) P(G| D) E B E B E B E B E B R R R R A A A A R A C C C C C • Pr oblem • Pr oblem ⇒ Small sample size Small sample size many high scoring models many high scoring models Answer based on one model of t en useless Answer based on one model of t en useless Want f eat ures common t o many models Want f eat ures common t o many models 04/ 21/ 2005 CS673 10

  11. Bayesian Approach Bayesian Approach • Est imat e probabilit y of f eatures – Edge X � Y Bayesian score for G – Markov edge X -- Y – Pat h X � … � Y – ... ∑ = ( | ) ( ) ( | ) P f D f G P G D G Feature of G, e.g., X � Y Indicator function for feature f Huge (super-exponent ial – 2 T (n 2 ) ) number of net works G • • Exact learning - int ract able 04/ 21/ 2005 CS673 11

  12. Approximat e Bayesian Learning Approximat e Bayesian Learning • Rest rict t he search space t o G k , where G k – set of graphs wit h indegree bounded by k -space st ill super-exponent ial • Find a set G of high scoring st ruct ures – Est imat e ∑ ( | ) ( ) P G D f G ≈ G ( | ) P f D ∑ ( | ) P G D G - Hill-climbing – biased sample of st ruct ures 04/ 21/ 2005 CS673 12

  13. Markov Chain Mont e Carlo over Net works Markov Chain Mont e Carlo over Net works MCMC Sampling – Def ine Markov Chain over BNs – Perf orm a walk t hrough t he chain t o get samples G’s whose post eriors converge t o t he post erior P(G| D) of t he t rue st ruct ure • Possible pit f alls: – St ill super-exponent ial number of net works – Time f or chain t o converge t o post erior is unknown – I slands of high post erior, connect ed by low bridges 04/ 21/ 2005 CS673 13

  14. Bet t er Approach t o Approximat e Learning Bet t er Approach t o Approximat e Learning • Furt her const raint s on t he search space – P erf orm model averaging over t he st ruct ures consist ent wit h some know (f ixed) t ot al ordering ‹ • Ordering of variables: – X 1 ‹ X 2 ‹… ‹ X n parent s f or X i must be in X 1 , X 2 ,… , X i-1 • I nt uit ion: Order decouples choice of parent s – Choice of P a(X 7 ) does not rest rict choice of P a(X 12 ) •Can comput e ef f icient ly in closed f orm Can comput e ef f icient ly in closed f orm • (D| ‹ ‹ ) ) Likelihood P Likelihood P (D| ‹ ) , ‹ Feat ure probabilit y P(f | D ) Feat ure probabilit y P(f | D, 04/ 21/ 2005 CS673 14

  15. Sample Orderings Sample Orderings We can writ e ∑ = p p ( | ) ( | , ) ( | ) P f D P f D P D p Sample orderings and approximat e n ∑ = p ( | ) ( | , ) P f D P f i D = 1 i MCMC Sampling • Def ine Markov Chain over orderings Run chain t o get samples f rom post erior P( < • | D) 04/ 21/ 2005 CS673 15

  16. Experiment s: Exact post erior over orders Experiment s: Exact post erior over orders versus order - -MCMC MCMC versus order 04/ 21/ 2005 CS673 16

  17. Experiment s: Convergence Experiment s: Convergence 04/ 21/ 2005 CS673 17

  18. Experiment s: st ruct ure- -MCMC MCMC – – post erior post erior Experiment s: st ruct ure correlat ion f or t wo dif f erent runs correlat ion f or t wo dif f erent runs 04/ 21/ 2005 CS673 18

  19. Experiment s: order - -MCMC MCMC – – post erior post erior Experiment s: order correlat ion f or t wo dif f erent runs correlat ion f or t wo dif f erent runs 04/ 21/ 2005 CS673 19

  20. Conclusion Conclusion • Or der -MCMC bet t er t han st r uct ur e-MCMC 04/ 21/ 2005 CS673 20

  21. Ref erences Ref erences Being Bayesian about Net work St ruct ure: A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works, N. Friedman and D. Koller. Machine Learning J ournal, 2002 NI P S 2001 Tut orial on learning Bayesian net works f rom Dat a. Nir Friedman and Daphne Koller Nir Friedman and Moises Goldzsmidt, AAAI -98 Tut orial on learning Bayesian net works f rom Dat a. D. Hecker man. A Tut or ial on Lear ning wit h Bayesian Net wor ks. I n Lear ning in Gr aphical Models, M. J or dan, ed.. MI T Pr ess, Cambr idge, MA, 1999. Also appear s as Technical Repor t MSR-TR-95-06, Micr osof t Resear ch, Mar ch, 1995. An ear lier ver sion appear s as Bayesian Net wor ks f or Dat a Mining, Dat a Mining and Knowledge Discover y , 1:79-119, 1997. Christ ophe Andrieu, Nando de Freit as, Arnaud Doucet and Michael I . J ordan. An I ntroduction to MCMC f or Machine Learning. Machine Learning, 2002. Art if icial I nt elligence: A Modern Approach. St uart Russell and P et er Norvig 04/ 21/ 2005 CS673 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend