Bayesian networks Compact representa)on of distribu)ons over - PowerPoint PPT Presentation

Introduc)on ¡to ¡ ¡ Ar)ficial ¡Intelligence ¡ Lecture ¡13 ¡– ¡Approximate ¡Inference ¡ CS/CNS/EE ¡154 ¡ Andreas ¡Krause ¡ TexPoint ¡fonts ¡used ¡in ¡EMF. ¡ ¡

Bayesian ¡networks ¡ � Compact ¡representa)on ¡of ¡distribu)ons ¡over ¡large ¡ number ¡of ¡variables ¡ � (OQen) ¡allows ¡efficient ¡exact ¡inference ¡(compu)ng ¡ marginals, ¡etc.) ¡ HailFinder ¡ 56 ¡vars ¡ ~ ¡3 ¡states ¡each ¡  ~10 26 ¡terms ¡ > ¡ 10.000 ¡years ¡ on ¡Top ¡ ¡ supercomputers ¡ JavaBayes ¡applet ¡ 2 ¡

Typical ¡queries: ¡Condi)onal ¡distribu)on ¡ � Compute ¡distribu)on ¡of ¡some ¡ E ¡ B ¡ variables ¡given ¡values ¡for ¡others ¡ A ¡ J ¡ M ¡ 3 ¡

Typical ¡queries: ¡Maximiza)on ¡ � MPE ¡(Most ¡probable ¡explana)on): ¡ E ¡ B ¡ ¡Given ¡values ¡for ¡some ¡vars, ¡ compute ¡most ¡likely ¡assignment ¡to ¡ all ¡remaining ¡vars ¡ A ¡ J ¡ M ¡ � MAP ¡(Maximum ¡a ¡posteriori): ¡ ¡Compute ¡most ¡likely ¡assignment ¡to ¡ some ¡variables ¡ 4 ¡

Hardness ¡of ¡inference ¡for ¡general ¡BNs ¡ � Compu)ng ¡condi)onal ¡distribu)ons: ¡ � Exact ¡solu)on: ¡#P-‑complete ¡ � NP-‑hard ¡to ¡obtain ¡any ¡nontrivial ¡approxima)on ¡ � Maximiza)on: ¡ � MPE: ¡NP-‑complete ¡ � MAP: ¡NP PP -‑complete ¡ � Inference ¡in ¡general ¡BNs ¡is ¡really ¡hard ¡  ¡ ¡ 5 ¡

Inference ¡ � Can ¡exploit ¡structure ¡(condi)onal ¡independence) ¡to ¡ efficiently ¡perform ¡ exact ¡inference ¡ in ¡many ¡prac)cal ¡ situa)ons ¡ � For ¡BNs ¡where ¡exact ¡inference ¡is ¡not ¡possible, ¡can ¡use ¡ algorithms ¡for ¡ approximate ¡inference ¡ (later) ¡ 6 ¡

Variable ¡elimina)on ¡algorithm ¡ � Given ¡BN ¡and ¡Query ¡P(X ¡| ¡ E = e ) ¡ � Choose ¡an ¡ordering ¡of ¡X 1 ,…,X n ¡ � Set ¡up ¡ini)al ¡factors: ¡f i ¡= ¡P(X i ¡| ¡ Pa i ) ¡ � For ¡i ¡=1:n, ¡X i ¡ ∉ ¡{X, E } ¡ � Collect ¡all ¡factors ¡f ¡that ¡include ¡X i ¡ � Generate ¡new ¡factor ¡by ¡marginalizing ¡out ¡X i ¡ � Add ¡g ¡to ¡set ¡of ¡factors ¡ � Renormalize ¡P(x, e ) ¡to ¡get ¡P(x ¡| ¡ e ) ¡ 7 ¡

Reusing ¡computa)on ¡ � OQen, ¡want ¡to ¡compute ¡condi)onal ¡distribu)ons ¡of ¡ many ¡variables, ¡for ¡fixed ¡observa)ons ¡ � E.g., ¡probability ¡of ¡ Pits ¡ at ¡different ¡loca)ons ¡given ¡ observed ¡ Breezes ¡ ¡ � Repeatedly ¡performing ¡variable ¡elimina)on ¡is ¡ wasteful ¡(many ¡factors ¡are ¡recomputed) ¡ � Need ¡right ¡data-‑structure ¡to ¡avoid ¡recomputa)on ¡  ¡Message ¡passing ¡on ¡factor ¡graphs ¡ 8 ¡

Factor ¡graphs ¡ � P(C,D,G,I,S,L) ¡= ¡P(C) ¡P(I) ¡P(D|C) ¡P(G|D,I) ¡P(S|I,G) ¡P(L|S) ¡ C ¡ f 1 ¡ f 2 ¡ f 3 ¡ f 4 ¡ D ¡ I ¡ CD ¡ DIG ¡ IGS ¡ SL ¡ G ¡ S ¡ L ¡ G ¡ C ¡ D ¡ I ¡ S ¡ L ¡ 9 ¡

Factor ¡graph ¡ � A ¡factor ¡graph ¡for ¡a ¡Bayesian ¡network ¡is ¡a ¡bipar)te ¡ graph ¡consis)ng ¡of ¡ � Variables ¡and ¡ � Factors ¡ � Each ¡factor ¡is ¡associated ¡with ¡a ¡subset ¡of ¡variables, ¡ and ¡all ¡CPDs ¡of ¡the ¡Bayesian ¡network ¡have ¡to ¡be ¡ assigned ¡to ¡one ¡of ¡the ¡factor ¡nodes ¡ C ¡ f 1 ¡ f 2 ¡ f 3 ¡ f 4 ¡ D ¡ I ¡ CD ¡ DIG ¡ IGS ¡ SL ¡ G ¡ S ¡ L ¡ G ¡ C ¡ D ¡ I ¡ S ¡ L ¡ 10 ¡

Sum-‑product ¡message ¡passing ¡on ¡factor ¡graphs ¡ � Messages ¡from ¡node ¡ v ¡to ¡factor ¡ u ¡ � Messages ¡from ¡factor ¡ u ¡to ¡node ¡ v ¡ f 1 ¡ f 2 ¡ f 3 ¡ f 4 ¡ CD ¡ DIG ¡ IGS ¡ SL ¡ G ¡ C ¡ D ¡ I ¡ S ¡ L ¡ 11 ¡

Example ¡messages ¡ P(C|B) ¡ P(A)P(B|A) ¡ f 1 ¡ f 2 ¡ AB ¡ BC ¡ C ¡ A ¡ B ¡ 12 ¡

Belief ¡propaga)on ¡on ¡polytrees ¡ � Belief ¡propaga)on ¡(aka ¡sum-‑product) ¡is ¡ exact ¡ for ¡ polytree ¡Bayesian ¡networks ¡ � Factor ¡graph ¡of ¡polytree ¡is ¡a ¡tree ¡ � Choose ¡one ¡node ¡as ¡root ¡ � Send ¡messages ¡from ¡leaves ¡to ¡root, ¡ ¡ and ¡from ¡root ¡to ¡leaves ¡ � AQer ¡convergence: ¡ � Thus: ¡immediately ¡have ¡correct ¡values ¡for ¡ all ¡ marginals! ¡ 13 ¡

What ¡if ¡we ¡have ¡loops? ¡ � Can ¡s)ll ¡apply ¡belief ¡propaga)on ¡even ¡if ¡we ¡have ¡loops ¡ � Just ¡run ¡it, ¡close ¡your ¡eyes ¡and ¡hope ¡for ¡the ¡best! ¡ � Use ¡approxima)on: ¡ � In ¡general, ¡will ¡not ¡converge… ¡ � Even ¡if ¡it ¡converges, ¡may ¡converge ¡to ¡incorrect ¡marginals… ¡ � However, ¡in ¡prac)ce ¡oQen ¡s)ll ¡useful! ¡ C ¡ � E.g., ¡turbo-‑codes, ¡etc. ¡ D ¡ I ¡ � “Loopy ¡belief ¡propaga)on” ¡ G ¡ S ¡ L ¡ 14 ¡

Behavior ¡of ¡Loopy ¡BP ¡ P(X 1 ¡= ¡1) ¡ 1 ¡ BP ¡es)mate ¡ True ¡ X 1 ¡ posterior ¡ .5 ¡ X 2 ¡ X 3 ¡ 0 ¡ X 4 ¡ Itera)on ¡# ¡ � Loopy ¡BP ¡mul)plies ¡same ¡factors ¡mul)ple ¡)mes ¡ ¡  ¡BP ¡oQen ¡overconfident ¡ 15 ¡

Does ¡Loopy ¡BP ¡always ¡converge? ¡ � No! ¡Can ¡oscillate! ¡ � Typically, ¡oscilla)on ¡the ¡more ¡severe ¡the ¡more ¡ “determinis)c” ¡the ¡poten)als ¡ Graphs ¡from ¡K. ¡Murphy ¡UAI ¡‘99 ¡ 16 ¡

What ¡about ¡MPE ¡queries? ¡ � E.g.,: ¡What’s ¡the ¡most ¡likely ¡assignment ¡to ¡the ¡ unobserved ¡variables, ¡given ¡the ¡observed ¡ones? ¡ E ¡ B ¡ A ¡ J ¡ M ¡ � Use ¡max-‑product ¡ ¡ (same ¡as ¡sum-‑product/BP, ¡but ¡with ¡max ¡instead ¡of ¡sums!) ¡ 17 ¡

Max-‑product ¡message ¡passing ¡on ¡factor ¡graphs ¡ � Messages ¡from ¡nodes ¡to ¡factors ¡ � Messages ¡from ¡factors ¡to ¡nodes ¡ f 1 ¡ f 2 ¡ f 3 ¡ f 4 ¡ CD ¡ DIG ¡ IGS ¡ SL ¡ G ¡ C ¡ D ¡ I ¡ S ¡ L ¡ 18 ¡

Sampling ¡based ¡inference ¡ � So ¡far: ¡determinis)c ¡inference ¡techniques ¡ � Variable ¡elimina)on ¡ � (Loopy) ¡belief ¡propaga)on ¡ � Will ¡now ¡introduce ¡stochas)c ¡approxima)ons ¡ � Algorithms ¡that ¡“randomize” ¡to ¡compute ¡expecta)ons ¡ � In ¡contrast ¡to ¡the ¡determinis)c ¡methods, ¡guaranteed ¡to ¡ converge ¡to ¡right ¡answer ¡(if ¡wait ¡looong ¡enough..) ¡ � More ¡exact, ¡but ¡slower ¡than ¡determinis)c ¡variants ¡ 19 ¡

Compu)ng ¡expecta)ons ¡ � OQen, ¡we’re ¡not ¡necessarily ¡interested ¡in ¡compu)ng ¡ marginal ¡distribu)ons, ¡but ¡certain ¡expecta)ons: ¡ � Moments ¡(mean, ¡variance, ¡…) ¡ � Event ¡probabili)es ¡ 20 ¡

Sample ¡approxima)ons ¡of ¡expecta)ons ¡ � x 1 ,…,x N ¡samples ¡from ¡RV ¡X ¡ � Law ¡of ¡large ¡numbers: ¡ � Hereby, ¡the ¡convergence ¡is ¡with ¡probability ¡1 ¡ ¡ (almost ¡sure ¡convergence) ¡ � Finite ¡samples: ¡ 21 ¡

How ¡many ¡samples ¡do ¡we ¡need? ¡ � Hoeffding ¡inequality ¡ Suppose ¡f ¡is ¡bounded ¡in ¡[0,C]. ¡Then ¡ � Thus, ¡probability ¡of ¡error ¡decreases ¡exponen)ally ¡in ¡N! ¡ � Need ¡to ¡be ¡able ¡to ¡draw ¡samples ¡from ¡P ¡ 22 ¡

Sampling ¡from ¡a ¡Bernoulli ¡distribu)on ¡ � Most ¡random ¡number ¡generators ¡produce ¡ (approximately) ¡uniformly ¡distributed ¡random ¡ numbers ¡ � How ¡can ¡we ¡draw ¡samples ¡from ¡X ¡~ ¡Bernoulli(p)? ¡ 23 ¡

Sampling ¡from ¡a ¡Mul)nomial ¡ � X ¡~ ¡Mult([ µ 1 ,…, µ k ]) ¡ ¡where ¡ µ i ¡= ¡P(X=i); ¡ ∑ i ¡ µ i ¡= ¡1 ¡ µ 3 ¡ … ¡ µ 1 ¡ µ 2 ¡ µ k ¡ 0 ¡ 1 ¡ � Func)on ¡g: ¡[0,1]  {1,…,k} ¡assigns ¡state ¡g(x) ¡to ¡each ¡x ¡ � Draw ¡sample ¡from ¡uniform ¡distribu)on ¡on ¡[0,1] ¡ � Return ¡g -‑1 (x) ¡ 24 ¡

Forward ¡sampling ¡from ¡a ¡BN ¡ 25 ¡

Bayesian networks Compact representa)on of distribu)ons over - PowerPoint PPT Presentation

Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause TexPoint fonts used in EMF. Bayesian networks

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Benchmarking in HPC One person/sites experience James H. Davenport thanks to Steven Chapman,

Application of Complementary Dual AG Codes to Entanglement-Assisted Quantum Codes Francisco

Polar Codes for Classical, Private, and Quantum Communication and Superactivation! (with Joseph

Partially Information Coupled Duo-binary Turbo Codes Xiaowei Wu, Min Qiu, and Jinhong Yuan School

MAP Estimation with Perfect Graphs Tony Jebara July 21, 2009 Background Matchings Perfect

tr r rtss tt

All-loop S-matrix of planar N = 4 Super Yang-Mills from Yangian symmetry Song He Simon

CS 528 Mobile and Ubiquitous Computing Lecture 7: Final Projects + Smorgasbord of Stuff!!