Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn - PowerPoint PPT Presentation

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Outline 1 ● Bayesian Networks ● Parameterized distributions ● Exact inference ● Approximate inference Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

2 bayesian networks Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Bayesian Networks 3 ● A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions ● Syntax – a set of nodes, one per variable – a directed, acyclic graph (link ≈ “directly influences”) – a conditional distribution for each node given its parents: P ( X i ∣ Parents ( X i )) ● In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 4 ● Topology of network encodes conditional independence assertions: ● Weather is independent of the other variables ● Toothache and Catch are conditionally independent given Cavity Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 5 ● I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar? ● Variables: Burglar , Earthquake , Alarm , JohnCalls , MaryCalls ● Network topology reflects “causal” knowledge – A burglar can set the alarm off – An earthquake can set the alarm off – The alarm can cause Mary to call – The alarm can cause John to call Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 6 Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Compactness 7 ● A conditional probability table for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values ● Each row requires one number p for X i = true (the number for X i = false is just 1 − p ) ● If each variable has no more than k parents, the complete network requires O ( n ⋅ 2 k ) numbers ● I.e., grows linearly with n , vs. O ( 2 n ) for the full joint distribution ● For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 − 1 = 31 ) Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Global Semantics 8 ● Global semantics defines the full joint distribution as the product of the local conditional distributions: P ( x 1 ,...,x n ) = n P ( x i ∣ parents ( X i )) ∏ i = 1 ● E.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) P ( j ∣ a ) P ( m ∣ a ) P ( a ∣¬ b, ¬ e ) P (¬ b ) P (¬ e ) = 0 . 9 × 0 . 7 × 0 . 001 × 0 . 999 × 0 . 998 = ≈ 0 . 00063 Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Local Semantics 9 ● Local semantics: each node is conditionally independent of its nondescendants given its parents ● Theorem: Local semantics ⇔ global semantics Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Markov Blanket 10 ● Each node is conditionally independent of all others given its Markov blanket: parents + children + children’s parents Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Constructing Bayesian Networks 11 ● Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics Choose an ordering of variables X 1 ,...,X n 1. For i = 1 to n 2. add X i to the network select parents from X 1 ,...,X i − 1 such that P ( X i ∣ Parents ( X i )) = P ( X i ∣ X 1 , ..., X i − 1 ) ● This choice of parents guarantees the global semantics: P ( X 1 ,...,X n ) n P ( X i ∣ X 1 , ..., X i − 1 ) ∏ = (chain rule) i = 1 n P ( X i ∣ Parents ( X i )) ∏ = (by construction) i = 1 Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 12 ● Suppose we choose the ordering M , J , A , B , E ● P ( J ∣ M ) = P ( J ) ? Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 13 ● Suppose we choose the ordering M , J , A , B , E ● P ( J ∣ M ) = P ( J ) ? No ● P ( A ∣ J,M ) = P ( A ∣ J ) ? P ( A ∣ J,M ) = P ( A ) ? Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 14 ● Suppose we choose the ordering M , J , A , B , E ● P ( J ∣ M ) = P ( J ) ? No ● P ( A ∣ J,M ) = P ( A ∣ J ) ? P ( A ∣ J,M ) = P ( A ) ? No ● P ( B ∣ A,J,M ) = P ( B ∣ A ) ? ● P ( B ∣ A,J,M ) = P ( B ) ? Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 15 ● Suppose we choose the ordering M , J , A , B , E ● P ( J ∣ M ) = P ( J ) ? No ● P ( A ∣ J,M ) = P ( A ∣ J ) ? P ( A ∣ J,M ) = P ( A ) ? No ● P ( B ∣ A,J,M ) = P ( B ∣ A ) ? Yes ● P ( B ∣ A,J,M ) = P ( B ) ? No ● P ( E ∣ B,A,J,M ) = P ( E ∣ A ) ? ● P ( E ∣ B,A,J,M ) = P ( E ∣ A,B ) ? Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 16 ● Suppose we choose the ordering M , J , A , B , E ● P ( J ∣ M ) = P ( J ) ? No ● P ( A ∣ J,M ) = P ( A ∣ J ) ? P ( A ∣ J,M ) = P ( A ) ? No ● P ( B ∣ A,J,M ) = P ( B ∣ A ) ? Yes ● P ( B ∣ A,J,M ) = P ( B ) ? No ● P ( E ∣ B,A,J,M ) = P ( E ∣ A ) ? No ● P ( E ∣ B,A,J,M ) = P ( E ∣ A,B ) ? Yes Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example 17 ● Deciding conditional independence is hard in noncausal directions ● (Causal models and conditional independence seem hardwired for humans!) ● Assessing conditional probabilities is hard in noncausal directions ● Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example: Car Diagnosis 18 ● Initial evidence: car won’t start ● Testable variables (green), “broken, so fix it” variables (orange) ● Hidden variables (gray) ensure sparse structure, reduce parameters Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Example: Car Insurance 19 Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Compact Conditional Distributions 20 ● CPT grows exponentially with number of parents CPT becomes infinite with continuous-valued parent or child ● Solution: canonical distributions that are defined compactly ● Deterministic nodes are the simplest case: X = f ( Parents ( X )) for some function f ● E.g., Boolean functions NorthAmerican ⇔ Canadian ∨ US ∨ Mexican ● E.g., numerical relationships among continuous variables ∂Level = inflow + precipitation - outflow - evaporation ∂t Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Compact Conditional Distributions 21 ● Noisy-OR distributions model multiple noninteracting causes – parents U 1 ...U k include all causes (can add leak node) – independent failure probability q i for each cause alone � ⇒ P ( X ∣ U 1 ...U j , ¬ U j + 1 ... ¬ U k ) = 1 − ∏ j i = 1 q i P ( Fever ) P (¬ Fever ) Cold Flu Malaria F F F 0.0 1 . 0 F F T 0 . 9 0.1 0.2 F T F 0 . 8 0 . 02 = 0 . 2 × 0 . 1 F T T 0 . 98 T F F 0.6 0 . 4 0 . 06 = 0 . 6 × 0 . 1 T F T 0 . 94 0 . 12 = 0 . 6 × 0 . 2 T T F 0 . 88 0 . 012 = 0 . 6 × 0 . 2 × 0 . 1 T T T 0 . 988 ● Number of parameters linear in number of parents Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Hybrid (Discrete+Continuous) Networks 22 ● Discrete ( Subsidy ? and Buys ? ); continuous ( Harvest and Cost ) ● Option 1: discretization—possibly large errors, large CPTs Option 2: finitely parameterized canonical families ● 1) Continuous variable, discrete+continuous parents (e.g., Cost ) 2) Discrete variable, continuous parents (e.g., Buys ? ) Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Continuous Child Variables 23 ● Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents ● Most common is the linear Gaussian model, e.g.,: P ( Cost = c ∣ Harvest = h,Subsidy ? = true ) N ( a t h + b t ,σ t )( c ) = 2 ( c − ( a t h + b t ) 2 √ exp (− 1 ) ) 1 = σ t σ t 2 π ● Mean Cost varies linearly with Harvest , variance is fixed ● Linear variation is unreasonable over the full range but works OK if the likely range of Harvest is narrow Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Continuous Child Variables 24 ● All-continuous network with LG distributions � ⇒ full joint distribution is a multivariate Gaussian ● Discrete+continuous LG network is a conditional Gaussian network i.e., a multivariate Gaussian over all continuous variables for each combination of discrete variable values Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Discrete Variable w/ Continuous Parents 25 ● Probability of Buys ? given Cost should be a “soft” threshold: ● Probit distribution uses integral of Gaussian: Φ ( x ) = ∫ −∞ N ( 0 , 1 )( x ) dx x P ( Buys ? = true ∣ Cost = c ) = Φ ((− c + µ )/ σ ) Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn - PowerPoint PPT Presentation

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015 Outline 1 Bayesian Networks Parameterized distributions Exact inference Approximate inference Philipp

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Bayesian Networks Volker Sorge Intro to AI: Specifying Probability Distributions Lecture 8

Monte Carol Integration Sung-Eui Yoon ( ) Course URL:

Optimization and Simulation Drawing from distributions Michel Bierlaire Transport and Mobility

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate One

Simulating quantum correlations as a sampling problem Julien Degorre | L.R.I. Universit Paris

STAT2201 Analysis of Engineering & Scientific Data Unit 6 Slava Vaisman The University of

Search for b b decay of Higgs associated with a vector boson at ATLAS Lei Zhang on behalf of

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn - PowerPoint PPT Presentation

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian Networks 29 October 2015 Outline 1 Bayesian Networks Parameterized distributions Exact inference Approximate inference Philipp

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Bayesian Networks Volker Sorge Intro to AI: Specifying Probability Distributions Lecture 8

Monte Carol Integration Sung-Eui Yoon ( ) Course URL:

Optimization and Simulation Drawing from distributions Michel Bierlaire Transport and Mobility

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate One

Simulating quantum correlations as a sampling problem Julien Degorre | L.R.I. Universit Paris

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 6 Slava Vaisman The University of

Search for b b decay of Higgs associated with a vector boson at ATLAS Lei Zhang on behalf of

STAT2201 Analysis of Engineering & Scientific Data Unit 6 Slava Vaisman The University of