Learning a Belief Network If you know the structure have observed - PowerPoint PPT Presentation

Learning a Belief Network If you ◮ know the structure ◮ have observed all of the variables ◮ have no missing data you can learn each conditional probability separately. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 1

Learning belief network example Model Data → Probabilities A B A B C D E P ( A ) t f t t f E P ( B ) f t t t t P ( E | A , B ) t t f t f P ( C | E ) C D · · · P ( D | E ) � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 2

Learning conditional probabilities Each conditional probability distribution can be learned separately: For example: P ( E = t | A = t ∧ B = f ) (#examples: E = t ∧ A = t ∧ B = f ) + c 1 = (#examples: A = t ∧ B = f ) + c where c 1 and c reflect prior (expert) knowledge ( c 1 ≤ c ). When there are many parents to a node, there can little or no data for each probability estimate: � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 3

Learning conditional probabilities Each conditional probability distribution can be learned separately: For example: P ( E = t | A = t ∧ B = f ) (#examples: E = t ∧ A = t ∧ B = f ) + c 1 = (#examples: A = t ∧ B = f ) + c where c 1 and c reflect prior (expert) knowledge ( c 1 ≤ c ). When there are many parents to a node, there can little or no data for each probability estimate: use supervised learning to learn a decision tree, linear classifier, a neural network or other representation of the conditional probability. A conditional probability doesn’t need to be represented as a table! � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 4

Unobserved Variables A What if we had only observed values for A , B , C ? H A B C t f t f t t t t f B C · · · � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 5

EM Algorithm Augmented Data Probabilities E-step A B C H Count 0 . 7 t f t t P ( A ) t f t f 0 . 3 P ( H | A ) 0 . 9 f t t f P ( B | H ) 0 . 1 f t t t P ( C | H ) M-step · · · · · · � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 6

EM Algorithm Repeat the following two steps: E-step give the expected number of data points for the ◮ unobserved variables based on the given probability distribution. Requires probabilistic inference. M-step infer the (maximum likelihood) probabilities ◮ from the data. This is the same as the full observable case. Start either with made-up data or made-up probabilities. EM will converge to a local maxima. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 7

Belief network structure learning (I) P ( model | data ) = P ( data | model ) × P ( model ) P ( data ) . A model here is a belief network. A bigger network can always fit the data better. P ( model ) lets us encode a preference for smaller networks (e.g., using the description length). You can search over network structure looking for the most likely model. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 8

A belief network structure learning algorithm Search over total orderings of variables. For each total ordering X 1 , . . . , X n use supervised learning to learn P ( X i | X 1 . . . X i − 1 ). Return the network model found with minimum: − log P ( data | model ) − log P ( model ) ◮ P ( data | model ) can be obtained by inference. ◮ How to determine − log P ( model )? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 9

Bayesian Information Criterion (BIC) Score P ( M | D ) = P ( D | M ) × P ( M ) P ( D ) − log P ( M | D ) ∝ − log P ( D | M ) − log P ( M ) − log P ( D | M ) is the negative log likelihood of the model: number of bits to describe the data in terms of the model. If | D | is the number of data instances, there are different probabilities to distinguish. Each one can be described in bits. If there are || M || independent parameters ( || M || is the dimensionality of the model): − log P ( M | D ) ∝ � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 10

Bayesian Information Criterion (BIC) Score P ( M | D ) = P ( D | M ) × P ( M ) P ( D ) − log P ( M | D ) ∝ − log P ( D | M ) − log P ( M ) − log P ( D | M ) is the negative log likelihood of the model: number of bits to describe the data in terms of the model. If | D | is the number of data instances, there are | D | + 1 different probabilities to distinguish. Each one can be described in log( | D | + 1) bits. If there are || M || independent parameters ( || M || is the dimensionality of the model): − log P ( M | D ) ∝ − log P ( D | M ) + || M || log( | D | + 1) (This is approximately the (negated) BIC score.) � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 11

Belief network structure learning (II) Given a total ordering, to determine parents ( X i ) do independence tests to determine which features should be the parents XOR problem: just because features do not give information individually, does not mean they will not give information in combination Search over total orderings of variables � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 12

Missing Data You cannot just ignore missing data unless you know it is missing at random. Is the reason data is missing correlated with something of interest? For example: data in a clinical trial to test a drug may be missing because: � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 13

Missing Data You cannot just ignore missing data unless you know it is missing at random. Is the reason data is missing correlated with something of interest? For example: data in a clinical trial to test a drug may be missing because: ◮ the patient dies ◮ the patient had severe side effects ◮ the patient was cured ◮ the patient had to visit a sick relative. — ignoring some of these may make the drug look better or worse than it is. In general you need to model why data is missing. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 14

Causal Networks A causal network is a Bayesian network that predicts the effects of interventions. To intervene on a variable: ◮ remove the arcs into the variable from its parents ◮ set the value of the variable Intervening on a variable only affects descendants of the variable. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 15

Causality We would expect a causal model to obey the independencies of a belief network. Not all belief networks are causal: Switch_up Light_on Switch_up Light_on Conjecture: causal belief networks are more natural and more concise than non-causal networks. We can’t learn causal models from observational data unless we are prepared to make modeling assumptions. Causal models can be learned from randomized experiments. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 16

General Learning of Belief Networks We have a mixture of observational data and data from randomized studies. We are not given the structure. We don’t know whether there are hidden variables or not. We don’t know the domain size of hidden variables. There is missing data. . . . this is too difficult for current techniques! � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 11.2, Page 17

Learning a Belief Network If you know the structure have observed - PowerPoint PPT Presentation

Learning a Belief Network If you know the structure have observed all of the variables have no missing data you can learn each conditional probability separately. D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

Belief Decision Behavior: Theory and Evidence Todd Davies Belief Concepts Proposition

Belief and assertion. Evidence from mood shift Alda Mari Institut Jean Nicod , cnrs/ens/ehess/psl

Belief network inference Four main approaches to determine posterior distributions in belief

Belief Networks Some Belief Network references E. Charniak Bayesian Networks without

Inference in Belief Networks CMPUT 366: Intelligent Systems P&M 8.4 Lecture Outline

Hypnosis How to harness the Power of your Subconscious Mind The Belief Cycle Outcomes Belief

Your Faith: A Popular Presentation of Catholic Belief Your Faith: A Popular Presentation of

In Search of a Global Belief In Search of a Global Belief Model for Discrete-time Model for

TCP Round-trip Times (RTTs RTTs) ) TCP Round-trip Times ( Popular belief: Popular belief:

11/21/2006 Massachusetts Institute of Technology Motivation Complex embedded systems

Logics of Belief based on Logics of Information Marta Blkov 12 September 2017 Marta

(Belief) Dynamic Doxastic Differential Dynamic Logic (d4L) for Belief-Aware Cyber Physical

Bayesian Belief Networks Decision Theoretic Agents Introduction to Probability [Ch13]

High Dimensional Data Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter 2012 UCSD

On Casting Importance Weighted Autoencoder to an EM Algorithm to Learn Deep Generative Models

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Learning dynamical systems with particle stochastic approximation EM Fredrik Lindsten, Link

Unsupervised Learning About this class Build a model for your data. Which datapoints

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

An Ensemble of Epoch-wise Empirical Bayes for Few-Shot Learning Yaoyao Liu Bernt Schiele Qianru

A Two-Level Toeplitz Model for Large-Scale Simultaneous Hypothesis Testing Dan Cervone Advisor:

Learning a Belief Network If you know the structure have observed - PowerPoint PPT Presentation

Learning a Belief Network If you know the structure have observed all of the variables have no missing data you can learn each conditional probability separately. D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

Belief Decision Behavior: Theory and Evidence Todd Davies Belief Concepts Proposition

Belief and assertion. Evidence from mood shift Alda Mari Institut Jean Nicod , cnrs/ens/ehess/psl

Belief network inference Four main approaches to determine posterior distributions in belief

Belief Networks Some Belief Network references E. Charniak Bayesian Networks without

Inference in Belief Networks CMPUT 366: Intelligent Systems P&amp;M 8.4 Lecture Outline

Hypnosis How to harness the Power of your Subconscious Mind The Belief Cycle Outcomes Belief

Your Faith: A Popular Presentation of Catholic Belief Your Faith: A Popular Presentation of

In Search of a Global Belief In Search of a Global Belief Model for Discrete-time Model for

TCP Round-trip Times (RTTs RTTs) ) TCP Round-trip Times ( Popular belief: Popular belief:

11/21/2006 Massachusetts Institute of Technology Motivation Complex embedded systems

Logics of Belief based on Logics of Information Marta Blkov 12 September 2017 Marta

(Belief) Dynamic Doxastic Differential Dynamic Logic (d4L) for Belief-Aware Cyber Physical

Bayesian Belief Networks Decision Theoretic Agents Introduction to Probability [Ch13]

High Dimensional Data Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter 2012 UCSD

On Casting Importance Weighted Autoencoder to an EM Algorithm to Learn Deep Generative Models

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Learning dynamical systems with particle stochastic approximation EM Fredrik Lindsten, Link

Unsupervised Learning About this class Build a model for your data. Which datapoints

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

An Ensemble of Epoch-wise Empirical Bayes for Few-Shot Learning Yaoyao Liu Bernt Schiele Qianru

A Two-Level Toeplitz Model for Large-Scale Simultaneous Hypothesis Testing Dan Cervone Advisor:

Inference in Belief Networks CMPUT 366: Intelligent Systems P&M 8.4 Lecture Outline