Bayesian Networks Part 2 CS 760@UW-Madison Goals for the lecture - PowerPoint PPT Presentation

Bayesian Networks Part 2 CS 760@UW-Madison

Goals for the lecture you should understand the following concepts • the parameter learning task for Bayes nets • the structure learning task for Bayes nets • maximum likelihood estimation • Laplace estimates • m -estimates • missing data in machine learning • hidden variables • missing at random • missing systematically • the EM approach to imputing missing values in Bayes net parameter learning • the Chow-Liu algorithm for structure search

Learning Bayes Networks: Parameters

The parameter learning task • Given: a set of training instances, the graph structure of a BN Burglary Earthquake B E A J M f f f t f Alarm f t f f f f f t f t … JohnCalls MaryCalls • Do: infer the parameters of the CPDs

The structure learning task • Given: a set of training instances B E A J M f f f t f f t f f f f f t f t … • Do: infer the graph structure (and perhaps the parameters of the CPDs too)

Parameter learning and MLE • maximum likelihood estimation (MLE) • given a model structure (e.g. a Bayes net graph) G and a set of data D • set the model parameters θ to maximize P ( D | G , θ ) • i.e. make the data D look as likely as possible under the model P ( D | G , θ )

Maximum likelihood estimation review consider trying to estimate the parameter θ (probability of heads) of a biased coin from a sequence of flips (1 stands for head) { } x = 1,1,1,0,1,0,0,1,0,1 the likelihood function for θ is given by: What’s MLE of the parameter?

MLE in a Bayes net   =  = ( ) ( ) ( ) d d d ( : , ) ( | , ) ( , ,..., ) L D G P D G P x x x 1 2 n  d D  = ( ) ( ) d d ( | ( )) P x Parents x i i  d D i     =   ( ) ( ) d d ( | ( )) P x Parents x i i    i d D

MLE in a Bayes net   =  = ( ) ( ) ( ) d d d ( : , ) ( | , ) ( , ,..., ) L D G P D G P x x x 1 2 n  d D  = ( ) ( ) d d ( | ( )) P x Parents x i i  d D i     =   ( ) ( ) d d ( | ( )) P x Parents x i i    i d D independent parameter learning problem for each CPD

Maximum likelihood estimation now consider estimating the CPD parameters for B and J in the alarm network given the following data set 1 = = ( ) 0 . 125 P b B E A J M E B 8 f f f t f 7  = = ( ) 0 . 875 P b f t f f f 8 A f f f t t t f f f t J M f f t t f f f t f t f f t t t f f t t t

Maximum likelihood estimation now consider estimating the CPD parameters for B and J in the alarm network given the following data set 1 = = ( ) 0 . 125 P b B E A J M E B 8 f f f t f 7  = = ( ) 0 . 875 P b f t f f f 8 A f f f t t t f f f t 3 = = ( | ) 0 . 75 P j a J M f f t t f 4 1 f f t f t  = = ( | ) 0 . 25 P j a 4 f f t t t 2  = = f f t t t ( | ) 0 . 5 P j a 4 2   = = ( | ) 0 . 5 P j a 4

Maximum likelihood estimation suppose instead, our data set was this… B E A J M E 0 B = = ( ) 0 P b f f f t f 8 f t f f f 8  = = ( ) 1 P b A f f f t t 8 f f f f t J M f f t t f do we really want to f f t f t set this to 0? f f t t t f f t t t

Laplace estimates • instead of estimating parameters strictly from the data, we could start with some prior belief for each • for example, we could use Laplace estimates + 1 n = = ( ) x P X x  + pseudocounts ( 1 ) n v  Values ( ) v X • where n v represents the number of occurrences of value v

M-estimates a more general form: m-estimates n x + p x m P ( X = x ) = prior probability of value x æ ö å ÷ + m n v ç number of “virtual” instances è ø v Î Values( X )

M-estimates example now let’s estimate parameters for B using m=4 and p b =0.25 B E A J M E B f f f t f f t f f f f f f t t A f f f f t f f t t f f f t f t J M f f t t t f f t t t +  +  0 0 . 25 4 1 8 0 . 75 4 11 = = =  b = = = ( ) 0 . 08 ( ) 0 . 92 P b P + + 8 4 12 8 4 12

EM Algorithm

Missing data • Commonly in machine learning tasks, some feature values are missing • some variables may not be observable (i.e. hidden ) even for training instances • values for some variables may be missing at random : what caused the data to be missing does not depend on the missing data itself • e.g. someone accidentally skips a question on an questionnaire • e.g. a sensor fails to record a value due to a power blip • values for some variables may be missing systematically : the probability of value being missing depends on the value • e.g. a medical test result is missing because a doctor was fairly sure of a diagnosis given earlier test results • e.g. the graded exams that go missing on the way home from school are those with poor scores

Missing data • hidden variables; values missing at random • these are the cases we’ll focus on • one solution: try impute the values • values missing systematically • may be sensible to represent “ missing ” as an explicit feature value

Imputing missing data with EM Given: • data set with some missing values • model structure, initial model parameters Repeat until convergence • Expectation (E) step: using current model, compute expectation over missing values • Maximization (M) step: update model parameters with those that maximize probability of the data (MLE or MAP)

Example: EM for parameter learning suppose we’re given the following initial BN and training set B E A J M P(B) P(E) f f ? f f 0.1 0.2 f f ? t f E B t f ? t t B E P(A) f f ? f t t t 0.9 f t ? t f t f A 0.6 f f ? f t f t 0.3 t t ? t t f f 0.2 f f ? f f J M f f ? t f A P(J) A P(M) f f ? f t t 0.9 t 0.8 f 0.2 f 0.1

Example: E-step B E A J M     ( | , , , ) P a b e j m t: 0.0069 f f f f      f: 0.9931 ( | , , , ) P a b e j m t:0.2 f f t f f:0.8 t:0.98 P(B) P(E) t f t t f: 0.02 0.1 0.2 t: 0.2 f f f t E B f: 0.8 B E P(A) t: 0.3 f t t f t t 0.9 f: 0.7 t f A 0.6 t:0.2 f f f t f t 0.3 f: 0.8 f f t: 0.997 0.2 t t t t f: 0.003 J M t: 0.0069 f f f f A P(J) A P(M) f: 0.9931 t 0.9 t 0.8 t:0.2 f f t f f: 0.8 f 0.2 f 0.1 t: 0.2 f f f t f: 0.8

Example: E-step     ( | , , , ) P a b e j m     ( , , , , ) P b e a j m =     +      ( , , , , ) ( , , , , ) P b e a j m P b e a j m     0 . 9 0 . 8 0 . 2 0 . 1 0 . 2 =     +     0 . 9 0 . 8 0 . 2 0 . 1 0 . 2 0 . 9 0 . 8 0 . 8 0 . 8 0 . 9 P(B) P(E) 0 . 00288 = = 0 . 0069 0 . 4176 0.1 0.2 E B    ( | , , , ) P a b e j m B E P(A)    ( , , , , ) P b e a j m t t = 0.9    +     ( , , , , ) ( , , , , ) P b e a j m P b e a j m t f A 0.6     0 . 9 0 . 8 0 . 2 0 . 9 0 . 2 = f t 0.3     +     0 . 9 0 . 8 0 . 2 0 . 9 0 . 2 0 . 9 0 . 8 0 . 8 0 . 2 0 . 9 f f 0.2 0 . 02592 = = 0 . 2 J M 0 . 1296 A P(J) A P(M) t 0.9 t 0.8 f 0.2 f 0.1

Example: M-step   B E A J M re-estimate probabilities # ( ) E a b e = ( | , ) P a b e  using expected counts # ( ) t: 0.0069 E b e f f f f f: 0.9931 0 . 997 = t:0.2 ( | , ) P a b e f f t f 1 f:0.8 0 . 98  e = t:0.98 ( | , ) P a b t f t t 1 f: 0.02 0 . 3 t: 0.2  = ( | , ) f f f t P a b e f: 0.8 1 + + + + + + t: 0.3 0 . 0069 0 . 2 0 . 2 0 . 2 0 . 0069 0 . 2 0 . 2 f t t f   = ( | , ) P a b e f: 0.7 7 t:0.2 f f f t f: 0.8 B E P(A) E B t: 0.997 t t 0.997 t t t t f: 0.003 t f 0.98 t: 0.0069 f t 0.3 f f f f A f: 0.9931 f f 0.145 t:0.2 f f t f f: 0.8 re-estimate probabilities for t: 0.2 J M P ( J | A ) and P ( M | A ) in same way f f f t f: 0.8

Bayesian Networks Part 2 CS 760@UW-Madison Goals for the lecture - PowerPoint PPT Presentation

Bayesian Networks Part 2 CS 760@UW-Madison Goals for the lecture you should understand the following concepts the parameter learning task for Bayes nets the structure learning task for Bayes nets maximum likelihood estimation

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

O On Automata Learning A t t L i and and Conformance Testing onformance est ng Bengt

Accelerating PDE-Constrained Optimization using Adaptive Reduced-Order Models: Application to

ROUNDING ERROR ANALYSIS OF INDEFINITE ORTHOGONALIZATION Miro Rozlo zn k joint work with

Direct methods for sparse linear systems Seminar Summer semester 2017 Andreas Potschka

Next Generation LearnLib (NGLL) Bernhard Steffen, Falk Howar, Maik Merten , Oliver Bauer TU

A Shortcut Fusion Rule for Circular Program Calculation Joo Fernandes 1 Alberto Pardo 2 Joo

Computing With Tensors: Modern Algorithm for . . . Modern Algorithm for . . . Potential

Compiler Construction Compiler Construction 1 / 88 Mayer Goldberg \ Ben-Gurion University Tuesday