Bayes Network description of the Bayes Network description of the - - PowerPoint PPT Presentation

▶

Feb 07, 2023 112 likes •179 views

Bayes Network description of the Bayes Network description of the learning problem learning problem h x 4 S x 1 x 2 x 3 y 4 y 1 y 2 y 3 P(S, h, x 4 , y 4 ) = i P(y i | x i , h) P(x i ) P(h) P(y 4 | x 4 , h) P(x 4 ) = P(S|h) P(h) P(y 4 | x 4

SLIDE 1

Bayes Network description of the Bayes Network description of the learning problem learning problem

h x1 y1 x2 y2 x3 y3 x4 y4

P(S, h, x4, y4) = ∏i P(yi | xi, h) P(xi) P(h) P(y4 | x4, h) P(x4) = P(S|h) P(h) P(y4 | x4, h) P(x4) = P(h|S) P(S) P(y4 | x4, h) P(x4)

SLIDE 2

Making a Prediction: Making a Prediction: Bayesian Model Averaging Bayesian Model Averaging

Goal: given S, x Goal: given S, x4

4, predict y

, predict y4

P (y4|x4,S) = P (S,x4,y4) P(S, x4) =

P h P(S, x4, y4,h)

P(S)P (x4) =

P h P(h|S)P (S)P(x4)P(y4|x4,h)

P(S)P (x4) =

X h

P(h|S)P(y4|x4,h)

SLIDE 3

Maximum A Posteriori Maximum A Posteriori (MAP) Estimation (MAP) Estimation

Bayesian model averaging is usually Bayesian model averaging is usually infeasible to compute infeasible to compute Replace the Bayesian model average by Replace the Bayesian model average by the best single model h the best single model hMAP

MAP

where where

P(y = k|S, x) =

X h∈H

P(y = k|h, x)P (h|S) ≈ P(y = k|hMAP,x) hMAP = argmaxhP(h|S) = argmaxhP (S|h)P(h)

SLIDE 4

MAP = Penalized Maximum MAP = Penalized Maximum Likelihood Likelihood

We can view P(h) as a We can view P(h) as a “ “complexity complexity” ” penalty on the maximum likelihood penalty on the maximum likelihood hypothesis hypothesis

hMAP = argmaxhP (S|h)P(h) = argmaxh logP(S|h) + logP (h) = argmaxh `(h) + logP(h)

SLIDE 5

Where does P(H) come from? Where does P(H) come from?

Theory: P(H) should encode Theory: P(H) should encode all and only all and only our

prior knowledge about H. prior knowledge about H. Practice: Practice:

– – Complexity Complexity-

based priors

based priors

penalize large neural network weights penalize large neural network weights penalize large SVM weights penalize large SVM weights penalize large decision trees penalize large decision trees penalize long penalize long “ “description lengths description lengths” ”

– – Knowledge Knowledge-

based priors

based priors

Bayes net structure prior Bayes net structure prior qualitative monotonicity priors qualitative monotonicity priors smoothness priors smoothness priors