Bayes Network description of the Bayes Network description of the learning problem learning problem h x 4 S x 1 x 2 x 3 y 4 y 1 y 2 y 3 P(S, h, x 4 , y 4 ) = ∏ i P(y i | x i , h) P(x i ) P(h) P(y 4 | x 4 , h) P(x 4 ) = P(S|h) P(h) P(y 4 | x 4 , h) P(x 4 ) = P(h|S) P(S) P(y 4 | x 4 , h) P(x 4 )
Making a Prediction: Making a Prediction: Bayesian Model Averaging Bayesian Model Averaging Goal: given S, x 4 , predict y 4 Goal: given S, x 4 , predict y 4 P ( y 4 | x 4 ,S ) = P ( S,x 4 ,y 4 ) P ( S, x 4 ) P h P ( S, x 4 , y 4 ,h ) = P ( S ) P ( x 4 ) P h P ( h | S ) P ( S ) P ( x 4 ) P ( y 4 | x 4 ,h ) = P ( S ) P ( x 4 ) X = P ( h | S ) P ( y 4 | x 4 ,h ) h
Maximum A Posteriori Maximum A Posteriori (MAP) Estimation (MAP) Estimation Bayesian model averaging is usually Bayesian model averaging is usually infeasible to compute infeasible to compute Replace the Bayesian model average by Replace the Bayesian model average by MAP the best single model h MAP the best single model h X P ( y = k | S, x ) = P ( y = k | h, x ) P ( h | S ) h ∈ H P ( y = k | h MAP , x ) ≈ where where h MAP = argmax h P ( h | S ) = argmax h P ( S | h ) P ( h )
MAP = Penalized Maximum MAP = Penalized Maximum Likelihood Likelihood We can view P(h) as a “ “complexity complexity” ” We can view P(h) as a penalty on the maximum likelihood penalty on the maximum likelihood hypothesis hypothesis h MAP = argmax h P ( S | h ) P ( h ) = argmax h log P ( S | h ) + log P ( h ) = argmax h ` ( h ) + log P ( h )
Where does P(H) come from? Where does P(H) come from? Theory: P(H) should encode all and only all and only our our Theory: P(H) should encode prior knowledge about H. prior knowledge about H. Practice: Practice: – Complexity – Complexity- -based priors based priors penalize large neural network weights penalize large neural network weights penalize large SVM weights penalize large SVM weights penalize large decision trees penalize large decision trees penalize long “ “description lengths description lengths” ” penalize long – Knowledge Knowledge- -based priors based priors – Bayes net structure prior Bayes net structure prior qualitative monotonicity priors qualitative monotonicity priors smoothness priors smoothness priors
Recommend
More recommend