Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP
Sham M Kakade
c 2018 University of Washington cse446-staff@cs.washington.edu
1 / 14
Machine Learning (CSE 446): Probabilistic Machine Learning MLE - - PowerPoint PPT Presentation
Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018 c University of Washington cse446-staff@cs.washington.edu 1 / 14 Announcements Homeworks HW 3 posted. Get the most recent version.
1 / 14
◮ HW 3 posted. Get the most recent version. ◮ You must do the regular probs before obtaining any extra credit. ◮ Extra credit factored in after your scores are averaged together.
◮ Review ◮ Probabilistic methods 1 / 14
1 / 14
◮ starting stepsize: start it “large”:
◮ When do we decay it?
2 / 14
3 / 14
◮ we DO know that “early stopping” for GD/SGD is (basically) doing L2 regularization
◮ i.e. if we don’t run for too long, then w2 won’t become too big.
◮ Set with a dev set! ◮ Exact methods (like matrix inverse/least squares): always need to regularize or
◮ GD/SGD: sometimes (often ?) it works just fine ignoring regularization 4 / 14
4 / 14
5 / 14
5 / 14
5 / 14
^
6 / 14
^
6 / 14
^
6 / 14
^
6 / 14
7 / 14
7 / 14
8 / 14
9 / 14
n=1 yn
10 / 14
10 / 14
10 / 14
◮ Observe xn. ◮ Transform it using parameters w to
◮ Sample yn ∼ p(Y | xn, w). 11 / 14
◮ Observe xn. ◮ Transform it using parameters w to
◮ Sample yn ∼ p(Y | xn, w).
◮ Observe xn. ◮ Transform it using parameters w and
◮ Sample yn ∼ p(Y | xn, w). 11 / 14
◮ We have a model Pr(Data|w). ◮ Find w which maximizes the probability of the data you have observed:
w
◮ Also have a prior Pr(W = w) ◮ Now we a have posterior distribution:
◮ Now suppose we are asked to provide our “best guess” at w. What should we do? 12 / 14
13 / 14
13 / 14
13 / 14
14 / 14
^
14 / 14
14 / 14