CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zetulemoyer
Your fjrst consultjng job • A billionaire from the suburbs of Seatule asks you a questjon: – He says: I have thumbtack, if I fmip it, what’s the probability it will fall with the nail up? – You say: Please fmip it a few tjmes: – You say: The probability is: • P(H) = 3/5 – He says: Why??? – You say: Because…
Thumbtack – Binomial Distributjon • P(Heads) = θ , P(Tails) = 1- θ … • Flips are i.i.d. : D ={ x i | i =1 …n }, P ( D | θ ) = Π i P ( x i | θ ) – Independent events – Identjcally distributed according to Binomial distributjon • Sequence D of α H Heads and α T Tails
Maximum Likelihood Estjmatjon • Data: Observed set D of α H Heads and α T Tails • Hypothesis space: Binomial distributjons • Learning: fjnding θ is an optjmizatjon problem – What’s the objectjve functjon? • MLE: Choose θ to maximize probability of D
Your fjrst parameter learning algorithm • Set derivatjve to zero, and solve!
But, how many fmips do I need? • Billionaire says: I fmipped 3 heads and 2 tails. • You say: θ = 3/5, I can prove it! • He says: What if I fmipped 30 heads and 20 tails? • You say: Same answer, I can prove it! • He says: What’s betuer? • You say: Umm… The more the merrier??? • He says: Is this why I am paying you the big bucks???
A bound (from Hoefgding’s inequality) • For N = α H + α T , and • Let θ * be the true parameter, for any ε >0 : Prob. of Mistake Exponential Decay! N
PAC Learning • PAC: Probably Approximate Correct • Billionaire says: I want to know the thumbtack θ , within ε = 0.1, with probability at least 1- δ = 0.95. • How many fmips? Or, how big do I set N ? Interesting! Lets look at some numbers! • ε = 0.1, δ=0.05
What if I have prior beliefs? • Billionaire says: Wait, I know that the thumbtack is “close” to 50-50. What can you do for me now? • You say: I can learn it the Bayesian way… • Rather than estjmatjng a single θ , we obtain a distributjon over possible values of θ In the beginning After observations Observe flips e.g.: {tails, tails}
Bayesian Learning Prior Data Likelihood • Use Bayes rule: Posterior Normalization • Or equivalently: • Also, for uniform priors: reduces to MLE objective
Bayesian Learning for Thumbtacks Likelihood functjon is Binomial: • What about prior? – Represent expert knowledge – Simple posterior form • Conjugate priors: – Closed-form representatjon of posterior – For Binomial, conjugate prior is Beta distributjon
Beta prior distributjon – P( θ ) • Likelihood functjon: • Posterior:
Posterior distributjon • Prior: • Data: α H heads and α T tails • Posterior distributjon:
MAP for Beta distributjon • MAP: use most likely parameter: • Beta prior equivalent to extra thumbtack flips • As N → ∞, prior is “forgotten” • But, for small sample size, prior is important!
What about contjnuous variables? • Billionaire says: If I am measuring a contjnuous variable, what can you do for me? • You say: Let me tell you about Gaussians…
Some propertjes of Gaussians • Affjne transformatjon (multjplying by scalar and adding a constant) are Gaussian – X ~ N ( µ , σ 2 ) – Y = aX + b Y ~ N (a µ +b,a 2 σ 2 ) • Sum of Gaussians is Gaussian – X ~ N ( µ X , σ 2X ) – Y ~ N ( µ Y , σ 2Y ) – Z = X+Y Z ~ N ( µ X + µ Y , σ 2X + σ 2Y ) • Easy to difgerentjate, as we will see soon!
Learning a Gaussian x i Exam Score i = • Collect a bunch of data 0 85 1 95 – Hopefully, i.i.d. samples 2 100 – e.g., exam scores 3 12 • Learn parameters … … – Mean: μ 99 89 – Variance: σ
MLE for Gaussian: • Prob. of i.i.d. samples D ={x 1 ,…,x N }: • Log-likelihood of data:
Your second learning algorithm: MLE for mean of a Gaussian • What’s MLE for mean?
MLE for variance • Again, set derivatjve to zero:
Learning Gaussian parameters • MLE: • BTW. MLE for the variance of a Gaussian is biased – Expected result of estjmatjon is not true parameter! – Unbiased variance estjmator:
Bayesian learning of Gaussian parameters • Conjugate priors – Mean: Gaussian prior – Variance: Wishart Distributjon • Prior for mean:
Recommend
More recommend