cse446 point estjmatjon spring 2017
play

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted - PowerPoint PPT Presentation

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zetulemoyer Your fjrst consultjng job A billionaire from the suburbs of Seatule asks you a questjon: He says: I have thumbtack, if


  1. CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zetulemoyer

  2. Your fjrst consultjng job • A billionaire from the suburbs of Seatule asks you a questjon: – He says: I have thumbtack, if I fmip it, what’s the probability it will fall with the nail up? – You say: Please fmip it a few tjmes: – You say: The probability is: • P(H) = 3/5 – He says: Why??? – You say: Because…

  3. Thumbtack – Binomial Distributjon • P(Heads) = θ , P(Tails) = 1- θ … • Flips are i.i.d. : D ={ x i | i =1 …n }, P ( D | θ ) = Π i P ( x i | θ ) – Independent events – Identjcally distributed according to Binomial distributjon • Sequence D of α H Heads and α T Tails

  4. Maximum Likelihood Estjmatjon • Data: Observed set D of α H Heads and α T Tails • Hypothesis space: Binomial distributjons • Learning: fjnding θ is an optjmizatjon problem – What’s the objectjve functjon? • MLE: Choose θ to maximize probability of D

  5. Your fjrst parameter learning algorithm • Set derivatjve to zero, and solve!

  6. But, how many fmips do I need? • Billionaire says: I fmipped 3 heads and 2 tails. • You say: θ = 3/5, I can prove it! • He says: What if I fmipped 30 heads and 20 tails? • You say: Same answer, I can prove it! • He says: What’s betuer? • You say: Umm… The more the merrier??? • He says: Is this why I am paying you the big bucks???

  7. A bound (from Hoefgding’s inequality) • For N = α H + α T , and • Let θ * be the true parameter, for any ε >0 : Prob. of Mistake Exponential Decay! N

  8. PAC Learning • PAC: Probably Approximate Correct • Billionaire says: I want to know the thumbtack θ , within ε = 0.1, with probability at least 1- δ = 0.95. • How many fmips? Or, how big do I set N ? Interesting! Lets look at some numbers! • ε = 0.1, δ=0.05

  9. What if I have prior beliefs? • Billionaire says: Wait, I know that the thumbtack is “close” to 50-50. What can you do for me now? • You say: I can learn it the Bayesian way… • Rather than estjmatjng a single θ , we obtain a distributjon over possible values of θ In the beginning After observations Observe flips e.g.: {tails, tails}

  10. Bayesian Learning Prior Data Likelihood • Use Bayes rule: Posterior Normalization • Or equivalently: • Also, for uniform priors:  reduces to MLE objective

  11. Bayesian Learning for Thumbtacks Likelihood functjon is Binomial: • What about prior? – Represent expert knowledge – Simple posterior form • Conjugate priors: – Closed-form representatjon of posterior – For Binomial, conjugate prior is Beta distributjon

  12. Beta prior distributjon – P( θ ) • Likelihood functjon: • Posterior:

  13. Posterior distributjon • Prior: • Data: α H heads and α T tails • Posterior distributjon:

  14. MAP for Beta distributjon • MAP: use most likely parameter: • Beta prior equivalent to extra thumbtack flips • As N → ∞, prior is “forgotten” • But, for small sample size, prior is important!

  15. What about contjnuous variables? • Billionaire says: If I am measuring a contjnuous variable, what can you do for me? • You say: Let me tell you about Gaussians…

  16. Some propertjes of Gaussians • Affjne transformatjon (multjplying by scalar and adding a constant) are Gaussian – X ~ N ( µ , σ 2 ) – Y = aX + b  Y ~ N (a µ +b,a 2 σ 2 ) • Sum of Gaussians is Gaussian – X ~ N ( µ X , σ 2X ) – Y ~ N ( µ Y , σ 2Y ) – Z = X+Y  Z ~ N ( µ X + µ Y , σ 2X + σ 2Y ) • Easy to difgerentjate, as we will see soon!

  17. Learning a Gaussian x i Exam Score i = • Collect a bunch of data 0 85 1 95 – Hopefully, i.i.d. samples 2 100 – e.g., exam scores 3 12 • Learn parameters … … – Mean: μ 99 89 – Variance: σ

  18. MLE for Gaussian: • Prob. of i.i.d. samples D ={x 1 ,…,x N }: • Log-likelihood of data:

  19. Your second learning algorithm: MLE for mean of a Gaussian • What’s MLE for mean?

  20. MLE for variance • Again, set derivatjve to zero:

  21. Learning Gaussian parameters • MLE: • BTW. MLE for the variance of a Gaussian is biased – Expected result of estjmatjon is not true parameter! – Unbiased variance estjmator:

  22. Bayesian learning of Gaussian parameters • Conjugate priors – Mean: Gaussian prior – Variance: Wishart Distributjon • Prior for mean:

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend