maximum maximum likelihood estimation
play

Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin - PowerPoint PPT Presentation

Learning Probabilistic Graphical Parameter Estimation Models Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin Example P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- sampled IID from P sampled IID from P


  1. Learning Probabilistic Graphical Parameter Estimation Models Maximum Maximum Likelihood Estimation Daphne Koller

  2. Biased Coin Example P is a Bernoulli distribution: P(X=1) = θ , P(X=0) = 1- θ sampled IID from P sampled IID from P • Tosses are independent of each other • Tosses are sampled from the same distribution (identically distributed) Daphne Koller

  3. IID as a PGM θ θ X . . . X[1] X[M] Data m ⎧ θ = 1 [ ] x m x θ = ⎨ ( [ ] | ) P x m − θ = 0 ⎩ 1 [ ] x m x Daphne Koller

  4. Maximum Likelihood Estimation • Goal: find θ∈ [0,1] that predicts D well • Prediction quality = likelihood of D given θ ∏ ∏ M θ θ = θ θ = θ θ ( ( : : ) ) ( ( | | ) ) ( ( [ [ ] ] | | ) ) L L D D P P D D P P x x m m = m 1 ( ) θ : , , , , L H T T H H L ( D: θ ) 1 θ 0 0.2 0.4 0.6 0.8 Daphne Koller

  5. Maximum Likelihood Estimator • Observations: M H heads and M T tails • Find θ maximizing likelihood θ = θ − θ M M ( : , ) ( 1 ) L M M H T H T • Equivalent to maximizing log-likelihood θ = θ + − θ ( : , ) log log( 1 ) l M M M M H T H T • Differentiating the log-likelihood and solving for θ : M θ ˆ = H + M M H T Daphne Koller

  6. Sufficient Statistics • For computing θ in the coin toss example, we only needed M H and M T since θ = θ − θ M M ( : ) ( 1 ) L D H T • � M H and M T are sufficient statistics Daphne Koller

  7. Sufficient Statistics • A function s(D) is a sufficient statistic from instances to a vector in ℜ k if for any two datasets D and D’ and any θ∈Θ we have = ∑ = ∑ ∑ ∑ ⇒ ⇒ θ θ = = θ θ ( ( [ [ ]) ]) ( ( [ [ ]) ]) ( ( : : ) ) ( ( : : ' ) ) s s x x i i s s x x i i L L D D L L D D ∈ ∈ x [ i ] D x [ i ] D ' Datasets Statistics Daphne Koller

  8. Sufficient Statistic for Multinomial • For a dataset D over variable X with k values, the sufficient statistics are counts <M 1 ,...,M k > where M i is the # of times that X[m]=x i in D • Sufficient statistic s(x) is a tuple of dimension k – s(x i )=(0,...0,1,0,...,0) ∏ k θ = θ M ( : ) L D i i = i 1 i Daphne Koller

  9. Sufficient Statistic for Gaussian • Gaussian distribution: 2 − μ ⎛ ⎞ 1 x − 1 ⎜ ⎟ μ σ = σ ⎝ ⎠ 2 2 ( ) ~ ( , ) ( ) P X N if p X e π σ 2 • Rewrite as Rewrite as ⎛ ⎞ 2 σ 2 + x μ σ 2 − μ 2 1 1 p ( X ) = 2 πσ exp − x 2 ⎜ ⎟ 2 σ 2 ⎝ ⎠ • Sufficient statistics for Gaussian: s(x)=<1,x,x 2 > Daphne Koller

  10. Maximum Likelihood Estimation • MLE Principle: Choose θ to maximize L(D: Θ ) • Multinomial MLE: • Multinomial MLE: M θ θ ˆ = = i i ∑ m M = i 1 i • Gaussian MLE: ) 1 ∑ μ = [ ] x m M m 1 ∑ σ = − μ 2 ˆ ˆ ( [ ] ) x m M m Daphne Koller

  11. Summary • Maximum likelihood estimation is a simple principle for parameter selection given D • Likelihood function uniquely determined Lik lih d functi n uniqu l d t min d by sufficient statistics that summarize D • MLE has closed form solution for many parametric distributions Daphne Koller

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend