parametric methods
play

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 - PowerPoint PPT Presentation

Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating Distribution Parameters Parametric Classification Regression


  1. Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1

  2. Distributions Estimating Distribution Parameters Parametric Classification Regression Outline Distributions 1 Estimating Distribution Parameters 2 Maximum Likelihood Estimation Evaluating an Estimator: Bias and Variance Bayes’ Estimator Parametric Classification 3 Regression 4 Regression Error Linear Regression Polynomial Regression Tuning Model Complexity Model Selection 2

  3. Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution 3

  4. Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters 3

  5. Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance 3

  6. Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance Estimate parameters from the sample to get an estimated distribution 3

  7. Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance Estimate parameters from the sample to get an estimated distribution Then use that distribution to make decisions 3

  8. Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) 4

  9. Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) 4

  10. Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) For regression, we need to estimate p ( y | x ) 4

  11. Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) For regression, we need to estimate p ( y | x ) In this chapter, we use single variables ( � x = [ x ]) 4

  12. Distributions Estimating Distribution Parameters Parametric Classification Regression Example Distributions Bernoulli: x ∈ { 0 , 1 } P ( x ) = p x 0 (1 − p 0 ) (1 − x ) Multinomial: K > 2 states, x i ∈ { 0 , 1 } � p x i P ( x 1 , x 2 , . . . , x k ) = i i 5

  13. Distributions Estimating Distribution Parameters Parametric Classification Regression Example Distributions Gaussian (Normal): − ( x − µ ) 2 1 � � √ p ( x ) = exp 2 σ 2 2 πσ 6

  14. Distributions Estimating Distribution Parameters Parametric Classification Regression Outline Distributions 1 Estimating Distribution Parameters 2 Maximum Likelihood Estimation Evaluating an Estimator: Bias and Variance Bayes’ Estimator Parametric Classification 3 Regression 4 Regression Error Linear Regression Polynomial Regression Tuning Model Complexity Model Selection 7

  15. Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) 8

  16. Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) How to find θ that makes our sample as likely as possible? 8

  17. Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) How to find θ that makes our sample as likely as possible? Because the x t are indep, the likelihood of θ given X is N � p ( x t | θ ) l ( θ |X ) ≡ p ( X| θ ) = t =1 8

  18. Distributions Estimating Distribution Parameters Parametric Classification Regression Maximum Likelihood Estimation (MLE) Likelihood N � p ( x t | θ ) l ( θ |X ) ≡ p ( X| θ ) = t =1 In MLE, find the θ that makes X the most likely to be seen Search for θ that maximizes l ( θ |X ) To simplify, we often instead maximize the log likelihood : N � log p ( x t | θ ) L ( θ |X ) ≡ log l ( θ |X ) = t =1 Maximum Likelihood Estimator θ ∗ = argmax θ L ( θ |X ) 9

  19. Distributions Estimating Distribution Parameters Parametric Classification Regression Example MLEs Bernoulli: x ∈ { 0 , 1 } P ( x ) = p x 0 (1 − p 0 ) (1 − x ) � p x t 0 (1 − p 0 ) (1 − x t ) L ( p 0 |X ) = log t t x t � MLE : p 0 = N Multinomial: K > 2 states, x i ∈ { 0 , 1 } � p x i P ( x 1 , x 2 , . . . , x k ) = i i x t � � L ( p 1 , p 2 , . . . , p k |X ) = log p i i t i t x t � i MLE : p i = N 10

  20. Distributions Estimating Distribution Parameters Parametric Classification Regression Example MLEs Gaussian (Normal): − ( x − µ ) 2 1 � � p ( x ) = N ( µ, σ 2 ) = √ exp 2 σ 2 2 πσ t x t � MLE for µ : m = N t ( x t − m ) 2 � MLE for σ 2 : s 2 = N 11

  21. Distributions Estimating Distribution Parameters Parametric Classification Regression Bias and Variance Population X drawn from p ( x | θ ) Estimator of θ , d i = d ( X i ) on sample X i Bias: b θ ( d ) = E [ d ] − θ � ( d − E [ d ]) 2 � Variance: E Mean square error: ( d − θ ) 2 � � r ( d , θ ) = E ( E [ d ] − θ ) 2 + E � ( d − E [ d ]) 2 � = Bias 2 + Variance = 12

  22. Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) 13

  23. Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ 13

  24. Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution 13

  25. Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation 13

  26. Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation Use simpler “point” estimators 13

  27. Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation Use simpler “point” estimators If form is tractable, we can do Bayes’ estimator 13

  28. Distributions Estimating Distribution Parameters Parametric Classification Regression Simpler Estimators Maximum a Posteriori (MAP) θ MAP = argmax θ p ( θ |X ) 14

  29. Distributions Estimating Distribution Parameters Parametric Classification Regression Simpler Estimators Maximum a Posteriori (MAP) θ MAP = argmax θ p ( θ |X ) Maximum Likelihood (ML) θ ML = argmax θ p ( X| θ ) 14

  30. Distributions Estimating Distribution Parameters Parametric Classification Regression Bayes’ Estimator Bayes: � θ Bayes = E [ θ |X ] = θ p ( θ |X ) d θ Example: x t ∼ N ( θ, σ 2 0 ) and θ ∼ N ( µ, σ 2 ) Let m be mean of the sample By the Central limit theorem, the distribution of even a non-normal poupulation’s mean is approx. normal, centered on the population mean, with a standard dev. of σ √ N θ ML = m θ MAP = θ Bayes = N /σ 2 1 /σ 2 0 E [ θ |X ] = 0 + 1 /σ 2 m + 0 + 1 /σ 2 µ N /σ 2 N /σ 2 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend