machine learning
play

Machine Learning Estimation Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Machine Learning Estimation Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/ Agenda Agenda Introduction Maximum Likelihood Estimation Maximum A Posteriori Estimation Bayesian Estimators Sharif


  1. Machine Learning Estimation Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/

  2. Agenda Agenda  Introduction  Maximum Likelihood Estimation  Maximum A Posteriori Estimation  Bayesian Estimators Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

  3. Densi Density ty Esti Estimati mation on  Model the probability distribution p(x) of a random variable x, given a finite set x1, . . . , xN of observations.  The good estimator is:  Unbiased: Sampling distribution of the estimator centers around the parameter value  Efficient: Smallest possible standard error, compared to other estimators  Methods for parameter estimation  Maximum Likelihood Estimation (MLE)  Maximum A Posteriori estimation (MAP) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

  4. Likel Likelihood ihood Func Functi tion on  Consider n independent observations of x : x 1 , ..., x n , where x follows f ( x ; q ). The joint pdf for the whole data sample is: Now evaluate this function with the data sample obtained and regard it as a function of the parameter(s). This is the likelihood function: ( x i constant) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

  5. Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  Likelihood function:  For each sample point 𝑦 , let 𝜾(𝒚) be the parameter value at which 𝑀 ( 𝜄 | 𝑦 ) attains its maximum as a function of 𝜄 .  The MLE estimator of 𝜄 based on a sample 𝑦 is 𝜾(𝒚) .  The MLE is the parameter point for which the observed sample is more likely. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

  6. Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  If the likelihood function is differentiable (in 𝜄𝑗 ), possible conditions for the MLE are the values ( 𝜄 1 ,…, 𝜄𝑙 ) that solve:  Note that the solutions are possible candidates. To find exact MLE we should check Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

  7. Exa Exampl mple e 1 Adopted from slides of Harvard university Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

  8. Exa Exampl mple e 2  MLE for Gaussian with unknown mean Let 𝑦 1, 𝑦 2 ,…, 𝑦𝑜 be iid samples from 𝑂 ( 𝜄 ,1) . Find and MLE of 𝜄 .   Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

  9. Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  Sometimes it’s more convenient to use log likelihood .  Let 𝑦 1, 𝑦 2 ,…, 𝑦𝑜 be iid samples from Bernouli ( 𝑞 ), then the likelihood function is:  If 𝜄 is the MLE, then for any function 𝜐 ( 𝜄 ) the MLE of 𝜐 ( 𝜄 ) is 𝜐 ( 𝜄 ). Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

  10. Exa Exampl mple e 3  MLE for Gaussian with unknown mean and variance  Let 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑺 be iid samples from 𝑶(𝝂, 𝝉 𝟑 ). Find the MLE for 𝜾 = (𝝂, 𝝉 𝟑 )  Solution:  Prove that MLE for the variance of a Gaussian is biased! Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

  11. Property Property of of MLE MLE  To use two-variate calculus to verify that a function 𝐼 ( 𝜄 1, 𝜄 2) has a maximum at 𝜄 1, 𝜄 2, it must be shown that the following three conditions hold: a) First order partial derivatives are zero: b) At least one second order partial derivative is negative: c) The Jacobian of second order derivatives is positive: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

  12. Exa Exampl mple e 4  MLE for Multinomial distribution (Hint: use Lagrange multipliers)  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 12

  13. MLE: MLE: Mul Multi tinomial nomial d dis istr tributi ibution on  To use two-variate calculus to verify that a function 𝐼 ( 𝜄 1, 𝜄 2) has a maximum at 𝜄 1, 𝜄 2, it must be shown that the following three conditions hold: a) First order partial derivatives are zero: b) At least one second order partial derivative is negative: c) The Jacobian of second order derivatives is positive: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

  14. Exa Exampl mple e 5  MLE for uniform distribution 𝑽 𝟏, 𝜾  Solution: Indicator function Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

  15. Maxim Maximum um A Posteri A Posteriori ori Estim Estimation ation  Approximation:  Instead of averaging over all parameter values  Consider only the most probable value (i.e., value with highest posterior probability)  Usually a very good approximation, and much simpler  MAP value ≠ Expected value  MAP → ML for infinite data (as long as prior ≠ 0 everywhere)  Given a set of observations 𝒠 and a prior distribution on parameters, the parameter vector that maximizes 𝑞 ( 𝒠 | 𝜾 ) 𝑞 ( 𝜾 ) is found. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

  16. Maxim Maximum um A Posteri A Posteriori ori Estim Estimation ation  Priors:  Uninformative priors: Uniform distribution  Conjugate priors: Closed-form representation of posterior, P(q) and P(q|D) have the same form Distribution Conjugate prior Binomial Beta Multinomial Dirichlet Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

  17. MA MAP P VS. VS. M MLE LE Adopted from slides of A. Zisserman Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

  18. MA MAP P VS. VS. M MLE LE  MLE: Choose value that maximizes the probability of observed data: -Suffer from overfitting  MAP: Choose value that is most probable given observed data and prior belief - Can avoid overfitting  When MAP and MLE are the same? Sharif University of Technology, Computer Engineering Department, Machine Learning Course 18

  19. Exa Exampl mple e 6  MAP for Gaussian with unknown mean and having prior 𝟑 ) Find  Let 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑶 be iid samples from 𝑶(𝝂, 𝝉 𝟑 ) and prior 𝑶(𝝂 0 , 𝝉 0 the MAP for 𝝂  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 19

  20. Bayes Estim Bayes Estimators ators  Suppose that we have a prior distribution for 𝜄 : 𝜌 ( 𝜄 )  Let 𝑔 ( 𝑦 | 𝜄 ) be the sampling distribution, then conditional distribution of 𝜄 given the sample 𝑦 is: where 𝑛 ( 𝑦 ) is the marginal distribution of 𝑦 : Sharif University of Technology, Computer Engineering Department, Machine Learning Course 20

  21. Exa Exampl mple e 7  Estimation for Gaussian with unknown mean and having prior 𝟑 ) 𝒃𝒐𝒆 𝜾 ~ 𝑶 ( 𝝂, 𝝉 𝟑 )  Let N iid samples from 𝒚 𝒖 ~ 𝑶 (𝜾, 𝝉 𝟏  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 21

  22. Bayesi Bayesian an Es Estim timato ators rs  Both ML and MAP return only single and specific values for the parameter Θ. Bayesian estimation, by contrast, calculates fully the posterior distribution Prob (Θ|X).  If: prior is well-behaved (i.e., does not assign 0 density to any “feasible” parameter value). Then: both MLE and Bayesian prediction converge to the same value as the number of training data increases. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 22

  23. Any Q Any Questi uestion on End of Lecture 2 Thank you! Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/ Sharif University of Technology, Computer Engineering Department, Machine Learning Course 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend