12. Principles of Parameter Estimation The purpose of this lecture - PowerPoint PPT Presentation

12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in earlier lectures to practical problems of interest. In this context, consider the problem of estimating an unknown parameter of interest from a few of its noisy observations. For example, determining the daily temperature in a city, or the depth of a river at a particular spot, are problems that fall into this category. Observations (measurement) are made on data that contain the desired nonrandom parameter θ and undesired noise. Thus, for example, 1 PILLAI

= + Observatio n signal (desired part) noise, (12-1) or, the i th observation can be represented as = θ + = X n , i 1 , 2 , � , n . (12-2) i i Here θ represents the unknown nonrandom desired = n i , i 1 , 2 , � , n parameter, and represent random variables that may be dependent or independent from observation to observation. Given n observations = = = X x , X x , � , X x , 1 1 2 2 n n the estimation problem is to obtain the “best” estimator for the unknown parameter θ in terms of these observations. ˆ X Let us denote by the estimator for θ . Obviously is ˆ X θ θ ( ) ( ) a function of only the observations. “Best estimator” in what sense? Various optimization strategies can be used to define the term “best”. 2 PILLAI

ˆ X Ideal solution would be when the estimate coincides θ ( ) with the unknown θ . This of course may not be possible, and almost always any estimate will result in an error given by (12-3) ˆ = θ − θ e ( X ) . ˆ X One strategy would be to select the estimator so as to θ ( ) minimize some function of this error - such as - minimization of the mean square error (MMSE), or minimization of the absolute value of the error etc. A more fundamental approach is that of the principle of Maximum Likelihood (ML). The underlying assumption in any estimation problem is 3 PILLAI

that the available data has something to do X , X , � , X 1 2 n with the unknown parameter θ . More precisely, we assume that the joint p.d.f of given by θ X , X , � , X f ( x , x , � , x ; ) 1 2 n X 1 2 n depends on θ . The method of maximum likelihood assumes that the given sample data set is representative of the population and chooses that value for θ θ � f ( x , x , , x ; ), X 1 2 n that most likely caused the observed data to occur, i.e., once observations are given, is a θ x , x , � , x f ( x , x , � , x ; ) 1 2 n X 1 2 n function of θ alone, and the value of θ that maximizes the above p.d.f is the most likely value for θ , and it is chosen as the ML estimate for θ (Fig. 12.1). ˆ θ ( X ) ML θ f ( x , x , � , x ; ) X 1 2 n θ 4 ˆ θ ( X ) Fig. 12.1 ML PILLAI

θ Given the joint p.d.f = = = f ( x , x , � , x ; ) X x , X x , � , X x , X 1 2 n 1 1 2 2 n n represents the likelihood function, and the ML estimate can be determined either from the likelihood equation θ � sup f ( x , x , , x ; ) (12-4) X 1 2 n ˆ θ ML or using the log-likelihood function (sup in (12-4) represents the supremum operation) ∆ θ = θ (12-5) L ( x , x , � , x ; ) log f ( x , x , � , x ; ). 1 2 n X 1 2 n ˆ If is differentiable and a supremum θ θ L ( x , x , � , x ; ) 1 2 n ML exists in (12-5), then that must satisfy the equation ∂ θ log f ( x , x , � , x ; ) = X 1 2 n 0 . (12-6) ∂ θ ˆ θ = θ ML We will illustrate the above procedure through several examples: 5 PILLAI

Example 12.1: Let represent n = θ + = → X w , i 1 n , i i observations where θ is the unknown parameter of interest, = → and are zero mean independent normal r.vs with w i , i 1 n , common variance Determine the ML estimate for θ . σ 2 . Solution: Since are independent r.vs and θ is an unknown w i constant, we have s are independent normal random X i variables. Thus the likelihood function takes the form n ∏ θ = θ (12-7) � f ( x , x , , x ; ) f ( x ; ) . X 1 2 n X i i = i 1 Moreover, each is Gaussian with mean θ and variance σ 2 X i (Why?). Thus 1 2 2 − − θ σ θ = ( x ) / 2 (12-8) f ( x ; ) e . i X i πσ 2 i 2 Substituting (12-8) into (12-7) we get the likelihood function to be 6 PILLAI

n ∑ 2 2 − − θ σ 1 ( x ) / 2 i θ = (12-9) f ( x , x , � , x ; ) e . = i 1 X 1 2 n πσ 2 n / 2 ( 2 ) It is easier to work with the log-likelihood function θ L ( X ; ) in this case. From (12-9) − θ 2 n n ( x ) ∑ (12-10) θ = θ = πσ 2 − L ( X ; ) ln f ( x , x , � , x ; ) ln( 2 ) i , X 1 2 n σ 2 2 2 = i 1 and taking derivative with respect to θ as in (12-6), we get ∂ θ − θ � n ln f ( x , x , , x ; ) ( x ) ∑ (12-11) = = X 1 2 n 2 i 0 , ∂ θ σ 2 2 ˆ = ˆ θ = θ i 1 θ = θ ML ML or n 1 ∑ ˆ (12-12) θ = ( X ) X . ML i n = i 1 Thus (12-12) represents the ML estimate for θ , which happens to be a linear estimator (linear function of the data) in this case. 7 PILLAI

Notice that the estimator is a r.v. Taking its expected value, we get n 1 (12-13) ∑ ˆ θ = = θ E [ ( x )] E ( X ) , ML i n = i 1 i.e., the expected value of the estimator does not differ from the desired parameter, and hence there is no bias between the two. Such estimators are known as unbiased estimators. Thus (12-12) represents an unbiased estimator for θ . Moreover the variance of the estimator is given by   2     1 n ∑ ˆ ˆ θ = θ − θ = − θ 2   Var ( ) E [( ) ] E X   ML ML i 2 n       = i 1   n n n 1 ∑ ∑ ∑ = − θ + − θ − θ 2 E ( X ) E ( X )( X ) .   i i j 2 n   = = = ≠ i 1 i 1 j 1 , i j The later terms are zeros since and are independent X X i j r.vs. 8 PILLAI

Then σ σ 2 2 n 1 n ∑ ˆ θ = = = Var ( ) Var ( X ) . (12-14) ML i 2 2 n n n = i 1 Thus ˆ θ → → ∞ (12-15) Var ( ) 0 as n , ML another desired property. We say such estimators (that satisfy (12-15)) are consistent estimators. Next two examples show that ML estimator can be highly nonlinear. Example 12.2: Let be independent, identically � X , X , , X 1 2 n distributed uniform random variables in the interval θ ( 0 , ) with common p.d.f 1 θ = < < θ f ( x ; ) , 0 x , (12-16) X i i θ i 9 PILLAI

where θ is an unknown parameter. Find the ML estimate for θ . Solution: The likelihood function in this case is given by 1 = = = θ = < ≤ θ = → � f ( X x , X x , , X x ; ) , 0 x , i 1 n X 1 1 2 2 n n i θ n 1 (12-17) = ≤ ≤ θ , 0 max( x , x , � , x ) . 1 2 n θ n From (12-17), the likelihood function in this case is maximized by the minimum value of θ , and since we get θ ≥ max( X , X , � , X ), 1 2 n ˆ (12-18) θ = ( X ) max( X , X , � , X ) ML 1 2 n to be the ML estimate for θ . Notice that (18) represents a nonlinear function of the observations. To determine whether (12-18) represents an unbiased estimate for θ , we need to evaluate its mean. To accomplish that in this case, it 10 is easier to determine its p.d.f and proceed directly. Let PILLAI

= Z max( X , X , � , X ) (12-19) 1 2 n with as in (12-16). Then X i = ≤ = ≤ ≤ ≤ F ( z ) P [max( X , X , � , X ) z ] P ( X z , X z , � , X z ) Z 1 2 n 1 2 n n n n  z  ∏ ∏ = ≤ = = < < θ P ( X z ) F ( z )   , 0 z , (12-20) i X θ   i = = i 1 i 1 so that  − n 1 nz  (12-21) < < θ , 0 z , = f ( z )  θ n Z  0 , otherwise .  Using (12-21), we get θ + θ n 1 n n θ θ ˆ ∫ ∫ (12-22) θ = = = = = n E [ ( X )] E ( Z ) z f ( z ) dz z dz . ML Z θ n + θ n + n 1 ( 1 1 / n ) 0 0 In this case and hence the ML estimator is not ˆ θ ≠ θ E [ ( X )] , ML an unbiased estimator for θ . However, from (12-22) as → ∞ n 11 PILLAI

12. Principles of Parameter Estimation The purpose of this lecture - PowerPoint PPT Presentation

12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in earlier lectures to practical problems of interest. In this context, consider the problem of

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Risk-parameter estimation in volatility models Christian Francq Jean-Michel Zakoan CREST and

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

A Maximum A-Posteriori Based Algorithm for Dynamic Load Model Parameter Estimation Siming Guo and

Parameter estimation methods for fault detection and isolation LAAS-CNRS UPC Teresa Escobet

DANIEL M. ROY UNIVERSITY OF CAMBRIDGE Joint work with Nate Ackerman (Harvard) Jeremy Avigad (CMU)

BLOG: Probabilistic Models with Unknown Objects Milch et. al. 2005 574 Presentation - Brian

Exploring the Landscape of Spa5al Robustness Logan Engstrom (with Brandon Tran*, Dimitris

Quaternary ammonium sophorolipids as renewable based antimicrobial products E.I.P. Delbeke 1 ,

Introduction to Survey Statistics Day 2 Sampling and Weighting Federico Vegetti Central

Divergence of the non-radom fluctuation in First-passage percolation Shuta Nakajima (Nagoya

random stepped surfaces Richard Kenyon (Yale) <latexit

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

12. Principles of Parameter Estimation The purpose of this lecture - PowerPoint PPT Presentation

12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in earlier lectures to practical problems of interest. In this context, consider the problem of

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Risk-parameter estimation in volatility models Christian Francq Jean-Michel Zakoan CREST and

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

A Maximum A-Posteriori Based Algorithm for Dynamic Load Model Parameter Estimation Siming Guo and

Parameter estimation methods for fault detection and isolation LAAS-CNRS UPC Teresa Escobet

DANIEL M. ROY UNIVERSITY OF CAMBRIDGE Joint work with Nate Ackerman (Harvard) Jeremy Avigad (CMU)

BLOG: Probabilistic Models with Unknown Objects Milch et. al. 2005 574 Presentation - Brian

Exploring the Landscape of Spa5al Robustness Logan Engstrom (with Brandon Tran*, Dimitris

Quaternary ammonium sophorolipids as renewable based antimicrobial products E.I.P. Delbeke 1 ,

Introduction to Survey Statistics Day 2 Sampling and Weighting Federico Vegetti Central

Divergence of the non-radom fluctuation in First-passage percolation Shuta Nakajima (Nagoya

random stepped surfaces Richard Kenyon (Yale) &lt;latexit

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

random stepped surfaces Richard Kenyon (Yale) <latexit