12. Principles of Parameter Estimation The purpose of this lecture - - PowerPoint PPT Presentation

12 principles of parameter estimation
SMART_READER_LITE
LIVE PREVIEW

12. Principles of Parameter Estimation The purpose of this lecture - - PowerPoint PPT Presentation

12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in earlier lectures to practical problems of interest. In this context, consider the problem of


slide-1
SLIDE 1

1

  • 12. Principles of Parameter Estimation

The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in earlier lectures to practical problems of interest. In this context, consider the problem of estimating an unknown parameter

  • f interest from a few of its noisy observations. For

example, determining the daily temperature in a city, or the depth of a river at a particular spot, are problems that fall into this category. Observations (measurement) are made on data that contain the desired nonrandom parameter θ and undesired noise. Thus, for example,

PILLAI

slide-2
SLIDE 2

2

(12-1)

noise, part) (desired signal n Observatio + =

  • r, the i th observation can be represented as

Here θ represents the unknown nonrandom desired parameter, and represent random variables that may be dependent or independent from observation to

  • bservation. Given n observations

the estimation problem is to obtain the “best” estimator for the unknown parameter θ in terms of these observations. Let us denote by the estimator for θ . Obviously is a function of only the observations. “Best estimator” in what sense? Various optimization strategies can be used to define the term “best”. . , , 2 , 1 , n i n X

i i

  • =

+ = θ

(12-2)

, , , ,

2 2 1 1 n n

x X x X x X = = =

  • )

( ˆ X θ ) ( ˆ X θ

n i ni , , 2 , 1 ,

  • =

PILLAI

slide-3
SLIDE 3

3

Ideal solution would be when the estimate coincides with the unknown θ . This of course may not be possible, and almost always any estimate will result in an error given by One strategy would be to select the estimator so as to minimize some function of this error - such as - minimization of the mean square error (MMSE), or minimization of the absolute value of the error etc. A more fundamental approach is that of the principle of Maximum Likelihood (ML). The underlying assumption in any estimation problem is

(12-3)

. ) ( ˆ θ θ − = X e

) ( ˆ X θ ) ( ˆ X θ

PILLAI

slide-4
SLIDE 4

4

that the available data has something to do with the unknown parameter θ . More precisely, we assume that the joint p.d.f of given by depends on θ. The method of maximum likelihood assumes that the given sample data set is representative of the population and chooses that value for θ that most likely caused the observed data to occur, i.e., once

  • bservations are given,

is a function of θ alone, and the value of θ that maximizes the above p.d.f is the most likely value for θ , and it is chosen as the ML estimate for θ (Fig. 12.1).

n

X X X , , ,

2 1

  • n

X X X , , ,

2 1

  • )

; , , , (

2 1

θ

n X

x x x f

  • ),

; , , , (

2 1

θ

n X

x x x f

  • n

x x x , , ,

2 1

  • )

; , , , (

2 1

θ

n X

x x x f

  • )

( ˆ X

ML

θ

) ( ˆ X

ML

θ ) ; , , , (

2 1

θ

n X

x x x f

  • θ
  • Fig. 12.1

PILLAI

slide-5
SLIDE 5

5

Given the joint p.d.f represents the likelihood function, and the ML estimate can be determined either from the likelihood equation

  • r using the log-likelihood function (sup in (12-4)

represents the supremum operation) If is differentiable and a supremum exists in (12-5), then that must satisfy the equation We will illustrate the above procedure through several examples:

, , , ,

2 2 1 1 n n

x X x X x X = = =

  • )

; , , , (

2 1

θ

n X

x x x f

  • )

; , , , ( sup

2 1 ˆ

θ

θ n X

x x x f

ML

  • (12-4)

). ; , , , ( log ) ; , , , (

2 1 2 1

θ θ

n X n

x x x f x x x L

  • =

(12-5)

) ; , , , (

2 1

θ

n

x x x L

ML

θ ˆ

. ) ; , , , ( log

ˆ 2 1

= ∂ ∂

=

ML

n X

x x x f

θ θ

θ θ

  • (12-6)

PILLAI

slide-6
SLIDE 6

6

Example 12.1: Let represent n

  • bservations where θ is the unknown parameter of interest,

and are zero mean independent normal r.vs with common variance Determine the ML estimate for θ. Solution: Since are independent r.vs and θ is an unknown constant, we have s are independent normal random

  • variables. Thus the likelihood function takes the form

Moreover, each is Gaussian with mean θ and variance (Why?). Thus Substituting (12-8) into (12-7) we get the likelihood function to be

, 1 , n i w X

i i

→ = + = θ .

2

σ

, 1 , n i wi → =

i

X

i

w . ) ; ( ) ; , , , (

1 2 1

=

=

n i i X n X

x f x x x f

i

θ θ

  • (12-7)

. 2 1 ) ; (

2 2

2 / ) ( 2 σ θ

πσ θ

− −

=

i i

x i X

e x f

i

X

2

σ

(12-8)

PILLAI

slide-7
SLIDE 7

7

. ) 2 ( 1 ) ; , , , (

1 2 2

2 / ) ( 2 / 2 2 1

∑ =

=

− −

n i i

x n n X

e x x x f

σ θ

πσ θ

  • It is easier to work with the log-likelihood function

in this case. From (12-9) and taking derivative with respect to θ as in (12-6), we get

  • r

Thus (12-12) represents the ML estimate for θ , which happens to be a linear estimator (linear function of the data) in this case.

, 2 ) ( 2 ) ; , , , ( ln

ˆ 1 2 ˆ 2 1

= − = ∂ ∂

= = =

ML ML

n i i n X

x x x x f

θ θ θ θ

σ θ θ θ

  • (12-10)

. 1 ) ( ˆ

1

=

=

n i i ML

X n X θ

(12-11) (12-12)

, 2 ) ( ) 2 ln( 2 ) ; , , , ( ln ) ; (

1 2 2 2 2 1

=

− − = =

n i i n X

x n x x x f X L σ θ πσ θ θ

  • )

; ( θ X L

PILLAI

(12-9)

slide-8
SLIDE 8

8

Notice that the estimator is a r.v. Taking its expected value, we get i.e., the expected value of the estimator does not differ from the desired parameter, and hence there is no bias between the two. Such estimators are known as unbiased estimators. Thus (12-12) represents an unbiased estimator for θ . Moreover the variance of the estimator is given by The later terms are zeros since and are independent r.vs.

, ) ( 1 )] ( ˆ [

1

θ θ = =

= n i i ML

X E n x E

(12-13)

. ) )( ( ) ( 1 1 ] ) ˆ [( ) ˆ (

1 , 1 1 2 2 2 1 2 2

      − − + − =                 − = − =

∑ ∑ ∑ ∑

= ≠ = = = n i n j i j j i n i i n i i ML ML

X X E X E n X E n E Var θ θ θ θ θ θ θ

i

X

j

X

PILLAI

slide-9
SLIDE 9

9

Then Thus another desired property. We say such estimators (that satisfy (12-15)) are consistent estimators. Next two examples show that ML estimator can be highly nonlinear. Example 12.2: Let be independent, identically distributed uniform random variables in the interval with common p.d.f

(12-14)

. ) ( 1 ) ˆ (

2 2 2 1 2

n n n X Var n Var

n i i ML

σ σ θ = = =

=

, as ) ˆ ( ∞ → → n Var

ML

θ

n

X X X , , ,

2 1

  • )

, ( θ

(12-15)

, , 1 ) ; ( θ θ θ < < =

i i X

x x f

i

(12-16)

PILLAI

slide-10
SLIDE 10

10

where θ is an unknown parameter. Find the ML estimate for θ. Solution: The likelihood function in this case is given by From (12-17), the likelihood function in this case is maximized by the minimum value of θ , and since we get to be the ML estimate for θ . Notice that (18) represents a nonlinear function of the observations. To determine whether (12-18) represents an unbiased estimate for θ , we need to evaluate its mean. To accomplish that in this case, it is easier to determine its p.d.f and proceed directly. Let

. ) , , , max( , 1 1 , , 1 ) ; , , , (

2 1 2 2 1 1

θ θ θ θ θ ≤ ≤ = → = ≤ < = = = =

n n i n n n X

x x x n i x x X x X x X f

  • (12-17)

), , , , max(

2 1 n

X X X

θ ) , , , max( ) ( ˆ

2 1 n ML

X X X X

  • =

θ

(12-18)

PILLAI

slide-11
SLIDE 11

11

(12-19)

) , , , max(

2 1 n

X X X Z

  • =

with as in (12-16). Then so that Using (12-21), we get In this case and hence the ML estimator is not an unbiased estimator for θ . However, from (12-22) as

i

X

, , ) ( ) ( ) , , , ( ] ) , , , [max( ) (

1 1 2 1 2 1

θ θ < <       = = ≤ = ≤ ≤ ≤ = ≤ =

∏ ∏

= =

z z z F z X P z X z X z X P z X X X P z F

n n i X n i i n n Z

i

  • (12-20)

.

  • therwise

, , , ) (

1

     < < =

θ θ z nz z f

n n Z

(12-21)

. ) / 1 1 ( 1 ) ( ) ( )] ( ˆ [

1

n n n dz z n dz z f z Z E X E

n n n n Z ML

+ = + = = = =

+

∫ ∫

θ θ θ θ θ

θ θ

(12-22)

, )] ( ˆ [ θ θ ≠ X E

ML

∞ → n

PILLAI

slide-12
SLIDE 12

12

(12-23)

, ) / 1 1 ( lim )] ( ˆ [ lim θ θ θ = + =

∞ → ∞ →

n X E

n ML n

i.e., the ML estimator is an asymptotically unbiased

  • estimator. From (12-21), we also get

so that Once again as implying that the estimator in (12-18) is a consistent estimator. Example 12.3: Let be i.i.d Gamma random variables with unknown parameters α and β . Determine the ML estimator for α and β .

2 ) ( ) (

2 1 2 2

+ = = =

∫ ∫

+

n n dz z n dz z f z Z E

n n Z

θ θ

θ θ

(12-24)

. ) 2 ( ) 1 ( ) 1 ( 2 )] ( [ ) ( )] ( ˆ [

2 2 2 2 2 2 2 2

+ + = + − + = − = n n n n n n n Z E Z E X Var

ML

θ θ θ θ )] ( ˆ [ → X Var

ML

θ

, ∞ → n

n

X X X , , ,

2 1

  • (12-25)

PILLAI

slide-13
SLIDE 13

13

Solution: Here and This gives the log-likelihood function to be Differentiating L with respect to α and β we get Thus from (12-29)

, ≥

i

x . )) ( ( ) , ; , , , (

1 1 2 1

1

= − −

∑ Γ =

=

n i x i n n n X

n i i

e x x x x f

β α α

α β β α

  • (12-26)

. log ) 1 ( ) ( log log ) , ; , , , ( log ) , ; , , , (

1 1 2 1 2 1

∑ ∑

= =

−       − + Γ − = =

n i i n i i n X n

x x n n x x x f x x x L β α α β α β α β α

  • (12-27)

, log ) ( ) ( log

ˆ , ˆ , 1

= + Γ′ Γ − = ∂ ∂

= =

β α β α

α α β α

n i i

x n n L

(12-28)

.

ˆ , ˆ , 1

= − = ∂ ∂

= =

β α β α

β α β

n i i

x n L

(12-29)

, 1 ˆ ) ( ˆ

1

=

=

n i i ML ML

x n X α β

(12-30)

PILLAI

slide-14
SLIDE 14

14

and substituting (12-30) into (12-28), it gives Notice that (12-31) is highly nonlinear in In general the (log)-likelihood function can have more than

  • ne solution, or no solutions at all. Further, the (log)-

likelihood function may not be even differentiable, or it can be extremely complicated to solve explicitly (see example 12.3, equation (12-31)). Best Unbiased Estimator: Referring back to example 12.1, we have seen that (12-12) represents an unbiased estimator for θ with variance given by (12-14). It is possible that, for a given n, there may be

  • ther

. 1 1 log ) ˆ ( ) ˆ ( ˆ log

1 1

∑ ∑

= =

−       = Γ Γ′ −

n i i n i i ML ML ML

x n x n α α α . ˆ ML α

(12-31)

PILLAI

slide-15
SLIDE 15

15

unbiased estimators to this problem with even lower

  • variances. If such is indeed the case, those estimators will be

naturally preferable compared to (12-12). In a given scenario, is it possible to determine the lowest possible value for the variance of any unbiased estimator? Fortunately, a theorem by Cramer and Rao (Rao 1945; Cramer 1948) gives a complete answer to this problem. Cramer - Rao Bound: Variance of any unbiased estimator based on observations for θ must satisfy the lower bound This important result states that the right side of (12-32) acts as a lower bound on the variance of all unbiased estimator for θ, provided their joint p.d.f satisfies certain regularity

  • restrictions. (see (8-79)-(8-81), Text).

θ ˆ

n n

x X x X x X = = = , , ,

2 2 1 1

  • .

) ; , , , ( ln 1 ) ; , , , ( ln 1 ) ˆ (

2 2 1 2 2 2 1

        ∂ ∂ − =       ∂ ∂ ≥ θ θ θ θ θ

n X n X

x x x f E x x x f E Var

  • (12-32)

PILLAI

slide-16
SLIDE 16

16

Naturally any unbiased estimator whose variance coincides with that in (12-32), must be the best. There are no better solutions! Such estimates are known as efficient estimators. Let us examine whether (12-12) represents an efficient

  • estimator. Towards this using (12-11)

and and substituting this into the first form on the right side of (12-32), we obtain the Cramer - Rao lower bound for this problem to be

; ) ( 1 ) ; , , , ( ln

2 1 4 2 2 1

      − =       ∂ ∂

= n i i n X

X x x x f θ σ θ θ

  • (12-33)

, 1 )] )( [( ] ) [( 1 ) ; , , , ( ln

2 1 2 4 1 , 1 1 2 4 2 2 1

σ σ σ θ θ θ σ θ θ n X X E X E x x x f E

n i n i n j i j j i n i i n X

= =    − − + − =       ∂ ∂

∑ ∑ ∑ ∑

= = ≠ = =

  • (12-34)

PILLAI

  

slide-17
SLIDE 17

17

.

2

n σ

(12-35)

PILLAI

But from (12-14) the variance of the ML estimator in (12-12) is the same as (12-35), implying that (12-12) indeed represents an efficient estimator in this case, the best of all possibilities! It is possible that in certain cases there are no unbiased estimators that are efficient. In that case, the best estimator will be an unbiased estimator with the lowest possible variance. How does one find such an unbiased estimator? Fortunately Rao-Blackwell theorem (page 335-337, Text) gives a complete answer to this problem. Cramer-Rao bound can be extended to multiparameter case as well (see page 343-345,Text).

slide-18
SLIDE 18

18

So far, we discussed nonrandom parameters that are

  • unknown. What if the parameter of interest is a r.v with

a-priori p.d.f How does one obtain a good estimate for θ based on the observations One technique is to use the observations to compute its a-posteriori probability density function Of course, we can use the Bayes’ theorem in (11.22) to

  • btain this a-posteriori p.d.f. This gives

Notice that (12-36) is only a function of θ , since represent given observations. Once again, we can look for

? ) (θ

θ

f ? , , ,

2 2 1 1 n n

x X x X x X = = =

  • ).

, , , | (

2 1 | n X

x x x f

  • θ

θ

. ) , , , ( ) ( ) | , , , ( ) , , , | (

2 1 2 1 | 2 1 | n X n X n X

x x x f f x x x f x x x f

  • θ

θ θ

θ θ θ

=

(12-36)

n

x x x , , ,

2 1

  • PILLAI
slide-19
SLIDE 19

19

the most probable value of θ suggested by the above a-posteriori p.d.f. Naturally, the most likely value for θ is that corresponding to the maximum of the a-posteriori p.d.f (see Fig. 12.2). This estimator - maximum of the a-posteriori p.d.f is known as the MAP estimator for θ . It is possible to use other optimality criteria as well. Of course, that should be the subject matter of another course!

MAP

θ ˆ ) , , , | (

2 1 n

x x x f

  • θ

θ

  • Fig. 12.2

PILLAI