Applied Statistics Lecturer: Serena Arima Likelihood ML estimator - - PowerPoint PPT Presentation

applied statistics
SMART_READER_LITE
LIVE PREVIEW

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator - - PowerPoint PPT Presentation

Likelihood ML estimator Summaries ML properties LR test Profile likelihood Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR test Profile likelihood Statistical models Statistics concerns


slide-1
SLIDE 1

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Applied Statistics

Lecturer: Serena Arima

slide-2
SLIDE 2

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Statistical models

Statistics concerns what can be learned from data using statistical models to study the variability of the data.

The key feature of a statistical model is that variability is represented using probability distributions, which form the building-blocks from which the model is constructed. The key idea in statistical modelling is to treat the data as the outcome of a random experiment.

slide-3
SLIDE 3

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Statistical models

Statistics concerns what can be learned from data using statistical models to study the variability of the data.

The key feature of a statistical model is that variability is represented using probability distributions, which form the building-blocks from which the model is constructed. The key idea in statistical modelling is to treat the data as the outcome of a random experiment.

slide-4
SLIDE 4

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Statistical models

Statistics concerns what can be learned from data using statistical models to study the variability of the data.

The key feature of a statistical model is that variability is represented using probability distributions, which form the building-blocks from which the model is constructed. The key idea in statistical modelling is to treat the data as the outcome of a random experiment.

slide-5
SLIDE 5

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Suppose we have observed the value y of a random variable Y whose probability density function is supposed known up to the value of a parameter θ. We write

f (y; θ)

to emphasize that the density is function of both data y and parameter θ. θ ∈ Θ parameter space Y ∈ Y sample space

slide-6
SLIDE 6

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Suppose we have observed the value y of a random variable Y whose probability density function is supposed known up to the value of a parameter θ. We write

f (y; θ)

to emphasize that the density is function of both data y and parameter θ. θ ∈ Θ parameter space Y ∈ Y sample space

slide-7
SLIDE 7

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Our goal is to make statements about the distribution of Y , based

  • n the observed value y (make inference about θ).

A fundamental tool is the likelihood for θ based on y, which is defined as L(θ) = f (y; θ) θ ∈ Θ regarded as a function of θ for fixed y.

When Y is discrete we use f (y; θ) = Pr(Y = y; θ).

slide-8
SLIDE 8

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.

1 What is the parameter space?

θ ∈ Θ = [0, 1]

2 What is the sample space? (heads obtained tossing the coin

10 times) y ∈ {0, 1, 2, ..., 10}

3 Which random variable does represent the experiment?

Y ∼ Binomial(10, θ)

slide-9
SLIDE 9

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.

1 What is the parameter space?

θ ∈ Θ = [0, 1]

2 What is the sample space? (heads obtained tossing the coin

10 times) y ∈ {0, 1, 2, ..., 10}

3 Which random variable does represent the experiment?

Y ∼ Binomial(10, θ)

slide-10
SLIDE 10

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.

1 What is the parameter space?

θ ∈ Θ = [0, 1]

2 What is the sample space? (heads obtained tossing the coin

10 times) y ∈ {0, 1, 2, ..., 10}

3 Which random variable does represent the experiment?

Y ∼ Binomial(10, θ)

slide-11
SLIDE 11

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.

1 What is the parameter space?

θ ∈ Θ = [0, 1]

2 What is the sample space? (heads obtained tossing the coin

10 times) y ∈ {0, 1, 2, ..., 10}

3 Which random variable does represent the experiment?

Y ∼ Binomial(10, θ)

slide-12
SLIDE 12

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.

1 What is the parameter space?

θ ∈ Θ = [0, 1]

2 What is the sample space? (heads obtained tossing the coin

10 times) y ∈ {0, 1, 2, ..., 10}

3 Which random variable does represent the experiment?

Y ∼ Binomial(10, θ)

slide-13
SLIDE 13

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.

1 What is the parameter space?

θ ∈ Θ = [0, 1]

2 What is the sample space? (heads obtained tossing the coin

10 times) y ∈ {0, 1, 2, ..., 10}

3 Which random variable does represent the experiment?

Y ∼ Binomial(10, θ)

slide-14
SLIDE 14

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.

1 What is the parameter space?

θ ∈ Θ = [0, 1]

2 What is the sample space? (heads obtained tossing the coin

10 times) y ∈ {0, 1, 2, ..., 10}

3 Which random variable does represent the experiment?

Y ∼ Binomial(10, θ)

slide-15
SLIDE 15

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

The likelihood is f (y; θ) = 10 y

  • θy(1 − θ)10−y

Suppose that the experiment leads to Y = y = 7:

For θ = 0.6, L(θ; y) = 0.215 For θ = 0.5, L(θ; y) = 0.117 Hence θ = 0.6 is more likely than θ = 0.5; more precisely, θ = 0.6 is 0.215 0.117 = 1.838 times more likely that θ = 0.5. What is the most likely value?

Likelihood value for Bin(n=10,y=7) Theta Likelihood 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.0 0.1 0.2

slide-16
SLIDE 16

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

When y = (y1, y2, ..., yn) is a collection of independent

  • bservations the likelihood is

L(θ) =

n

  • j=1

f (yj; θ).

slide-17
SLIDE 17

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Normal example: we observe n replicates (X1, X2, ..., Xn) from a Normal random variable Xi ∼ N(µ, σ2).

The likelihood of a Normal experiment is L(y; θ = (µ, σ2)) = 1 (2π)n/2(σ2)n/2exp

  • − n

2σ2(s2 + (¯ x − µ)2)

  • Proof (blackboard).
slide-18
SLIDE 18

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Suppose that n = 10 and σ2 = 4 known and fixed. Hence L(y; θ = (µ, σ2)) = L(y, σ2; θ = µ). For a sample z = (6.38, 1.39, 5.67, 3.26, 1.96, 3.73, −0.32, 0.54, 2.53, 5.40) the likelihood is

Likelihood for sigma=4 fixed µ L(µ) −2 −1 1 2 3 4 5 6 7 8 1.66e−48 5e−48 8.33e−48

slide-19
SLIDE 19

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Suppose now that n = 10 and µ = 3 known and fixed. Hence L(y; θ = (µ, σ2)) = L(y, µ; θ = σ2). For the same sample the likelihood is

Likelihood for mu=3 fixed σ2 L(µ) 0.8 1.6 2.4 3.2 4 4.8 5.6 6.4 7.2 8 2e−09 4e−09 6e−09 8e−09 1e−08 1.2e−08

slide-20
SLIDE 20

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

When both parameters are uknown, L(y; θ = (µ, σ2)) the likelihood is

mu sigma^2 L ( m u , s i g m a ^ 2 ) Likelihood for both mu and sigma unknown

slide-21
SLIDE 21

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Example 1: Exponential distribution Let y1, ., , , yn be a random sample from the exponential density f (y; θ) = θ−1e−y/θ (y > 0, θ > 0) . The parameter space is Θ = R+ and the sample space is the Cartesian product Rn

+.

The likelihood is L(θ) =

n

  • i=1

θ−1e−yi/θ = θ−nexp

  • −1

θ

n

  • i=1

yi

slide-22
SLIDE 22

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Example 2: Cauchy distribution The Cauchy distribution centered at θ is f (y; θ) = 1 π[1 + (y − θ)2] where y ∈ R and θ ∈ R. The likelihood for a random sample y1, ..., yn is L(θ) =

n

  • i=1

1 π[1 + (yi − θ)2] The sample space is Rn and the parameter space is R.

slide-23
SLIDE 23

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

Example 3: Binomial experiment Consider a random variable R ∼ Binom(m, π) Pr(R = r) = m! r!(m − r)!πr(1 − π)(m−r) Suppose π depends on a variable x1 through the relation π = exp(β0 + β1x1) 1 + exp(β0 + β1x1).

slide-24
SLIDE 24

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood

If Ri are independent, the likelihood for the random sample is L(β0, β1) =

n

  • i=1

Pr(Ri = ri; β0, β1) = =

n

  • i=1

m! ri!(m − ri)! exp(β0 n

i=1 ri + β1

n

i=1 rix1i)

n

i=1(1 + exp(β0 + β1x1i)m

slide-25
SLIDE 25

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the likelihood function

1 L(θ) is defined up to a constant; 2 L(θ) is NOT a probability distribution; 3 L(θ) is invariant to known transformation of the data.

slide-26
SLIDE 26

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the likelihood function

1 L(θ) is defined up to a constant;

For the Normal example with known σ2 the likelihood depends only

  • n µ (and the data) so

L(y, σ2; θ = µ) ∝ exp

  • − n

2σ2 (s2 + (¯ x − µ)2)

slide-27
SLIDE 27

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the likelihood function

2 L(θ) is NOT a probability distribution;

If you integrate the likelihood expression, you do not necessarily

  • btain 1!
slide-28
SLIDE 28

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the likelihood function

3 L(θ) is invariant to known transformation of the data.

Suppose Z is a known 1-1 transformation of Y. The probability density function of Z is fZ(z; θ) = fY (y; θ)

  • dy

dz

  • where
  • dy

dz

  • is the so-called Jacobian.

Since the Jacobian does not depend on θ, the likelihood based on z equals that based on y.

slide-29
SLIDE 29

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Relative likelihood

When the value that maximizes the likelihood is finite, we define the relative likelihood of θ to be RL(θ) = L(θ) maxθ′L(θ′) RL(θ) ∈ [0, 1]; log(RL(θ)) ∈ (−∞, 0)

slide-30
SLIDE 30

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Relative likelihood

When there is a particular parametric model for a set of data, likelihood provides a natural basis for assessing the plausibility of different parameter values, but how should be interpreted? One viewpoint is that values of θ can be compared using a scale 1 ≥ RL(θ) > 1

3

θ strongly supported

1 3 ≥ RL(θ) > 1 10

θ supported

1 10 ≥ RL(θ) > 1 100

θ weakly supported

1 100 ≥ RL(θ) > 1 1000

θ poorly supported

1 100 ≥ RL(θ) > 0

θ very poorly supported Under this pure likelihood approach, values of θ are compared solely in terms of relative likelihoods. The values are arbitrary and they do not take into account the dimension of θ. That’s why this interpretation is not common in practice.

slide-31
SLIDE 31

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Maximum Likelihood estimator

The maximum likelihood (ML) estimate of θ, ˆ θ is a value of θ that maximizes the likelihood L(θ) or equivalently the log-likelihood l(θ) = log(L(θ)). The estimate ˆ θ often satisfies the likelihood equation dl(ˆ θ) θ = 0

The score function is defined as U(Y ; θ) = dl(θ)

dθ . We check that ˆ

θ gives a local maximum verifying that −d2l(ˆ θ) dθ2 > 0.

slide-32
SLIDE 32

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Maximum Likelihood estimator

If θ is a p × 1 vector, we have p likelihood equations to be solved simultaneously. We check that ˆ θ gives a local maximum verifying that J(θ) = − d2l(ˆ θ) dθdθT > 0 The quantity J(θ) is called observed information.

slide-33
SLIDE 33

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Expected and observed information

In a model with log-likelihood l(θ), the observed information is defined as J(θ) = −d2l(θ) dθ2 . When θ is a p × 1 vector, the observed information is defined as J(θ) = − d2l(θ) dθdθT > 0 whose (r, s) element is − d2l(θ)

dθrdθs .

slide-34
SLIDE 34

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Expected and observed information

Before the experiment is performed, we have no data, so we cannot

  • btain the observed information. However we can calculate the

expected or Fisher information (from y to Y ) as follows I(θ) = E

  • −d2l(θ)

dθ2

  • When θ is a p × 1 vector, the expected information matrix is

I(θ) = E

  • − d2l(θ)

dθdθT

  • whose (r, s) element is −E
  • d2l(θ)

dθrdθs

  • .
slide-35
SLIDE 35

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Expected and observed information

The observed and expected information for a normal random variable with mean µ and variance σ2 are J(µ, σ2) =

  • n

σ2

− n

σ4 (¯

y − µ) − n

σ4 (¯

y − µ) − n

2σ4 + 1 σ6

n

i=1(yi − µ)2

  • and

I(µ, σ2) = n

σ2 n 2σ4

  • Exercise: X1, ..., Xn ∼ Exponential(θ); find the ML estimate of θ.
slide-36
SLIDE 36

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Some properties

Proposition 1. Under general conditions Eθ[U(Y ; θ)] = 0 Var[U(Y ; θ)] = −Eθ[ d2

dθ2l(θ)]

Proof (blackboard)

slide-37
SLIDE 37

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Some properties

Proposition 1. Under general conditions J(θ) n

p

→ I1(θ) where I1(θ) is the expected information associated with a sample of dimension 1.

Proof (blackboard)

slide-38
SLIDE 38

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Summaries: quadratic approximation

In a problem with one or two parameters, the likelihood can be

  • visualized. However, in realistic models we have a few dozen of

parameters and we need to summarize the likelihood.

In regular situations, we can approximate the relative likelihood with a Gaussian density. When θ is scalar, we can write the log relative likelihood as logRL(θ) = l(θ) − l(ˆ θ) where ˆ θ is the ML estimate.

slide-39
SLIDE 39

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Summaries: quadratic approximation

Expanding l(θ) in a Taylor series about ˆ θ we get l(θ) = l(ˆ θ) + (θ − ˆ θ)l′(ˆ θ) + 1 2(θ − ˆ θ)2l′′(ˆ θ) = = l(ˆ θ) + U(ˆ θ; y)(θ − ˆ θ) − 1 2J(ˆ θ)(θ − ˆ θ)2 = ≈ l(ˆ θ) − n 2I1(ˆ θ)(θ − ˆ θ)2

slide-40
SLIDE 40

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Summaries: quadratic approximation

Hence we have that l(θ) = log L(θ) L(ˆ θ) ≈ −n 2I1(ˆ θ)(θ − ˆ θ)2. and the relative likelihood can be approximated with Normal density with mean ˆ θ and variance (nI1(ˆ θ))−1, that is LR(θ) ≈ exp

  • −n

2I1(ˆ θ)(θ − ˆ θ)2

slide-41
SLIDE 41

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Summaries: quadratic approximation

Example: Poisson Likelihood.

1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0

Exact and approximated likelihood for a Poisson density

θ L(θ)

Exact RL Approximated RL

slide-42
SLIDE 42

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Summaries: sufficent statistics

The likelihood often depends on the data through some low-dimensional function s(y) of the yj and then a suitable summary can be given in terms of this. If we believe that our model is correct, we need only these functions to calculate the likelihood for any value of θ.

Suppose we have observed data y generated by a distribution f (y; θ) and that the statistics s(y) is a function of y such that fY |S(y|s; θ) does not depend on θ. Then S is said to be a sufficient statistics for θ.

slide-43
SLIDE 43

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Summaries: sufficent statistics

Factorization criterion A necessary and sufficient condition for statistics S to be a sufficient statistics for a parameter θ in a family of probability density functions f (y; θ) is that the density of Y can be expressed as f (y; θ) = g(s(y); θ)h(y) where h(y) does not depend on θ.

slide-44
SLIDE 44

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Summaries: sufficent statistics

Examples: Bernoulli distribution: S = Yj; Exponential distribution: S = Y or s = ¯ Y ; Normal distribution: S = ( ¯ Y , s2); ...

slide-45
SLIDE 45

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the ML estimator

Provided that the likelihood function is correctly specified, it can be shown under weak regularity conditions that:

1 The ML estimator is consistent for θ (ˆ

θ

p

→ θ);

2 The ML estimator is asymptotically efficient (that is,

asymptotically the ML estimator has the smallest variance among all consistent asymptotically normal estimators);

3 The ML estimator is asymptotically normally distributed

according to √n(ˆ θ − θ) → N(0, V ) where V is the asymptotic covariance matrix.

slide-46
SLIDE 46

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the ML estimator

Provided that the likelihood function is correctly specified, it can be shown under weak regularity conditions that:

1 The ML estimator is consistent for θ (ˆ

θ

p

→ θ);

2 The ML estimator is asymptotically efficient (that is,

asymptotically the ML estimator has the smallest variance among all consistent asymptotically normal estimators);

3 The ML estimator is asymptotically normally distributed

according to √n(ˆ θ − θ) → N(0, V ) where V is the asymptotic covariance matrix.

slide-47
SLIDE 47

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the ML estimator

Provided that the likelihood function is correctly specified, it can be shown under weak regularity conditions that:

1 The ML estimator is consistent for θ (ˆ

θ

p

→ θ);

2 The ML estimator is asymptotically efficient (that is,

asymptotically the ML estimator has the smallest variance among all consistent asymptotically normal estimators);

3 The ML estimator is asymptotically normally distributed

according to √n(ˆ θ − θ) → N(0, V ) where V is the asymptotic covariance matrix.

slide-48
SLIDE 48

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the ML estimator: example

Consider random samples of size n = 10 from the exponential distribution with true mean θ0 = 1. The ML estimate is ˆ θ = ¯ y I(θ) = n θ2 .

1 Sample n = 10 from an Exp(1); 2 Compute the ML estimate; 3 Repeat Step 1 and Step 2 5000 times; 4 Make an histogram of the values;

slide-49
SLIDE 49

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the ML estimator: example

Theorem 2

θ PDF 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2

slide-50
SLIDE 50

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the ML estimator

The covariance matrix V is defined as V = I(θ)−1 where I is the Fisher information matrix defined as I(θ) = E

  • −d2l(θ)

dθdθ′

  • Loosely speaking this matrix summarizes the expected amount of

information about θ contained in the sample.

slide-51
SLIDE 51

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the ML estimator

Given the asymptotic efficiency of the ML estimator, the inverse of the information matrix I(θ)−1 provides a lower bound on the asymptotic covariance matrix for any consistent asymptotically normal estimator of θ. The ML estimator is asymptotically efficient because attains this bound, often referred to Cramer-Rao lower bound

slide-52
SLIDE 52

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Properties of the ML estimator

In practice V can be estimated consistently as follows ˆ V =

  • −1

n

n

  • i=1

d2li(θ) dθdθ′ |ˆ

θ

−1 where we take derivatives first and in the result replace the unknown θ with ˆ θ.

slide-53
SLIDE 53

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Inferential use of the ML estimates

The main use of this approximation is to construct confidence regions for θ and tesing hypothesis. Scalar parameter If θ is scalar, the (1 − 2α) confidence interval for θ0 is (ˆ θ − z1−αI(ˆ θ)−1/2; ˆ θ + zαI(ˆ θ)−1/2) The corresponding interval using the observed information J(ˆ θ) is (ˆ θ − z1−αJ(ˆ θ)−1/2; ˆ θ + zαJ(ˆ θ)−1/2) is easier to calculate because it requires no expectations and moreover its coverage probability is often closer to the nominal

  • level. Both intervals are symmetric about ˆ

θ.

slide-54
SLIDE 54

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Inferential use of the ML estimates

Vector parameter When θ is a vector, confidence sets for the r-th element of θ, θr, may be based on the fact that the corresponding ML estimator, ˆ θr has approximately N(θr, vrr). This gives intervals as for the scalar parameter but with ˆ θ, I(ˆ θ), J(ˆ θ) replaced by ˆ θr, vrr and the (r, r) element of J(ˆ θ).

slide-55
SLIDE 55

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Inferential use of the ML estimates: example

For the Normal distribution we know that I(θ) = diag(n/σ2, n/2σ4) Hence the (1 − 2α) confidence intervals for µ and σ2 based on the large sample results are ¯ y ± n−1/2ˆ σzα and

  • σ2 ± (2/n)1/2

σ2zα The asymptotic approximation gives an interval with the same form as the exact interval (¯ y ± n−1/2stn−1

(α) ) but with s replaced by ˆ

σ2 and the t quantile replaced by the corresponding normal quantile.

slide-56
SLIDE 56

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Inferential use of the ML estimates: example

For the Normal distribution we know that I(θ) = diag(n/σ2, n/2σ4) Hence the (1 − 2α) confidence intervals for µ and σ2 based on the large sample results are ¯ y ± n−1/2ˆ σzα and

  • σ2 ± (2/n)1/2

σ2zα The asymptotic approximation gives an interval with the same form as the exact interval (¯ y ± n−1/2stn−1

(α) ) but with s replaced by ˆ

σ2 and the t quantile replaced by the corresponding normal quantile.

slide-57
SLIDE 57

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood Ratio Statistic

Suppose our model is determined by a parameter θ (p × 1) whose true value is θ0 and ML estimate is ˆ θ.

Provided the conditions for asymptotic normality of the ML estimator, in large samples the likelihood ratio statistic W (θ0) = −2logRL(θ0) = 2[l(ˆ θ) − l(θ0)]

D

→ χ2

p

When θ is scalar, W (θ0) D → χ2

1.

slide-58
SLIDE 58

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Likelihood Ratio Statistic

It is widely used for hypothesis testing. We want to test H0 : θ = θ0 H0 : θ = θ0 we can approximate W as W = nI(ˆ θ)(θ0 − ˆ θ)2 that under H0 is χ2

  • 1. The statistic W is usually called Wald
  • statistic. Usign W we can compute confidence intervals and

p−values. Example: exponential distribution

slide-59
SLIDE 59

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Profile likelihood

Until now we have treated all elements of θ equally, but in practice some are more important than other. We write θT = (ψT, λT) (p × 1, q × 1) If our focus is on ψ (that is we want to built confidence intervals of ψ, p-values or hypothesis testing on ψ), we say that

The vector ψ is the parameter of interest and λ is the vector

  • f nuisance parameters.

We would like to eliminate λ and make inference about ψ.

slide-60
SLIDE 60

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Profile likelihood

Until now we have treated all elements of θ equally, but in practice some are more important than other. We write θT = (ψT, λT) (p × 1, q × 1) If our focus is on ψ (that is we want to built confidence intervals of ψ, p-values or hypothesis testing on ψ), we say that

The vector ψ is the parameter of interest and λ is the vector

  • f nuisance parameters.

We would like to eliminate λ and make inference about ψ.

slide-61
SLIDE 61

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Profile likelihood

We say that two models are nested if one reduces to the other when certain parameters are fixed. Thus a model with parameters (ψ0, λ) is nested within the more general model (ψ, λ). A natural statistic with which to compare two nested models is Wp(ψ0) = 2[l( ˆ ψ, ˆ λ) − l(ψ0, ˆ λψ0)] which is called generalized likelihood ratio statistic. It follows that Wp(ψ0)

D

→ χ2

p.

that is even though nuisance parameters are estimated, the likelihood ratio statistic has an approximate chi distribution.

slide-62
SLIDE 62

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Profile likelihood

We say that two models are nested if one reduces to the other when certain parameters are fixed. Thus a model with parameters (ψ0, λ) is nested within the more general model (ψ, λ). A natural statistic with which to compare two nested models is Wp(ψ0) = 2[l( ˆ ψ, ˆ λ) − l(ψ0, ˆ λψ0)] which is called generalized likelihood ratio statistic. It follows that Wp(ψ0)

D

→ χ2

p.

that is even though nuisance parameters are estimated, the likelihood ratio statistic has an approximate chi distribution.

slide-63
SLIDE 63

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Profile likelihood

Often the parameter of interest ψ is scalar or much smaller dimension than λ and we wish to form a confidence interval for ψ0 regardless λ.

To do so, we use the profile log likelihood lp(ψ) = maxλl(ψ, λ) = l(ψ, ˆ λψ) where ˆ λψ is the ML estimate of λ for fixed ψ.

slide-64
SLIDE 64

Likelihood ML estimator Summaries ML properties LR test Profile likelihood

Profile likelihood

In this way, we can form a (1 − 2α) confidence region for ψ0 as

  • ψ : lp(ψ) ≥ lp( ˆ

ψ) − 1 2cp(1 − 2α)

  • Example: Normal distribution (blackboard)