Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Applied Statistics Lecturer: Serena Arima Likelihood ML estimator - - PowerPoint PPT Presentation
Applied Statistics Lecturer: Serena Arima Likelihood ML estimator - - PowerPoint PPT Presentation
Likelihood ML estimator Summaries ML properties LR test Profile likelihood Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR test Profile likelihood Statistical models Statistics concerns
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Statistical models
Statistics concerns what can be learned from data using statistical models to study the variability of the data.
The key feature of a statistical model is that variability is represented using probability distributions, which form the building-blocks from which the model is constructed. The key idea in statistical modelling is to treat the data as the outcome of a random experiment.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Statistical models
Statistics concerns what can be learned from data using statistical models to study the variability of the data.
The key feature of a statistical model is that variability is represented using probability distributions, which form the building-blocks from which the model is constructed. The key idea in statistical modelling is to treat the data as the outcome of a random experiment.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Statistical models
Statistics concerns what can be learned from data using statistical models to study the variability of the data.
The key feature of a statistical model is that variability is represented using probability distributions, which form the building-blocks from which the model is constructed. The key idea in statistical modelling is to treat the data as the outcome of a random experiment.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Suppose we have observed the value y of a random variable Y whose probability density function is supposed known up to the value of a parameter θ. We write
f (y; θ)
to emphasize that the density is function of both data y and parameter θ. θ ∈ Θ parameter space Y ∈ Y sample space
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Suppose we have observed the value y of a random variable Y whose probability density function is supposed known up to the value of a parameter θ. We write
f (y; θ)
to emphasize that the density is function of both data y and parameter θ. θ ∈ Θ parameter space Y ∈ Y sample space
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Our goal is to make statements about the distribution of Y , based
- n the observed value y (make inference about θ).
A fundamental tool is the likelihood for θ based on y, which is defined as L(θ) = f (y; θ) θ ∈ Θ regarded as a function of θ for fixed y.
When Y is discrete we use f (y; θ) = Pr(Y = y; θ).
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.
1 What is the parameter space?
θ ∈ Θ = [0, 1]
2 What is the sample space? (heads obtained tossing the coin
10 times) y ∈ {0, 1, 2, ..., 10}
3 Which random variable does represent the experiment?
Y ∼ Binomial(10, θ)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.
1 What is the parameter space?
θ ∈ Θ = [0, 1]
2 What is the sample space? (heads obtained tossing the coin
10 times) y ∈ {0, 1, 2, ..., 10}
3 Which random variable does represent the experiment?
Y ∼ Binomial(10, θ)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.
1 What is the parameter space?
θ ∈ Θ = [0, 1]
2 What is the sample space? (heads obtained tossing the coin
10 times) y ∈ {0, 1, 2, ..., 10}
3 Which random variable does represent the experiment?
Y ∼ Binomial(10, θ)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.
1 What is the parameter space?
θ ∈ Θ = [0, 1]
2 What is the sample space? (heads obtained tossing the coin
10 times) y ∈ {0, 1, 2, ..., 10}
3 Which random variable does represent the experiment?
Y ∼ Binomial(10, θ)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.
1 What is the parameter space?
θ ∈ Θ = [0, 1]
2 What is the sample space? (heads obtained tossing the coin
10 times) y ∈ {0, 1, 2, ..., 10}
3 Which random variable does represent the experiment?
Y ∼ Binomial(10, θ)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.
1 What is the parameter space?
θ ∈ Θ = [0, 1]
2 What is the sample space? (heads obtained tossing the coin
10 times) y ∈ {0, 1, 2, ..., 10}
3 Which random variable does represent the experiment?
Y ∼ Binomial(10, θ)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Binomial example: a coin gives Head with probability θ and Tail with probability 1 − θ. Suppose to toss the coin n = 10 times.
1 What is the parameter space?
θ ∈ Θ = [0, 1]
2 What is the sample space? (heads obtained tossing the coin
10 times) y ∈ {0, 1, 2, ..., 10}
3 Which random variable does represent the experiment?
Y ∼ Binomial(10, θ)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
The likelihood is f (y; θ) = 10 y
- θy(1 − θ)10−y
Suppose that the experiment leads to Y = y = 7:
For θ = 0.6, L(θ; y) = 0.215 For θ = 0.5, L(θ; y) = 0.117 Hence θ = 0.6 is more likely than θ = 0.5; more precisely, θ = 0.6 is 0.215 0.117 = 1.838 times more likely that θ = 0.5. What is the most likely value?
Likelihood value for Bin(n=10,y=7) Theta Likelihood 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.0 0.1 0.2
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
When y = (y1, y2, ..., yn) is a collection of independent
- bservations the likelihood is
L(θ) =
n
- j=1
f (yj; θ).
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Normal example: we observe n replicates (X1, X2, ..., Xn) from a Normal random variable Xi ∼ N(µ, σ2).
The likelihood of a Normal experiment is L(y; θ = (µ, σ2)) = 1 (2π)n/2(σ2)n/2exp
- − n
2σ2(s2 + (¯ x − µ)2)
- Proof (blackboard).
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Suppose that n = 10 and σ2 = 4 known and fixed. Hence L(y; θ = (µ, σ2)) = L(y, σ2; θ = µ). For a sample z = (6.38, 1.39, 5.67, 3.26, 1.96, 3.73, −0.32, 0.54, 2.53, 5.40) the likelihood is
Likelihood for sigma=4 fixed µ L(µ) −2 −1 1 2 3 4 5 6 7 8 1.66e−48 5e−48 8.33e−48
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Suppose now that n = 10 and µ = 3 known and fixed. Hence L(y; θ = (µ, σ2)) = L(y, µ; θ = σ2). For the same sample the likelihood is
Likelihood for mu=3 fixed σ2 L(µ) 0.8 1.6 2.4 3.2 4 4.8 5.6 6.4 7.2 8 2e−09 4e−09 6e−09 8e−09 1e−08 1.2e−08
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
When both parameters are uknown, L(y; θ = (µ, σ2)) the likelihood is
mu sigma^2 L ( m u , s i g m a ^ 2 ) Likelihood for both mu and sigma unknown
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Example 1: Exponential distribution Let y1, ., , , yn be a random sample from the exponential density f (y; θ) = θ−1e−y/θ (y > 0, θ > 0) . The parameter space is Θ = R+ and the sample space is the Cartesian product Rn
+.
The likelihood is L(θ) =
n
- i=1
θ−1e−yi/θ = θ−nexp
- −1
θ
n
- i=1
yi
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Example 2: Cauchy distribution The Cauchy distribution centered at θ is f (y; θ) = 1 π[1 + (y − θ)2] where y ∈ R and θ ∈ R. The likelihood for a random sample y1, ..., yn is L(θ) =
n
- i=1
1 π[1 + (yi − θ)2] The sample space is Rn and the parameter space is R.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
Example 3: Binomial experiment Consider a random variable R ∼ Binom(m, π) Pr(R = r) = m! r!(m − r)!πr(1 − π)(m−r) Suppose π depends on a variable x1 through the relation π = exp(β0 + β1x1) 1 + exp(β0 + β1x1).
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood
If Ri are independent, the likelihood for the random sample is L(β0, β1) =
n
- i=1
Pr(Ri = ri; β0, β1) = =
n
- i=1
m! ri!(m − ri)! exp(β0 n
i=1 ri + β1
n
i=1 rix1i)
n
i=1(1 + exp(β0 + β1x1i)m
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the likelihood function
1 L(θ) is defined up to a constant; 2 L(θ) is NOT a probability distribution; 3 L(θ) is invariant to known transformation of the data.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the likelihood function
1 L(θ) is defined up to a constant;
For the Normal example with known σ2 the likelihood depends only
- n µ (and the data) so
L(y, σ2; θ = µ) ∝ exp
- − n
2σ2 (s2 + (¯ x − µ)2)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the likelihood function
2 L(θ) is NOT a probability distribution;
If you integrate the likelihood expression, you do not necessarily
- btain 1!
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the likelihood function
3 L(θ) is invariant to known transformation of the data.
Suppose Z is a known 1-1 transformation of Y. The probability density function of Z is fZ(z; θ) = fY (y; θ)
- dy
dz
- where
- dy
dz
- is the so-called Jacobian.
Since the Jacobian does not depend on θ, the likelihood based on z equals that based on y.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Relative likelihood
When the value that maximizes the likelihood is finite, we define the relative likelihood of θ to be RL(θ) = L(θ) maxθ′L(θ′) RL(θ) ∈ [0, 1]; log(RL(θ)) ∈ (−∞, 0)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Relative likelihood
When there is a particular parametric model for a set of data, likelihood provides a natural basis for assessing the plausibility of different parameter values, but how should be interpreted? One viewpoint is that values of θ can be compared using a scale 1 ≥ RL(θ) > 1
3
θ strongly supported
1 3 ≥ RL(θ) > 1 10
θ supported
1 10 ≥ RL(θ) > 1 100
θ weakly supported
1 100 ≥ RL(θ) > 1 1000
θ poorly supported
1 100 ≥ RL(θ) > 0
θ very poorly supported Under this pure likelihood approach, values of θ are compared solely in terms of relative likelihoods. The values are arbitrary and they do not take into account the dimension of θ. That’s why this interpretation is not common in practice.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Maximum Likelihood estimator
The maximum likelihood (ML) estimate of θ, ˆ θ is a value of θ that maximizes the likelihood L(θ) or equivalently the log-likelihood l(θ) = log(L(θ)). The estimate ˆ θ often satisfies the likelihood equation dl(ˆ θ) θ = 0
The score function is defined as U(Y ; θ) = dl(θ)
dθ . We check that ˆ
θ gives a local maximum verifying that −d2l(ˆ θ) dθ2 > 0.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Maximum Likelihood estimator
If θ is a p × 1 vector, we have p likelihood equations to be solved simultaneously. We check that ˆ θ gives a local maximum verifying that J(θ) = − d2l(ˆ θ) dθdθT > 0 The quantity J(θ) is called observed information.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Expected and observed information
In a model with log-likelihood l(θ), the observed information is defined as J(θ) = −d2l(θ) dθ2 . When θ is a p × 1 vector, the observed information is defined as J(θ) = − d2l(θ) dθdθT > 0 whose (r, s) element is − d2l(θ)
dθrdθs .
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Expected and observed information
Before the experiment is performed, we have no data, so we cannot
- btain the observed information. However we can calculate the
expected or Fisher information (from y to Y ) as follows I(θ) = E
- −d2l(θ)
dθ2
- When θ is a p × 1 vector, the expected information matrix is
I(θ) = E
- − d2l(θ)
dθdθT
- whose (r, s) element is −E
- d2l(θ)
dθrdθs
- .
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Expected and observed information
The observed and expected information for a normal random variable with mean µ and variance σ2 are J(µ, σ2) =
- n
σ2
− n
σ4 (¯
y − µ) − n
σ4 (¯
y − µ) − n
2σ4 + 1 σ6
n
i=1(yi − µ)2
- and
I(µ, σ2) = n
σ2 n 2σ4
- Exercise: X1, ..., Xn ∼ Exponential(θ); find the ML estimate of θ.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Some properties
Proposition 1. Under general conditions Eθ[U(Y ; θ)] = 0 Var[U(Y ; θ)] = −Eθ[ d2
dθ2l(θ)]
Proof (blackboard)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Some properties
Proposition 1. Under general conditions J(θ) n
p
→ I1(θ) where I1(θ) is the expected information associated with a sample of dimension 1.
Proof (blackboard)
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Summaries: quadratic approximation
In a problem with one or two parameters, the likelihood can be
- visualized. However, in realistic models we have a few dozen of
parameters and we need to summarize the likelihood.
In regular situations, we can approximate the relative likelihood with a Gaussian density. When θ is scalar, we can write the log relative likelihood as logRL(θ) = l(θ) − l(ˆ θ) where ˆ θ is the ML estimate.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Summaries: quadratic approximation
Expanding l(θ) in a Taylor series about ˆ θ we get l(θ) = l(ˆ θ) + (θ − ˆ θ)l′(ˆ θ) + 1 2(θ − ˆ θ)2l′′(ˆ θ) = = l(ˆ θ) + U(ˆ θ; y)(θ − ˆ θ) − 1 2J(ˆ θ)(θ − ˆ θ)2 = ≈ l(ˆ θ) − n 2I1(ˆ θ)(θ − ˆ θ)2
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Summaries: quadratic approximation
Hence we have that l(θ) = log L(θ) L(ˆ θ) ≈ −n 2I1(ˆ θ)(θ − ˆ θ)2. and the relative likelihood can be approximated with Normal density with mean ˆ θ and variance (nI1(ˆ θ))−1, that is LR(θ) ≈ exp
- −n
2I1(ˆ θ)(θ − ˆ θ)2
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Summaries: quadratic approximation
Example: Poisson Likelihood.
1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0
Exact and approximated likelihood for a Poisson density
θ L(θ)
Exact RL Approximated RL
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Summaries: sufficent statistics
The likelihood often depends on the data through some low-dimensional function s(y) of the yj and then a suitable summary can be given in terms of this. If we believe that our model is correct, we need only these functions to calculate the likelihood for any value of θ.
Suppose we have observed data y generated by a distribution f (y; θ) and that the statistics s(y) is a function of y such that fY |S(y|s; θ) does not depend on θ. Then S is said to be a sufficient statistics for θ.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Summaries: sufficent statistics
Factorization criterion A necessary and sufficient condition for statistics S to be a sufficient statistics for a parameter θ in a family of probability density functions f (y; θ) is that the density of Y can be expressed as f (y; θ) = g(s(y); θ)h(y) where h(y) does not depend on θ.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Summaries: sufficent statistics
Examples: Bernoulli distribution: S = Yj; Exponential distribution: S = Y or s = ¯ Y ; Normal distribution: S = ( ¯ Y , s2); ...
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the ML estimator
Provided that the likelihood function is correctly specified, it can be shown under weak regularity conditions that:
1 The ML estimator is consistent for θ (ˆ
θ
p
→ θ);
2 The ML estimator is asymptotically efficient (that is,
asymptotically the ML estimator has the smallest variance among all consistent asymptotically normal estimators);
3 The ML estimator is asymptotically normally distributed
according to √n(ˆ θ − θ) → N(0, V ) where V is the asymptotic covariance matrix.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the ML estimator
Provided that the likelihood function is correctly specified, it can be shown under weak regularity conditions that:
1 The ML estimator is consistent for θ (ˆ
θ
p
→ θ);
2 The ML estimator is asymptotically efficient (that is,
asymptotically the ML estimator has the smallest variance among all consistent asymptotically normal estimators);
3 The ML estimator is asymptotically normally distributed
according to √n(ˆ θ − θ) → N(0, V ) where V is the asymptotic covariance matrix.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the ML estimator
Provided that the likelihood function is correctly specified, it can be shown under weak regularity conditions that:
1 The ML estimator is consistent for θ (ˆ
θ
p
→ θ);
2 The ML estimator is asymptotically efficient (that is,
asymptotically the ML estimator has the smallest variance among all consistent asymptotically normal estimators);
3 The ML estimator is asymptotically normally distributed
according to √n(ˆ θ − θ) → N(0, V ) where V is the asymptotic covariance matrix.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the ML estimator: example
Consider random samples of size n = 10 from the exponential distribution with true mean θ0 = 1. The ML estimate is ˆ θ = ¯ y I(θ) = n θ2 .
1 Sample n = 10 from an Exp(1); 2 Compute the ML estimate; 3 Repeat Step 1 and Step 2 5000 times; 4 Make an histogram of the values;
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the ML estimator: example
Theorem 2
θ PDF 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the ML estimator
The covariance matrix V is defined as V = I(θ)−1 where I is the Fisher information matrix defined as I(θ) = E
- −d2l(θ)
dθdθ′
- Loosely speaking this matrix summarizes the expected amount of
information about θ contained in the sample.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the ML estimator
Given the asymptotic efficiency of the ML estimator, the inverse of the information matrix I(θ)−1 provides a lower bound on the asymptotic covariance matrix for any consistent asymptotically normal estimator of θ. The ML estimator is asymptotically efficient because attains this bound, often referred to Cramer-Rao lower bound
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Properties of the ML estimator
In practice V can be estimated consistently as follows ˆ V =
- −1
n
n
- i=1
d2li(θ) dθdθ′ |ˆ
θ
−1 where we take derivatives first and in the result replace the unknown θ with ˆ θ.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Inferential use of the ML estimates
The main use of this approximation is to construct confidence regions for θ and tesing hypothesis. Scalar parameter If θ is scalar, the (1 − 2α) confidence interval for θ0 is (ˆ θ − z1−αI(ˆ θ)−1/2; ˆ θ + zαI(ˆ θ)−1/2) The corresponding interval using the observed information J(ˆ θ) is (ˆ θ − z1−αJ(ˆ θ)−1/2; ˆ θ + zαJ(ˆ θ)−1/2) is easier to calculate because it requires no expectations and moreover its coverage probability is often closer to the nominal
- level. Both intervals are symmetric about ˆ
θ.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Inferential use of the ML estimates
Vector parameter When θ is a vector, confidence sets for the r-th element of θ, θr, may be based on the fact that the corresponding ML estimator, ˆ θr has approximately N(θr, vrr). This gives intervals as for the scalar parameter but with ˆ θ, I(ˆ θ), J(ˆ θ) replaced by ˆ θr, vrr and the (r, r) element of J(ˆ θ).
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Inferential use of the ML estimates: example
For the Normal distribution we know that I(θ) = diag(n/σ2, n/2σ4) Hence the (1 − 2α) confidence intervals for µ and σ2 based on the large sample results are ¯ y ± n−1/2ˆ σzα and
- σ2 ± (2/n)1/2
σ2zα The asymptotic approximation gives an interval with the same form as the exact interval (¯ y ± n−1/2stn−1
(α) ) but with s replaced by ˆ
σ2 and the t quantile replaced by the corresponding normal quantile.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Inferential use of the ML estimates: example
For the Normal distribution we know that I(θ) = diag(n/σ2, n/2σ4) Hence the (1 − 2α) confidence intervals for µ and σ2 based on the large sample results are ¯ y ± n−1/2ˆ σzα and
- σ2 ± (2/n)1/2
σ2zα The asymptotic approximation gives an interval with the same form as the exact interval (¯ y ± n−1/2stn−1
(α) ) but with s replaced by ˆ
σ2 and the t quantile replaced by the corresponding normal quantile.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood Ratio Statistic
Suppose our model is determined by a parameter θ (p × 1) whose true value is θ0 and ML estimate is ˆ θ.
Provided the conditions for asymptotic normality of the ML estimator, in large samples the likelihood ratio statistic W (θ0) = −2logRL(θ0) = 2[l(ˆ θ) − l(θ0)]
D
→ χ2
p
When θ is scalar, W (θ0) D → χ2
1.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Likelihood Ratio Statistic
It is widely used for hypothesis testing. We want to test H0 : θ = θ0 H0 : θ = θ0 we can approximate W as W = nI(ˆ θ)(θ0 − ˆ θ)2 that under H0 is χ2
- 1. The statistic W is usually called Wald
- statistic. Usign W we can compute confidence intervals and
p−values. Example: exponential distribution
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Profile likelihood
Until now we have treated all elements of θ equally, but in practice some are more important than other. We write θT = (ψT, λT) (p × 1, q × 1) If our focus is on ψ (that is we want to built confidence intervals of ψ, p-values or hypothesis testing on ψ), we say that
The vector ψ is the parameter of interest and λ is the vector
- f nuisance parameters.
We would like to eliminate λ and make inference about ψ.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Profile likelihood
Until now we have treated all elements of θ equally, but in practice some are more important than other. We write θT = (ψT, λT) (p × 1, q × 1) If our focus is on ψ (that is we want to built confidence intervals of ψ, p-values or hypothesis testing on ψ), we say that
The vector ψ is the parameter of interest and λ is the vector
- f nuisance parameters.
We would like to eliminate λ and make inference about ψ.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Profile likelihood
We say that two models are nested if one reduces to the other when certain parameters are fixed. Thus a model with parameters (ψ0, λ) is nested within the more general model (ψ, λ). A natural statistic with which to compare two nested models is Wp(ψ0) = 2[l( ˆ ψ, ˆ λ) − l(ψ0, ˆ λψ0)] which is called generalized likelihood ratio statistic. It follows that Wp(ψ0)
D
→ χ2
p.
that is even though nuisance parameters are estimated, the likelihood ratio statistic has an approximate chi distribution.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Profile likelihood
We say that two models are nested if one reduces to the other when certain parameters are fixed. Thus a model with parameters (ψ0, λ) is nested within the more general model (ψ, λ). A natural statistic with which to compare two nested models is Wp(ψ0) = 2[l( ˆ ψ, ˆ λ) − l(ψ0, ˆ λψ0)] which is called generalized likelihood ratio statistic. It follows that Wp(ψ0)
D
→ χ2
p.
that is even though nuisance parameters are estimated, the likelihood ratio statistic has an approximate chi distribution.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Profile likelihood
Often the parameter of interest ψ is scalar or much smaller dimension than λ and we wish to form a confidence interval for ψ0 regardless λ.
To do so, we use the profile log likelihood lp(ψ) = maxλl(ψ, λ) = l(ψ, ˆ λψ) where ˆ λψ is the ML estimate of λ for fixed ψ.
Likelihood ML estimator Summaries ML properties LR test Profile likelihood
Profile likelihood
In this way, we can form a (1 − 2α) confidence region for ψ0 as
- ψ : lp(ψ) ≥ lp( ˆ
ψ) − 1 2cp(1 − 2α)
- Example: Normal distribution (blackboard)