Lecture 7: Maximum likelihood estimators and estimation topics
Jason Mezey jgm45@cornell.edu
- Feb. 23, 2016 (T) 8:40-9:55
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 - - PowerPoint PPT Presentation
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 7: Maximum likelihood estimators and estimation topics Jason Mezey jgm45@cornell.edu Feb. 23, 2016 (T) 8:40-9:55 Announcements Homework #2 is graded (!!) key will be
Jason Mezey jgm45@cornell.edu
in computer lab)
(Homework #4 available ~week from today)
videos, updated schedule, homework key, new supplemental material, etc.)
discussion of maximum likelihood estimators (MLE’s)
Experiment
Assumptions
. M
e l s
Statistics
Experiment
X = x , Pr(X)
Random Variable
Pr(F)
F
→ [X1 = x1, ..., Xn = xn] , Pr([X1 = x1, ..., Xn = xn])
Sample of size n Sampling Distribution
X(ω), ω ∈ Ω
T(x), P
∈ ˆ θ
Pr(T(X)|θ)
θ ∈ Θ
about
an experiment
reals (Axioms of probability!)
(represents the probability of every sample under given assumptions, e.g., iid):
indexed by constant(s) (=parameter) belonging to probability model “family”
at θ
Ω
F F
distribution (from an assumed family of probability distributions indexed by parameters) on the basis of a sample
experimental trials (= random vector!)
(usually assume i.i.d.!!):
Pr(X1 = x1) = Pr(X2 = x2) = ... = Pr(Xn = xn)
Pr(X = x) = Pr(X1 = x1)Pr(X2 = x2)...Pr(Xn = xn)
specific val e T(x) = ˆ θ.
Pr(T(X))
x = [x1, x2, ..., xn]
Pr([X1, X2, ..., Xn])
T(x) = T([x1, x2, ..., xn]) = t
Pr(ˆ θ)
evidence for being the true value of a parameter
returns a reasonable or “good” estimator of the true parameter value under a variety of conditions
assessing “good” and our underlying assumptions
specific val e T(x) = ˆ θ.
a statistic, and an estimator is just a statistic, there is an underlying probability distribution on an estimator:
to output a single value or a vector of estimates:
parameter for every possible sample (hence no perfect estimator!)
define “bad” estimators (examples?)
θ),
at is a vector valued function on s T(X = x) = ˆ θ = h ˆ θ1, ˆ θ2, ... i .
which is one class of estimators, in this course
function on a sample!!),
statistic!
but just keep this big picture in mind
fixed constants in the formula called parameters
inputs produce different outputs
parameters for a fixed sample
functions, e.g. they are functions of parameters and they are used for estimation
parameter (that determines the probability of the values of X and T(X)!) is a specific value
Pr(X = x|µ, σ2) = fX(x|µ, σ2) = 1 √ 2πσ2 e− (x−µ)2
2σ2
⇧ ⇤ Pr(µ, σ2|X = x) = L(µ, σ2|X = x) = 1 √ 2πσ2 e− (x−µ)2
2σ2
(marginal) probability of a single observation in our sample is xi is:
multivariate (n-variate) normal distribution
to write pdf of this entire sample as follow:
Pr(X = x) = Pr(X1 = x1)Pr(X2 = x2)...Pr(Xn = xn)
L(µ, σ2|X = x) =
n
Y
i=1
1 √ 2πσ2 e
−(xi−µ)2 2σ2
Pr(Xi = xi|µ, σ2) = fXi(xi|µ, σ2) = 1 √ 2πσ2 e− (xi−µ)2
2σ2
⇧ ⇤
P(X = x|µ, σ2) =
n
⇤
i=1
1 √ 2πσ2 e
−(xi−µ)2 2σ2
plot where the x and y axes are and and the z-axis is the likelihood:
marginal likelihood for when using the sample above:
⇤ σ2 = 1
σ2 x|µ,
⇤ Xi ∼ N(µ = 0, σ2 = 1)
L(µ, σ2|X = x) =
n
Y
i=1
1 √ 2πσ2 e
−(xi−µ)2 2σ2
is our estimator (!!)
MLE(ˆ σ2) = 1 n
n
X
i
(xi − x)2 MLE(ˆ θ) = ˆ θ = argmaxθ∈ΘL(θ|x)
⇥ MLE(ˆ µ) = ¯ X = 1 n
n
⇤
i=1
xi
MLE(ˆ p) = 1 n
n
⇥
i=1
xi
maximum of the likelihood function for our observed sample
to zero (why?)
equal to zero, we often transform the likelihood by the natural log (ln) to produce the log-likelihood:
maximum and because it is often easier to work with the log-likelihood
but likelihoods are never negative (consider the structure of probability!) ln [L(θ|x)]
l(θ|x) = ln[L(θ|x)]
[0, ∞)
model has the following likelihood
this model
the partial (!!) derivative with respect to and set this equal to zero, then solve (this is the MLE!):
L(µ, σ2|X = x) =
n
Y
i=1
1 √ 2πσ2 e
−(xi−µ)2 2σ2
l(µ, σ2|X = x)) = −nln(σ) − n 2 ln(2π) − 1 2σ2
n
⇥
i
(xi − µ)2
∂l(θ|X = x) ∂µ = 1 σ2
n
⇥
i
(xi − µ) = 0 MLE(ˆ µ) = 1 n
n
⇥
i
xi
x|µ, x|µ,
a = −ln(a)
to this parameter
⇥ ∂l(θ|X = x) ∂σ2 = 0
MLE(ˆ σ2) = 1 n
n
⇤
i
(xi − x)2 σ2 σ2
l(µ, σ2|X = x)) = −nln(σ) − n 2 ln(2π) − 1 2σ2
n
⇥
i
(xi − µ)2
variate Bernoulli
by considering x = total number of tails in the entire sample:
Pr(xi|p) = pxi(1 − p)1−xi
| − Pr(x|p) =
n
⌅
i=1
pxi(1 − p)1−xi ⇥ ⌅ Pr(x|p) = n x ⇥ px(1 − p)n−x ⇥
⇥ L(p|X = x) = n x ⇥ px(1 − p)n−x
likelihood:
always negative (why?):
∂l(p|X = x) ∂p = x p − n − x 1 − p
MLE(ˆ p) = x n
∂2l(p|X = x) ∂p2 = − x p2 + x − n (1 − p)2 ⇥ L(p|X = x) = n x ⇥ px(1 − p)n−x l(p|X = x) = ln ✓n x ◆ + xln(p) + (n − x)ln(1 − p)
most standard “parametric” estimation and hypothesis testing (stay tuned!) that you will do in basic statistical analysis
properties (i.e. no surprise they play a central role) although we will not have time to discuss them in detail in this course (e.g. likelihood has strong connections to the concept of sufficiency, likelihood principal, etc., MLE have nice properties as estimators, ways of
you calculate an MLE, you are just calculating a statistic (estimator!)
MLE’s these are still just estimators (!!), i.e. they are statistics that take a sample as input and output a value that we consider an estimate of our parameter
class!), but there are many other estimators that we could use
we can define has different properties, some of which are desirable, some are less desirable
based on well defined criteria
estimator has a bias of zero):
(and vice versa!)
limn→∞Pr(|ˆ ⇥ − ⇥| < ) = 1 Bias(ˆ ⇥) = Eˆ ⇥ − ⇥
MLE( ˆ ⇤2) = 1 n
n
(xi − x)2
ˆ ⇤2 = 1 n − 1
n
(xi − x)2
testing
concepts in probability and statistics!