1
Data Analysis and Uncertainty Part 2: Estimation
Instructor: Sargur N. Srihari
University at Buffalo The State University of New York
srihari@cedar.buffalo.edu
Srihari
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur - - PowerPoint PPT Presentation
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. Srihari University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Srihari Topics in Estimation 1. Estimation 2. Desirable Properties of Estimators
1
Srihari
Srihari 2
Srihari 3
– Measures Systematic departure from true value
– Data driven component of error in estimation procedure – E.g., Always saying has a variance of zero but high bias
Srihari
Bias( ˆ θ ) = E[θ
∧
]−θ
Var(θ
∧
) = E[θ
∧
− E[θ
∧
]]2
E[(θ
∧
−θ)2] θ
∧
=1
Expectation over all possible data sets of size n
180 (variance=0)
mean 180 and std dev 10 (variance 100)
beliefs with mean 180 and std dev 20 (variance=400)
and other 160
– Ave error is -20
– Ave squared error: 800
200 180 200 180 200 180
Bias No variance Bias Some variance Bias More variance
Each scenario has expected value of 180 (or bias error = 20), but increasing variance in estimate
Srihari 6
E[(θ
∧
−θ)2] = E[(θ
∧
− E[θ
∧
]+ E[θ
∧
]−θ)2] = (E[θ
∧
]−θ)2 + E[(θ
∧
− E[θ
∧
])2] = (Bias(θ
∧
))2 + Var(θ
∧
)
Srihari
L(θ | D) = L(θ | x(1),..., x(n)) = p(x(1),..., x(n) |θ) = p(x(i) |θ)
i=1 n
Srihari
Bin(r | n,θ) = n r θ r(1−θ)n−r
Srihari 9
L(θ | x(1),..,x(1000)) = θ x(i)(1−θ)n−x(i)
i
= θ r(1−θ)1000−r l(θ) = logL(θ) = rlogθ + (1000 − r)log(1−θ) ˆ θ
ML = r /1000
Srihari 10
r milk purchases out of n customers θ is the probability that milk is purchased by random customer r=70 n=100 r=700 n=1000 r=7 n=10
Srihari 11
l(θ | x(1),...,x(n)) = − n 2 log2π − 1 2 (x(i) −θ)2
i=1 n
L(θ | x(1),...,x(n)) = (2π)−1/ 2
i=1 n
exp − 1 2 x(i) −θ
( )
2
= 2π
( )
−n / 2 exp − 1
2 x(i) −θ
( )
2 i=1 n
ˆ θ
ML =
x(i)/n
i
Srihari 12
Estimate unknown mean θ Histogram of 20 data points drawn from zero mean, unit variance Likelihood function Log-Likelihood function
Srihari 13
Histogram of 200 data points drawn from zero mean, unit variance Likelihood function Log-Likelihood function
15
p(µ-1.96σ/10)<x_<µ+1.96σ/10)=0.95 Rewritten as P(x_-1.96σ/10) <x_<1.96σ/10)=0.95 l(x)=x_-1.96σ/10 and u(x)=x_+1.96σ/10 is a 95% interval
Srihari 16
– Since prior is flat preferring no single value – MLE can be viewed as a special case of MAP procedure, which in turn is a restricted form of Bayesian estimation p(θ | D) = p(D |θ)p(θ) p(D) = p(D |θ)p(θ) p(D |ψ)p(ψ)ψ
ψ
Srihari
p(θ) ∝θα−1(1−θ)β −1 p(θ | D) ∝ p(D |θ)p(θ) = θ r(1−θ)n−rθα−1(1−θ)β −1 = θ r+α−1(1−θ)n−r+β −1 L(θ | D) = θ r(1−θ)n−r
Where α > 0, β > 0 are two parameters
Beta(θ |α,β) = Γ(α + β) Γ(α)Γ(β)θα−1(1−θ)β −1
E[θ] = α α + β mode[θ] = α -1 α +β - 2 var[θ] = αβ (α +β)2(α +β +1)
E[θ] = r + α n + α + β
Srihari 20
p(x(n +1) | D) = p(x(n +1),θ | D)dθ
= p(x(n +1) |
θ)p(θ | D)dθ p(θ | D
1,D2) ∝ p(D2 |θ)p(D 1 |θ)p(θ)
Since x(n+1) is conditionally independent
Srihari
p(θ | D) = p(D |θ)p(θ) p(D) = p(D |θ)p(θ) p(D |ψ)p(ψ)ψ
ψ
Srihari
p(θ | x) ∝ p(x |θ)p(θ) = 1 2πα exp - 1 2α x −θ
( )
2
1 2πα exp - 1 2α0 x −θ0
( )
2
p(θ | x) = 1 2πα1 exp - 1 2 θ −θ1
( )
2 /α1
where α1=(α0
θ1=α1(θ0/α0+x/α) weighted sum of mean and datum
Srihari 23
Srihari 24
I(θ | x) = −E[∂2 logL(θ | x) ∂θ 2 ] p(θ) ∝ I(θ | x)
Srihari 25
Srihari 26
Srihari 27