Lecture 7 Introduction to Statistical Decision Theory
I-Hsiang Wang
Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw
December 14, 2016
1 / 62 I-Hsiang Wang IT Lecture 7
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang - - PowerPoint PPT Presentation
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 14, 2016 1 / 62 I-Hsiang Wang IT Lecture 7 In the rest of this course we switch
1 / 62 I-Hsiang Wang IT Lecture 7
1 In this lecture, we will introduce the basic elements of statistical decision theory:
2 In the follow-up lectures, we will go into details of several topics, including
2 / 62 I-Hsiang Wang IT Lecture 7
1 We will begin with setting up the framework of statistical decision theory, including:
2 Next, we will introduce two basic statical decision making problems, including
3 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making
4 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Basic Framework
5 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Basic Framework
Statistical Experiment
θ ∈ Θ Decision Making
X ∈ X
6 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Basic Framework
Statistical Experiment
θ ∈ Θ Decision Making
X ∈ X
▶ Θ is called the parameter space, could be finite, infinitely countable, or uncountable. ▶ Pθ (·) is a probability distribution which accounts for the implicit randomness in experiments,
▶ X could be random variables, vectors, matrices, processes, etc.
7 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Basic Framework
Statistical Experiment
θ ∈ Θ Decision Making
X ∈ X
8 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Basic Framework
Statistical Experiment
θ ∈ Θ Decision Making
X ∈ X
Loss Function
l(T(θ), τ(X))
9 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Basic Framework
Statistical Experiment
θ ∈ Θ Decision Making
X ∈ X
Loss Function
l(T(θ), τ(X))
10 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
11 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
12 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
i=1 PY |X(yi|xi(m))
13 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
14 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
i.i.d.
15 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
16 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
i=1 PX (xi) PY |h(X)(yi|h(xi))
(Note: This is still random as ˆ
17 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
i=1 PX (xi) PZ (yi − f(xi)) (Y = f(X) + Z where Z is the observation noise)
(Note: This is still random as ˆ
f
18 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Examples
i=1 PX (xi) PZ (yi − γ − β⊺xi)
⊺X − ˆ
(Note: This is still random as (ˆ
β,γ
⊺X − ˆ
19 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Paradigms
20 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Paradigms
21 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Paradigms
π ≜ infτ Rπ (τ) ,
π (may not be unique).
22 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Paradigms
23 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Paradigms
π.
1 The parameter space Θ and the data alphabet X are both finite. 2 |Θ| < ∞ and the loss function is bounded from below.
24 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Paradigms
π = supπ infτ EΘ∼π [LΘ(τ)].
π.
π.
25 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Paradigms
1 If the loss function τ → l(T, τ) is convex, then randomization does not help. 2 In the Bayes paradigm, there always exists a deterministic decision rule which is Bayes optimal.
26 / 62 I-Hsiang Wang IT Lecture 7
Statistical Model and Decision Making Paradigms
27 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing
28 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
29 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
1 Two hypotheses regarding the observation X, indexed by θ ∈ {0, 1}:
2 Goal: design a decision making algorithm ϕ : X → {0, 1} , x → ˆ
3 The loss function is the 0-1 loss, rendering two kinds probability of errors:
30 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
x∈A1(φ) P0 (x) = ∑ x∈X ϕ (x) P0 (x),
x∈A0(φ) P1 (x) = ∑ x∈X (1 − ϕ (x)) P1 (x).
θ.
31 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
P0(x). Hence, LRT is a thresholding algorithm on the
32 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
33 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
34 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
35 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
φ:X→[0,1]
36 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
x∈X
x: L(x)>τ ∗ P0 (x) +
x: L(x)=τ ∗ γ∗P0 (x)
37 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
38 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
φ: X→[0,1]
(l(1,0)−l(1,1))π1
39 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
x∈X l(0, 0)π0P0 (x) (1 − ϕ (x)) + ∑ x∈X l(0, 1)π0P0 (x) ϕ (x)
x∈X l(1, 0)π1P1 (x) (1 − ϕ (x)) + ∑ x∈X l(1, 1)π1P1 (x) ϕ (x)
x∈X (l(0, 1) − l(0, 0)) π0P0 (x) ϕ (x)
x∈X (l(1, 1) − l(1, 0)) π1P1 (x) ϕ (x)
x∈X
40 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Basics
P0(x) is a sufficient statistics.
41 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Asymptotics: Overview
42 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Asymptotics: Overview
i.i.d.
i.i.d.
FA ≜ P {H1 is chosen | H0} ,
MD ≜ P {H0 is chosen | H1}
43 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Asymptotics: Overview
n
i=1 P1(xi) P0(xi) = ∏ a∈X
P1(a) P0(a)
a∈X
P1(a) P0(a)
nN (a|xn) is the relative frequency of occurrence
44 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Asymptotics: Overview
a∈X
P1(a) P0(a)
a∈X
π(a|xn) P0(a)
a∈X
π(a|xn) P1(a)
n log τn
45 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Asymptotics: Overview
Observation Space
Acceptance Region of H1. Acceptance Region of H0.
1
Acceptance Region of H1.
Acceptance Region of H0.
i
46 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Asymptotics: Overview
1
1
1
47 / 62 I-Hsiang Wang IT Lecture 7
Hypothesis Testing Asymptotics: Overview
1 Neyman-Pearson: β∗ (n, ε) ≜
φn:X n→[0,1]β(n) φn ,
φn ≤ ε. It turns out that
n→∞
n log β∗ (n, ε)
2 Bayes: P∗
e(n) ≜
φn:X n→[0,1]
φn + π1β(n) φn
n→∞
n log P∗ e (n)
(P0(a))1−λ(P1(a))λ ∑
x∈X (P0(x))1−λ(P1(x))λ , ∀ a ∈ X,
48 / 62 I-Hsiang Wang IT Lecture 7
Estimation
49 / 62 I-Hsiang Wang IT Lecture 7
Estimation Mean-Squared Error (MSE) and Cramér-Rao Lower Bound
50 / 62 I-Hsiang Wang IT Lecture 7
Estimation Mean-Squared Error (MSE) and Cramér-Rao Lower Bound
51 / 62 I-Hsiang Wang IT Lecture 7
Estimation Mean-Squared Error (MSE) and Cramér-Rao Lower Bound
52 / 62 I-Hsiang Wang IT Lecture 7
Estimation Mean-Squared Error (MSE) and Cramér-Rao Lower Bound
∂ θ ln fθ(X)
1 J(θ), ∀ θ ∈ Θ .
53 / 62 I-Hsiang Wang IT Lecture 7
Estimation Mean-Squared Error (MSE) and Cramér-Rao Lower Bound
∂ ∂ θ ln fθ(X) = 1 fθ(X) ∂ ∂ θfθ(X), because
−∞ fθ (x) 1 fθ(x) ∂ ∂ θfθ(x) dx =
−∞ ∂ ∂ θfθ(x) dx
d d θ
−∞ fθ(x) dx = 0.
54 / 62 I-Hsiang Wang IT Lecture 7
Estimation Mean-Squared Error (MSE) and Cramér-Rao Lower Bound
1 fθ(X) ∂ ∂ θfθ(X)ϕ (X)
d d θ
−∞ fθ(x)ϕ (x) dx = d d θEfθ [ϕ (X)] (a)
d d θθ = 1,
55 / 62 I-Hsiang Wang IT Lecture 7
Estimation Mean-Squared Error (MSE) and Cramér-Rao Lower Bound
Exercise 1 (Cramér-Rao Inequality for Unbiased Functional Estimators) Prove that for any unbiased estimator ζ of z (θ), MSEθ (ζ) ≥
1 J(θ)
d θ z (θ)
Exercise 2 (Cramér-Rao Inequality for Biased Estimators) Prove that for any estimator φ of the parameter θ, MSEθ (φ) ≥
1 J(θ)
d d θ Biasθ (φ)
Exercise 3 (Attainment of Cramér-Rao) Show that the necessary and sufficient condition for an unbiased estimator φ to attain the Cramér-Rao lower bound is that, there exists some function g such that for all x,
∂ ∂ θ ln fθ (x) .
56 / 62 I-Hsiang Wang IT Lecture 7
Estimation Mean-Squared Error (MSE) and Cramér-Rao Lower Bound
1 J (θ) ≜ Efθ
∂ ∂ θ ln fθ(X) = 1 fθ(X) ∂ ∂ θfθ(X) is zero-mean.
2 Suppose Xi i.i.d.
3 For an exponential family {fθ | θ ∈ Θ}, it can be shown that
∂2 ∂ θ2 ln fθ (X)
57 / 62 I-Hsiang Wang IT Lecture 7
Estimation Maximum Likelihood Estimator, Consistency, and Efficiency
58 / 62 I-Hsiang Wang IT Lecture 7
Estimation Maximum Likelihood Estimator, Consistency, and Efficiency
θ∈Θ
Exercise 4 (MLE of Gaussian with Unknown Mean and Variance) Consider Xi
i.i.d.
for i = 1, 2, . . . , n, where θ ≜
denote the unknown parameter. Let x ≜ 1
n
i=1 xi.
Show that
1 n
i=1 (xi − x)2)
59 / 62 I-Hsiang Wang IT Lecture 7
Estimation Maximum Likelihood Estimator, Consistency, and Efficiency
i.i.d.
1 Consistency:
2 Efficiency:
60 / 62 I-Hsiang Wang IT Lecture 7
Estimation Maximum Likelihood Estimator, Consistency, and Efficiency
n→∞ PXi
i.i.d.
∼ Pθ {|ζn (Xn) − z (θ)| < ε} = 1, ∀ θ ∈ Θ.
p
61 / 62 I-Hsiang Wang IT Lecture 7
Estimation Maximum Likelihood Estimator, Consistency, and Efficiency
1 J(θ)
d θz (θ)
62 / 62 I-Hsiang Wang IT Lecture 7