PATTERN RECOGNITION AND MACHINE LEARNING
Slide Set 2: Estimation Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi
Signal Processing Tampere University
PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation - - PowerPoint PPT Presentation
PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University default Classical Estimation and Detection Theory Before the machine
Signal Processing Tampere University
default
learning.
1 Estimation theory:
2 Detection theory:
2 / 37
default
control, seismology, etc.
depends on the unknown parameter θ ∈ R, we wish to design an estimator g(·) for θ ˆ θ = g(x[0], x[1], . . . , x[N − 1]).
1 What is the model for our data? 2 How to determine its parameters? 3 / 37
default
would like to approximate the relationship of the two coordinates.
the following model: y[n] = ax[n] + b + w[n], with a ∈ R and b ∈ R unknown and w[n] ∼ N(0, σ2)
and variance σ2.
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 x 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 y
4 / 37
default
data set? Or some other line?
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 x 0.25 0.00 0.25 0.50 0.75 1.00 1.25 y Model candidate 1 (a = 0.07, b = 0.49) Model candidate 2 (a = 0.06, b = 0.33) Model candidate 3 (a = 0.08, b = 0.51)
5 / 37
default
later) is given by ˆ a = − 6 N(N + 1)
N−1
y(n) + 12 N(N2 − 1)
N−1
x(n)y(n) ˆ b = 2(2N − 1) N(N + 1)
N−1
y(n) − 6 N(N + 1)
N−1
x(n)y(n).
6 / 37
default
a = 0.07401 and ˆ b = 0.49319, which produces the line shown on the right.
(green dashed lines) between the model (blue line) and the data (red circles).
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 x 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 y Best Fit y = 0.0740x+0.4932 Sum of Squares = 3.62
7 / 37
default
20 40 60 80 100 120 140 160 4 3 2 1 1 2 3 4
8 / 37
default
below.
20 40 60 80 100 120 140 160 4 3 2 1 1 2 3 4
9 / 37
default
x[n] = A cos(2πf0n + φ) + w[n], with w[n] ∼ N(0, σ2).
10 / 37
default
are given by ˆ f0 = value of f that maximizes
x(n)e−2πifn
x(n)e−2πiˆ
f0n
n=0 x(n) sin(2πˆ
f0n) N−1
n=0 x(n) cos(2πˆ
f0n) .
11 / 37
default
20 40 60 80 100 120 140 160 4 3 2 1 1 2 3 4
ˆ f0 = 0.068 (0.068); ˆ A = 0.692 (0.667); ˆ φ = 0.238 (0.609)
the green circles.
12 / 37
default
20 40 60 80 100 120 140 160 4 3 2 1 1 2 3 4
ˆ f0 = 0.068; ˆ A = 0.652; ˆ φ = -0.023
20 40 60 80 100 120 140 160 4 3 2 1 1 2 3 4
ˆ f0 = 0.066; ˆ A = 0.660; ˆ φ = 0.851
20 40 60 80 100 120 140 160 4 3 2 1 1 2 3 4
ˆ f0 = 0.067; ˆ A = 0.786; ˆ φ = 0.814
20 40 60 80 100 120 140 160 4 3 2 1 1 2 3 4
ˆ f0 = 0.459; ˆ A = 0.618; ˆ φ = 0.299 13 / 37
default
estimates
f0], E[ˆ φ] and E[ˆ A]?
14 / 37
default
15 / 37
default
b
16 / 37
default
y = Xθ + w
//github.com/mahehu/SGN-41007/blob/master/code/Least_Squares.ipynb
17 / 37
default
have an uneven illumination.
highest brightness at the center.
z(x, y) = c1x2 + c2y2 + c3xy + c4x + c5y + c6, with z(x, y) the brightness at pixel (x, y).
18 / 37
default
z
1
y2
1
x1y1 x1 y1 1 x2
2
y2
2
x2y2 x2 y2 1 . . . . . . . . . . . . . . . . . . x2
N
y2
N
xNyN xN yN 1
c
19 / 37
default
c =
z(x, y) = − 0.000080x2 − 0.000288y2 + 0.000123xy + 0.022064x + 0.284020y + 106.538687.
20 / 37
default
21 / 37
default
22 / 37
default
applicability in complicated estimation problems.
learning.
already in 1912 as a third year undergraduate.
generated the data x.
not necessarily unbiased, either.
23 / 37
default
x[n] = A + w[n], where w[n] ∼ N(0, σ2): Constant plus Gaussian random noise with zero mean and variance σ2.
50 100 150 200 4 3 2 1 1 2 3
24 / 37
default
unknown parameter (with fixed data), it is called likelihood function.
natural logarithm to get rid of the exponential. Note that the maximum of the new log-likelihood function does not change.
4 2 2 4 6 8 10 A 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
PDF of A assuming x[0] = 3
20 15 10 5
Likelihood Log-likelihood
25 / 37
default
x[n] = A + w[n], n = 0, 1, . . . , N − 1, the likelihood function is easy to construct.
whole batch of samples x = (x[0], . . . , x[N − 1]) is obtained by the product rule: p(x; A) =
N−1
p(x[n]; A) = 1 (2πσ2)
N 2 exp
2σ2
N−1
what is the most likely parameter A to have generated the data.
26 / 37
default
a different name such as L(A; x) or ℓ(A; x).
27 / 37
default
different values of A.
2 3 4 5 6 7 8 A 1 2 3 4 5 6 1e 31
Likelihood of A (max at 5.17)
350 300 250 200 150 100 50
Samples Likelihood Log-likelihood
28 / 37
default
# The samples are in array called x0 x = np.linspace(2, 8, 200) likelihood = [] log_likelihood = [] for A in x: likelihood.append(gaussian(x0, A, 1).prod()) log_likelihood.append(gaussian_log(x0, A, 1).sum()) print ("Max likelihood is at %.2f" % (x[np.argmax(log_likelihood)]))
29 / 37
default
instead: p(x; A) = 1 (2πσ2)
N 2
exp
1 2σ2
N−1
(x[n] − A)2 ln p(x; A) = −N 2 ln(2πσ2) − 1 2σ2
N−1
(x[n] − A)2
∂ ln p(x; A) ∂A = 1 σ2
N−1
(x[n] − A)
30 / 37
default
1 σ2
N−1
(x[n] − A) =
N−1
(x[n] − A) =
N−1
x[n] −
N−1
A =
N−1
x[n] − NA =
N−1
x[n] = NA A = 1 N
N−1
x[n] 31 / 37
default
distribution mean.
32 / 37
default
x[n] = A cos(2πf0n + φ) + w[n] with w[n] ∼ N(0, σ2). It is possible to find the MLE for all three parameters: θ = [A, f0, φ]T.
p(x; θ) = 1 (2πσ2)
N 2 exp
N−1
33 / 37
default
above function is maximized when J(A, f0, φ) =
N−1
slides).
Estimation Theory," 1993.
34 / 37
default
ˆ f0 = arg max
f
x[n] exp(−j2πfn)
f0 is available, proceed by calculating the other parameters: ˆ A = 2 N
x[n] exp(−j2πˆ f0n)
φ = arctan −
N−1
x[n] sin 2πˆ f0n
N−1
x[n] cos 2πˆ f0n
35 / 37
default
illustrated in the figures.
0.0 0.1 0.2 0.3 0.4 0.5 1000 2000 3000 4000 5000 6000 7000 8000
10000 estimates of parameter f0 (true = 0.07)
0.4 0.6 0.8 1.0 1.2 100 200 300 400 500 600
10000 estimates of parameter A (true = 0.67)
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 100 200 300 400 500 600 700 800 900
10000 estimates of parameter φ (true = 0.61)
36 / 37
default
Maximum Likelihood.
theory is the answer.
classical theory has no answer.
37 / 37