PATTERN RECOGNITION AND MACHINE LEARNING
Slide Set 3: Detection Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi
Signal Processing Tampere University
PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection - - PowerPoint PPT Presentation
PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University default Detection theory In this section, we will brie fl y consider
PATTERN RECOGNITION AND MACHINE LEARNING
Slide Set 3: Detection Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi
Signal Processing Tampere University
Detection theory
as
beep or no beep?
something moving in the scene?
Is the response in patients due to the new drug or due to random fluctuations?
2 / 35
Detection theory
Noiseless Signal
200 400 600 800 2.5 0.0Noisy Signal
200 400 600 800 20 40Detection Result
3 / 35
Detection theory
H1 : x[n] = A cos(2πf0n + φ) + w[n] H0 : x[n] = w[n]
alternative hypothesis.
4 / 35
Introductory Example
−2 2 4 6 8 0.0 0.1 0.2 0.3 0.4 0.5
Where did this come from?
Gaussian with µ = 1 Gaussian with µ = 0
5 / 35
Introductory Example
H1 : µ = 1, H0 : µ = 0, and the corresponding likelihoods are plotted below.
4 3 2 1 1 2 3 4 x[0] 0.0 0.1 0.2 0.3 0.4 Likelihood
Likelihood of observing different values of x[0] given
0 or 1
p(x[0] |
0)p(x[0] |
1)6 / 35
Introductory Example
for a particular x[0].
H1 : p(x[0] | µ = 1) = 1 √ 2π exp
2
H0 : p(x[0] | µ = 0) = 1 √ 2π exp
2
7 / 35
Introductory Example
p(x[0] | µ = 1) > p(x[0] | µ = 0) ⇔ p(x[0] | µ = 1) p(x[0] | µ = 0) > 1 ⇔
1 √ 2π exp
2
√ 2π exp
2
⇔ exp
2
8 / 35
Introductory Example
⇔ (x[0]2 − (x[0] − 1)2) > 0 ⇔ 2x[0] − 1 > 0 ⇔ x[0] > 1 2.
p(x[0] | µ = 1) p(x[0] | µ = 0) > 1
likelihood ratio test (LRT).
9 / 35
Error Types
costly than others.
alarm.
the LRT.
10 / 35
Error Types
would hold (false match).
4 3 2 1 1 2 3 4 x[0] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Likelihood Decide H0 when H1 holds Decide H1 when H0 holds
11 / 35
Error Types
the detection threshold.
4 3 2 1 1 2 3 4 x[0] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Likelihood Decide H0 when H1 holds Decide H1 when H0 holds Detection threshold at 0. Small amount of missed detections (red) but many false matches (blue). 4 3 2 1 1 2 3 4 x[0] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Likelihood Decide H0 when H1 holds Decide H1 when H0 holds Detection threshold at 1.5. Small amount of false matches (blue) but many missed detections (red).12 / 35
Error Types
PFA = P(x[0] > γ | µ = 0) = ∞
1.5
1 √ 2π exp
2
PM = P(x[0] < γ | µ = 1) = 1.5
−∞
1 √ 2π exp
2
PD = 1 − PM = ∞
1.5
1 √ 2π exp
2
13 / 35
Choosing the threshold
can accept.
and we can tolerate 10% false alarms (PFA = 0.1).
Select H1 if p(x | µ = 1) p(x | µ = 0) > γ The only thing to find out now is the threshold γ such that ∞
γ
p(x | µ = 0) dx = 0.1.
14 / 35
Choosing the threshold
distribution function.
>>> import scipy.stats as stats >>> # Compute threshold such that P_FA = 0.1 >>> T = stats.norm.isf(0.1, loc = 0, scale = 1) >>> print T 1.28155156554
density, respectively.
15 / 35
Detector for a known waveform
sequence w[n]: H1 : x[n] = s[n] + w[n] H0 : x[n] = w[n].
signals, where a pulse s[n] transmitted by us is reflected back after some propagation time.
200 400 600 800 1.0 0.5 0.0 0.5 1.0
Transmitted signal s[n]
200 400 600 800 1 1
Received signal s[n] + w[n]
16 / 35
Detector for a known waveform
p(x | H1) =
N−1
1 √ 2πσ2 exp
2σ2
p(x | H0) =
N−1
1 √ 2πσ2 exp
2σ2
p(x | H1) p(x | H0) = exp
2σ2 N−1
(x[n] − s[n])2 −
N−1
(x[n])2
17 / 35
Detector for a known waveform
− 1 2σ2 N−1
(x[n] − s[n])2 −
N−1
(x[n])2
1 σ2
N−1
x[n]s[n] − 1 2σ2
N−1
(s[n])2 > ln γ.
18 / 35
Detector for a known waveform
it to the right hand side and combining it with the threshold:
N−1
x[n]s[n] > σ2 ln γ + 1 2
N−1
(s[n])2. We can equivalently call the right hand side as our threshold (say γ′) to get the final decision rule
N−1
x[n]s[n] > γ′.
19 / 35
Example
N−1
x[n]A cos(2πf0n + φ) > γ ⇒ A
N−1
x[n] cos(2πf0n + φ) > γ.
N−1
x[n] cos(2πf0n + φ) > γ′.
does not affect our statistic, only the threshold which is anyway selected according to the fixed PFA rate.
20 / 35
Example
detection process with σ = 0.5.
sliding window; i.e., we perform the hypothesis test at every window of length 100.
200 400 600 800 1 1
Noiseless Signal
200 400 600 800 2 2
Noisy Signal
200 400 600 800 50 50
Detection Result
21 / 35
Detection of random signals
results depend on how well the phases match.
unknown, but the correlation structure is known. Since the correlation captures the frequency (but not the phase), this is exactly what we want.
test H0 : x ∼ N(0, σ2I) H1 : x ∼ N(0, Cs + σ2I)
22 / 35
Detection of random signals
Decide H1, if xTˆ s > γ, where ˆ s = Cs(Cs + σ2I)−1x.
23 / 35
Example of Random Signal Detection
sinusoid:
x[n] exp(−2πif0n)
import numpy as np h = np.exp(-2 * np.pi * 1j * f0 * n) y = np.abs(np.convolve(h, xn, ’same’))
24 / 35
Example of Random Signal Detection
200 400 600 800 1 1
Noiseless Signal
200 400 600 800 2.5 0.0
Noisy Signal
200 400 600 800 20 40
Detection Result 25 / 35
Receiver Operating Characteristics
Characteristics curve (ROC curve).
threshold γ.
selected detector.
26 / 35
Receiver Operating Characteristics
PD(γ) = ∞
γ
1 √ 2π exp
2
PFA(γ) = ∞
γ
1 √ 2π exp
2
PD(γ) = ∞
γ−1
1 √ 2π exp
2
27 / 35
Receiver Operating Characteristics
performance.
performance: the Area Under (ROC) Curve, or AUC criterion.
and tests the accuracy for all thresholds.
0.0 0.2 0.4 0.6 0.8 1.0 Probability of False Alarm PFA 0.0 0.2 0.4 0.6 0.8 1.0 Probability of Detection PD Good detector Better detector Bad detector28 / 35
Empirical AUC
performance, as well.
prediction results on a holdout test set.
29 / 35
Classification Example—ROC and AUC
exactly the same.
each on the bottom data.
2 1 1 2 3 4 5 6 10 8 6 4 2 2 Training Data 2 1 1 2 3 4 5 6 10 8 6 4 2 2 Test Data30 / 35
Classification Example—ROC and AUC
produces the shown class boundary.
that minimizes the overall classification error for the training data.
y = c1x + c0 with parameters c1 and c0 learned from data.
2 1 1 2 3 4 5 6 10 8 6 4 2 2 Class Boundary 13.5 % of circles detected as cross 5.0 % of crosses detected as circle
Classifier with minimum error boundary 31 / 35
Classification Example—ROC and AUC
boundary up or down.
y = c1x + c0
32 / 35
Classification Example—ROC and AUC
AUC = 0.98
2 1 1 2 3 4 5 6 10 8 6 4 2 2 Class Boundary 13.5 % of circles detected as cross 5.0 % of crosses detected as circleClassifier with minimum error boundary
33 / 35
Classification Example—ROC and AUC
slides.
0.0 0.2 0.4 0.6 0.8 1.0 Probability of False Alarm PFA 0.0 0.2 0.4 0.6 0.8 1.0 Probability of Detection PDLogistic Regression (AUC = 0.98) Support Vector Machine (AUC = 0.96) Random Forest (AUC = 0.97) Nearest Neighbor (AUC = 0.96)
34 / 35
ROC and AUC code in Python
classifiers = [(LogisticRegression(), "Logistic Regression"), (SVC(probability = True), "Support Vector Machine"), (RandomForestClassifier(n_estimators = 100), "Random Forest"), (KNeighborsClassifier(), "Nearest Neighbor")] for clf, name in classifiers: clf.fit(X, y) ROC = [] for gamma in np.linspace(0, 1, 1000): err1 = np.count_nonzero(clf.predict_proba(X_test[y_test == 0, :])[:,1] <= gamma) err2 = np.count_nonzero(clf.predict_proba(X_test[y_test == 1, :])[:,1] > gamma) err1 = float(err1) / np.count_nonzero(y_test == 0) err2 = float(err2) / np.count_nonzero(y_test == 1) ROC.append([err1, err2]) ROC = np.array(ROC) ROC = ROC[::-1, :] auc = roc_auc_score(y_test, clf.predict_proba(X_test)[:,1]) plt.plot(1-ROC[:, 0], ROC[:, 1], linewidth = 2, label="%s (AUC = %.2f)" % (name, auc))
35 / 35
| 0
R = 0% P = 100% R = 100% P = 100% R = 100% P = 20%
| 1
| 2
Low sensitivity High sensitivity
| 3
Low sensitivity High sensitivity