PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University

default Detection theory • In this section, we will brie fl y consider detection theory. • Detection theory has many common topics with machine learning. • The methods are based on estimation theory and attempt to answer questions such as • Is a signal of speci fi c model present in our time series? E.g., detection of noisy sinusoid; beep or no beep? • Is the transmitted pulse present at radar signal at time t ? • Does the mean level of a signal change at time t ? • After calculating the mean change in pixel values of subsequent frames in video, is there something moving in the scene? • Is there a person in this video frame? • The area is closely related to hypothesis testing , which is widely used e.g., in medicine: Is the response in patients due to the new drug or due to random fl uctuations? 2 / 35

default Detection theory • Consider the detection of a sinusoidal waveform Noiseless Signal 1 0 1 0 200 400 600 800 Noisy Signal 0.0 2.5 0 200 400 600 800 Detection Result 40 20 0 0 200 400 600 800 3 / 35

default Detection theory • In our case, the hypotheses could be H 1 : x [ n ] = A cos( 2 π f 0 n + φ ) + w [ n ] H 0 : x [ n ] = w [ n ] • This example corresponds to detection of noisy sinusoid. • The hypothesis H 1 corresponds to the case that the sinusoid is present and is called alternative hypothesis . • The hypothesis H 0 corresponds to the case that the measurements consists of noise only and is called null hypothesis . 4 / 35

default Introductory Example • Consider a simplistic detection problem, where we observe one sample x [ 0 ] from one of two densities: N ( 0 , 1 ) or N ( 1 , 1 ) . • The task is to choose the correct density in an optimal manner. 0 . 5 Gaussian with µ = 1 0 . 4 Gaussian with µ = 0 0 . 3 0 . 2 Where did this come from? 0 . 1 0 . 0 − 2 0 2 4 6 8 5 / 35

default Introductory Example • Our hypotheses are now H 1 : µ = 1 , H 0 : µ = 0 , and the corresponding likelihoods are plotted below. Likelihood of observing different values of x[0] given 0 or 1 0.4 p(x[0] | 0 ) p(x[0] | 1 ) 0.3 Likelihood 0.2 0.1 0.0 4 3 2 1 0 1 2 3 4 x[0] 6 / 35

default Introductory Example • An obvious approach for deciding the density would choose the one, which is higher for a particular x [ 0 ] . • More specifically, study the likelihoods and choose the more likely one. • The likelihoods are � − ( x [ 0 ] − 1 ) 2 � 1 H 1 : p ( x [ 0 ] | µ = 1 ) = √ exp . 2 π 2 � − ( x [ 0 ]) 2 � 1 √ H 0 : p ( x [ 0 ] | µ = 0 ) = exp . 2 π 2 • One should select H 1 if " µ = 1" is more likely than " µ = 0". • In other words, p ( x [ 0 ] | µ = 1 ) > p ( x [ 0 ] | µ = 0 ) . 7 / 35

default Introductory Example • Let’s state this in terms of x [ 0 ] : p ( x [ 0 ] | µ = 1 ) > p ( x [ 0 ] | µ = 0 ) ⇔ p ( x [ 0 ] | µ = 1 ) p ( x [ 0 ] | µ = 0 ) > 1 � � − ( x [ 0 ] − 1 ) 2 2 π exp 1 √ 2 ⇔ > 1 � � − ( x [ 0 ]) 2 2 π exp 1 √ 2 − ( x [ 0 ] − 1 ) 2 − x [ 0 ] 2 � � ⇔ exp > 1 2 8 / 35

default Introductory Example ⇔ ( x [ 0 ] 2 − ( x [ 0 ] − 1 ) 2 ) > 0 ⇔ 2 x [ 0 ] − 1 > 0 ⇔ x [ 0 ] > 1 2 . • In other words, choose H 1 if x [ 0 ] > 0 . 5 and H 0 if x [ 0 ] < 0 . 5. • Studying the ratio of likelihoods (second row of the previous derivation) is the key: p ( x [ 0 ] | µ = 1 ) p ( x [ 0 ] | µ = 0 ) > 1 • This ratio is called likelihood ratio , and comparison to a threshold (here γ = 1) is called likelihood ratio test (LRT). • Of course the detection threshold γ may be chosen other than γ = 1. 9 / 35

default Error Types • It might be that the detection problem is not symmetric and some errors are more costly than others. • For example, when detecting a disease, a missed detection is more costly than a false alarm. • The tradeoff between misses and false alarms can be adjusted using the threshold of the LRT. 10 / 35

default Error Types • The below figure illustrates the probabilities of the two kinds of errors. • The blue area on the left corresponds to the probability of choosing H 1 while H 0 would hold (false match). • The red area is the probability of choosing H 0 while H 1 would hold (missed detection). 0.40 0.35 0.30 Decide H 0 when H 1 holds Decide H 1 when H 0 holds Likelihood 0.25 0.20 0.15 0.10 0.05 0.00 4 3 2 1 0 1 2 3 4 x[0] 11 / 35

default Error Types • It can be seen that we can decrease either probability arbitrarily small by adjusting the detection threshold. Detection threshold at 0. Small amount of missed detections (red) but many false matches (blue). 0.40 0.35 0.30 Decide H 0 when H 1 holds Decide H 1 when H 0 holds Likelihood 0.25 0.20 0.15 0.10 0.05 0.00 4 3 2 1 0 1 2 3 4 x[0] Detection threshold at 1.5. Small amount of false matches (blue) but many missed detections (red). 0.40 0.35 0.30 Decide H 0 when H 1 holds Decide H 1 when H 0 holds Likelihood 0.25 0.20 0.15 0.10 0.05 0.00 4 3 2 1 0 1 2 3 4 x[0] 12 / 35

default Error Types • For example, suppose the threshold is γ = 1 . 5. What are P FA and P D ? • Probability of false alarm is found by integrating over the blue area: � ∞ � − ( x [ 0 ]) 2 � 1 P FA = P ( x [ 0 ] > γ | µ = 0 ) = √ exp dx [ 0 ] ≈ 0 . 0668 . 2 π 2 1 . 5 • Probability of missed detection is the area marked in red: � 1 . 5 � − ( x [ 0 ] − 1 ) 2 � 1 P M = P ( x [ 0 ] < γ | µ = 1 ) = √ exp dx [ 0 ] ≈ 0 . 6915 . 2 π 2 −∞ • An equivalent, but more useful term is the complement of P M : probability of detection: � ∞ � − ( x [ 0 ] − 1 ) 2 � 1 P D = 1 − P M = √ exp dx [ 0 ] ≈ 0 . 3085 . 2 π 2 1 . 5 13 / 35

default Choosing the threshold • Often we don’t want to define the threshold, but rather the amount of false alarms we can accept. • For example, suppose we want to find the best detector for our introductory example, and we can tolerate 10% false alarms ( P FA = 0 . 1). • The likelihood ratio detection rule is: Select H 1 if p ( x | µ = 1 ) p ( x | µ = 0 ) > γ The only thing to find out now is the threshold γ such that � ∞ p ( x | µ = 0 ) dx = 0 . 1 . γ 14 / 35

default Choosing the threshold • This can be done with Python function isf , which solves the inverse cumulative distribution function. >>> import scipy.stats as stats >>> # Compute threshold such that P_FA = 0.1 >>> T = stats.norm.isf(0.1, loc = 0, scale = 1) >>> print T 1.28155156554 • The parameters loc and scale are the mean and standard deviation of the Gaussian density, respectively. 15 / 35

default Detector for a known waveform • An important special case is that of a known waveform s [ n ] embedded in WGN sequence w [ n ] : H 1 : x [ n ] = s [ n ] + w [ n ] H 0 : x [ n ] = w [ n ] . • An example of a case where the waveform is known could be detection of radar signals, where a pulse s [ n ] transmitted by us is re fl ected back after some propagation time. Received signal s [ n ] + w [ n ] Transmitted signal s [ n ] 1.0 1 0.5 0 0.0 0.5 1 1.0 0 200 400 600 800 0 200 400 600 800 16 / 35

default Detector for a known waveform • For this case the likelihoods are N − 1 � − ( x [ n ] − s [ n ]) 2 � � 1 p ( x | H 1 ) = √ 2 πσ 2 exp , 2 σ 2 n = 0 N − 1 � − ( x [ n ]) 2 � � 1 √ p ( x | H 0 ) = 2 πσ 2 exp . 2 σ 2 n = 0 • The likelihood ratio test is easily obtained as � � N − 1 �� N − 1 p ( x | H 1 ) ( x [ n ] − s [ n ]) 2 − − 1 � � p ( x | H 0 ) = exp ( x [ n ]) 2 > γ. 2 σ 2 n = 0 n = 0 17 / 35

default Detector for a known waveform • This simplifies by taking the logarithm from both sides: � N − 1 � N − 1 ( x [ n ] − s [ n ]) 2 − − 1 � � ( x [ n ]) 2 > ln γ. 2 σ 2 n = 0 n = 0 • This further simplifies into N − 1 N − 1 1 1 ( s [ n ]) 2 > ln γ. � � x [ n ] s [ n ] − σ 2 2 σ 2 n = 0 n = 0 18 / 35

default Detector for a known waveform • Since s [ n ] is a known waveform (= constant), we can simplify the procedure by moving it to the right hand side and combining it with the threshold: N − 1 N − 1 x [ n ] s [ n ] > σ 2 ln γ + 1 � � ( s [ n ]) 2 . 2 n = 0 n = 0 We can equivalently call the right hand side as our threshold (say γ ′ ) to get the final decision rule N − 1 � x [ n ] s [ n ] > γ ′ . n = 0 19 / 35

default Example • The detector for a sinusoid in WGN is N − 1 N − 1 � � x [ n ] A cos( 2 π f 0 n + φ ) > γ ⇒ A x [ n ] cos( 2 π f 0 n + φ ) > γ. n = 0 n = 0 • Again we can divide by A to get N − 1 � x [ n ] cos( 2 π f 0 n + φ ) > γ ′ . n = 0 • In other words, we check the correlation with the sinusoid. Note that the amplitude A does not affect our statistic, only the threshold which is anyway selected according to the fixed P FA rate. 20 / 35

default Example Noiseless Signal 1 0 1 • As an example, the picture shows the 0 200 400 600 800 Noisy Signal detection process with σ = 0 . 5. 2 • Note, that we apply the detector with a 0 2 sliding window; i.e., we perform the 0 200 400 600 800 Detection Result hypothesis test at every window of 50 length 100. 0 50 0 200 400 600 800 21 / 35

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University default Detection theory In this section, we will brie fl y consider

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

Attacks Against Process Control Systems: Risk Assessment, Detection, and Response A.Cardenas, S.

trt t q

OBSERVABLES + STRUCTURAL DIRECTIVES = NILS MEHLHORN nils-mehlhorn.de www freelance

Feature Detection and Matching Applications Goal : Develop matching procedures that can detect

Detecting a Change of Style Using Text Statistics Kamil Safin Aleksandr Ogaltsov Antiplagiat

CS 4495 Computer Vision Features 1 Harris and other corners Aaron Bobick School of

Synchronisation solutions, NDS2 Cryptobox Kamil Guryn, BIAMAN, www.pb.edu.pl Maciej Brzeniak,

SAR Image PostProcessing and Exploitation This presentation is an informal communication

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University default Detection theory In this section, we will brie fl y consider

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

PRLab TUDelft NL PATTERN RECOGNITION &amp; MACHINE LEARNING An Introduction Marco Loog

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

Attacks Against Process Control Systems: Risk Assessment, Detection, and Response A.Cardenas, S.

trt t q

OBSERVABLES + STRUCTURAL DIRECTIVES = NILS MEHLHORN nils-mehlhorn.de www freelance

Feature Detection and Matching Applications Goal : Develop matching procedures that can detect

Detecting a Change of Style Using Text Statistics Kamil Safin Aleksandr Ogaltsov Antiplagiat

CS 4495 Computer Vision Features 1 Harris and other corners Aaron Bobick School of

Synchronisation solutions, NDS2 Cryptobox Kamil Guryn, BIAMAN, www.pb.edu.pl Maciej Brzeniak,

SAR Image PostProcessing and Exploitation This presentation is an informal communication

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog