PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University

default Classical Estimation and Detection Theory • Before the machine learning part, we will take a look at classical estimation theory. • Estimation theory has many connections to the foundations of modern machine learning. • Outline of the next few hours: 1 Estimation theory: • Fundamentals • Maximum likelihood • Examples 2 Detection theory: • Fundamentals • Error metrics • Examples 2 / 37

default Introduction - estimation • Our goal is to estimate the values of a group of parameters from data. • Examples: radar, sonar, speech, image analysis, biomedicine, communications, control, seismology, etc. • Parameter estimation : Given an N -point data set x = { x [ 0 ] , x [ 1 ] , . . . , x [ N − 1 ] } which depends on the unknown parameter θ ∈ R , we wish to design an estimator g ( · ) for θ ˆ θ = g ( x [ 0 ] , x [ 1 ] , . . . , x [ N − 1 ]) . • The fundamental questions are: 1 What is the model for our data? 2 How to determine its parameters? 3 / 37

default Introductory Example – Straight line 1.2 • Suppose we have the illustrated time series and 1.0 would like to approximate the relationship of the two coordinates. 0.8 • The relationship looks linear, so we could assume 0.6 the following model: y 0.4 y [ n ] = ax [ n ] + b + w [ n ] , 0.2 with a ∈ R and b ∈ R unknown and w [ n ] ∼ N ( 0 , σ 2 ) 0.0 • N ( 0 , σ 2 ) is the normal distribution with mean 0 0.2 and variance σ 2 . 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 x 4 / 37

default Introductory Example – Straight line Model candidate 1 (a = 0.07, b = 0.49) 1.25 Model candidate 2 (a = 0.06, b = 0.33) Model candidate 3 1.00 (a = 0.08, b = 0.51) 0.75 • Each pair of a and b represent one line. 0.50 y • Which line of the three would best describe the data set? Or some other line? 0.25 0.00 0.25 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 x 5 / 37

default Introductory Example – Straight line • It can be shown that the best solution (in the maximum likelihood sense; to be defined later) is given by N − 1 N − 1 � � 6 12 ˆ = − y ( n ) + x ( n ) y ( n ) N ( N 2 − 1 ) a N ( N + 1 ) n = 0 n = 0 2 ( 2 N − 1 ) N − 1 N − 1 � � ˆ 6 = y ( n ) − x ( n ) y ( n ) . b N ( N + 1 ) N ( N + 1 ) n = 0 n = 0 • Or, as we will later learn, in an easy matrix form: � ˆ � a ˆ θ = = ( X T X ) − 1 X T y ˆ b 6 / 37

default Introductory Example – Straight line Best Fit 1.2 y = 0.0740x+0.4932 Sum of Squares = 3.62 1.0 0.8 a = 0 . 07401 and ˆ • In this case, ˆ b = 0 . 49319, which 0.6 produces the line shown on the right. y • The line also minimizes the squared distances 0.4 (green dashed lines) between the model (blue 0.2 line) and the data (red circles). 0.0 0.2 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 x 7 / 37

default Introductory Example 2 – Sinusoid • Consider transmitting the sinusoid below. 4 3 2 1 0 1 2 3 4 0 20 40 60 80 100 120 140 160 8 / 37

default Introductory Example 2 – Sinusoid • When the data is received, it is corrupted by noise and the received samples look like below. 4 3 2 1 0 1 2 3 4 0 20 40 60 80 100 120 140 160 • Can we recover the parameters of the sinusoid? 9 / 37

default Introductory Example 2 – Sinusoid • In this case, the problem is to find good values for A , f 0 and φ in the following model: x [ n ] = A cos( 2 π f 0 n + φ ) + w [ n ] , with w [ n ] ∼ N ( 0 , σ 2 ) . 10 / 37

default Introductory Example 2 – Sinusoid • It can be shown that the maximum likelihood estimator; MLE for parameters A , f 0 and φ are given by � � � N − 1 � � ˆ � � = x ( n ) e − 2 π ifn � , f 0 value of f that maximizes � � � n = 0 � � � N − 1 � � x ( n ) e − 2 π i ˆ ˆ 2 � � = f 0 n A � � � � N n = 0 arctan − � N − 1 n = 0 x ( n ) sin( 2 π ˆ f 0 n ) ˆ = φ . � N − 1 n = 0 x ( n ) cos( 2 π ˆ f 0 n ) 11 / 37

default Introductory Example 2 – Sinusoid • It turns out that the sinusoidal parameter estimation is very successful: f 0 = 0.068 (0.068); ˆ ˆ A = 0.692 (0.667); ˆ φ = 0.238 (0.609) 4 3 2 1 0 1 2 3 4 0 20 40 60 80 100 120 140 160 • The blue curve is the original sinusoid, and the red curve is the one estimated from the green circles. • The estimates are shown in the figure (true values in parentheses). 12 / 37

default Introductory Example 2 – Sinusoid • However, the results are different for each realization of the noise w [ n ] . f 0 = 0.068; ˆ ˆ A = 0.652; ˆ f 0 = 0.066; ˆ ˆ A = 0.660; ˆ φ = -0.023 φ = 0.851 4 4 3 3 2 2 1 1 0 0 1 1 2 2 3 3 4 4 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 f 0 = 0.067; ˆ ˆ A = 0.786; ˆ f 0 = 0.459; ˆ ˆ A = 0.618; ˆ φ = 0.814 φ = 0.299 4 4 3 3 2 2 1 1 0 0 1 1 2 2 3 3 4 4 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 13 / 37

default Introductory Example 2 – Sinusoid • Thus, we’re not very interested in an individual case, but rather on the distributions of estimates • What are the expectations: E [ˆ f 0 ] , E [ˆ φ ] and E [ˆ A ] ? • What are their respective variances? • Could there be a better formula that would yield smaller variance? • If yes, how to discover the better estimators? 14 / 37

default LEAST SQUARES 15 / 37

default Least Squares • The general solution for linear case is easy to remember. • Consider the line fitting case: y [ n ] = ax [ n ] + b + w [ n ] . • This can be written in matrix form as follows:   x [ 0 ]   1 y [ 0 ] x [ 1 ]   1 y [ 1 ] � �     x [ 2 ] a     = + w . 1     . . b     . . . . .   �� . . y [ N − 1 ] θ x [ N − 1 ] 1 � �� y X 16 / 37

default Least Squares • Now the model is written compactly as y = X θ + w • Solution: The value of θ minimizing the error w T w is given by: � � − 1 X T y . ˆ θ LS = X T X • See sample Python code here: https: //github.com/mahehu/SGN-41007/blob/master/code/Least_Squares.ipynb 17 / 37

default LS Example with two variables • Consider an example, where microscope images have an uneven illumination. • The illumination pattern appears to have the highest brightness at the center. • Let’s try to fit a paraboloid to remove the effect: z ( x , y ) = c 1 x 2 + c 2 y 2 + c 3 xy + c 4 x + c 5 y + c 6 , with z ( x , y ) the brightness at pixel ( x , y ) . 18 / 37

default LS Example with two variables • In matrix form, the model looks like the following:   c 1     x 2 y 2 z 1 x 1 y 1 x 1 y 1 1   1 1 c 2      x 2 y 2  z 2 x 2 y 2 x 2 y 2 1      2 2  c 3 · = + ǫ ,       (1) . . . . . . .   . . . . . . . c 4     . . . . . . .     c 5 x 2 y 2 z N x N y N x N y N 1 N N � �� c 6 � �� z H c with z k the grayscale value at k ’th pixel ( x k , y k ) . 19 / 37

default LS Example with two variables � � − 1 H T z , • As a result we get the LS fit by ˆ c = H T H ˆ c = ( − 0 . 000080 , − 0 . 000288 , 0 . 000123 , 0 . 022064 , 0 . 284020 , 106 . 538687 ) • Or, in other words z ( x , y ) = − 0 . 000080 x 2 − 0 . 000288 y 2 + 0 . 000123 xy + 0 . 022064 x + 0 . 284020 y + 106 . 538687 . 20 / 37

default LS Example with two variables • Finally, we remove the illumination component by subtraction. 21 / 37

default MAXIMUM LIKELIHOOD 22 / 37

default Maximum Likelihood Estimation • Maximum likelihood (MLE) is the most popular estimation approach due to its applicability in complicated estimation problems. • Maximization of likelihood also appears often as the optimality criterion in machine learning. • The method was proposed by Fisher in 1922, though he published the basic principle already in 1912 as a third year undergraduate. • The basic principle is simple: find the parameter θ that is the most probable to have generated the data x . • The MLE estimator may or may not be optimal in the minimum variance sense. It is not necessarily unbiased, either. 23 / 37

default The Likelihood Function • Consider again the problem of estimating the mean level A of noisy data. • Assume that the data originates from the following model: x [ n ] = A + w [ n ] , where w [ n ] ∼ N ( 0 , σ 2 ) : Constant plus Gaussian random noise with zero mean and variance σ 2 . 3 2 1 0 1 2 3 4 0 50 100 150 200 24 / 37

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University default Classical Estimation and Detection Theory Before the machine

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Estimating the parameters of some probability distributions: Exemplifications 1. Estimating the

Maximum Likelihood Learning Stefano Ermon, Aditya Grover Stanford University Lecture 4 Stefano

MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20, 2020 1 Reminders

Benefits Eligibility 1 Eligibility Transition - General All eCommerce associates enrolling

Maximum Likelihood Estimation MLE tool for parameter estimation good approach for cases

Introduction to General and Generalized Linear Models The Likelihood Principle - part I Henrik

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

STAT 339 A Generative Linear Model and Max Likelihood Estimation 20-22 February 2017 Colin

Sambuz

Useful Links

Newsletter

Mail Us

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University default Classical Estimation and Detection Theory Before the machine

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

PRLab TUDelft NL PATTERN RECOGNITION &amp; MACHINE LEARNING An Introduction Marco Loog

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Estimating the parameters of some probability distributions: Exemplifications 1. Estimating the

Maximum Likelihood Learning Stefano Ermon, Aditya Grover Stanford University Lecture 4 Stefano

MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20, 2020 1 Reminders

Benefits Eligibility 1 Eligibility Transition - General All eCommerce associates enrolling

Maximum Likelihood Estimation MLE tool for parameter estimation good approach for cases

Introduction to General and Generalized Linear Models The Likelihood Principle - part I Henrik

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

STAT 339 A Generative Linear Model and Max Likelihood Estimation 20-22 February 2017 Colin

Sambuz

Useful Links

Newsletter

Mail Us

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog