Linear Methods for Regression and Classification Petr Pok Czech - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear Methods for Regression and Classification Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Pošík c � 2017 Artificial Intelligence – 1 / 34

Linear regression P. Pošík c � 2017 Artificial Intelligence – 2 / 34

Linear regression: Illustration 5 Linear regression • Illustration • Regression • Notation remarks 0 • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Gradient descent • Multivariate linear -5 regression Linear classification 1 Perceptron Logistic regression 0 Optimal separating hyperplane 1 -1 0.5 0 -0.5 -1 Summary Given a dataset of input vectors x ( i ) and the respective values of output variable y ( i ) . . . P. Pošík c � 2017 Artificial Intelligence – 3 / 34

Linear regression: Illustration Linear regression • Illustration • Regression • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Gradient descent • Multivariate linear regression Linear classification Perceptron Logistic regression Optimal separating hyperplane Summary . . . we would like to find a linear model of this dataset . . . P. Pošík c � 2017 Artificial Intelligence – 3 / 34

Linear regression: Illustration Linear regression • Illustration • Regression • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Gradient descent • Multivariate linear regression Linear classification Perceptron Logistic regression Optimal separating hyperplane Summary . . . which would minimize certain error between the known values of output variable and the model predictions. P. Pošík c � 2017 Artificial Intelligence – 3 / 34

Linear regression Regression task is a supervised learning task, i.e. ■ a training (multi)set T = { ( x ( 1 ) , y ( 1 ) ) , . . . , ( x ( | T | ) , y ( | T | ) ) } is available, where Linear regression ■ the labels y ( i ) are quantitative , often continuous (as opposed to classification tasks • Illustration where y ( i ) are nominal). • Regression • Notation remarks ■ Its purpose is to model the relationship between independent variables (inputs) • Train, apply x = ( x 1 , . . . , x D ) and the dependent variable (output) y . • 1D regression • LSM • Minimizing J ( w , T ) • Gradient descent • Multivariate linear regression Linear classification Perceptron Logistic regression Optimal separating hyperplane Summary P. Pošík c � 2017 Artificial Intelligence – 4 / 34

Linear regression Regression task is a supervised learning task, i.e. ■ a training (multi)set T = { ( x ( 1 ) , y ( 1 ) ) , . . . , ( x ( | T | ) , y ( | T | ) ) } is available, where Linear regression ■ the labels y ( i ) are quantitative , often continuous (as opposed to classification tasks • Illustration where y ( i ) are nominal). • Regression • Notation remarks ■ Its purpose is to model the relationship between independent variables (inputs) • Train, apply x = ( x 1 , . . . , x D ) and the dependent variable (output) y . • 1D regression • LSM • Minimizing J ( w , T ) • Gradient descent Linear regression is a particular regression model which assumes (and learns) linear • Multivariate linear relationship between the inputs and the output: regression Linear classification y = h ( x ) = w 0 + w 1 x 1 + . . . + w D x D = w 0 + � w , x � = w 0 + xw T , � Perceptron Logistic regression where Optimal separating � y is the model prediction ( estimate of the true value y ), hyperplane ■ h ( x ) is the linear model (a hypothesis ), Summary ■ w 0 , . . . , w D are the coefficients of the linear function, w 0 is the bias , organized in a row ■ vector w , ■ � w , x � is a dot product of vectors w and x (scalar product), ■ which can be also computed as a matrix product xw T if w and x are row vectors. P. Pošík c � 2017 Artificial Intelligence – 4 / 34

Notation remarks Homogeneous coordinates : If we add “1” as the first element of x so that x = ( 1, x 1 , . . . , x D ) , then we can write the linear model in an even simpler form (without the explicit bias term): Linear regression • Illustration y = h ( x ) = w 0 · 1 + w 1 x 1 + . . . + w D x D = � w , x � = xw T . � • Regression • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Gradient descent • Multivariate linear regression Linear classification Perceptron Logistic regression Optimal separating hyperplane Summary P. Pošík c � 2017 Artificial Intelligence – 5 / 34

Notation remarks Homogeneous coordinates : If we add “1” as the first element of x so that x = ( 1, x 1 , . . . , x D ) , then we can write the linear model in an even simpler form (without the explicit bias term): Linear regression • Illustration y = h ( x ) = w 0 · 1 + w 1 x 1 + . . . + w D x D = � w , x � = xw T . � • Regression • Notation remarks • Train, apply • 1D regression Matrix notation: If we organize the data into matrix X and vector y , such that • LSM • Minimizing J ( w , T )     x ( 1 ) y ( 1 ) • Gradient descent 1 • Multivariate linear     . . . regression     X = y = . . and .  ,    . . . Linear classification x ( | T | ) y ( | T | ) 1 Perceptron Logistic regression and similarly with � y , then we can write a batch computation of predictions for all data in Optimal separating hyperplane X as Summary y = Xw T . � P. Pošík c � 2017 Artificial Intelligence – 5 / 34

Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Illustration • Regression • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Gradient descent • Multivariate linear regression Linear classification Perceptron Logistic regression Optimal separating hyperplane Summary P. Pošík c � 2017 Artificial Intelligence – 6 / 34

Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Illustration • Regression • Notation remarks • Train, apply • 1D regression The model h can be viewed as a function of 2 variables: h ( x , w ) . • LSM • Minimizing J ( w , T ) • Gradient descent • Multivariate linear regression Linear classification Perceptron Logistic regression Optimal separating hyperplane Summary P. Pošík c � 2017 Artificial Intelligence – 6 / 34

Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Illustration • Regression • Notation remarks • Train, apply • 1D regression The model h can be viewed as a function of 2 variables: h ( x , w ) . • LSM • Minimizing J ( w , T ) • Gradient descent Model application: If the model is given ( w is fixed), we can manipulate x to make • Multivariate linear predictions: regression Linear classification y = h ( x , w ) = h w ( x ) . � Perceptron Logistic regression Optimal separating hyperplane Summary P. Pošík c � 2017 Artificial Intelligence – 6 / 34

Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Illustration • Regression • Notation remarks • Train, apply • 1D regression The model h can be viewed as a function of 2 variables: h ( x , w ) . • LSM • Minimizing J ( w , T ) • Gradient descent Model application: If the model is given ( w is fixed), we can manipulate x to make • Multivariate linear predictions: regression Linear classification y = h ( x , w ) = h w ( x ) . � Perceptron Logistic regression Optimal separating Model learning: If the data is given ( T is fixed), we can manipulate the model parameters hyperplane w to fit the model to the data: Summary w ∗ = argmin J ( w , T ) . w P. Pošík c � 2017 Artificial Intelligence – 6 / 34

Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Illustration • Regression • Notation remarks • Train, apply • 1D regression The model h can be viewed as a function of 2 variables: h ( x , w ) . • LSM • Minimizing J ( w , T ) • Gradient descent Model application: If the model is given ( w is fixed), we can manipulate x to make • Multivariate linear predictions: regression Linear classification y = h ( x , w ) = h w ( x ) . � Perceptron Logistic regression Optimal separating Model learning: If the data is given ( T is fixed), we can manipulate the model parameters hyperplane w to fit the model to the data: Summary w ∗ = argmin J ( w , T ) . w How to train the model? P. Pošík c � 2017 Artificial Intelligence – 6 / 34

Linear Methods for Regression and Classification Petr Pok Czech - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear Methods for Regression and Classification Petr Pok Czech Technical University in Prague Faculty of Electrical Engineering Dept. of

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Classification or Regression? Regression Classification: want to learn a discrete target

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Air Travel Forecast Problem Objectives Introduction to forecasting methods Experience

Revision (Part I I ) Ke Chen Revision slides are going to summarise all you have learnt from

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

The Human Right to a Healthy Environment INTERNATIONAL ASSOCIATION OF DEMOCRATIC LAWYERS HANOI,

Good Predictions Are Worth a Few Comparisons Carine Pivoteau with Nicolas Auger and Cyril Nicaud

Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Dimension Reduction

Spring 2016 Research Update Presentations UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN |

Coling 2008 workshop on human judgements in Computational Linguistics Ron Artstein Gemma Boleda