linear regression
play

Linear regression Petr Po s k P. Po s k c 2015 Artificial - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear regression Petr Po s k P. Po s k c 2015 Artificial Intelligence 1 / 9 Linear regression P. Po s k c


  1. CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear regression Petr Poˇ s´ ık P. Poˇ s´ ık c � 2015 Artificial Intelligence – 1 / 9

  2. Linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 2 / 9

  3. Linear regression Regression task is a supervised learning task, i.e. ■ a training (multi)set T = { ( x ( 1 ) , y ( 1 ) ) , . . . , ( x ( | T | ) , y ( | T | ) ) } is available, where Linear regression ■ the labels y ( i ) are quantitave , often continuous (as opposed to classification tasks • Regression where y ( i ) are nominal). • Notation remarks • Train, apply ■ Its purpose is to model the relationship between independent variables (inputs) • 1D regression x = ( x 1 , . . . , x D ) and the dependent variable (output) y . • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 9

  4. Linear regression Regression task is a supervised learning task, i.e. ■ a training (multi)set T = { ( x ( 1 ) , y ( 1 ) ) , . . . , ( x ( | T | ) , y ( | T | ) ) } is available, where Linear regression ■ the labels y ( i ) are quantitave , often continuous (as opposed to classification tasks • Regression where y ( i ) are nominal). • Notation remarks • Train, apply ■ Its purpose is to model the relationship between independent variables (inputs) • 1D regression x = ( x 1 , . . . , x D ) and the dependent variable (output) y . • LSM • Minimizing J ( w , T ) • Multivariate linear regression Linear regression is a particular regression model which assumes (and learns) linear relationship between the inputs and the output: y = h ( x ) = w 0 + w 1 x 1 + . . . + w D x D = w 0 + � w , x � = w 0 + xw T , � where � ■ y is the model prediction ( estimate of the true value y ), h ( x ) is the linear model (a hypothesis ), ■ w 0 , . . . , w D are the coefficients of the linear function, w 0 is the bias , organized in a row ■ vector w , ■ � w , x � is a dot product of vectors w and x (scalar product), ■ which can be also computed as a matrix product xw T if w and x are row vectors. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 9

  5. Notation remarks Homogeneous coordinates : If we add “1” as the first element of x so that x = ( 1, x 1 , . . . , x D ) , then we can write the linear model in an even simpler form (without the explicit bias term): Linear regression • Regression y = h ( x ) = w 0 · 1 + w 1 x 1 + . . . + w D x D = � w , x � = xw T . � • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 9

  6. Notation remarks Homogeneous coordinates : If we add “1” as the first element of x so that x = ( 1, x 1 , . . . , x D ) , then we can write the linear model in an even simpler form (without the explicit bias term): Linear regression • Regression y = h ( x ) = w 0 · 1 + w 1 x 1 + . . . + w D x D = � w , x � = xw T . � • Notation remarks • Train, apply • 1D regression • LSM Matrix notation: If we organize the data into matrix X and vector y , such that • Minimizing J ( w , T ) • Multivariate linear     regression x ( 1 ) y ( 1 ) 1     . . .     X = y = . . and .  ,    . . . x ( | T | ) y ( | T | ) 1 and similarly with � y , then we can write a batch computation of predictions for all data in X as y = Xw T . � P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 9

  7. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  8. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM The model h can be viewed as a function of 2 variables: h ( x , w ) . • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  9. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM The model h can be viewed as a function of 2 variables: h ( x , w ) . • Minimizing J ( w , T ) • Multivariate linear regression Model application: If the model is given ( w is fixed), we can manipulate x to make predictions: y = h ( x , w ) = h w ( x ) . � P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  10. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM The model h can be viewed as a function of 2 variables: h ( x , w ) . • Minimizing J ( w , T ) • Multivariate linear regression Model application: If the model is given ( w is fixed), we can manipulate x to make predictions: y = h ( x , w ) = h w ( x ) . � Model learning: If the data is given ( T is fixed), we can manipulate the model parameters w to fit the model to the data: w ∗ = argmin J ( w , T ) . w P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  11. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM The model h can be viewed as a function of 2 variables: h ( x , w ) . • Minimizing J ( w , T ) • Multivariate linear regression Model application: If the model is given ( w is fixed), we can manipulate x to make predictions: y = h ( x , w ) = h w ( x ) . � Model learning: If the data is given ( T is fixed), we can manipulate the model parameters w to fit the model to the data: w ∗ = argmin J ( w , T ) . w How to train the model? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  12. Simple (univariate) linear regression Simple (univariate) regression deals with cases where x ( i ) = x ( i ) , i.e. the examples are described by a single feature (they are 1-dimensional). Linear regression • Regression • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 9

  13. Simple (univariate) linear regression Simple (univariate) regression deals with cases where x ( i ) = x ( i ) , i.e. the examples are described by a single feature (they are 1-dimensional). Linear regression Fitting a line to data: • Regression • Notation remarks y = w 0 + w 1 x ■ find parameters w 0 , w 1 of a linear model ˆ • Train, apply • 1D regression ■ given a traning (multi)set T = { ( x ( i ) , y ( i ) ) } | T | i = 1 . • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 9

  14. Simple (univariate) linear regression Simple (univariate) regression deals with cases where x ( i ) = x ( i ) , i.e. the examples are described by a single feature (they are 1-dimensional). Linear regression Fitting a line to data: • Regression • Notation remarks y = w 0 + w 1 x ■ find parameters w 0 , w 1 of a linear model ˆ • Train, apply • 1D regression ■ given a traning (multi)set T = { ( x ( i ) , y ( i ) ) } | T | i = 1 . • LSM • Minimizing J ( w , T ) • Multivariate linear regression How to fit depending on the number of training examples: ■ Given a single example (1 equation, 2 parameters) ⇒ infinitely many linear function can be fitted. ■ Given 2 examples (2 equations, 2 parameters) ⇒ exactly 1 linear function can be fitted. ■ Given 3 or more examples ( > 2 equations, 2 parameters) ⇒ no line can be fitted without any error ⇒ a line which minimizes the “size” of error y − � y can be fitted: w ∗ = ( w ∗ 0 , w ∗ 1 ) = argmin J ( w 0 , w 1 , T ) . w 0 , w 1 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 9

  15. The least squares method The least squares method (LSM) suggests to choose such parameters w which minimize the mean squared error Linear regression � y ( i ) � 2 � � 2 | T | | T | 1 1 • Regression y ( i ) − � y ( i ) − h w ( x ( i ) ) ∑ ∑ J ( w ) = = . • Notation remarks | T | | T | i = 1 i = 1 • Train, apply • 1D regression y • LSM • Minimizing J ( w , T ) y = w 0 + w 1 x � ( x ( 3 ) , � y ( 3 ) ) ( x ( 2 ) , y ( 2 ) ) • Multivariate linear regression | y ( 3 ) − � y ( 3 ) | | y ( 2 ) − � y ( 2 ) | ( x ( 3 ) , y ( 3 ) ) ( x ( 2 ) , � y ( 2 ) ) w 1 ( x ( 1 ) , � y ( 1 ) ) 1 | y ( 1 ) − � y ( 1 ) | w 0 ( x ( 1 ) , y ( 1 ) ) x 0 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend