introduction to machine learning
play

Introduction to Machine Learning Milan Straka October 07, 2019 - PowerPoint PPT Presentation

NPFL129, Lecture 1 Introduction to Machine Learning Milan Straka October 07, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated Machine Learning A


  1. NPFL129, Lecture 1 Introduction to Machine Learning Milan Straka October 07, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

  2. Machine Learning A possible definition of learning from Mitchell (1997): A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Task T k classification : assigning one of categories to a given input x ∈ R regression : producing a number for a given input structured prediction , denoising , density estimation , … Experience E supervised : usually a dataset with desired outcomes ( labels or targets ) unsupervised : usually data without any annotation (raw text, raw images, … ) reinforcement learning , semi-supervised learning , … Measure P accuracy , error rate , F-score , … NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 2/21

  3. Deep Learning Highlights Image recognition Object detection Image segmentation Human pose estimation Image labeling Visual question answering Speech recognition and generation Lip reading Machine translation Machine translation without parallel data Chess, Go and Shogi Multiplayer Capture the flag NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 3/21

  4. Introduction to Machine Learning History https://www.slideshare.net/deview/251-implementing-deep-learning-using-cu-dnn/4 NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 4/21

  5. Machine and Representation Learning Figure 1.5, page 10 of Deep Learning Book, http://deeplearningbook.org. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 5/21

  6. Basic Machine Learning Settings x ∈ R d Assume we have an input of . Then the two basic ML tasks are: t ∈ R 1. regression : The goal of a regression is to predict real-valued target variable of the given input. K 2. classification : Assuming we have a fixed set of labels, the goal of a classification is to choose a corresponding label/class for a given input. We can predict the class only. We can predict the whole distribution of all classes probabilities. ( x , t ) We usually have a training set , which is assumed to consist of examples of generated independently from a data generating distribution . The goal of optimization is to match the training set as well as possible. However, the goal of machine learning is to perform well on previously unseen data, to achieve lowest generalization error or test error . We typically estimate it using a test set of examples independent of the training set, but generated by the same data generating distribution. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 6/21

  7. Notation a a A A , , , : scalar (integer or real), vector, matrix, tensor a a A , , : scalar, vector, matrix random variable df f x dx : derivative of with respect to ∂ f f x ∂ x : partial derivative of with respect to ( ∂ x ∂ f ( x ) ) ∂ f ( x ) ∂ f ( x ) ∇ , , … , f f x x ∂ x ∂ x : gradient of with respect to , i.e., 1 2 n NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 7/21

  8. Example Dataset Assume we have the following data, generated from an underlying curve by adding a small amount of Gaussian noise.        Figure 1.2 of Pattern Recognition and Machine Learning. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 8/21

  9. Linear Regression x ∈ R d Given an input value , one of the simplest models to predict a target real value is linear regression : d ∑ T f ( x ; w , b ) = x + + … + x + b = + b = x w + b . w x w w x w 1 1 2 2 D D i i i =1 w b The are usually called weights and is called bias . Sometimes it is convenient not to deal with the bias separately. Instead, we might enlarge the T x x w input vector by padding a value 1, and consider only , where the role of a bias is accomplished by the last weight. Therefore, when we say “ weights ” , we usually mean both weights and biases. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 9/21

  10. X Separate Bias vs. Padding with Ones f ( x ) = w x + T b Using an explicit bias term in the form of . ⎡ x ⎤ ⎡ w ⎤ + w x + b x x 11 12 1 11 2 12 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ + w x + b ⎢ ⎥ ⎢ ⎥ x x w x [ w 2 ] 21 22 1 21 2 22 ⎢ ⎥ 1 ⎢ ⎥ ⋅ + b = w ⎣ ⋮ n 2 ⎦ ⎣ ⋮ ⎦ + w x + b x x w x n 1 1 n 1 2 n 2 1 X b With extra padding in and an additional weight representing the bias. ⎡ x ⎤ ⎡ w ⎤ 1 + w + b x x x 11 12 ⎡ w ⎤ 1 11 2 12 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 + w + b ⎢ x x ⎥ 1 ⎢ w x x ⎥ 21 22 ⎢ ⎥ 1 21 2 22 ⎢ ⎥ ⎢ ⎥ ⋅ = ⎣ b ⎦ w 2 ⎣ ⋮ 1 ⎦ ⎣ ⋮ ⎦ + w + b x x w x x n 1 n 2 1 n 1 2 n 2 NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 10/21

  11. Linear Regression , … , x , … , t N x t 1 1 N N Assume we have a dataset of input values and targets . To find the values of weights, we usually minimize an error function between the real target values and their predictions. A popular and simple error function is mean squared error : N 1   ∑ 2  MSE( w ) = ( f ( x ; w ) − ) . t i i N i =1        Often, sum of squares N 1 ∑ 2 ( f ( x ; w ) − ) t i i 2 i =1    Figure 1.3 of Pattern Recognition and Machine Learning. is used instead, because the math comes out nicer. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 11/21

  12. Linear Regression There are several ways how to minimize the error function, but in the case of linear regression and sum of squares error, there exists an explicit solution. Our goal is to minimize the following quantity: N ∑ 1 2 ( x w − ) . T t i 2 i i X ∈ R N × D t ∈ R N x i i Note that if we denote the matrix of input values with on a row and the vector of target values, we can rewrite the minimized quantity as 2 1 ∣∣ Xw − t ∣∣ . 2 NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 12/21

  13. Linear Regression 1 ∑ i N i 2 ( x w − ) T t 2 i In order to find a minimum of , we can inspect values where the derivative w j of the error function is zero, with respect to all weights . N N N ∂ ∑ ∑ ∑ 1 2 1 T T T ( x w − ) = 2( x w − t ) x = ( x w − ) ( ij ) t x t i i ij i 2 2 ∂ w i i i j i i i N ( x w − ) = 0 ∑ i T j x t ij i i Therefore, we want for all that . We can write all the equations X ( Xw − T t ) = 0 together using matrix notation as and rewrite to T T X Xw = X t . T D × D X X The matrix is of size . If it is regular, we can compute its inverse and therefore −1 T T w = ( X X ) X t . NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 13/21

  14. Linear Regression X ∈ R N × D t ∈ R N Input : Dataset ( , ). w ∈ R D Output : Weights minimizing MSE of linear regression. −1 T T w ← ( X X ) X t . 2 O ( ND ) N ≥ D The algorithm has complexity , assuming . T T T X Xw = X X X t When the matrix is singular, we can solve using SVD, which will be demonstrated on the next lecture. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 14/21

  15. Linear Regression Example 0 1 x = ( x , x , … , x ) M ≥ 0 M Assume our input vectors comprise of , for .                                         Figure 1.4 of Pattern Recognition and Machine Learning. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 15/21

  16. Linear Regression Example RMSE = MSE To plot the error, the root mean squared error is frequently used.  The displayed error nicely illustrates two main challenges in machine    learning:  underfitting overfitting           Figure 1.5 of Pattern Recognition and Machine Learning. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 16/21

  17. Model Capacity We can control whether a model underfits or overfits by modifying its capacity . representational capacity effective capacity Figure 5.3, page 115 of Deep Learning Book, http://deeplearningbook.org NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 17/21

  18. Linear Regression Overfitting Note that employing more data also usually alleviates overfitting (the relative capacity of the model is decreased).                     Figure 1.6 of Pattern Recognition and Machine Learning. NPFL129, Lecture 1 Machine Learning TL;DR Linear Regression Regularization 18/21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend