CSI5180. MachineLearningfor BioinformaticsApplications Regularized - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel Turcotte Version November 6, 2019

Preamble Preamble 2/42

Preamble Regularized Linear Models In this lecture, we introduce the concept of regularization. We consider the specific context of linear models: Ridge Regression, Lasso Regression, and Elastic Net. Finally, we discuss a simple technique called early stopping. General objective : Explain the concept of regularization in the context of linear regression and logistic Preamble 3/42

Learning objectives Explain the concept of regularization in the context of linear regression and logistic Reading: Simon Dirmeier, Christiane Fuchs, Nikola S Mueller, and Fabian J Theis, netReg: network-regularized linear models for biological association studies, Bioinformatics 34 (2018), no. 5, 896898. Preamble 4/42

Plan 1. Preamble 2. Introduction 3. Polynomial Regression 4. Regularization 5. Logistic Regression 6. Prologue Preamble 5/42

Introduction Introduction 6/42

Supervised learning The data set is a collection of labelled examples. { ( x i , y i ) } N i = 1 Each x i is a feature vector with D dimensions. x ( j ) is the value of the feature j of the example k , for j ∈ 1 . . . D and k k ∈ 1 . . . N . The label y i is either a class, taken from a finite list of classes, { 1 , 2 , . . . , C } , or a real number , or a more complex object (vector, matrix, tree, graph, etc). Problem : given the data set as input, create a “ model ” that can be used to predict the value of y for an unseen x . Classification : y i ∈ { Positive , Negative } , a binary classification problem. Regression : y i is a real number. Introduction 7/42

Linear Regression A linear model assumes that the value of the label, ˆ y i , can be expressed as a linear combination of the feature values, x ( j ) i : y i = h ( x i ) = θ 0 + θ 1 x ( 1 ) + θ 2 x ( 2 ) + . . . + θ D x ( D ) ˆ i i i Introduction 8/42

Linear Regression A linear model assumes that the value of the label, ˆ y i , can be expressed as a linear combination of the feature values, x ( j ) i : y i = h ( x i ) = θ 0 + θ 1 x ( 1 ) + θ 2 x ( 2 ) + . . . + θ D x ( D ) ˆ i i i Here, θ j is the j the parameter of the (linear) model , with θ 0 being the bias term/parameter, θ 1 . . . θ D being the feature weights . Introduction 8/42

Linear Regression A linear model assumes that the value of the label, ˆ y i , can be expressed as a linear combination of the feature values, x ( j ) i : y i = h ( x i ) = θ 0 + θ 1 x ( 1 ) + θ 2 x ( 2 ) + . . . + θ D x ( D ) ˆ i i i Here, θ j is the j the parameter of the (linear) model , with θ 0 being the bias term/parameter, θ 1 . . . θ D being the feature weights . Problem: find values for all the model parameters so that the model “best fit” the training data. Introduction 8/42

Linear Regression A linear model assumes that the value of the label, ˆ y i , can be expressed as a linear combination of the feature values, x ( j ) i : y i = h ( x i ) = θ 0 + θ 1 x ( 1 ) + θ 2 x ( 2 ) + . . . + θ D x ( D ) ˆ i i i Here, θ j is the j the parameter of the (linear) model , with θ 0 being the bias term/parameter, θ 1 . . . θ D being the feature weights . Problem: find values for all the model parameters so that the model “best fit” the training data. The Root Mean Square Error is a common performance measure for regression problems. � N � � 1 � � [ h ( x i ) − y i ] 2 N 1 Introduction 8/42

PolynomialRegression Polynomial Regression 9/42

Polynomial Regression What if the data is more complex? Polynomial Regression 10/42

Polynomial Regression What if the data is more complex? In our discussion on underfitting and overfitting the training data, we did look at polynomial models, but did not discuss how to learn them. Polynomial Regression 10/42

Polynomial Regression What if the data is more complex? In our discussion on underfitting and overfitting the training data, we did look at polynomial models, but did not discuss how to learn them. Can we use our linear model to “fit” non linear data , and specifically data would have been generated by a polynomial “process”? Polynomial Regression 10/42

Polynomial Regression What if the data is more complex? In our discussion on underfitting and overfitting the training data, we did look at polynomial models, but did not discuss how to learn them. Can we use our linear model to “fit” non linear data , and specifically data would have been generated by a polynomial “process”? How? Polynomial Regression 10/42

sklearn.preprocessing.PolynomialFeatures A surprisingly simple solution consists of generating new features that are powers of existing ones! Polynomial Regression 11/42

sklearn.preprocessing.PolynomialFeatures A surprisingly simple solution consists of generating new features that are powers of existing ones! from s k l e a r n . p r e p r o c e s s i n g import PolynomialFeatures p o l y _ f e a t u r e s = PolynomialFeatures ( degree =2, i n c l u d e _ b i a s=Fa l s e ) X_poly = p o l y _ f e a t u r e s . f i t _ t r a n s f o r m (X) Polynomial Regression 11/42

sklearn.preprocessing.PolynomialFeatures A surprisingly simple solution consists of generating new features that are powers of existing ones! from s k l e a r n . p r e p r o c e s s i n g import PolynomialFeatures p o l y _ f e a t u r e s = PolynomialFeatures ( degree =2, i n c l u d e _ b i a s=Fa l s e ) X_poly = p o l y _ f e a t u r e s . f i t _ t r a n s f o r m (X) from s k l e a r n . linear_model import L i n e a r R e g r e s s i o n l i n _ r e g = L i n e a r R e g r e s s i o n () l i n _ r e g . f i t ( X_poly , y ) p r i n t ( l i n _ r e g . intercept_ , l i n _ r e g . coef_ ) Polynomial Regression 11/42

Example fitting a linear model import numpy as np X = 2 ∗ np . random . rand (100 , 1) y = 4 + 3 ∗ X + np . random . randn (100 , 1) from s k l e a r n . linear_model import L i n e a r R e g r e s s i o n l i n _ r e g = L i n e a r R e g r e s s i o n () l i n _ r e g . f i t (X, y ) l i n _ r e g . intercept_ , l i n _ r e g . coef_ # [4.07916603] [ [ 2 . 9 0 1 7 3 9 4 9 ] ] y = 4 + 3 x + noise ˆ y = 4 . 07916603 + 2 . 90173949 x Polynomial Regression 12/42

Example fitting a polynomial model import numpy as np X = 6 ∗ np . random . rand (100 , 1) − 3 y = 2 + 0.5 ∗ X ∗∗ 2 + X + np . random . randn (100 , 1) from s k l e a r n . p r e p r o c e s s i n g import PolynomialFeatures p o l y _ f e a t u r e s = PolynomialFeatures ( degree =2, i n c l u d e _ b i a s=Fa l s e ) X_poly = p o l y _ f e a t u r e s . f i t _ t r a n s f o r m (X) l i n _ r e g = L i n e a r R e g r e s s i o n () l i n _ r e g . f i t ( X_poly , y ) l i n _ r e g . intercept_ , l i n _ r e g . coef_ # [ 1 . 7 0 1 1 4 4 ] [[1.02118676 0.55725864]] y = 2 . 0 + 0 . 5 x 2 + 1 . 0 x + noise y = 1 . 701144 + 0 . 55725864 x 2 + 1 . 02118676 x ˆ Polynomial Regression 13/42

Remarks For higher degrees, PolynomialFeatures adds all the combination of features. Polynomial Regression 14/42

Remarks For higher degrees, PolynomialFeatures adds all the combination of features. Given two features a and b , PolynomialFeatures generates, a 2 , a 3 , b 2 , b 3 , but also ab , a 2 b , ab 2 . Polynomial Regression 14/42

Remarks For higher degrees, PolynomialFeatures adds all the combination of features. Given two features a and b , PolynomialFeatures generates, a 2 , a 3 , b 2 , b 3 , but also ab , a 2 b , ab 2 . Given n features and degree d , PolynomialFeatures produces ( n + d )! d ! n ! combinations! Polynomial Regression 14/42

Regularization Regularization 15/42

Bias/Variance trade-off From [2] §4: “(. . . ) a models generalization error can be expressed as the sum of three very different errors:” Regularization 16/42

Bias/Variance trade-off From [2] §4: “(. . . ) a models generalization error can be expressed as the sum of three very different errors:” Bias: “is due to wrong assumptions ”, “A high-bias model is most likely to underfit the training data” Regularization 16/42

Bias/Variance trade-off From [2] §4: “(. . . ) a models generalization error can be expressed as the sum of three very different errors:” Bias: “is due to wrong assumptions ”, “A high-bias model is most likely to underfit the training data” Variance: “the model’s excessive sensitivity to small variations in the training data”. A model with many parameters “is likely to have high variance and thus overfit the training data.” Regularization 16/42

Bias/Variance trade-off From [2] §4: “(. . . ) a models generalization error can be expressed as the sum of three very different errors:” Bias: “is due to wrong assumptions ”, “A high-bias model is most likely to underfit the training data” Variance: “the model’s excessive sensitivity to small variations in the training data”. A model with many parameters “is likely to have high variance and thus overfit the training data.” Irreducible error: “noisiness of the data itself” Regularization 16/42

CSI5180. MachineLearningfor BioinformaticsApplications Regularized - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel Turcotte Version November 6, 2019 Preamble Preamble 2/42 Preamble Regularized Linear Models In this lecture, we introduce the concept of

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning tasks

CSI5180. MachineLearningfor BioinformaticsApplications Essential Cellular Biology (continued) by

CSI5180. MachineLearningfor BioinformaticsApplications Essential Cellular Biology by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Hidden Markov Models by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Essential Bioinformatics Skills by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Course overview by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Rule Learning by Marcel Turcotte Version

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning practical issues by

5. Summary of linear regression so far Main points Model/function/predictor class of linear

Lecture 3: Kernel Regression Curse of Dimensionality Aykut Erdem February 2016 Hacettepe

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr

Regression with Many Predictors 21.12.2016 Goals of Todays Lecture Get a (limited) overview

Math 211 Math 211 Lecture #2 2 Autonomous Equations Autonomous Equations General equation:

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

Variables Review Control Statements private void solaDon () { method // three turnLeft()s }

Variables Varia iables Variables allow your programs to store, load , and change values in

Lecture 2: Variables & Assignments (Sections 2.1-2.3,2.5) CS 1110 Introduction to Computing

Values and Variables 1 / 19 Languages and Computation Every powerful language has three

CSI5180. MachineLearningfor BioinformaticsApplications Regularized - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel Turcotte Version November 6, 2019 Preamble Preamble 2/42 Preamble Regularized Linear Models In this lecture, we introduce the concept of

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning tasks

CSI5180. MachineLearningfor BioinformaticsApplications Essential Cellular Biology (continued) by

CSI5180. MachineLearningfor BioinformaticsApplications Essential Cellular Biology by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Hidden Markov Models by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Essential Bioinformatics Skills by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Course overview by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Rule Learning by Marcel Turcotte Version

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning practical issues by

5. Summary of linear regression so far Main points Model/function/predictor class of linear

Lecture 3: Kernel Regression Curse of Dimensionality Aykut Erdem February 2016 Hacettepe

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr

Regression with Many Predictors 21.12.2016 Goals of Todays Lecture Get a (limited) overview

Math 211 Math 211 Lecture #2 2 Autonomous Equations Autonomous Equations General equation:

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

Variables Review Control Statements private void solaDon () { method // three turnLeft()s }

Variables Varia iables Variables allow your programs to store, load , and change values in

Lecture 2: Variables &amp; Assignments (Sections 2.1-2.3,2.5) CS 1110 Introduction to Computing

Values and Variables 1 / 19 Languages and Computation Every powerful language has three

Lecture 2: Variables & Assignments (Sections 2.1-2.3,2.5) CS 1110 Introduction to Computing