Linear models Oliver Stegle and Karsten Borgwardt Machine Learning - PowerPoint PPT Presentation

Linear models Oliver Stegle and Karsten Borgwardt Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology, Tübingen Oliver Stegle and Karsten Borgwardt: Computational Approaches for Analysing Complex Biological Systems, Page 1

Motivation Curve fitting Tasks we are interested in: ? ◮ Making predictions Y ◮ Comparison of alternative models x* X O. Stegle & K. Borgwardt Linear models T¨ ubingen 1

Motivation Further reading, useful material ◮ Christopher M. Bishop: Pattern Recognition and Machine learning. ◮ Good background, covers most of the course material and much more! ◮ This lecture is largely inspired by chapter 3 of the book. O. Stegle & K. Borgwardt Linear models T¨ ubingen 2

Outline Outline O. Stegle & K. Borgwardt Linear models T¨ ubingen 3

Linear Regression Outline Motivation Linear Regression Bayesian linear regression Model comparison and hypothesis testing Summary O. Stegle & K. Borgwardt Linear models T¨ ubingen 4

Linear Regression Regression Noise model and likelihood ◮ Given a dataset D = { x n , y n } N n =1 , where x n = { x n, 1 , . . . , x n,D } is D dimensional, fit parameters θ of a regressor f with added Gaussian noise: � � � 0 , σ 2 � p ( ǫ | σ 2 ) = N y n = f ( x n ; θ ) + ǫ n where ǫ . ◮ Equivalent likelihood formulation: N � � � � f ( x n ) , σ 2 � p ( y | X ) = N y n n =1 O. Stegle & K. Borgwardt Linear models T¨ ubingen 5

Linear Regression Regression Choosing a regressor ◮ Choose f to be linear: N � � � w T · x n + c, σ 2 � � p ( y | X ) = N y n n =1 ◮ Consider bias free case, c = 0 , otherwise inlcude an additional column of ones in each x n . O. Stegle & K. Borgwardt Linear models T¨ ubingen 6

Linear Regression Regression Choosing a regressor ◮ Choose f to be linear: N � � � w T · x n + c, σ 2 � � p ( y | X ) = N y n n =1 ◮ Consider bias free case, c = 0 , otherwise inlcude an additional column of ones in each x n . Equivalent graphical model O. Stegle & K. Borgwardt Linear models T¨ ubingen 6

Linear Regression Linear Regression Maximum likelihood ◮ Taking the logarithm, we obtain N � � � � w T x n , σ 2 � ln p ( y | w , X , σ 2 ) = ln N y n n =1 N 1 = − N � 2 ln 2 πσ 2 − ( y n − w T · x n ) 2 2 σ 2 n =1 � �� Sum of squares ◮ The likelihood is maximized when the squared error is minimized. ◮ Least squares and maximum likelihood are equivalent. O. Stegle & K. Borgwardt Linear models T¨ ubingen 7

Linear Regression Linear Regression and Least Squares y n y f ( x n , w ) x x n (C.M. Bishop, Pattern Recognition and Machine Learning) N E ( w ) = 1 � ( y n − w T x n ) 2 2 n =1 O. Stegle & K. Borgwardt Linear models T¨ ubingen 8

Linear Regression Linear Regression and Least Squares ◮ Derivative w.r.t a single weight entry w i � � N − 1 d d � ln p ( y | w , σ 2 ) = ( y n − w · x n ) 2 2 σ 2 d w i d w i n =1 N = 1 � ( y n − w · x n ) x i σ 2 n =1 ◮ Set gradient w.r.t to w to zero N ∇ w ln p ( y | w , σ 2 ) = 1 � ( y n − w · x n ) x T n = 0 σ 2 n =1 ⇒ w ML = ( X T X ) − 1 X T = y � �� Pseudo inverse   x 1 , 1 . . . x 1 , D ◮ Here, the matrix X is defined as X = . . . . . . . . .   x N, 1 . . . x N,D O. Stegle & K. Borgwardt Linear models T¨ ubingen 9

Linear Regression Polynomial Curve Fitting ◮ Use the polynomials up to degree K to construct new features from x f ( x, w ) = w 0 + w 1 x + w 2 x 2 + · · · + w K x K = w T φ ( x ) , where we defined φ ( x ) = (1 , x, x 2 , . . . , x K ) . ◮ Similarly, φ can be any feature mapping. ◮ Possible to show: the feature map φ can be expressed in terms of kernels (kernel trick). O. Stegle & K. Borgwardt Linear models T¨ ubingen 10

Linear Regression Polynomial Curve Fitting Overfitting ◮ The degree of the polynomial is crucial to avoid under- and overfitting. M = 0 1 t 0 −1 0 1 x (C.M. Bishop, Pattern Recognition and Machine Learning) O. Stegle & K. Borgwardt Linear models T¨ ubingen 11

Linear Regression Regularized Least Squares ◮ Solutions to avoid overfitting: ◮ Intelligently choose K ◮ Regularize the regression weights w ◮ Construct a smoothed error function N E ( w ) = 1 + λ � � � 2 y n − w T φ ( x n ) 2 w T w 2 n =1 � �� Regularizer Squared error O. Stegle & K. Borgwardt Linear models T¨ ubingen 12

Linear Regression Regularized Least Squares More general regularizers ◮ A more general regularization approach: N D E ( w ) = 1 + λ � � � � 2 y n − w T φ ( x n ) | w d | q 2 2 n =1 d =1 � �� Squared error Regularizer O. Stegle & K. Borgwardt Linear models T¨ ubingen 13

Linear Regression Regularized Least Squares More general regularizers ◮ A more general regularization approach: N D E ( w ) = 1 + λ � � � � 2 y n − w T φ ( x n ) | w d | q 2 2 n =1 d =1 � �� Squared error Regularizer q =0 . 5 q =1 q =2 q =4 (C.M. Bishop, Pattern Recognition and Machine Learning) O. Stegle & K. Borgwardt Linear models T¨ ubingen 13

Linear Regression Regularized Least Squares More general regularizers ◮ A more general regularization approach: N D E ( w ) = 1 + λ � � � � 2 y n − w T φ ( x n ) | w d | q 2 2 n =1 d =1 � �� Squared error Regularizer sparse q =0 . 5 q =1 q =2 q =4 Lasso Quadratic (C.M. Bishop, Pattern Recognition and Machine Learning) O. Stegle & K. Borgwardt Linear models T¨ ubingen 13

Linear Regression Loss functions and other methods ◮ Even more general: vary the loss function N D E ( w ) = 1 + λ � � L ( y n − w T φ ( x n )) | w d | q 2 2 n =1 d =1 � �� Loss Regularizer ◮ Many state-of-the-art machine learning methods can be expressed within this framework. ◮ Linear Regression: squared loss, squared regularizer. ◮ Support Vector Machine: hinge loss, squared regularizer. ◮ Lasso: squared loss, L1 regularizer. ◮ Inference: minimize the cost function E ( w ) , yielding a point estimate for w . O. Stegle & K. Borgwardt Linear models T¨ ubingen 14

Linear models Oliver Stegle and Karsten Borgwardt Machine Learning - PowerPoint PPT Presentation

Linear models Oliver Stegle and Karsten Borgwardt Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology, Tbingen Oliver Stegle and

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Introduction to Java Server Faces(JSF) Deepak Goyal Vikas Varma Sun Microsystems Objective

Distributed Learning Environment Using XML Templates Sren Auer - University of Leipzig, Germany

Introduction to Machine Learning - CS725 Instructor: Prof. Ganesh Ramakrishnan Lecture 4 - Linear

Help System H "able" "absent" "add" "zoom" . . . The

Performance measures Nicolas Papageorgiou AQF-2005 Measuring hedge fund performance What

Contextualized Word Embeddings Spring 2020 2020-03-17 Adapted from slides from Danqi Chen and

How Humans Work Semester 2, 2009 1 Human-Machine Interaction Important Concepts Normans

CS 744: MAPREDUCE Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS Assignment 1 deliverables

Linear models Oliver Stegle and Karsten Borgwardt Machine Learning - PowerPoint PPT Presentation

Linear models Oliver Stegle and Karsten Borgwardt Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology, Tbingen Oliver Stegle and

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Introduction to Java Server Faces(JSF) Deepak Goyal Vikas Varma Sun Microsystems Objective

Distributed Learning Environment Using XML Templates Sren Auer - University of Leipzig, Germany

Introduction to Machine Learning - CS725 Instructor: Prof. Ganesh Ramakrishnan Lecture 4 - Linear

Help System H &quot;able&quot; &quot;absent&quot; &quot;add&quot; &quot;zoom&quot; . . . The

Performance measures Nicolas Papageorgiou AQF-2005 Measuring hedge fund performance What

Contextualized Word Embeddings Spring 2020 2020-03-17 Adapted from slides from Danqi Chen and

How Humans Work Semester 2, 2009 1 Human-Machine Interaction Important Concepts Normans

CS 744: MAPREDUCE Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS Assignment 1 deliverables

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Help System H "able" "absent" "add" "zoom" . . . The