Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept - PowerPoint PPT Presentation

Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010

Discrete to Continuous Labels Classification Sports Anemic cell Science Healthy cell News Y = Diagnosis X = Document Y = Topic X = Cell Image Regression Stock Market Prediction Y = ? X = Feb01 2

Regression Tasks Weather Prediction Y = Temp X = 7 pm Estimating Contamination X = new location Y = sensor reading 3

Supervised Learning Goal: Sports Science Y = ? News X = Feb01 Classification: Regression: Probability of Error Mean Squared Error 4

Regression Optimal predictor: (Conditional Mean) Intuition: Signal plus (zero-mean) Noise model 5

Regression Optimal predictor: Proof Strategy: Dropping subscripts for notational convenience 6 ≥ 0

Regression Optimal predictor: (Conditional Mean) Intuition: Signal plus (zero-mean) Noise model Depends on unknown distribution 7

Regression algorithms Learning algorithm Linear Regression Lasso, Ridge regression (Regularized Linear Regression) Nonlinear Regression Kernel Regression Regression Trees, Splines , Wavelet estimators, … 8

Empirical Risk Minimization (ERM) Optimal predictor: Empirical Risk Minimizer: Class of predictors Empirical mean Law of Large Numbers More later… 9

ERM – you saw it before! • Learning Distributions Max likelihood = Min -ve log likelihood empirical risk What is the class F ? Class of parametric distributions Bernoulli ( q ) Gaussian ( m , s 2 ) 10

Linear Regression Least Squares Estimator - Class of Linear functions b 2 = slope Uni-variate case: b 1 - intercept Multi-variate case: 1 where , 11

Least Squares Estimator 12

Least Squares Estimator 13

Normal Equations p xp p x1 p x1 If is invertible, When is invertible ? Recall: Full rank matrices are invertible. What is rank of ? What if is not invertible ? Regularization (later) 14

Geometric Interpretation Difference in prediction on training set: 0 is the orthogonal projection of onto the linear subspace spanned by the columns of 15

Revisiting Gradient Descent Even when is invertible, might be computationally expensive if A is huge. Gradient Descent since J( b ) is convex Initialize: Update: 0 if = Stop: when some criterion met e.g. fixed # iterations, or < ε . 16

Effect of step-size α Large α => Fast convergence but larger residual error Also possible oscillations Small α => Slow convergence but small residual error 17

Least Squares and MLE Intuition: Signal plus (zero-mean) Noise model log likelihood Least Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model ! 19

Regularized Least Squares and MAP What if is not invertible ? log likelihood log prior I) Gaussian Prior 0 Ridge Regression Closed form: HW Prior belief that β is Gaussian with zero- mean biases solution to “small” β 20

Regularized Least Squares and MAP What if is not invertible ? log likelihood log prior II) Laplace Prior Lasso Prior belief that β is Laplace with zero- mean biases solution to “small” β 21

Ridge Regression vs Lasso Ridge Regression: Lasso: Ideally l0 penalty, HOT! but optimization becomes non-convex β s with constant J ( β ) (level sets of J ( β )) β s with β s with β 2 β s with constant constant constant l2 norm l1 norm l0 norm β 1 Lasso (l1 penalty) results in sparse solutions – vector with more zero coordinates Good for high-dimensional problems – don’t have to store all coordinates! 22

Beyond Linear Regression Polynomial regression Regression with nonlinear features/basis functions h Kernel regression - Local/Weighted regression Regression trees – Spatially adaptive regression 23

Polynomial Regression Univariate (1-d) case: where , Weight of Nonlinear each feature features 24

Polynomial Regression http://mste.illinois.edu/users/exner/java.f/leastsquares/ 25

Nonlinear Regression Basis coefficients Nonlinear features/basis functions Fourier Basis Wavelet Basis Good representation for oscillatory functions Good representation for functions localized at multiple scales 26

Local Regression Basis coefficients Nonlinear features/basis functions Globally supported basis functions (polynomial, fourier) will not yield a good representation 27

Local Regression Basis coefficients Nonlinear features/basis functions Globally supported basis functions (polynomial, fourier) will not yield a good representation 28

What you should know Linear Regression Least Squares Estimator Normal Equations Gradient Descent Geometric and Probabilistic Interpretation (connection to MLE) Regularized Linear Regression (connection to MAP) Ridge Regression, Lasso Polynomial Regression, Basis (Fourier, Wavelet) Estimators Next time - Kernel Regression (Localized) - Regression Trees 29

Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept - PowerPoint PPT Presentation

Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Anemic cell Science Healthy cell News Y = Diagnosis X = Document Y = Topic X = Cell Image Regression Stock

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

A new method for computing Kazhdan-Lusztig polynomials Eric Sommers University of Massachusetts

ASX Announcement 15 May 2015 2015 Annual Meeting Resolutions In accordance with ASX Listing Rule

Welcome to General Capital Limiteds Special Meeting 10.30am Thursday 29 November 2018

LIFE and ENERGY in Hungary Future of the Geothermal Energy in Europe The Hungarian Example:

The Resolution Proof System in Propositional Logic While the Hilbert proof system has the

Supplementary Information 30 September 2007 Index RevPAR 3 Months to 30 September 2007

4. Coordination and Social Models Part 3: Coordination models (I): Multiagent Systems Design

United Kingdom Clinical Research Collaboration Public Health Research Centres of Excellence