LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) - PowerPoint PPT Presentation

Linear Regression CSCI 447/547 MACHINE LEARNING

Outline  Linear Models  1D Ordinary Least Squares (OLS)  Solution of OLS  Interpretation  Anscombe’s Quartet  Multivariate OLS  OLS Pros and Cons

Optional Reading

Terminology  Features (Covariates or predictors)  Labels (Variates or targets)  Regression  Classification

Types of Machine Learning  Unsupervised Weight  Finding structure in data  Supervised  Predict from given data Height Women Weight Weight Men Height Classification Height OLS Regression categorical output data (Prediction) Logistic Regression continuous output data

What is a Linear Model?  Predict Housing Prices  Depends on:  Area  # of bedrooms  # of bathrooms  Hypothesis is that relationship is linear  Price = k 1 (Area) + k 2 (#bed) + k 3 (#bath)  y i = a 0 + a 1 x 1 + a 2 x 2 + …

Why Use Linear Models?  Interpretable  Relationships are easy to see  Low Complexity  Prevents overfitting  Scalable  Scale up to more data, larger problems  Baseline  Can benchmark other methods against them

Examples of Use  Example of Use  MNIST dataset – handwritten digits  Best performance – neural networks and regularization  99.79% accurate  Takes about a day to train  More difficult to build  Logistic Regression  92.5% accurate  Takes seconds to train  Can be built with less expertise  Building Blocks of Later Techniques

Optional Reading

Definition of 1-Dimension OLS  The Problem Statement  i is an observation, we have N of them  i = 1…N  x is the independent variable (feature)  y is dependent variable (output variable)  y = ax + b, a,b are constants ˆ  y i = ax i + b OR y i = ax i + b + ε  Two unknowns – want to solve for a and b

The Loss Function  L = ∑ i=1 ˆ N (y i – y i ) 2  Goal is to minimize this function ˆ  Using y i = ax i + b, the equation becomes:  L = ∑ i=1 N (y i – ax i - b) 2  So this is the equation we want to minimize

Solution of OLS  Derivation  L = ∑ i=1 N (y i – ax i - b) 2  Want to minimize L  Take derivative of loss function wrt each variable 𝑒𝑀 𝑒𝑀  𝑒𝑏 = 0, 𝑒𝑐 = 0 𝑒𝑀 𝑒𝑀  N 2(y i – ax i - b)(-x i ) = 0 𝑒𝑏 = 0 => 𝑒𝑏 = ∑ i=1 𝑒𝑀  => N x i y i – a ∑ i=1 N x i 2 - b ∑ i=1 N x i = 0 𝑒𝑏 = ∑ i=1

Solution of OLS  Derivation 𝑒𝑀 𝑒𝑀  N 2(y i – ax i - b)(+1) = 0 𝑒𝑐 = 0 => 𝑒𝑐 = ∑ i=1 𝑒𝑀  => N y i –∑ i=1 N x i – bN = 0 𝑒𝑐 = ∑ i=1 1 𝑏 𝑂 ∑ i=1  b = N y i – N x i 𝑂 ∑ i=1  This is the closed form solution for b

Solution of OLS  Derivation  From first set, 𝑒𝑀  N x i y i – a∑ i=1 N x i 2 - b∑ i=1 N x i = 0 𝑒𝑏 = ∑ i=1 1 𝑏 𝑂 ∑ i=1  => ∑ i=1 N x i y i = a∑ i=1 N x i 2 + ∑ i=1 N x i ( N y i – N x i ) 𝑂 ∑ i=1 𝑦 𝑗 𝑧 𝑗 − 1 𝑂 𝑂 𝑂 𝑦 𝑗 𝑧 𝑗 1 1  a = 𝑗 − 1 𝑂 𝑂 𝑦 2 𝑦 𝑗 ) 2 𝑂 ( 1 1  This is the closed form solution for a

Solution of OLS  Optimal Choices

Interpretation  Interpretation of a and b  a is the slope of the line  tangent of angle θ  the effect of the independent variable on the dependent y – dependent variable θ x – independent variable  b is the intercept of the line

Interpretation  Interpretation of L  L = ∑ i=1 N (y i – y i ) 2  Expresses how well the solution captures the variation in the data  R 2 = 1 – MSE/Var(y)  R 2  [0, 1]

Interpretation

Anscombe’s Quartet

Anscombe’s Quartet  Same values for mean, variance and best fit line  R 2 values are the same for each example  But … linear regression may not be the best for the last three examples

Multivariable OLS  Definition of Model  Data Matrix  The Loss Function

Mutivariable OLS  i = an observation  N = number of observations  i = 1…N  M = number of features  x i = [x i1 , x i2 , …, x iM ]  y i - dependent variable 𝑦 11 𝑦 12 … 𝑦 1𝑁 … … …  Data matrix: X = 𝑦 𝑂1 𝑌 𝑂2 … 𝑌 𝑂𝑁

Mutivariable OLS 𝑦 11 𝑦 12 … 𝑦 1𝑁 … … …  Data matrix: X = 𝑦 𝑂1 𝑌 𝑂2 … 𝑌 𝑂𝑁  y = ax + b(1)  Add a column of all 1’s to left of data matrix to get bias term included ˆ  y i = B 0 + B 1 x i1 + B 2 x i2 + … + B M x iM 𝐶 0 …  x i . B, B = , y = XB 𝐶 𝑁

Multivariable OLS  Loss Function  L = ∑ i=1 ˆ N (y i – y i ) 2  Still want to minimize L  L = ∑ i=1 N (y i – (B 0 + B 1 x i1 + … + B M x iM )) 2  L = ∑ i=1 N (y i – x i B) 2  Norm manner – L2 norm of the vector  L = 𝑧 − 𝑌𝐶 2 2  L = (y – XB) T (y – XB)

Optimization  A Few Facts from Matrix Calculus  𝑒(𝑏𝑦) = 𝑏 𝑒𝑦  𝑒 𝑏𝑦 2 = 2𝑏𝑦 𝑒𝑦

Optimization  Minimizing the Loss  L = (y – XB) T (y – XB) 𝑒𝑀 𝑒𝐶 = 0   𝑒 𝑧 −𝑌𝐶 𝑈 (𝑧−𝑌𝐶) = 0 𝑒𝐶  𝑒(𝑧 𝑈 𝑧 −𝑧 𝑈 𝑌𝐶 −𝐶 𝑈 𝑌 𝑈 𝑧+𝐶 𝑈 𝑌 𝑈 𝑌𝐶) = 0 ((XY) T = Y T X T ) 𝑒𝐶  -(X T y) – (X T y) + 2(X T X)B = 0  X T y = (X T X)B  B = (X T X )-1 X T y (assuming X T X is invertible, which is true if X is a full rank matrix, that is none of its columns are linearly dependent)

OLS Pros and Cons  OLS  Pros  Efficient to compute  Unique minimum  Stable under perturbation of data  Easy to interpret  Cons  Influenced by outliers  (X T X) -1 may not exist  Features may not be linearly independent

Summary  Linear Models  1D Ordinary Least Squares (OLS)  Solution of OLS  Interpretation  Anscombe’s Quartet  Multivariate OLS  OLS Pros and Cons

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) - PowerPoint PPT Presentation

Linear Regression CSCI 447/547 MACHINE LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombes Quartet Multivariate OLS OLS Pros and Cons Optional Reading

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Learning from Observations Chapter 18, Sections 13 Chapter 18, Sections 13 1 Outline

Lecture 9: Regularized/penalized regression Felix Held, Mathematical Sciences MSA220/MVE440

Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin

$TITLE: M5-1.GMS, ordinary least squares using NLP $ONTEXT there are I observations on two

Introduction to Big Data and Machine Learning OLS matrix derivation Dr. Mihail August 26, 2019

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics &

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

Slide Set 4 CLRM estimation Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics

Pattern recognition in nuclear fusion data by means of geometric methods in probabilistic spaces

Sambuz

Useful Links

Newsletter

Mail Us

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) - PowerPoint PPT Presentation

Linear Regression CSCI 447/547 MACHINE LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombes Quartet Multivariate OLS OLS Pros and Cons Optional Reading

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Learning from Observations Chapter 18, Sections 13 Chapter 18, Sections 13 1 Outline

Lecture 9: Regularized/penalized regression Felix Held, Mathematical Sciences MSA220/MVE440

Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin

$TITLE: M5-1.GMS, ordinary least squares using NLP $ONTEXT there are I observations on two

Introduction to Big Data and Machine Learning OLS matrix derivation Dr. Mihail August 26, 2019

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics &amp;

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

Slide Set 4 CLRM estimation Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics

Pattern recognition in nuclear fusion data by means of geometric methods in probabilistic spaces

Sambuz

Useful Links

Newsletter

Mail Us

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics &