learning outline
play

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) - PowerPoint PPT Presentation

Linear Regression CSCI 447/547 MACHINE LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombes Quartet Multivariate OLS OLS Pros and Cons Optional Reading


  1. Linear Regression CSCI 447/547 MACHINE LEARNING

  2. Outline  Linear Models  1D Ordinary Least Squares (OLS)  Solution of OLS  Interpretation  Anscombe’s Quartet  Multivariate OLS  OLS Pros and Cons

  3. Optional Reading

  4. Terminology  Features (Covariates or predictors)  Labels (Variates or targets)  Regression  Classification

  5. Types of Machine Learning  Unsupervised Weight  Finding structure in data  Supervised  Predict from given data Height Women Weight Weight Men Height Classification Height OLS Regression categorical output data (Prediction) Logistic Regression continuous output data

  6. What is a Linear Model?  Predict Housing Prices  Depends on:  Area  # of bedrooms  # of bathrooms  Hypothesis is that relationship is linear  Price = k 1 (Area) + k 2 (#bed) + k 3 (#bath)  y i = a 0 + a 1 x 1 + a 2 x 2 + …

  7. Why Use Linear Models?  Interpretable  Relationships are easy to see  Low Complexity  Prevents overfitting  Scalable  Scale up to more data, larger problems  Baseline  Can benchmark other methods against them

  8. Examples of Use  Example of Use  MNIST dataset – handwritten digits  Best performance – neural networks and regularization  99.79% accurate  Takes about a day to train  More difficult to build  Logistic Regression  92.5% accurate  Takes seconds to train  Can be built with less expertise  Building Blocks of Later Techniques

  9. Optional Reading

  10. Definition of 1-Dimension OLS  The Problem Statement  i is an observation, we have N of them  i = 1…N  x is the independent variable (feature)  y is dependent variable (output variable)  y = ax + b, a,b are constants ˆ  y i = ax i + b OR y i = ax i + b + ε  Two unknowns – want to solve for a and b

  11. The Loss Function  L = ∑ i=1 ˆ N (y i – y i ) 2  Goal is to minimize this function ˆ  Using y i = ax i + b, the equation becomes:  L = ∑ i=1 N (y i – ax i - b) 2  So this is the equation we want to minimize

  12. Solution of OLS  Derivation  L = ∑ i=1 N (y i – ax i - b) 2  Want to minimize L  Take derivative of loss function wrt each variable 𝑒𝑀 𝑒𝑀  𝑒𝑏 = 0, 𝑒𝑐 = 0 𝑒𝑀 𝑒𝑀  N 2(y i – ax i - b)(-x i ) = 0 𝑒𝑏 = 0 => 𝑒𝑏 = ∑ i=1 𝑒𝑀  => N x i y i – a ∑ i=1 N x i 2 - b ∑ i=1 N x i = 0 𝑒𝑏 = ∑ i=1

  13. Solution of OLS  Derivation 𝑒𝑀 𝑒𝑀  N 2(y i – ax i - b)(+1) = 0 𝑒𝑐 = 0 => 𝑒𝑐 = ∑ i=1 𝑒𝑀  => N y i –∑ i=1 N x i – bN = 0 𝑒𝑐 = ∑ i=1 1 𝑏 𝑂 ∑ i=1  b = N y i – N x i 𝑂 ∑ i=1  This is the closed form solution for b

  14. Solution of OLS  Derivation  From first set, 𝑒𝑀  N x i y i – a∑ i=1 N x i 2 - b∑ i=1 N x i = 0 𝑒𝑏 = ∑ i=1 1 𝑏 𝑂 ∑ i=1  => ∑ i=1 N x i y i = a∑ i=1 N x i 2 + ∑ i=1 N x i ( N y i – N x i ) 𝑂 ∑ i=1 𝑦 𝑗 𝑧 𝑗 − 1 𝑂 𝑂 𝑂 𝑦 𝑗 𝑧 𝑗 1 1  a = 𝑗 − 1 𝑂 𝑂 𝑦 2 𝑦 𝑗 ) 2 𝑂 ( 1 1  This is the closed form solution for a

  15. Solution of OLS  Optimal Choices

  16. Interpretation  Interpretation of a and b  a is the slope of the line  tangent of angle θ  the effect of the independent variable on the dependent y – dependent variable θ x – independent variable  b is the intercept of the line

  17. Interpretation  Interpretation of L  L = ∑ i=1 N (y i – y i ) 2  Expresses how well the solution captures the variation in the data  R 2 = 1 – MSE/Var(y)  R 2  [0, 1]

  18. Interpretation

  19. Anscombe’s Quartet

  20. Anscombe’s Quartet  Same values for mean, variance and best fit line  R 2 values are the same for each example  But … linear regression may not be the best for the last three examples

  21. Multivariable OLS  Definition of Model  Data Matrix  The Loss Function

  22. Mutivariable OLS  i = an observation  N = number of observations  i = 1…N  M = number of features  x i = [x i1 , x i2 , …, x iM ]  y i - dependent variable 𝑦 11 𝑦 12 … 𝑦 1𝑁 … … …  Data matrix: X = 𝑦 𝑂1 𝑌 𝑂2 … 𝑌 𝑂𝑁

  23. Mutivariable OLS 𝑦 11 𝑦 12 … 𝑦 1𝑁 … … …  Data matrix: X = 𝑦 𝑂1 𝑌 𝑂2 … 𝑌 𝑂𝑁  y = ax + b(1)  Add a column of all 1’s to left of data matrix to get bias term included ˆ  y i = B 0 + B 1 x i1 + B 2 x i2 + … + B M x iM 𝐶 0 …  x i . B, B = , y = XB 𝐶 𝑁

  24. Multivariable OLS  Loss Function  L = ∑ i=1 ˆ N (y i – y i ) 2  Still want to minimize L  L = ∑ i=1 N (y i – (B 0 + B 1 x i1 + … + B M x iM )) 2  L = ∑ i=1 N (y i – x i B) 2  Norm manner – L2 norm of the vector  L = 𝑧 − 𝑌𝐶 2 2  L = (y – XB) T (y – XB)

  25. Optimization  A Few Facts from Matrix Calculus  𝑒(𝑏𝑦) = 𝑏 𝑒𝑦  𝑒 𝑏𝑦 2 = 2𝑏𝑦 𝑒𝑦

  26. Optimization  Minimizing the Loss  L = (y – XB) T (y – XB) 𝑒𝑀 𝑒𝐶 = 0   𝑒 𝑧 −𝑌𝐶 𝑈 (𝑧−𝑌𝐶) = 0 𝑒𝐶  𝑒(𝑧 𝑈 𝑧 −𝑧 𝑈 𝑌𝐶 −𝐶 𝑈 𝑌 𝑈 𝑧+𝐶 𝑈 𝑌 𝑈 𝑌𝐶) = 0 ((XY) T = Y T X T ) 𝑒𝐶  -(X T y) – (X T y) + 2(X T X)B = 0  X T y = (X T X)B  B = (X T X )-1 X T y (assuming X T X is invertible, which is true if X is a full rank matrix, that is none of its columns are linearly dependent)

  27. OLS Pros and Cons  OLS  Pros  Efficient to compute  Unique minimum  Stable under perturbation of data  Easy to interpret  Cons  Influenced by outliers  (X T X) -1 may not exist  Features may not be linearly independent

  28. Summary  Linear Models  1D Ordinary Least Squares (OLS)  Solution of OLS  Interpretation  Anscombe’s Quartet  Multivariate OLS  OLS Pros and Cons

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend