machine learning regressions
play

Machine Learning - Regressions Amir H. Payberah payberah@kth.se - PowerPoint PPT Presentation

Machine Learning - Regressions Amir H. Payberah payberah@kth.se 07/11/2018 The Course Web Page https://id2223kth.github.io 1 / 81 Where Are We? 2 / 81 Where Are We? 3 / 81 Lets Start with an Example 4 / 81 The Housing Price Example


  1. Machine Learning - Regressions Amir H. Payberah payberah@kth.se 07/11/2018

  2. The Course Web Page https://id2223kth.github.io 1 / 81

  3. Where Are We? 2 / 81

  4. Where Are We? 3 / 81

  5. Let’s Start with an Example 4 / 81

  6. The Housing Price Example (1/3) ◮ Given the dataset of m houses. Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 . . . . . . . . . ◮ Predict the prices of other houses, as a function of the size of living area and number of bedrooms? 5 / 81

  7. The Housing Price Example (2/3) Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 . . . . . . . . . � 2104 � � 1600 � � 2400 � x ( 1 ) = y ( 1 ) = 400 x ( 2 ) = y ( 2 ) = 330 x ( 3 ) = y ( 3 ) = 369 3 3 3  x ( 1 ) ⊺   2104 3   400  x ( 2 ) ⊺ 1600 3 330         X = = y = x ( 3 ) ⊺ 2400 3 369             . . . .       . . . . . . . . ◮ x ( i ) ∈ R 2 : x ( i ) is the living area, and x ( i ) is the number of bedrooms of the i th 1 2 house in the training set. 6 / 81

  8. The Housing Price Example (3/3) Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 . . . . . . . . . ◮ Predict the prices of other houses ^ y as a function of the size of their living areas x 1 , and number of bedrooms x 2 , i.e., ^ y = f ( x 1 , x 2 ) ◮ E.g., what is ^ y , if x 1 = 4000 and x 2 = 4 ? ◮ As an initial choice: ^ y = f w ( x ) = w 1 x 1 + w 2 x 2 7 / 81

  9. Linear Regression 8 / 81

  10. Linear Regression (1/2) ◮ Our goal: to build a system that takes input x ∈ R n and predicts output ^ y ∈ R . ◮ In linear regression, the output ^ y is a linear function of the input x . y = f w ( x ) = w 1 x 1 + w 2 x 2 + · · · + w n x n ^ y = w ⊺ x ^ • ^ y : the predicted value • n : the number of features • x i : the i th feature value • w j : the j th model parameter ( w ∈ R n ) 9 / 81

  11. Linear Regression (2/2) ◮ Linear regression often has one additional parameter, called intercept b : y = w ⊺ x + b ^ ◮ Instead of adding the bias parameter b , we can augment x with an extra entry that is always set to 1. y = f w ( x ) = w 0 x 0 + w 1 x 1 + w 2 x 2 + · · · + w n x n , where x 0 = 1 ^ 10 / 81

  12. Linear Regression - Model Parameters ◮ Parameters w ∈ R n are values that control the behavior of the model. ◮ w are a set of weights that determine how each feature affects the prediction. • w i > 0 : increasing the value of the feature x i , increases the value of our prediction ^ y . • w i < 0 : increasing the value of the feature x i , decreases the value of our prediction ^ y . • w i = 0 : the value of the feature x i , has no effect on the prediction ^ y . 11 / 81

  13. How to Learn Model Parameters w ? 12 / 81

  14. Linear Regression - Cost Function (1/2) ◮ One reasonable model should make ^ y close to y , at least for the training dataset. ◮ Residual: the difference between the dependent variable y and the predicted value ^ y . r ( i ) = y ( i ) − ^ y ( i ) 13 / 81

  15. Linear Regression - Cost Function (2/2) ◮ Cost function J ( w ) y ( i ) is to the corresponding y ( i ) . • For each value of the w , it measures how close the ^ • We can define J ( w ) as the mean squared error (MSE): m J ( w ) = MSE ( w ) = 1 y ( i ) − y ( i ) ) 2 � ( ^ m i y − y ) 2 ] = 1 y − y || 2 = E [( ^ m || ^ 2 14 / 81

  16. How to Learn Model Parameters? ◮ We want to choose w so as to minimize J ( w ). ◮ Two approaches to find w : • Normal equation • Gradient descent 15 / 81

  17. Normal Equation 16 / 81

  18. Derivatives and Gradient (1/3) ◮ The first derivative of f ( x ), shown as f ′ ( x ), shows the slope of the tangent line to the function at the poa x . ◮ f ( x ) = x 2 ⇒ f ′ ( x ) = 2x ◮ If f(x) is increasing, then f ′ ( x ) > 0 ◮ If f(x) is decreasing, then f ′ ( x ) < 0 ◮ If f(x) is at local minimum/maximum, then f ′ ( x ) = 0 17 / 81

  19. Derivatives and Gradient (2/3) ◮ What if a function has multiple arguments, e.g., f ( x 1 , x 2 , · · · , x n ) ◮ Partial derivatives: the derivative with respect to a particular argument. ∂ f ∂ x 1 , the derivative with respect to x 1 • ∂ f ∂ x 2 , the derivative with respect to x 2 • ∂ f ∂ x i : shows how much the function f will change, if we change x i . ◮ ◮ Gradient: the vector of all partial derivatives for a function f .  ∂ f  ∂ x 1 ∂ f   ∂ x 2 ∇ x f ( x ) =   . .   .   ∂ f ∂ x n 18 / 81

  20. Derivatives and Gradient (3/3) ◮ What is the gradient of f ( x 1 , x 2 , x 3 ) = x 1 − x 1 x 2 + x 2 3 ?   ∂ ∂ x 1 ( x 1 − x 1 x 2 + x 2 3 )  1 − x 2  ∂ x 2 ( x 1 − x 1 x 2 + x 2 ∂ ∇ x f ( x ) = 3 )  = − x 1      ∂ ∂ x 3 ( x 1 − x 1 x 2 + x 2 3 ) 2x 3 19 / 81

  21. Normal Equation (1/2) ◮ To minimize J ( w ), we can simply solve for where its gradient is 0: ∇ w J ( w ) = 0 y = w ⊺ x ^ [ x ( 1 ) 1 , x ( 1 ) 2 , · · · , x ( 1 )   x ( 1 ) ⊺ y ( 1 ) n ]     ^ [ x ( 2 ) 1 , x ( 2 ) 2 , · · · , x ( 2 ) x ( 2 ) ⊺ y ( 2 ) ^  n ]        X = = y = ^ .  .   .    . . .     . . .         x ( m ) ⊺ y ( m ) [ x ( m ) 1 , x ( m ) 2 , · · · , x ( m ) ^ n ] y = w ⊺ X ⊺ or ^ y = Xw ^ 20 / 81

  22. Normal Equation (2/2) ◮ To minimize J ( w ), we can simply solve for where its gradient is 0: ∇ w J ( w ) = 0 J ( w ) = 1 y − y || 2 m || ^ 2 , ∇ w J ( w ) = 0 1 y − y || 2 ⇒ ∇ w m || ^ 2 = 0 1 m || Xw − y || 2 ⇒ ∇ w 2 = 0 ⇒ ∇ w ( Xw − y ) ⊺ ( Xw − y ) = 0 ⇒ ∇ w ( w ⊺ X ⊺ Xw − 2 w ⊺ X ⊺ y + y ⊺ y ) = 0 ⇒ 2 X ⊺ Xw − 2 X ⊺ y = 0 ⇒ w = ( X ⊺ X ) − 1 X ⊺ y 21 / 81

  23. Normal Equation - Example (1/7) Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 1416 2 232 3000 4 540 ◮ Predict the value of ^ y , when x 1 = 4000 and x 2 = 4 . ◮ We should find w 0 , w 1 , and w 2 in ^ y = w 0 + w 1 x 1 + w 2 x 2 . ◮ w = ( X ⊺ X ) − 1 X ⊺ y . 22 / 81

  24. Normal Equation - Example (2/7) Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 1416 2 232 3000 4 540 1 2104 3 400     1 1600 3 330     X = 1 2400 3 y = 369         1 1416 2 232     1 3000 4 540 import breeze.linalg._ val X = new DenseMatrix(5, 3, Array(1.0, 1.0, 1.0, 1.0, 1.0, 2104.0, 1600.0, 2400.0, 1416.0, 3000.0, 3.0, 3.0, 3.0, 2.0, 4.0)) val y = new DenseVector(Array(400.0, 330.0, 369.0, 232.0, 540.0)) 23 / 81

  25. Normal Equation - Example (3/7) 1 2104 3       1 1 1 1 1 1 1600 3 5 10520 15   X ⊺ X =  = 2104 1600 2400 1416 3000 1 2400 3 10520 23751872 33144         3 3 3 2 4 1 1416 2 15 33144 47  1 3000 4 val Xt = X.t val XtX = Xt * X 24 / 81

  26. Normal Equation - Example (4/7)   4 . 90366455e + 00 7 . 48766737e − 04 − 2 . 09302326e + 00 ( X ⊺ X ) − 1 = 7 . 48766737e − 04 2 . 75281889e − 06 − 2 . 18023256e − 03   − 2 . 09302326e + 00 − 2 . 18023256e − 03 2 . 22674419e + 00 val XtXInv = inv(XtX) 25 / 81

  27. Normal Equation - Example (5/7) 400       1 1 1 1 1 330 1871   X ⊺ y =  = 2104 1600 2400 1416 3000  369  4203712       3 3 3 2 4 232 5921  540 val Xty = Xt * y 26 / 81

  28. Normal Equation - Example (6/7)  4 . 90366455e + 00 − 2 . 09302326e + 00   1871  7 . 48766737e − 04 w = ( X ⊺ X ) − 1 X ⊺ y = 7 . 48766737e − 04 2 . 75281889e − 06 − 2 . 18023256e − 03 4203712     − 2 . 09302326e + 00 2 . 22674419e + 00 5921 − 2 . 18023256e − 03   − 7 . 04346018 e + 01 = 6 . 38433756 e − 02   1 . 03436047 e + 02 val w = XtXInv * Xty 27 / 81

  29. Normal Equation - Example (7/7) ◮ Predict the value of y , when x 1 = 4000 and x 2 = 4 . y = − 7 . 04346018e + 01 + 6 . 38433756e − 02 × 4000 + 1 . 03436047e + 02 × 4 ≈ 599 ^ val test = new DenseVector(Array(1.0, 4000.0, 4.0)) val yHat = w * test 28 / 81

  30. Normal Equation in Spark case class house(x1: Long, x2: Long, y: Long) val trainData = Seq(house(2104, 3, 400), house(1600, 3, 330), house(2400, 3, 369), house(1416, 2, 232), house(3000, 4, 540)).toDF val testData = Seq(house(4000, 4, 0)).toDF import org.apache.spark.ml.feature.VectorAssembler val va = new VectorAssembler().setInputCols(Array("x1", "x2")).setOutputCol("features") val train = va.transform(trainData) val test = va.transform(testData) import org.apache.spark.ml.regression.LinearRegression val lr = new LinearRegression().setFeaturesCol("features").setLabelCol("y").setSolver("normal") val lrModel = lr.fit(train) lrModel.transform(test).show 29 / 81

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend