machine learning 1 linear regression
play

Machine Learning 1: Linear Regression Stefano Ermon March 31, 2016 - PowerPoint PPT Presentation

Machine Learning 1: Linear Regression Stefano Ermon March 31, 2016 Stefano Ermon March 31, 2016 1 / 25 Machine Learning 1: Linear Regression Plan for today Plan for today: Supervised Machine Learning: linear regression Stefano Ermon March


  1. Machine Learning 1: Linear Regression Stefano Ermon March 31, 2016 Stefano Ermon March 31, 2016 1 / 25 Machine Learning 1: Linear Regression

  2. Plan for today Plan for today: Supervised Machine Learning: linear regression Stefano Ermon March 31, 2016 2 / 25 Machine Learning 1: Linear Regression

  3. Renewable electricity generation in the U.S Source: Renewable energy data book, NREL Stefano Ermon March 31, 2016 3 / 25 Machine Learning 1: Linear Regression

  4. Challenges for the grid Wind and solar are intermittent We will need traditional power plants when the wind stops Many power plants (e.g., nuclear) cannot be easily turned on/off or quickly ramped up/down With more accurate forecasts, wind and solar power become more efficient alternatives A few years ago, Xcel Energy (Colorado) ran ads opposing a proposal that it use 10% of renewable sources Thanks to wind forecasting (ML) algorithms developed at NCAR, they now aim for 30 percent. Accurate forecasting saved the utility $6-$10 million per year Stefano Ermon March 31, 2016 4 / 25 Machine Learning 1: Linear Regression

  5. Motivation Solar and wind are intermittent Can we accurately forecast how much energy will we consume tomorrow? Difficult to estimate from “a priori” models But, we have lots of data from which to build a model Stefano Ermon March 31, 2016 5 / 25 Machine Learning 1: Linear Regression

  6. Typical electricity consumption 3 Feb 9 Jul 13 Oct 10 2.5 Hourly Demand (GW) 2 1.5 1 0 5 10 15 20 Hour of Day Data: PJM http://www.pjm.com Stefano Ermon March 31, 2016 6 / 25 Machine Learning 1: Linear Regression

  7. Predict peak demand from high temperature What will peak demand be tomorrow? If we know something else about tomorrow (like the high temperature), we can use this to predict peak demand 3 Peak Hourly Demand (GW) 2.5 2 1.5 60 65 70 75 80 85 90 95 High Temperature (F) Data: PJM, Weather Underground (summer months, June-August) Stefano Ermon March 31, 2016 7 / 25 Machine Learning 1: Linear Regression

  8. A simple model A linear model that predicts demand: predicted peak demand = θ 1 · ( high temperature ) + θ 2 Observed data 3 Linear regression prediction Peak Hourly Demand (GW) 2.5 2 1.5 60 65 70 75 80 85 90 95 High Temperature (F) Parameters of model: θ 1 , θ 2 ∈ R ( θ 1 = 0 . 046 , θ 2 = − 1 . 46 ) Stefano Ermon March 31, 2016 8 / 25 Machine Learning 1: Linear Regression

  9. A simple model We can use a model like this to make predictions What will be the peak demand tomorrow? I know from weather report that high temperature will be 80 ◦ F (ignore, for the moment, that this too is a prediction) Then predicted peak demand is: θ 1 · 80 + θ 2 = 0 . 046 · 80 − 1 . 46 = 2 . 19 GW Stefano Ermon March 31, 2016 9 / 25 Machine Learning 1: Linear Regression

  10. Formal problem setting Input : x i ∈ R n , i = 1 , . . . , m E.g.: x i ∈ R 1 = { high temperature for day i } Output : y i ∈ R ( regression task) E.g.: y i ∈ R = { peak demand for day i } Model Parameters : θ ∈ R k Predicted Output : ˆ y i ∈ R E.g.: ˆ y i = θ 1 · x i + θ 2 Stefano Ermon March 31, 2016 10 / 25 Machine Learning 1: Linear Regression

  11. For convenience, we define a function that maps inputs to feature vectors φ : R n → R k For example, in our task above, if we define � x i � φ ( x i ) = (here n = 1 , k = 2 ) 1 then we can write k θ j · φ j ( x i ) ≡ θ T φ ( x i ) � ˆ y i = j =1 Stefano Ermon March 31, 2016 11 / 25 Machine Learning 1: Linear Regression

  12. Loss functions Want a model that performs “well” on the data we have I.e., ˆ y i ≈ y i , ∀ i We measure “closeness” of ˆ y i and y i using loss function ℓ : R × R → R + Example: squared loss y i − y i ) 2 ℓ (ˆ y i , y i ) = (ˆ Stefano Ermon March 31, 2016 12 / 25 Machine Learning 1: Linear Regression

  13. Finding model parameters, and optimization Want to find model parameters such that minimize sum of costs over all input/output pairs m m ( θ T φ ( x i ) − y i ) 2 � � J ( θ ) = ℓ (ˆ y i , y i ) = i =1 i =1 Write our objective formally as minimize J ( θ ) θ simple example of an optimization problem ; these will dominate our development of algorithms throughout the course Stefano Ermon March 31, 2016 13 / 25 Machine Learning 1: Linear Regression

  14. How do we optimize a function Search algorithm: Start with an initial guess for θ . Keep changing θ (by a little bit) to reduce J ( θ ) Animation https://www.youtube.com/watch?v=vWFjqgb-ylQ Stefano Ermon March 31, 2016 14 / 25 Machine Learning 1: Linear Regression

  15. Gradient descent Search algorithm: Start with an initial guess for θ . Keep changing θ (by a little bit) to reduce J ( θ ) m m ( θ T φ ( x i ) − y i ) 2 � � J ( θ ) = ℓ (ˆ y i , y i ) = i =1 i =1 Gradient descent: θ j = θ j − α ∂J ( θ ) ∂θ j , for all j m i =1 ( θ T φ ( x i ) − y i ) 2 ∂ ( θ T φ ( x i ) − y i ) 2 = ∂ � m ∂J � = ∂θ j ∂θ j ∂θ j i =1 m 2( θ T φ ( x i ) − y i ) ∂ ( θ T φ ( x i ) − y i ) � = ∂θ j i =1 m 2( θ T φ ( x i ) − y i ) φ ( x i ) j � = i =1 Stefano Ermon March 31, 2016 15 / 25 Machine Learning 1: Linear Regression

  16. Gradient descent Repeat until “convergence”: m 2( θ T φ ( x i ) − y i ) φ ( x i ) j , for all j � θ j = θ j − α i =1 Demo: https://lukaszkujawa.github.io/gradient-descent.html Stochastic gradient descent Stefano Ermon March 31, 2016 16 / 25 Machine Learning 1: Linear Regression

  17. Let’s write J ( θ ) a little more compactly using matrix notation; define φ ( x 1 ) T     — — y 1 φ ( x 2 ) T — — y 2 Φ ∈ R m × k =   y ∈ R m =    , . .     . .     . .    φ ( x m ) T — — y m then m ( θ T φ ( x i ) − y i ) 2 = � Φ θ − y � 2 � J ( θ ) = 2 i =1 √ �� m z T z ) i =1 z 2 ( � z � 2 is ℓ 2 norm of a vector: � z � 2 ≡ i = Called least-squares objective function Stefano Ermon March 31, 2016 17 / 25 Machine Learning 1: Linear Regression

  18. How do we optimize a function? 1-D case ( θ ∈ R ): 14 14 4 4 12 12 2 2 10 10 0 0 8 8 −3 −3 −2 −2 −1 −1 0 0 1 1 2 2 3 3 6 6 −2 −2 4 4 −4 −4 2 2 −6 −6 0 0 −3 −3 −2 −2 −1 −1 0 0 1 1 2 2 3 3 −2 −2 −8 −8 J ( θ ) = θ 2 − 2 θ − 1 dJ dθ = 2 θ − 2 � ⇒ dJ θ ⋆ minimum = � θ ⋆ = 0 � dθ � ⇒ 2 θ ⋆ − 2 = 0 = ⇒ θ ⋆ = 1 = Stefano Ermon March 31, 2016 18 / 25 Machine Learning 1: Linear Regression

  19. Multi-variate case: θ ∈ R k , J : R k → R Generalized condition: ∇ θ J ( θ ) | θ ⋆ = 0 ∇ θ J ( θ ) denotes gradient of J with respect to θ ∂J   ∂θ 1 ∂J   ∇ θ J ( θ ) ∈ R k ≡  ∂θ 2   .  .   .   ∂J ∂θ k Some important rules and common gradient ∇ θ ( af ( θ ) + bg ( θ )) = a ∇ θ f ( θ ) + b ∇ θ g ( θ ) , ( a, b ∈ R ) ∇ θ ( θ T Aθ ) = ( A + A T ) θ, ( A ∈ R k × k ) ∇ θ ( b T θ ) = b, ( b ∈ R k ) Stefano Ermon March 31, 2016 19 / 25 Machine Learning 1: Linear Regression

  20. Optimizing least-squares objective J ( θ ) = � Φ θ − y � 2 2 = (Φ θ − y ) T (Φ θ − y ) = θ T Φ T Φ θ − 2 y T Φ θ + y T y Using the previous gradient rules ∇ θ J ( θ ) = ∇ θ ( θ T Φ T Φ θ − 2 y T Φ θ + y T y ) = ∇ θ ( θ T Φ T Φ θ ) − 2 ∇ θ ( y T Φ θ ) + ∇ θ ( y T y ) = 2Φ T Φ θ − 2Φ T y Setting gradient equal to zero 2Φ T Φ θ ⋆ − 2Φ T y = 0 ⇐ ⇒ θ ⋆ = (Φ T Φ) − 1 Φ T y known as the normal equations Stefano Ermon March 31, 2016 20 / 25 Machine Learning 1: Linear Regression

  21. Let’s see how this looks in MATLAB code high_temperature.txt´ X = load(´ ); peak_demand.txt´ y = load(´ ); n = size(X,2); m = size(X,1); Phi = [X ones(m,1)]; theta = inv(Phi´ * Phi) * Phi´ * y; theta = 0.0466 -1.4600 The normal equations are so common that MATLAB has a special operation for them % same as inv(Phi´ * Phi) * Phi´ * y theta = Phi \ y; Stefano Ermon March 31, 2016 21 / 25 Machine Learning 1: Linear Regression

  22. Higher-dimensional inputs � temperature � Input: x ∈ R 2 = hour of day Output: y ∈ R = demand Stefano Ermon March 31, 2016 22 / 25 Machine Learning 1: Linear Regression

  23. Stefano Ermon March 31, 2016 23 / 25 Machine Learning 1: Linear Regression

  24.   temperature Features: φ ( x ) ∈ R 3 = hour of day   1 Same matrices as before  φ ( x 1 ) T    — — y 1 Φ ∈ R m × k = . y ∈ R m = . . .  ,     . .    φ ( x m ) T — — y m Same solution as before θ ∈ R 3 = (Φ T Φ) − 1 Φ T y Stefano Ermon March 31, 2016 24 / 25 Machine Learning 1: Linear Regression

  25. Stefano Ermon March 31, 2016 25 / 25 Machine Learning 1: Linear Regression

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend