csc 411 lecture 02 linear regression
play

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun - PowerPoint PPT Presentation

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun & Rich Zemels lectures Sanja Fidler University of Toronto Jan 13, 2016 (Most plots in this lecture are from Bishops book) Urtasun, Zemel, Fidler (UofT) CSC 411:


  1. CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun & Rich Zemel’s lectures Sanja Fidler University of Toronto Jan 13, 2016 (Most plots in this lecture are from Bishop’s book) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 1 / 22

  2. Problems for Today What should I watch this Friday? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

  3. Problems for Today What should I watch this Friday? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

  4. Problems for Today Goal : Predict movie rating automatically! Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

  5. Problems for Today Goal: How many followers will I get? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

  6. Problems for Today Goal: Predict the price of the house Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

  7. Regression What do all these problems have in common? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

  8. Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

  9. Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

  10. Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

  11. Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) ◮ Training examples, many x ( i ) for which t ( i ) is known (eg, many movies for which we know the rating) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

  12. Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) ◮ Training examples, many x ( i ) for which t ( i ) is known (eg, many movies for which we know the rating) ◮ A model, a function that represents the relationship between x and t Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

  13. Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) ◮ Training examples, many x ( i ) for which t ( i ) is known (eg, many movies for which we know the rating) ◮ A model, a function that represents the relationship between x and t ◮ A loss or a cost or an objective function, which tells us how well our model approximates the training examples Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

  14. Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) ◮ Training examples, many x ( i ) for which t ( i ) is known (eg, many movies for which we know the rating) ◮ A model, a function that represents the relationship between x and t ◮ A loss or a cost or an objective function, which tells us how well our model approximates the training examples ◮ Optimization, a way of finding the parameters of our model that minimizes the loss function Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

  15. Today: Linear Regression Linear regression ◮ continuous outputs ◮ simple model (linear) Introduce key concepts: ◮ loss functions ◮ generalization ◮ optimization ◮ model complexity ◮ regularization Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 4 / 22

  16. Simple 1-D regression Circles are data points (i.e., training examples) that are given to us Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 5 / 22

  17. Simple 1-D regression Circles are data points (i.e., training examples) that are given to us The data points are uniform in x , but may be displaced in y t ( x ) = f ( x ) + ǫ with ǫ some noise Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 5 / 22

  18. Simple 1-D regression Circles are data points (i.e., training examples) that are given to us The data points are uniform in x , but may be displaced in y t ( x ) = f ( x ) + ǫ with ǫ some noise In green is the ”true” curve that we don’t know Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 5 / 22

  19. Simple 1-D regression Circles are data points (i.e., training examples) that are given to us The data points are uniform in x , but may be displaced in y t ( x ) = f ( x ) + ǫ with ǫ some noise In green is the ”true” curve that we don’t know Goal: We want to fit a curve to these points Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 5 / 22

  20. Simple 1-D regression Key Questions: Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 6 / 22

  21. Simple 1-D regression Key Questions: ◮ How do we parametrize the model? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 6 / 22

  22. Simple 1-D regression Key Questions: ◮ How do we parametrize the model? ◮ What loss (objective) function should we use to judge the fit? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 6 / 22

  23. Simple 1-D regression Key Questions: ◮ How do we parametrize the model? ◮ What loss (objective) function should we use to judge the fit? ◮ How do we optimize fit to unseen test data (generalization)? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 6 / 22

  24. Example: Boston Housing data Estimate median house price in a neighborhood based on neighborhood statistics Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 7 / 22

  25. Example: Boston Housing data Estimate median house price in a neighborhood based on neighborhood statistics Look at first possible attribute (feature): per capita crime rate Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 7 / 22

  26. Example: Boston Housing data Estimate median house price in a neighborhood based on neighborhood statistics Look at first possible attribute (feature): per capita crime rate Use this to predict house prices in other neighborhoods Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 7 / 22

  27. Example: Boston Housing data Estimate median house price in a neighborhood based on neighborhood statistics Look at first possible attribute (feature): per capita crime rate Use this to predict house prices in other neighborhoods Is this a good input (attribute) to predict house prices? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 7 / 22

  28. Represent the Data Data is described as pairs D = { ( x (1) , t (1) ) , · · · , ( x ( N ) , t ( N ) ) } ◮ x ∈ R is the input feature (per capita crime rate) ◮ t ∈ R is the target output (median house price) ◮ ( i ) simply indicates the training examples (we have N in this case) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 8 / 22

  29. Represent the Data Data is described as pairs D = { ( x (1) , t (1) ) , · · · , ( x ( N ) , t ( N ) ) } ◮ x ∈ R is the input feature (per capita crime rate) ◮ t ∈ R is the target output (median house price) ◮ ( i ) simply indicates the training examples (we have N in this case) Here t is continuous, so this is a regression problem Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 8 / 22

  30. Represent the Data Data is described as pairs D = { ( x (1) , t (1) ) , · · · , ( x ( N ) , t ( N ) ) } ◮ x ∈ R is the input feature (per capita crime rate) ◮ t ∈ R is the target output (median house price) ◮ ( i ) simply indicates the training examples (we have N in this case) Here t is continuous, so this is a regression problem Model outputs y , an estimate of t y ( x ) = w 0 + w 1 x Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 8 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend