CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun - PowerPoint PPT Presentation

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun & Rich Zemel’s lectures Sanja Fidler University of Toronto Jan 13, 2016 (Most plots in this lecture are from Bishop’s book) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 1 / 22

Problems for Today What should I watch this Friday? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

Problems for Today Goal : Predict movie rating automatically! Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

Problems for Today Goal: How many followers will I get? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

Problems for Today Goal: Predict the price of the house Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 2 / 22

Regression What do all these problems have in common? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) ◮ Training examples, many x ( i ) for which t ( i ) is known (eg, many movies for which we know the rating) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) ◮ Training examples, many x ( i ) for which t ( i ) is known (eg, many movies for which we know the rating) ◮ A model, a function that represents the relationship between x and t Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) ◮ Training examples, many x ( i ) for which t ( i ) is known (eg, many movies for which we know the rating) ◮ A model, a function that represents the relationship between x and t ◮ A loss or a cost or an objective function, which tells us how well our model approximates the training examples Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

Regression What do all these problems have in common? ◮ Continuous outputs, we’ll call these t (eg, a rating: a real number between 0-10, # of followers, house price) What do I need in order to predict these outputs? Predicting continuous outputs is called regression ◮ Features (inputs), we’ll call these x (or x if vectors) ◮ Training examples, many x ( i ) for which t ( i ) is known (eg, many movies for which we know the rating) ◮ A model, a function that represents the relationship between x and t ◮ A loss or a cost or an objective function, which tells us how well our model approximates the training examples ◮ Optimization, a way of finding the parameters of our model that minimizes the loss function Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 3 / 22

Today: Linear Regression Linear regression ◮ continuous outputs ◮ simple model (linear) Introduce key concepts: ◮ loss functions ◮ generalization ◮ optimization ◮ model complexity ◮ regularization Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 4 / 22

Simple 1-D regression Circles are data points (i.e., training examples) that are given to us Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 5 / 22

Simple 1-D regression Circles are data points (i.e., training examples) that are given to us The data points are uniform in x , but may be displaced in y t ( x ) = f ( x ) + ǫ with ǫ some noise Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 5 / 22

Simple 1-D regression Circles are data points (i.e., training examples) that are given to us The data points are uniform in x , but may be displaced in y t ( x ) = f ( x ) + ǫ with ǫ some noise In green is the ”true” curve that we don’t know Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 5 / 22

Simple 1-D regression Circles are data points (i.e., training examples) that are given to us The data points are uniform in x , but may be displaced in y t ( x ) = f ( x ) + ǫ with ǫ some noise In green is the ”true” curve that we don’t know Goal: We want to fit a curve to these points Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 5 / 22

Simple 1-D regression Key Questions: Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 6 / 22

Simple 1-D regression Key Questions: ◮ How do we parametrize the model? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 6 / 22

Simple 1-D regression Key Questions: ◮ How do we parametrize the model? ◮ What loss (objective) function should we use to judge the fit? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 6 / 22

Simple 1-D regression Key Questions: ◮ How do we parametrize the model? ◮ What loss (objective) function should we use to judge the fit? ◮ How do we optimize fit to unseen test data (generalization)? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 6 / 22

Example: Boston Housing data Estimate median house price in a neighborhood based on neighborhood statistics Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 7 / 22

Example: Boston Housing data Estimate median house price in a neighborhood based on neighborhood statistics Look at first possible attribute (feature): per capita crime rate Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 7 / 22

Example: Boston Housing data Estimate median house price in a neighborhood based on neighborhood statistics Look at first possible attribute (feature): per capita crime rate Use this to predict house prices in other neighborhoods Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 7 / 22

Example: Boston Housing data Estimate median house price in a neighborhood based on neighborhood statistics Look at first possible attribute (feature): per capita crime rate Use this to predict house prices in other neighborhoods Is this a good input (attribute) to predict house prices? Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 7 / 22

Represent the Data Data is described as pairs D = { ( x (1) , t (1) ) , · · · , ( x ( N ) , t ( N ) ) } ◮ x ∈ R is the input feature (per capita crime rate) ◮ t ∈ R is the target output (median house price) ◮ ( i ) simply indicates the training examples (we have N in this case) Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 8 / 22

Represent the Data Data is described as pairs D = { ( x (1) , t (1) ) , · · · , ( x ( N ) , t ( N ) ) } ◮ x ∈ R is the input feature (per capita crime rate) ◮ t ∈ R is the target output (median house price) ◮ ( i ) simply indicates the training examples (we have N in this case) Here t is continuous, so this is a regression problem Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 8 / 22

Represent the Data Data is described as pairs D = { ( x (1) , t (1) ) , · · · , ( x ( N ) , t ( N ) ) } ◮ x ∈ R is the input feature (per capita crime rate) ◮ t ∈ R is the target output (median house price) ◮ ( i ) simply indicates the training examples (we have N in this case) Here t is continuous, so this is a regression problem Model outputs y , an estimate of t y ( x ) = w 0 + w 1 x Urtasun, Zemel, Fidler (UofT) CSC 411: 02-Regression Jan 13, 2016 8 / 22

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun - PowerPoint PPT Presentation

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun & Rich Zemels lectures Sanja Fidler University of Toronto Jan 13, 2016 (Most plots in this lecture are from Bishops book) Urtasun, Zemel, Fidler (UofT) CSC 411:

CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 19: Bayesian Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411 Lecture 8: Linear Classification II Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411 Lecture 20: Gaussian Processes Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 20: Closing Thoughts Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411: Lecture 19: Reinforcement Learning Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 08: Generative Models for Classification Class based on Raquel Urtasun &

CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel

CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun & Rich Zemels lectures

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun & Rich Zemels lectures Sanja

CSC 411 Lecture 9: SVMs and Boosting Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

FCOT Portfolio detail as at 31 March 2014 Portfolio details Singapore & Australia focus

In Praise of Snapshots Ravi Kanbur www.Kanbur.Dyson.cornell.edu UNU-WIDER, Helsinki, 5

[3] The Vector Space Linear Combinations An expression 1 v 1 + + n v n is a linear

CS 161: Computer Security Prof. Vern Paxson TAs: Jethro Beekman, Mobin Javed, Antonio Lupher,

60% are racial or ethnic minorities Over 50% of have symptoms of serious mental illness More

Spiritual Inspiration Introduction Beauty - Mklppen SW tip of Sweden v 2.5 Normal book

LNG IN DEVENTER JOOST DE RUIJTER 10 oktober 2019 CLEANER AND MORE SUSTAINABLE Transition from

Challenges in Antimicrobial Clinical Development Axel Dalhoff & Heino Sta Institut fr

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun - PowerPoint PPT Presentation

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun & Rich Zemels lectures Sanja Fidler University of Toronto Jan 13, 2016 (Most plots in this lecture are from Bishops book) Urtasun, Zemel, Fidler (UofT) CSC 411:

CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 19: Bayesian Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411 Lecture 8: Linear Classification II Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411 Lecture 20: Gaussian Processes Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 20: Closing Thoughts Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411: Lecture 19: Reinforcement Learning Class based on Raquel Urtasun &amp; Rich Zemels

CSC 411: Lecture 08: Generative Models for Classification Class based on Raquel Urtasun &amp;

CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411: Lecture 14: Principal Components Analysis &amp; Autoencoders Class based on Raquel

CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun &amp; Rich Zemels

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun &amp; Rich Zemels lectures

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun &amp; Rich Zemels lectures Sanja

CSC 411 Lecture 9: SVMs and Boosting Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

FCOT Portfolio detail as at 31 March 2014 Portfolio details Singapore &amp; Australia focus

In Praise of Snapshots Ravi Kanbur www.Kanbur.Dyson.cornell.edu UNU-WIDER, Helsinki, 5

[3] The Vector Space Linear Combinations An expression 1 v 1 + + n v n is a linear

CS 161: Computer Security Prof. Vern Paxson TAs: Jethro Beekman, Mobin Javed, Antonio Lupher,

60% are racial or ethnic minorities Over 50% of have symptoms of serious mental illness More

Spiritual Inspiration Introduction Beauty - Mklppen SW tip of Sweden v 2.5 Normal book

LNG IN DEVENTER JOOST DE RUIJTER 10 oktober 2019 CLEANER AND MORE SUSTAINABLE Transition from

Challenges in Antimicrobial Clinical Development Axel Dalhoff &amp; Heino Sta Institut fr

CSC 411: Lecture 19: Reinforcement Learning Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 08: Generative Models for Classification Class based on Raquel Urtasun &

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel

CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun & Rich Zemels lectures

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun & Rich Zemels lectures Sanja

FCOT Portfolio detail as at 31 March 2014 Portfolio details Singapore & Australia focus

Challenges in Antimicrobial Clinical Development Axel Dalhoff & Heino Sta Institut fr