Linear Regression Machine Learning 10-601 Seyoung Kim - PowerPoint PPT Presentation

Linear ¡Regression ¡ Machine ¡Learning ¡10-‑601 ¡ Seyoung ¡Kim ¡ Many ¡of ¡these ¡slides ¡are ¡derived ¡from ¡Tom ¡ Mitchell. ¡Thanks! ¡

Regression ¡ • So ¡far, ¡we’ve ¡been ¡interested ¡in ¡learning ¡P(Y|X) ¡where ¡Y ¡has ¡ discrete ¡values ¡(called ¡‘classificaLon’) ¡ • What ¡if ¡Y ¡is ¡conLnuous? ¡(called ¡‘regression’) ¡ – predict ¡weight ¡from ¡gender, ¡height, ¡age, ¡… ¡ – predict ¡Google ¡stock ¡price ¡today ¡from ¡Google, ¡Yahoo, ¡MSFT ¡prices ¡ yesterday ¡ – predict ¡each ¡pixel ¡intensity ¡in ¡robot’s ¡current ¡camera ¡image, ¡from ¡ previous ¡image ¡and ¡previous ¡acLon ¡

Supervised ¡Learning ¡ • Wish ¡to ¡learn ¡f:X � Y, ¡given ¡observaLons ¡for ¡both ¡X ¡and ¡Y ¡in ¡ training ¡data ¡-‑ ¡Supervised ¡learning ¡ – ClassificaLon: ¡Y ¡is ¡discrete ¡ – Regression: ¡Y ¡is ¡conLnuous ¡ ¡

Regression ¡ Wish ¡to ¡learn ¡f:X � Y, ¡where ¡Y ¡is ¡real, ¡given ¡{<x 1 ,y 1 >…<x N ,y N >} ¡ • Approach: ¡ • ¡ 1. ¡choose ¡some ¡parameterized ¡form ¡for ¡P(Y|X; ¡θ) ¡ ¡ ¡ ¡ ¡ ¡ ¡(θ ¡is ¡the ¡vector ¡of ¡parameters) ¡ 2. ¡derive ¡learning ¡algorithm ¡as ¡MLE ¡or ¡MAP ¡esLmate ¡for ¡θ ¡

1. ¡Choose ¡parameterized ¡form ¡for ¡P(Y|X; ¡θ) ¡ Y ¡ X ¡ Assume ¡Y ¡is ¡some ¡determinisLc ¡f(X), ¡plus ¡random ¡noise ¡ε ¡ • where ¡ Therefore ¡Y ¡is ¡a ¡random ¡variable ¡that ¡follows ¡the ¡distribuLon ¡ • The ¡expected ¡value ¡of ¡y ¡for ¡any ¡given ¡x ¡is ¡ E p( y | x ) [ y ]= f ( x ) •

1. ¡Choose ¡parameterized ¡form ¡for ¡P(Y|X; ¡θ) ¡ Y ¡ X ¡ Assume ¡Y ¡is ¡some ¡determinisLc ¡f(X), ¡plus ¡random ¡noise ¡ε ¡ • where ¡ Assume ¡a ¡linear ¡funcLon ¡for ¡ f ( x ) •

1. ¡Choose ¡parameterized ¡form ¡for ¡P(Y|X; ¡θ) ¡ Y ¡ X ¡ Assume ¡a ¡linear ¡funcLon ¡for ¡ f ( x ) •

2. ¡How ¡can ¡We ¡Learn ¡Linear ¡Regression ¡Parameters? ¡ Given ¡the ¡linear ¡regression ¡model ¡ • NotaLon: ¡to ¡make ¡our ¡parameters ¡explicit, ¡let’s ¡write ¡using ¡vector ¡ – notaLon ¡ ⎛ ⎞ ω 0 W = ⎜ ⎟ ω 1 ⎜ ⎟ : ⎜ ⎟ ⎜ ⎟ ω J ⎜ ⎟ ⎝ ⎠ Given ¡a ¡training ¡dataset ¡of ¡ N ¡samples ¡{< x 1 ,y 1 >…< x N ,y N >} ¡ • y l : ¡a ¡univariate ¡real ¡value ¡for ¡the ¡ l -‑th ¡sample ¡ – x l : ¡a ¡vector ¡of ¡ J ¡features ¡for ¡the ¡ l -‑th ¡sample ¡ ¡ – How ¡can ¡we ¡learn ¡W ¡from ¡the ¡training ¡data? ¡ •

2. ¡How ¡can ¡We ¡Learn ¡Linear ¡Regression ¡Parameters? ¡ How ¡can ¡we ¡learn ¡W ¡from ¡the ¡training ¡data ¡( y l , x l ), ¡where ¡ l =1, • …N ¡for ¡N ¡samples? ¡ Maximum ¡CondiIonal ¡Likelihood ¡ EsImate! ¡ ¡ ¡ ¡ ¡ ¡ ¡ where ¡

2. ¡How ¡can ¡We ¡Learn ¡Linear ¡Regression ¡Parameters? ¡ Learn ¡Maximum ¡CondiLonal ¡Likelihood ¡EsLmate ¡ • ¡ ¡ where ¡ Thus, ¡the ¡condiLonal ¡log-‑likelihood ¡is ¡given ¡as ¡ • ⎡ y l − f ( x l ; W ) ⎤ 2 ⎛ ⎞ 2 πσ 2 − 1 1 ∑ ⎢ ⎥ ln ⎜ ⎟ 2 σ ⎝ ⎠ ⎢ ⎥ ⎣ ⎦ l Constant ¡with ¡respect ¡to ¡W ¡

2. ¡How ¡can ¡We ¡Learn ¡Linear ¡Regression ¡Parameters? ¡ Learn ¡Maximum ¡CondiLonal ¡Likelihood ¡EsLmate ¡ • ¡ ¡ where ¡ Thus, ¡the ¡condiLonal ¡log-‑likelihood ¡is ¡given ¡as ¡ • ⎡ y l − f ( x l ; W ) ⎤ 2 ⎛ ⎞ 2 πσ 2 − 1 1 ∑ ⎢ ⎥ ln ⎜ ⎟ 2 σ ⎝ ⎠ ⎢ ⎥ ⎣ ⎦ l

2. ¡How ¡can ¡We ¡Learn ¡Linear ¡Regression ¡Parameters? ¡ Learn ¡Maximum ¡CondiLonal ¡Likelihood ¡EsLmate ¡ • • Maximum ¡condiLonal ¡likelihood ¡esLmate ¡is ¡also ¡called ¡least ¡ squared-‑error ¡esLmate ¡ • MLE ¡provides ¡a ¡probabilisLc ¡interpretaLon ¡of ¡least ¡squared-‑ error ¡esLmate ¡

Vector/Matrix ¡RepresentaIon ¡ • Rewrite ¡the ¡linear ¡regression ¡model ¡for ¡training ¡data ¡using ¡ vector/matrix ¡representaLon ¡ y = X W + ε ¡ Augmented ¡input ¡feature ¡ corresponding ¡to ¡w 0 ¡ J ¡input ¡features ¡ ⎛ ⎞ ⎛ ⎞ 1 1 ω 0 ⎛ ⎞ 1 x 1 ... x J y 1 ⎜ ⎟ N ¡samples ¡ ⎜ ⎟ N ¡samples ¡ ⎜ ⎟ ω 1 y = X = W = ⎜ ⎟ : : : : : ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ N N y N ω J 1 x 1 ... x J ⎝ ⎠ ⎝ ⎠ ⎜ ⎟ ⎝ ⎠

2. ¡How ¡can ¡We ¡Learn ¡Linear ¡Regression ¡Parameters? ¡ Learn ¡Maximum ¡CondiLonal ¡Likelihood ¡EsLmate ¡ • = arg min ( y - X W) T ( y - X W) Re-‑write ¡using ¡vector ¡representaLons ¡of ¡ N ¡samples ¡in ¡data ¡ ¡ J ¡input ¡features ¡ ⎛ ⎞ y 1 ⎛ ⎞ 1 1 1 x 1 ... x J N ¡samples ¡ ⎜ ⎟ ⎜ ⎟ y = : X = : : : ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ y N ⎜ ⎟ ⎝ ⎠ N N 1 x 1 ... x J ⎝ ⎠

2. ¡How ¡can ¡We ¡Learn ¡Linear ¡Regression ¡Parameters? ¡ Learn ¡Maximum ¡CondiLonal ¡Likelihood ¡EsLmate ¡ • W MCLE = arg min ( y - X W) T ( y – X W) Re-‑write ¡using ¡vector ¡representaLons ¡of ¡ N ¡samples ¡in ¡data ¡ ¡ J ¡input ¡features ¡ ⎛ ⎞ 1 1 ⎛ ⎞ y 1 1 x 1 ... x J N ¡samples ¡ ⎜ ⎟ ⎜ ⎟ y = X = : : : : ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ y N N N 1 x 1 ... x J ⎝ ⎠ ⎝ ⎠ ( y - X W) T ( y – X W) = 0 δ δ W

2. ¡How ¡can ¡We ¡Learn ¡Linear ¡Regression ¡Parameters? ¡ Learn ¡Maximum ¡CondiLonal ¡Likelihood ¡EsLmate ¡ • ( y - X W) T ( y – X W) δ δ W = 2 X T ( y – X W) = 0 ¡ ¡ W MCLE = ( X T X ) − 1 X T y

Comments ¡on ¡Training ¡Linear ¡Regression ¡ Models ¡ • Least ¡squared ¡error ¡method ¡ W MCLE = ( X T X ) − 1 X T y – A ¡single ¡equaLon ¡for ¡compuLng ¡the ¡esLmate ¡(i.e., ¡a ¡closed-‑form ¡ soluLon ¡for ¡MLE ¡esLmate) ¡ – When ¡the ¡dataset ¡is ¡extremely ¡large, ¡compuLng ¡ X T X ¡and ¡inverLng ¡it ¡ can ¡be ¡costly ¡especially ¡for ¡streaming ¡data ¡ • AlternaLvely, ¡gradient ¡descent ¡method ¡ – Works ¡well ¡on ¡large ¡datasets ¡

Training ¡Linear ¡Regression ¡with ¡Gradient ¡Descent ¡ Learn ¡Maximum ¡CondiLonal ¡Likelihood ¡EsLmate ¡ Can ¡we ¡derive ¡gradient ¡descent ¡rule ¡for ¡training? ¡

¡Gradient ¡Descent: ¡ ¡ Batch ¡gradient : ¡use ¡error ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡over ¡enLre ¡training ¡set ¡ D Do until satisfied: 1. Compute the gradient 2. Update the vector of parameters: Stochas5c ¡gradient : ¡use ¡error ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡over ¡single ¡examples Do until satisfied: 1. Choose (with replacement) a random training example 2. Compute the gradient just for : 3. Update the vector of parameters: StochasLc ¡approximates ¡Batch ¡arbitrarily ¡closely ¡as ¡ StochasLc ¡can ¡be ¡much ¡faster ¡when ¡ D ¡is ¡very ¡large ¡ Intermediate ¡approach: ¡use ¡error ¡over ¡subsets ¡of ¡ D ¡ ¡

Training ¡Linear ¡Regression ¡with ¡Gradient ¡Descent ¡ Learn ¡Maximum ¡CondiLonal ¡Likelihood ¡EsLmate ¡ • Can ¡we ¡derive ¡gradient ¡descent ¡rule ¡for ¡training? ¡ • And ¡if ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡… ¡ Gradient ¡descent ¡rule: ¡ ¡

Linear Regression Machine Learning 10-601 Seyoung Kim - PowerPoint PPT Presentation

Linear Regression Machine Learning 10-601 Seyoung Kim Many of these slides are derived from Tom Mitchell. Thanks! Regression So far, weve been

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Social and socio-economic assessment Final Workshop, 28 April 2020 Enrica Imbert

Training Cycles, Sport Science & Dry-Land Co Coaching Youth At Athletes Coach Widmer to

Why? Cost saving? Return on Investment? Energy

The Hydrosphere Like air, water is essential to life as we know it Aesthetic appearance of

Sensory receptors Unencapsulated receptors Encapsulated receptors Have connective tissue capsule

Developmental Neurology: A Humans First 12 Months Laboratory for Perceptual Robotics

Review of Infant Deaths due to Congenital Anomalies Wednesday, February 19, 2020 2:00 PM

Rare cancers What is the problem, and how big is it? Rare (orphan) diseases NIH Office for Rare

Linear Regression Machine Learning 10-601 Seyoung Kim - PowerPoint PPT Presentation

Linear Regression Machine Learning 10-601 Seyoung Kim Many of these slides are derived from Tom Mitchell. Thanks! Regression So far, weve been

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Social and socio-economic assessment Final Workshop, 28 April 2020 Enrica Imbert

Training Cycles, Sport Science &amp; Dry-Land Co Coaching Youth At Athletes Coach Widmer to

Why? Cost saving? Return on Investment? Energy

The Hydrosphere Like air, water is essential to life as we know it Aesthetic appearance of

Sensory receptors Unencapsulated receptors Encapsulated receptors Have connective tissue capsule

Developmental Neurology: A Humans First 12 Months Laboratory for Perceptual Robotics

Review of Infant Deaths due to Congenital Anomalies Wednesday, February 19, 2020 2:00 PM

Rare cancers What is the problem, and how big is it? Rare (orphan) diseases NIH Office for Rare

Training Cycles, Sport Science & Dry-Land Co Coaching Youth At Athletes Coach Widmer to