( Y n a bX n ) 2 . n = 1 Thus, Note that E [ X ] = 0 and E [ Y ] - PowerPoint PPT Presentation

X Y CS70: Lecture 35. Review: Linear Regression – Motivation Review: Covariance Definition Example: 100 people. Let ( X n , Y n ) = (height, weight) of person n , for n = 1 ,..., 100: Regression (contd.): Linear and Beyond The covariance of X and Y is 1. Review: Linear Regression (LR), LLSE cov ( X , Y ) := E [( X − E [ X ])( Y − E [ Y ])] . 2. LR: Examples E [ Y ] 3. Beyond LR: Quadratic Regression Fact cov ( X , Y ) = E [ XY ] − E [ X ] E [ Y ] . 4. Conditional Expectation (CE) and properties 5. Non-linear Regression: CE = Minimum Mean-Squared Error (MMSE) The blue line is Y = − 114 . 3 + 106 . 5 X . ( X in meters, Y in kg.) Best linear fit: Linear Regression. Review: Examples of Covariance Review: Linear Regression – Non-Bayesian Review: Linear Least Squares Estimate (LLSE) Definition Definition Given the samples { ( X n , Y n ) , n = 1 ,..., N } , the Linear Given two RVs X and Y with known distribution Regression of Y over X is Pr [ X = x , Y = y ] , the Linear Least Squares Estimate of Y given ˆ X is Y = a + bX ˆ Y = a + bX =: L [ Y | X ] where ( a , b ) minimize where ( a , b ) minimize N g ( a , b ) := E [( Y − a − bX ) 2 ] . ∑ ( Y n − a − bX n ) 2 . n = 1 Thus, ˆ Note that E [ X ] = 0 and E [ Y ] = 0 in these examples. Then Y = a + bX is our guess about Y given X . The squared Thus, ˆ error is ( Y − ˆ Y ) 2 . The LLSE minimizes the expected value of cov ( X , Y ) = E [ XY ] . Y n = a + bX n is our guess about Y n given X n . The squared error is ( Y n − ˆ When cov ( X , Y ) > 0, the RVs X and Y tend to be large or small Y n ) 2 . The LR minimizes the sum of the the squared error. Note: This is a Bayesian formulation: there is together. X and Y are said to be positively correlated. squared errors. Note: This is a non-Bayesian formulation: there a prior. When cov ( X , Y ) < 0, when X is larger, Y tends to be smaller. X and is no prior. Y are said to be negatively correlated. When cov ( X , Y ) = 0, we say that X and Y are uncorrelated.

Review: LR: Non-Bayesian or Uniform? Review: LLSE LR: Illustration Theorem Observe that Consider two RVs X , Y with a given distribution Pr [ X = x , Y = y ] . Then, N Y = E [ Y ]+ cov ( X , Y ) 1 L [ Y | X ] = ˆ ( Y n − a − bX n ) 2 = E [( Y − a − bX ) 2 ] ∑ var ( X ) ( X − E [ X ]) . N n = 1 where one assumes that Non-Bayesian setting: ( X , Y ) = ( X n , Y n ) , w.p. 1 N N E [ X ] = 1 E [ Y ] = 1 N for n = 1 ,..., N . ∑ ∑ X n ; Y n N N n = 1 n = 1 That is, the non-Bayesian LR is equivalent to the Bayesian N N Var [ X ] = E [ X 2 ] − ( E [ X ]) 2 = 1 ( X n ) 2 − ( 1 LLSE that assumes that ( X , Y ) is uniform on the set of ( X n )) 2 ∑ ∑ N N observed samples. n = 1 n = 1 Note that Thus, we can study the two cases LR and LLSE in one shot. N N N Cov ( X , Y ) = E [ XY ] − E [ X ] E [ Y ] = 1 ( X n Y n ) − ( 1 X n )( 1 ◮ the LR line goes through ( E [ X ] , E [ Y ]) However, the interpretations are different! ∑ ∑ ∑ Y n ) N N N ◮ its slope is cov ( X , Y ) var ( X ) . n = 1 n = 1 n = 1 Linear Regression: Examples

Linear Regression: Example 2 Linear Regression: Example 3 Estimation Error We saw that the LLSE of Y given X is Y = E [ Y ]+ cov ( X , Y ) L [ Y | X ] = ˆ ( X − E [ X ]) . var ( X ) How good is this estimator? That is, what is the mean squared estimation error? We find E [ | Y − L [ Y | X ] | 2 ] = E [( Y − E [ Y ] − ( cov ( X , Y ) / var ( X ))( X − E [ X ])) 2 ] = E [( Y − E [ Y ]) 2 ] − 2 ( cov ( X , Y ) / var ( X )) E [( Y − E [ Y ])( X − E [ X ])] We find: We find: +( cov ( X , Y ) / var ( X )) 2 E [( X − E [ X ]) 2 E [ X ] = 0 ; E [ Y ] = 0 ; E [ X 2 ] = 1 / 2 ; E [ XY ] = 1 / 2 ; E [ X ] = 0 ; E [ Y ] = 0 ; E [ X 2 ] = 1 / 2 ; E [ XY ] = − 1 / 2 ; = var ( Y ) − cov ( X , Y ) 2 var [ X ] = E [ X 2 ] − E [ X ] 2 = 1 / 2 ; cov ( X , Y ) = E [ XY ] − E [ X ] E [ Y ] = 1 / 2 ; var [ X ] = E [ X 2 ] − E [ X ] 2 = 1 / 2 ; cov ( X , Y ) = E [ XY ] − E [ X ] E [ Y ] = − 1 / 2 ; . var ( X ) Y = E [ Y ]+ cov ( X , Y ) Y = E [ Y ]+ cov ( X , Y ) LR: ˆ LR: ˆ ( X − E [ X ]) = − X . ( X − E [ X ]) = X . Without observations, the estimate is E [ Y ] = 0. The error is var ( Y ) . var [ X ] var [ X ] Observing X reduces the error. Wrap-up of Linear Regression Beyond Linear Regression: Discussion Nonlinear Regression: Motivation Goal: guess the value of Y in the expected squared error There are many situations where a good guess about Y given X is sense. We know nothing about Y other than its distribution. not linear. Our best guess is? E [ Y ] . E.g., (diameter of object, weight), (school years, income), (PSA level, Linear Regression cancer risk). Now assume we make some observation X related to Y . How do we use that observation to improve our guess about Y ? 1. Linear Regression: L [ Y | X ] = E [ Y ]+ cov ( X , Y ) var ( X ) ( X − E [ X ]) Idea: use a function g ( X ) of the observation to estimate Y . 2. Non-Bayesian: minimize ∑ n ( Y n − a − bX n ) 2 LR: Restriction to linear functions: g ( X ) = a + bX . 3. Bayesian: minimize E [( Y − a − bX ) 2 ] With no such constraints, what is the best g ( X ) ? Answer: E [ Y | X ] . Our goal: explore estimates ˆ Y = g ( X ) for nonlinear functions g ( · ) . This is called the Conditional Expectation (CE).

Quadratic Regression Conditional Expectation Deja vu, all over again? Let X , Y be two random variables defined on the same probability Have we seen this before? Yes. space. Is anything new? Yes. Definition: The quadratic regression of Y over X is the random Definition Let X and Y be RVs on Ω . The conditional variable expectation of Y given X is defined as The idea of defining g ( x ) = E [ Y | X = x ] and then Q [ Y | X ] = a + bX + cX 2 E [ Y | X ] = g ( X ) . E [ Y | X ] = g ( X ) where a , b , c are chosen to minimize E [( Y − a − bX − cX 2 ) 2 ] . Big deal? Quite! Simple but most convenient. Derivation: We set to zero the derivatives w.r.t. a , b , c . We get where Recall that L [ Y | X ] = a + bX is a function of X . E [ Y − a − bX − cX 2 ] 0 = g ( x ) := E [ Y | X = x ] := ∑ This is similar: E [ Y | X ] = g ( X ) for some function g ( · ) . yPr [ Y = y | X = x ] . E [( Y − a − bX − cX 2 ) X ] 0 = y In general, g ( X ) is not linear, i.e., not a + bX . It could be that E [( Y − a − bX − cX 2 ) X 2 ] 0 = g ( X ) = a + bX + cX 2 . Or that g ( X ) = 2 sin( 4 X )+exp {− 3 X } . Or something else. We solve these three equations in the three unknowns ( a , b , c ) . Properties of CE Calculating E [ Y | X ] CE = MMSE (Conditional Expectation = Minimum Mean Squared Error) Theorem Let X , Y , Z be i.i.d. with mean 0 and variance 1. We want to g ( X ) := E [ Y | X ] is the function of X that minimizes calculate E [( Y − g ( X )) 2 ] . E [ Y | X = x ] = ∑ E [ 2 + 5 X + 7 XY + 11 X 2 + 13 X 3 Z 2 | X ] . yPr [ Y = y | X = x ] That is, E [ Y | X ] is the ‘best’ guess about Y based on X . y Specifically, it is the function g ( X ) of X that Theorem minimizes E [( Y − g ( X )) 2 ] . We find (a) X , Y independent ⇒ E [ Y | X ] = E [ Y ] ; E [ 2 + 5 X + 7 XY + 11 X 2 + 13 X 3 Z 2 | X ] (b) E [ aY + bZ | X ] = aE [ Y | X ]+ bE [ Z | X ] ; = 2 + 5 X + 7 XE [ Y | X ]+ 11 X 2 + 13 X 3 E [ Z 2 | X ] (c) E [ Yh ( X ) | X ] = h ( X ) E [ Y | X ] , ∀ h ( · ) ; = 2 + 5 X + 7 XE [ Y ]+ 11 X 2 + 13 X 3 E [ Z 2 ] (d) E [ E [ Y | X ]] = E [ Y ] . = 2 + 5 X + 11 X 2 + 13 X 3 ( var [ Z ]+ E [ Z ] 2 ) = 2 + 5 X + 11 X 2 + 13 X 3 .

Summary Linear and Non-Linear Regression: Conditional Expectation ◮ Linear Regression: L [ Y | X ] = E [ Y ]+ cov ( X , Y ) var ( X ) ( X − E [ X ]) ◮ Non-linear Regression: MMSE: E [ Y | X ] minimizes E [( Y − g ( X )) 2 ] over all g ( · ) ◮ Definition: E [ Y | X ] := ∑ y yPr [ Y = y | X = x ]

( Y n a bX n ) 2 . n = 1 Thus, Note that E [ X ] = 0 and E [ Y ] - PowerPoint PPT Presentation

X Y CS70: Lecture 35. Review: Linear Regression Motivation Review: Covariance Definition Example: 100 people. Let ( X n , Y n ) = (height, weight) of person n , for n = 1 ,..., 100: Regression (contd.): Linear and Beyond The covariance of

Israel: Israel: Past, Present, and Past, Present, and Future Future Ezekiel 5:5 Thus says

A Random History of Mining A Random History of Mining (And thus the industrial revolution) (And

AEAC U S & RHADAM AN THUS MC4BSM FNAL May 18-20, 2015 Joel W. Walker SHSU

Note-8-3-for-talk Note-8-3-for-talk Current Interactive Session ACL2 Version 8.3 (April, 2020)

VA Refinance Cash Out Updates Review March 13, 2019 Note, these are informational webinars only

Note 2 Flash By: Adrian Sham (adrsham) and Trey Anderson (treyman) A note taking program that

x agent with these beliefs and these My preferences preferences to make. Justin C. Fisher

Drums, Tempo, and Nested Repeats Drums! The play note block plays a percussion note. It has

53 Note 2 OCBOA Happy Public Library OCBOA 12/31/2015 Interest Calculation for Note Disclosure

If we control these we can monitor & influence these If we control these

Lectures for 3rd Edition Note: these lectures are often supplemented with other materials and

Cautionary Note Cautionary Note on Forward on Forward- -Looking Statements Looking Statements

Cautionary Note Cautionary Note on Forward on Forward- -Looking Statements Looking Statements

There is a life cycle of stress Prenatal Early life Adulthood And so on. Thus,

Scale of the problem parallel with the provision of education, training and information. Thus,

Refrigeration In ancient times... The use of ice to refrigerate and thus preserve food goes back

Day 6: Model Selection II Lucas Leemann Essex Summer School Introduction to Statistical Learning

CSE 158 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression

Introduction to Machine Learning Evaluation: Measures for Regression Learning goals Know the

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 1 From last time 2 Linear

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

CSC 411: Lecture 10: Neural Networks I Class based on Raquel Urtasun & Rich Zemels lectures

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

( Y n a bX n ) 2 . n = 1 Thus, Note that E [ X ] = 0 and E [ Y ] - PowerPoint PPT Presentation

X Y CS70: Lecture 35. Review: Linear Regression Motivation Review: Covariance Definition Example: 100 people. Let ( X n , Y n ) = (height, weight) of person n , for n = 1 ,..., 100: Regression (contd.): Linear and Beyond The covariance of

Israel: Israel: Past, Present, and Past, Present, and Future Future Ezekiel 5:5 Thus says

A Random History of Mining A Random History of Mining (And thus the industrial revolution) (And

AEAC U S &amp; RHADAM AN THUS MC4BSM FNAL May 18-20, 2015 Joel W. Walker SHSU

Note-8-3-for-talk Note-8-3-for-talk Current Interactive Session ACL2 Version 8.3 (April, 2020)

VA Refinance Cash Out Updates Review March 13, 2019 Note, these are informational webinars only

Note 2 Flash By: Adrian Sham (adrsham) and Trey Anderson (treyman) A note taking program that

x agent with these beliefs and these My preferences preferences to make. Justin C. Fisher

Drums, Tempo, and Nested Repeats Drums! The play note block plays a percussion note. It has

53 Note 2 OCBOA Happy Public Library OCBOA 12/31/2015 Interest Calculation for Note Disclosure

If we control these we can monitor &amp; influence these If we control these

Lectures for 3rd Edition Note: these lectures are often supplemented with other materials and

Cautionary Note Cautionary Note on Forward on Forward- -Looking Statements Looking Statements

Cautionary Note Cautionary Note on Forward on Forward- -Looking Statements Looking Statements

There is a life cycle of stress Prenatal Early life Adulthood And so on. Thus,

Scale of the problem parallel with the provision of education, training and information. Thus,

Refrigeration In ancient times... The use of ice to refrigerate and thus preserve food goes back

Day 6: Model Selection II Lucas Leemann Essex Summer School Introduction to Statistical Learning

CSE 158 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression

Introduction to Machine Learning Evaluation: Measures for Regression Learning goals Know the

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 1 From last time 2 Linear

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

CSC 411: Lecture 10: Neural Networks I Class based on Raquel Urtasun &amp; Rich Zemels lectures

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

AEAC U S & RHADAM AN THUS MC4BSM FNAL May 18-20, 2015 Joel W. Walker SHSU

If we control these we can monitor & influence these If we control these

CSC 411: Lecture 10: Neural Networks I Class based on Raquel Urtasun & Rich Zemels lectures