CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: - PowerPoint PPT Presentation

Estimation Error We saw that the LLSE of Y given X is Y = E [ Y ]+ cov ( X , Y ) L [ Y | X ] = ˆ ( X − E [ X ]) . var ( X )

Estimation Error We saw that the LLSE of Y given X is Y = E [ Y ]+ cov ( X , Y ) L [ Y | X ] = ˆ ( X − E [ X ]) . var ( X ) How good is this estimator?

Estimation Error We saw that the LLSE of Y given X is Y = E [ Y ]+ cov ( X , Y ) L [ Y | X ] = ˆ ( X − E [ X ]) . var ( X ) How good is this estimator? That is, what is the mean squared estimation error?

Estimation Error We saw that the LLSE of Y given X is Y = E [ Y ]+ cov ( X , Y ) L [ Y | X ] = ˆ ( X − E [ X ]) . var ( X ) How good is this estimator? That is, what is the mean squared estimation error? We find E [ | Y − L [ Y | X ] | 2 ] = E [( Y − E [ Y ] − ( cov ( X , Y ) / var ( X ))( X − E [ X ])) 2 ] = E [( Y − E [ Y ]) 2 ] − 2 ( cov ( X , Y ) / var ( X )) E [( Y − E [ Y ])( X − E [ X ])] +( cov ( X , Y ) / var ( X )) 2 E [( X − E [ X ]) 2 = var ( Y ) − cov ( X , Y ) 2 . var ( X ) Without observations, the estimate is E [ Y ] = 0. The error is var ( Y ) . Observing X reduces the error.

Wrap-up of Linear Regression Linear Regression

Wrap-up of Linear Regression Linear Regression 1. Linear Regression: L [ Y | X ] = E [ Y ]+ cov ( X , Y ) var ( X ) ( X − E [ X ])

Wrap-up of Linear Regression Linear Regression 1. Linear Regression: L [ Y | X ] = E [ Y ]+ cov ( X , Y ) var ( X ) ( X − E [ X ]) 2. Non-Bayesian: minimize ∑ n ( Y n − a − bX n ) 2

Wrap-up of Linear Regression Linear Regression 1. Linear Regression: L [ Y | X ] = E [ Y ]+ cov ( X , Y ) var ( X ) ( X − E [ X ]) 2. Non-Bayesian: minimize ∑ n ( Y n − a − bX n ) 2 3. Bayesian: minimize E [( Y − a − bX ) 2 ]

Beyond Linear Regression: Discussion

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is?

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] .

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] . Now assume we make some observation X related to Y .

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] . Now assume we make some observation X related to Y . How do we use that observation to improve our guess about Y ?

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] . Now assume we make some observation X related to Y . How do we use that observation to improve our guess about Y ? Idea: use a function g ( X ) of the observation to estimate Y .

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] . Now assume we make some observation X related to Y . How do we use that observation to improve our guess about Y ? Idea: use a function g ( X ) of the observation to estimate Y . LR: Restriction to linear functions: g ( X ) = a + bX .

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] . Now assume we make some observation X related to Y . How do we use that observation to improve our guess about Y ? Idea: use a function g ( X ) of the observation to estimate Y . LR: Restriction to linear functions: g ( X ) = a + bX . With no such constraints, what is the best g ( X ) ?

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] . Now assume we make some observation X related to Y . How do we use that observation to improve our guess about Y ? Idea: use a function g ( X ) of the observation to estimate Y . LR: Restriction to linear functions: g ( X ) = a + bX . With no such constraints, what is the best g ( X ) ? Answer:

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] . Now assume we make some observation X related to Y . How do we use that observation to improve our guess about Y ? Idea: use a function g ( X ) of the observation to estimate Y . LR: Restriction to linear functions: g ( X ) = a + bX . With no such constraints, what is the best g ( X ) ? Answer: E [ Y | X ] .

Beyond Linear Regression: Discussion Goal: guess the value of Y in the expected squared error sense. We know nothing about Y other than its distribution. Our best guess is? E [ Y ] . Now assume we make some observation X related to Y . How do we use that observation to improve our guess about Y ? Idea: use a function g ( X ) of the observation to estimate Y . LR: Restriction to linear functions: g ( X ) = a + bX . With no such constraints, what is the best g ( X ) ? Answer: E [ Y | X ] . This is called the Conditional Expectation (CE).

Nonlinear Regression: Motivation

Nonlinear Regression: Motivation There are many situations where a good guess about Y given X is not linear.

Nonlinear Regression: Motivation There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight),

Nonlinear Regression: Motivation There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income),

Nonlinear Regression: Motivation There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk).

Nonlinear Regression: Motivation There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal:

Nonlinear Regression: Motivation There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: explore estimates ˆ Y = g ( X ) for nonlinear functions g ( · ) .

Quadratic Regression

Quadratic Regression Let X , Y be two random variables defined on the same probability space.

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition:

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable Q [ Y | X ] = a + bX + cX 2

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable Q [ Y | X ] = a + bX + cX 2 where a , b , c are chosen to minimize E [( Y − a − bX − cX 2 ) 2 ] .

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable Q [ Y | X ] = a + bX + cX 2 where a , b , c are chosen to minimize E [( Y − a − bX − cX 2 ) 2 ] . Derivation:

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable Q [ Y | X ] = a + bX + cX 2 where a , b , c are chosen to minimize E [( Y − a − bX − cX 2 ) 2 ] . Derivation: We set to zero the derivatives w.r.t. a , b , c .

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable Q [ Y | X ] = a + bX + cX 2 where a , b , c are chosen to minimize E [( Y − a − bX − cX 2 ) 2 ] . Derivation: We set to zero the derivatives w.r.t. a , b , c . We get E [ Y − a − bX − cX 2 ] 0 =

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable Q [ Y | X ] = a + bX + cX 2 where a , b , c are chosen to minimize E [( Y − a − bX − cX 2 ) 2 ] . Derivation: We set to zero the derivatives w.r.t. a , b , c . We get E [ Y − a − bX − cX 2 ] 0 = E [( Y − a − bX − cX 2 ) X ] 0 =

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable Q [ Y | X ] = a + bX + cX 2 where a , b , c are chosen to minimize E [( Y − a − bX − cX 2 ) 2 ] . Derivation: We set to zero the derivatives w.r.t. a , b , c . We get E [ Y − a − bX − cX 2 ] 0 = E [( Y − a − bX − cX 2 ) X ] 0 = E [( Y − a − bX − cX 2 ) X 2 ] 0 =

Quadratic Regression Let X , Y be two random variables defined on the same probability space. Definition: The quadratic regression of Y over X is the random variable Q [ Y | X ] = a + bX + cX 2 where a , b , c are chosen to minimize E [( Y − a − bX − cX 2 ) 2 ] . Derivation: We set to zero the derivatives w.r.t. a , b , c . We get E [ Y − a − bX − cX 2 ] 0 = E [( Y − a − bX − cX 2 ) X ] 0 = E [( Y − a − bX − cX 2 ) X 2 ] 0 = We solve these three equations in the three unknowns ( a , b , c ) .

Conditional Expectation Definition Let X and Y be RVs on Ω .

Conditional Expectation Definition Let X and Y be RVs on Ω . The conditional expectation of Y given X is defined as E [ Y | X ] = g ( X )

Conditional Expectation Definition Let X and Y be RVs on Ω . The conditional expectation of Y given X is defined as E [ Y | X ] = g ( X ) where g ( x ) := E [ Y | X = x ] := ∑ yPr [ Y = y | X = x ] . y

Deja vu, all over again? Have we seen this before?

Deja vu, all over again? Have we seen this before? Yes.

Deja vu, all over again? Have we seen this before? Yes. Is anything new?

Deja vu, all over again? Have we seen this before? Yes. Is anything new? Yes.

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: - PowerPoint PPT Presentation

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.): Linear and Beyond 1. Review: Linear Regression (LR), LLSE 2. LR: Examples 3. Beyond LR: Quadratic Regression 4. Conditional Expectation (CE) and

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

CS70: Jean Walrand: Lecture 36. Continuous Probability 3 CS70: Jean Walrand: Lecture 36.

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34.

CS70: Jean Walrand: Lecture 24. Changing your mind? CS70: Jean Walrand: Lecture 24. Changing

CS70: Jean Walrand: Lecture 22. How to model uncertainty? CS70: Jean Walrand: Lecture 22. How to

CS70: Jean Walrand: Lecture 37. Statistics are Confusing; Whats next CS70: Jean Walrand:

A Random Walk through CS70 CS70 Summer 2016 - Lecture 8B David Dinh 09 August 2016 UC Berkeley

A Random Walk through CS70 CS70 Summer 2016 - Lecture 8B David Dinh 09 August 2016 UC Berkeley

A Random Walk through CS70 bounds for computing whether you have an even number of 1s as true?

Markov Chains II CS70 Summer 2016 - Lecture 6C David Dinh 27 July 2016 UC Berkeley Agenda

CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous Probability Warning: This

Lecture 15: More Probability. Summary. CS70: Onwards. Events, Conditional Probability,

CS70: Lecture 2. Outline. Today: Proofs!!! 1. By Example (or Counterexample). 2. Direct. (Prove P

CS70: Lecture 2. Outline. Quick Background and Notation. Direct Proof (Forward Reasoning).

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

CS70: Jean Walrand: Lecture 23. Conditional Probability: Review Conditional Probability: Pictures

3 Ways to Improve RFP Responses When Working Remotely Presented by David Blume RFPIO Sr.

Isolating Failure-Inducing Thread Schedules Andreas Zeller Jong-Deok Choi

1 1 11/17/09 2 2 11/17/09 The SiteKey. This is not a graphical password system. ...and I'm

Stat 5101 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota

SOLID SERVICES Ond ej Kraj ek @hedragon @ysoftdevs A Little Disclaimer Today, I am

Does data security rule out high performance? Adam Huffman 2018-02-04 FOSDEM HPC & Big Data

Family Reunion Incognito Pit Stops In Life Joseph will God does not pay at experience

More Mechanisms for Generating Optimization Power-Law Distributions Minimal Cost Mandelbrot vs.

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: - PowerPoint PPT Presentation

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.): Linear and Beyond 1. Review: Linear Regression (LR), LLSE 2. LR: Examples 3. Beyond LR: Quadratic Regression 4. Conditional Expectation (CE) and

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

CS70: Jean Walrand: Lecture 36. Continuous Probability 3 CS70: Jean Walrand: Lecture 36.

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34.

CS70: Jean Walrand: Lecture 24. Changing your mind? CS70: Jean Walrand: Lecture 24. Changing

CS70: Jean Walrand: Lecture 22. How to model uncertainty? CS70: Jean Walrand: Lecture 22. How to

CS70: Jean Walrand: Lecture 37. Statistics are Confusing; Whats next CS70: Jean Walrand:

A Random Walk through CS70 CS70 Summer 2016 - Lecture 8B David Dinh 09 August 2016 UC Berkeley

A Random Walk through CS70 CS70 Summer 2016 - Lecture 8B David Dinh 09 August 2016 UC Berkeley

A Random Walk through CS70 bounds for computing whether you have an even number of 1s as true?

Markov Chains II CS70 Summer 2016 - Lecture 6C David Dinh 27 July 2016 UC Berkeley Agenda

CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous Probability Warning: This

Lecture 15: More Probability. Summary. CS70: Onwards. Events, Conditional Probability,

CS70: Lecture 2. Outline. Today: Proofs!!! 1. By Example (or Counterexample). 2. Direct. (Prove P

CS70: Lecture 2. Outline. Quick Background and Notation. Direct Proof (Forward Reasoning).

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

CS70: Jean Walrand: Lecture 23. Conditional Probability: Review Conditional Probability: Pictures

3 Ways to Improve RFP Responses When Working Remotely Presented by David Blume RFPIO Sr.

Isolating Failure-Inducing Thread Schedules Andreas Zeller Jong-Deok Choi

1 1 11/17/09 2 2 11/17/09 The SiteKey. This is not a graphical password system. ...and I'm

Stat 5101 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota

SOLID SERVICES Ond ej Kraj ek @hedragon @ysoftdevs A Little Disclaimer Today, I am

Does data security rule out high performance? Adam Huffman 2018-02-04 FOSDEM HPC &amp; Big Data

Family Reunion Incognito Pit Stops In Life Joseph will God does not pay at experience

More Mechanisms for Generating Optimization Power-Law Distributions Minimal Cost Mandelbrot vs.

Does data security rule out high performance? Adam Huffman 2018-02-04 FOSDEM HPC & Big Data