Fitting a Line, Residuals, and Correlation October 28, 2019 October - PowerPoint PPT Presentation

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36

Fitting a Line to Data In this section, we will talk about fitting a line to data. Linear regression will allow us to look at relationships between two (or more) variables. This is a bit like ANOVA, but now we will be able to predict outcomes. Section 8.1 October 28, 2019 2 / 36

Fitting a Line to Data This relationship can be modeled perfectly with a straight line: y = 5 + 64 . 96 x I.e., x and y are perfectly correlated. Section 8.1 October 28, 2019 3 / 36

Fitting a Line to Data When we can model a relationship perfectly , y = 5 + 64 . 96 x, we know the exact value of y just by knowing the value of x . However, this kind of perfect relationship is pretty unrealistic... it’s also pretty uninteresting. Section 8.1 October 28, 2019 4 / 36

Linear Regression Linear regression takes this idea of fitting a line and allows for some error: y = β 0 + β 1 x + ǫ β 0 and β 1 are the model’s parameters. The error is represented by ǫ . Section 8.1 October 28, 2019 5 / 36

Linear Regression The parameters β 0 and β 1 are estimated using data. We denote these point estimates by b 0 and b 1 . ...or sometimes ˆ β 0 and ˆ β 1 Section 8.1 October 28, 2019 6 / 36

Linear Regression For a regression line y = β 0 + β 1 x + ǫ we make predictions about y using values of x . y is called the response variable . x is called the predictor variable . Section 8.1 October 28, 2019 7 / 36

Linear Regression When we find our point estimates b 0 and b 1 , we usually write the line as ˆ y = b 0 + b 1 x We drop the error term because it is a random, unknown quantity. Instead we focus on ˆ y , the predicted value for y . Section 8.1 October 28, 2019 8 / 36

Linear Regression As with any line, the intercept and slope are meaningful. The slope β 1 is the change in y for every one-unit change in x . The intercept β 0 is the predicted value for y when x = 0. Section 8.1 October 28, 2019 9 / 36

Clouds of Points Section 8.1 October 28, 2019 10 / 36

Clouds of Points Think of this like the 2-dimensional version of a point estimate. The line gives our best estimate of the relationship. There is some variability in the data that will impact our confidence in our estimates. The true relationship is unknown. Section 8.1 October 28, 2019 11 / 36

Linear Trends Sometimes, there is a clear relationship but simple linear regression won’t work! We will talk about this later in the term. Section 8.1 October 28, 2019 12 / 36

Prediction Often, when we build a regression model our goal is prediction. We want to use information about the predictor variable to make predictions about the response variable. Section 8.1 October 28, 2019 13 / 36

Example: Possum Head Lengths Remember our brushtail possums? Section 8.1 October 28, 2019 14 / 36

Example: Possum Head Lengths Researchers captured 104 brushtail possums and took a variety of body measurements on each before releasing them back into the wild. We consider two measurements for each possum: total body length. head length. Section 8.1 October 28, 2019 15 / 36

Example: Possum Head Lengths Section 8.1 October 28, 2019 16 / 36

Example: Possum Head Lengths The relationship isn’t perfectly linear. However, there does appear to be a linear relationship. We want to try to use body length to predict head length. Section 8.1 October 28, 2019 17 / 36

Example: Possum Head Lengths The textbook gives the following linear relationship: y = 41 + 0 . 59 x ˆ As always, the hat denotes an estimate of some unknown true value. Section 8.1 October 28, 2019 18 / 36

Example: Possum Head Lengths Predict the head length for a possum with a body length of 80 cm. Section 8.1 October 28, 2019 19 / 36

Example: Possum Head Lengths If we had more information (other variables), we could probably get a better estimate. We might be interested in including sex region diet or others. Absent addition information, our prediction is a reasonable estimate. Section 8.1 October 28, 2019 20 / 36

Residuals Residuals are the leftover variation in the data after accounting for model fit: data = prediction + residual Each observation will have its own residual. Section 8.1 October 28, 2019 21 / 36

Residuals Formally, we define the residual of the i th observation ( x i , y i ) as the difference between observed ( y i ) and expected (ˆ y i ): e i = y i − ˆ y i We denote the residuals by e i and find ˆ y by plugging in x i . Section 8.1 October 28, 2019 22 / 36

Residuals If an observation lands above the regression line, e i = y i − ˆ y i > 0 . If below, e i = y i − ˆ y i < 0 . Section 8.1 October 28, 2019 23 / 36

Residuals When we estimate the parameters for the regression, our goal is to get each residual as close to 0 as possible. Section 8.1 October 28, 2019 24 / 36

Example: Possum Head Lengths The residual for each observation is the vertical distance between the line and the observation. Section 8.1 October 28, 2019 25 / 36

Example: Possum Head Lengths The scatterplot is nice, but a calculation is always more precise. Let’s find the residual for the observation (77 . 0 , 85 . 3). Section 8.1 October 28, 2019 26 / 36

Residual Plots Our goal is to get our residuals as close as possible to 0. Residuals are a good way to examine how well a linear model fits a data set. We can examine these quickly using a residual plot. Section 8.1 October 28, 2019 27 / 36

Residual Plots Residual plots show the x -values plotted against their residuals. Section 8.1 October 28, 2019 28 / 36

Residual Plots We use residual plots to identify characteristics or patterns. These are things that are still apparent event after fitting the model. Obvious patterns suggest some problems with our model fit. Section 8.1 October 28, 2019 29 / 36

Residual Plots Section 8.1 October 28, 2019 30 / 36

Correlation We’ve talked about the strength of linear relationships, but it would be nice to formalize this concept. The correlation between two variables describes the strength of their linear relationship. It always takes values between -1 and 1. Section 8.1 October 28, 2019 31 / 36

Correlation We denote the correlation (or correlation coefficient) by R : n 1 � x i − ¯ × y i − ¯ � x y � R = n − 1 s x s y i =1 where s x and s y are the respective standard deviations for x and y . Section 8.1 October 28, 2019 32 / 36

Correlation Correlations Close to -1 suggest strong, negative linear relationships. Close to +1 suggest strong, positive linear relationships. Close to 0 have little-to-no linear relationship. Section 8.1 October 28, 2019 33 / 36

Correlation Note: the sign of the correlation will match the sign of the slope! If R < 0, there is a downward trend and b 1 < 0. If R > 0, there is an upward trend and b 1 > 0. If R ≈ 0, there is no relationship and b 1 ≈ 0. Section 8.1 October 28, 2019 34 / 36

Correlation Section 8.1 October 28, 2019 35 / 36

Correlations Correlations only represent linear trends! Section 8.1 October 28, 2019 36 / 36

Fitting a Line, Residuals, and Correlation October 28, 2019 October - PowerPoint PPT Presentation

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a Line to Data In this section, we will talk about fitting a line to data. Linear regression will allow us to look at relationships between two (or

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

gam.check summary(resid_fit) Randomised quantile residuals Example Fitting to residuals

Model Adequacy Usual residual plots: Residuals versus predicted (fitted) values; Probability

Diagnostics Internally studentized residuals, PRESS residuals or externally studentized

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Logistic Regression November 25, 2019 November 25, 2019 1 / 16 Example Estimate Std. Error z

Vis u ali z ing bi v ariate relationships C OR R E L ATION AN D R E G R E SSION IN R Ben Ba u

Learning more with every year : School year productivity and international learning divergence

1 & 2 Samuel Series Lesson #126 April 3, 2018 Dean Bible Ministries www.deanbibleministries.org

4/19/2016 1. Correlation Suppose we would like to investigate the relationship between two

Multiple and Logistic Regression IV Dajiang Liu @PHS 525 Apr-21 st -2016 Review of Last Two

Unit Testing with Perl Testing effectively using Perl and the Test Anything Protocol Daniel

Decision Procedures for Automating Termination Proofs Ruzica Piskac, EPFL Thomas Wies, IST

Fitting a Line, Residuals, and Correlation October 28, 2019 October - PowerPoint PPT Presentation

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a Line to Data In this section, we will talk about fitting a line to data. Linear regression will allow us to look at relationships between two (or

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

gam.check summary(resid_fit) Randomised quantile residuals Example Fitting to residuals

Model Adequacy Usual residual plots: Residuals versus predicted (fitted) values; Probability

Diagnostics Internally studentized residuals, PRESS residuals or externally studentized

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Logistic Regression November 25, 2019 November 25, 2019 1 / 16 Example Estimate Std. Error z

Vis u ali z ing bi v ariate relationships C OR R E L ATION AN D R E G R E SSION IN R Ben Ba u

Learning more with every year : School year productivity and international learning divergence

1 &amp; 2 Samuel Series Lesson #126 April 3, 2018 Dean Bible Ministries www.deanbibleministries.org

4/19/2016 1. Correlation Suppose we would like to investigate the relationship between two

Multiple and Logistic Regression IV Dajiang Liu @PHS 525 Apr-21 st -2016 Review of Last Two

Unit Testing with Perl Testing effectively using Perl and the Test Anything Protocol Daniel

Decision Procedures for Automating Termination Proofs Ruzica Piskac, EPFL Thomas Wies, IST

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

1 & 2 Samuel Series Lesson #126 April 3, 2018 Dean Bible Ministries www.deanbibleministries.org