cs 147 computer systems performance analysis
play

CS 147: Computer Systems Performance Analysis Linear Regression - PowerPoint PPT Presentation

CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Linear Regression Models CS 147: Computer Systems Performance Analysis Linear Regression Models 1 / 32 Overview CS147 Overview 2015-06-15 What is a (good) model? Estimating


  1. CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Linear Regression Models CS 147: Computer Systems Performance Analysis Linear Regression Models 1 / 32

  2. Overview CS147 Overview 2015-06-15 What is a (good) model? Estimating Model Parameters Allocating Variation Confidence Intervals for Regressions Overview Parameter Intervals Prediction Intervals Verifying Regression What is a (good) model? Estimating Model Parameters Allocating Variation Confidence Intervals for Regressions Parameter Intervals Prediction Intervals Verifying Regression 2 / 32

  3. What is a (good) model? What Is a (Good) Model? CS147 What Is a (Good) Model? 2015-06-15 What is a (good) model? ◮ For correlated data, model predicts response given an input ◮ Model should be equation that fits data ◮ Standard definition of “fits” is least-squares ◮ Minimize squared error ◮ Keep mean error zero What Is a (Good) Model? ◮ Minimizes variance of errors ◮ For correlated data, model predicts response given an input ◮ Model should be equation that fits data ◮ Standard definition of “fits” is least-squares ◮ Minimize squared error ◮ Keep mean error zero ◮ Minimizes variance of errors 3 / 32

  4. What is a (good) model? Least-Squared Error CS147 Least-Squared Error 2015-06-15 What is a (good) model? ◮ If ˆ y = b 0 + b 1 x then error in estimate for x i is e i = y i − ˆ y i ◮ Minimize Sum of Squared Errors (SSE) n n � e 2 � ( y i − b 0 − b 1 x i ) 2 i = i = 1 i = 1 ◮ Subject to the constraint Least-Squared Error n n � � e i = ( y i − b 0 − b 1 x i ) = 0 i = 1 i = 1 ◮ If ˆ y = b 0 + b 1 x then error in estimate for x i is e i = y i − ˆ y i ◮ Minimize Sum of Squared Errors (SSE) n n � � e 2 ( y i − b 0 − b 1 x i ) 2 i = i = 1 i = 1 ◮ Subject to the constraint n n � � e i = ( y i − b 0 − b 1 x i ) = 0 i = 1 i = 1 4 / 32

  5. Estimating Model Parameters Estimating Model Parameters CS147 Estimating Model Parameters 2015-06-15 Estimating Model Parameters ◮ Best regression parameters are � x i y i − nxy b 1 = b 0 = y − b 1 x � x 2 i − nx 2 where Estimating Model Parameters x = 1 y = 1 � x i � y i n n ◮ Note that book may have errors in these equations! ◮ Best regression parameters are � x i y i − nxy b 1 = b 0 = y − b 1 x � x 2 i − nx 2 where x = 1 y = 1 � � x i y i n n ◮ Note that book may have errors in these equations! 5 / 32

  6. Estimating Model Parameters Parameter Estimation Example CS147 Parameter Estimation Example 2015-06-15 Estimating Model Parameters ◮ Execution time of a script for various loop counts: Loops 3 5 7 9 10 Time 1.2 1.7 2.5 2.9 3.3 ◮ x = 6 . 8, y = 2 . 32, � xy = 88 . 54, � x 2 = 264 ◮ b 1 = 88 . 54 − 5 ( 6 . 8 )( 2 . 32 ) Parameter Estimation Example = 0 . 29 264 − 5 ( 6 . 8 ) 2 ◮ b 0 = 2 . 32 − ( 0 . 29 )( 6 . 8 ) = 0 . 35 ◮ Execution time of a script for various loop counts: Loops 3 5 7 9 10 Time 1.2 1.7 2.5 2.9 3.3 ◮ x = 6 . 8, y = 2 . 32, � xy = 88 . 54, � x 2 = 264 ◮ b 1 = 88 . 54 − 5 ( 6 . 8 )( 2 . 32 ) = 0 . 29 264 − 5 ( 6 . 8 ) 2 ◮ b 0 = 2 . 32 − ( 0 . 29 )( 6 . 8 ) = 0 . 35 6 / 32

  7. Estimating Model Parameters Graph of Parameter Estimation Example CS147 Graph of Parameter Estimation Example 2015-06-15 Estimating Model Parameters 3 2 Graph of Parameter Estimation Example 1 0 0 2 4 6 8 10 12 3 2 1 0 0 2 4 6 8 10 12 7 / 32

  8. Allocating Variation Allocating Variation CS147 Allocating Variation 2015-06-15 Allocating Variation Analysis of Variation (ANOVA): ◮ If no regression, best guess of y is y ◮ Observed values of y differ from y , giving rise to errors (variance) ◮ Regression gives better guess, but there are still errors Allocating Variation ◮ We can evaluate quality of regression by allocating sources of errors Analysis of Variation (ANOVA): ◮ If no regression, best guess of y is y ◮ Observed values of y differ from y , giving rise to errors (variance) ◮ Regression gives better guess, but there are still errors ◮ We can evaluate quality of regression by allocating sources of errors 8 / 32

  9. Allocating Variation The Total Sum of Squares CS147 The Total Sum of Squares 2015-06-15 Without regression, squared error is Allocating Variation n n ( y i − y ) 2 = = � � ( y 2 i − 2 y i y + y 2 ) SST i = 1 i = 1 � n � n � � � y 2 � + ny 2 = − 2 y y i i i = 1 i = 1 � n Without regression, squared error is � � y 2 − 2 y ( ny ) + ny 2 = The Total Sum of Squares i i = 1 � n � = � y 2 − ny 2 i i = 1 = SSY − SS0 n n ( y i − y ) 2 = � � i − 2 y i y + y 2 ) ( y 2 SST = i = 1 i = 1 � n � n � � � y 2 � + ny 2 = − 2 y y i i i = 1 i = 1 � n � � y 2 − 2 y ( ny ) + ny 2 = i i = 1 � n � � − ny 2 y 2 = i i = 1 = SSY − SS0 9 / 32

  10. Allocating Variation The Sum of Squares from Regression CS147 The Sum of Squares from Regression 2015-06-15 Allocating Variation ◮ Recall that regression error is SSE = � e 2 i = � ( y i − y ) 2 ◮ Error without regression is SST (previous slide) ◮ So regression explains SSR = SST − SSE The Sum of Squares from Regression ◮ Regression quality measured by coefficient of determination R 2 = SSR SST = SST − SSE SST ◮ Recall that regression error is � � e 2 ( y i − y ) 2 SSE = i = ◮ Error without regression is SST (previous slide) ◮ So regression explains SSR = SST − SSE ◮ Regression quality measured by coefficient of determination R 2 = SSR SST = SST − SSE SST 10 / 32

  11. Allocating Variation Evaluating Coefficient of Determination CS147 Evaluating Coefficient of Determination 2015-06-15 Allocating Variation ◮ Compute SST = ( � y 2 ) − ny 2 ◮ Compute SSE = � y 2 − b 0 � y − b 1 � xy ◮ Compute R 2 = SST − SSE SST Evaluating Coefficient of Determination ◮ Compute SST = ( � y 2 ) − ny 2 ◮ Compute SSE = � y 2 − b 0 � y − b 1 � xy ◮ Compute R 2 = SST − SSE SST 11 / 32

  12. Allocating Variation Example of Coefficient of Determination CS147 Example of Coefficient of Determination 2015-06-15 Allocating Variation For previous regression example: Loops 3 5 7 9 10 Time 1.2 1.7 2.5 2.9 3.3 ◮ � y = 11 . 60, � y 2 = 29 . 79, � xy = 88 . 54, ny 2 = 5 ( 2 . 32 ) 2 = 26 . 9 ◮ SSE = 29 . 79 − ( 0 . 35 )( 11 . 60 ) − ( 0 . 29 )( 88 . 54 ) = 0 . 05 Example of Coefficient of Determination ◮ SST = 29 . 79 − 26 . 9 = 2 . 89 ◮ SSR = 2 . 89 − 0 . 05 = 2 . 84 ◮ R 2 = ( 2 . 89 − 0 . 05 ) / 2 . 89 = 0 . 98 For previous regression example: Loops 3 5 7 9 10 Time 1.2 1.7 2.5 2.9 3.3 ◮ � y = 11 . 60, � y 2 = 29 . 79, � xy = 88 . 54, ny 2 = 5 ( 2 . 32 ) 2 = 26 . 9 ◮ SSE = 29 . 79 − ( 0 . 35 )( 11 . 60 ) − ( 0 . 29 )( 88 . 54 ) = 0 . 05 ◮ SST = 29 . 79 − 26 . 9 = 2 . 89 ◮ SSR = 2 . 89 − 0 . 05 = 2 . 84 ◮ R 2 = ( 2 . 89 − 0 . 05 ) / 2 . 89 = 0 . 98 12 / 32

  13. Allocating Variation Standard Deviation of Errors CS147 Standard Deviation of Errors 2015-06-15 Allocating Variation ◮ Variance of errors is SSE divided by degrees of freedom ◮ DOF is n − 2 because we’ve calculated 2 regression parameters from the data ◮ So variance ( mean squared error , MSE) is SSE / ( n − 2 ) � SSE ◮ Standard deviation of errors is square root: s e = Standard Deviation of Errors n − 2 (minor error in book) ◮ Variance of errors is SSE divided by degrees of freedom ◮ DOF is n − 2 because we’ve calculated 2 regression parameters from the data ◮ So variance ( mean squared error , MSE) is SSE / ( n − 2 ) � SSE ◮ Standard deviation of errors is square root: s e = n − 2 (minor error in book) 13 / 32

  14. Allocating Variation Checking Degrees of Freedom CS147 Checking Degrees of Freedom 2015-06-15 Allocating Variation Degrees of freedom always equate: ◮ SS0 has 1 (computed from y ) ◮ SST has n − 1 (computed from data and y , which uses up 1) ◮ SSE has n − 2 (needs 2 regression parameters) ◮ So Checking Degrees of Freedom SST = SSY − SS0 = SSR + SSE n − 1 = n − 1 = 1 + ( n − 2 ) Degrees of freedom always equate: ◮ SS0 has 1 (computed from y ) ◮ SST has n − 1 (computed from data and y , which uses up 1) ◮ SSE has n − 2 (needs 2 regression parameters) ◮ So SST = SSY − SS0 = SSR + SSE n − 1 = n − 1 = 1 + ( n − 2 ) 14 / 32

  15. Allocating Variation Example of Standard Deviation of Errors CS147 Example of Standard Deviation of Errors 2015-06-15 Allocating Variation ◮ For regression example, SSE was 0.05, so MSE is 0 . 05 / 3 = 0 . 017 and s e = 0 . 13 ◮ Note high quality of our regression: ◮ R 2 = 0 . 98 ◮ s e = 0 . 13 Example of Standard Deviation of Errors ◮ Why such a nice straight-line fit? ◮ For regression example, SSE was 0.05, so MSE is 0 . 05 / 3 = 0 . 017 and s e = 0 . 13 ◮ Note high quality of our regression: ◮ R 2 = 0 . 98 ◮ s e = 0 . 13 ◮ Why such a nice straight-line fit? 15 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend