cs 147 computer systems performance analysis
play

CS 147: Computer Systems Performance Analysis Summarizing - PowerPoint PPT Presentation

CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1 / 49 Overview CS147


  1. CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1 / 49

  2. Overview CS147 Overview 2015-06-15 Introduction Indices of Dispersion Range Variance, Standard Deviation, C.V. Quantiles Miscellaneous Measures Choosing a Measure Introduction Identifying Distributions Overview Histograms Kernel Density Estimation Quantile-Quantile Plots Statistics of Samples Meaning of a Sample Indices of Dispersion Guessing the True Value Range Variance, Standard Deviation, C.V. Quantiles Miscellaneous Measures Choosing a Measure Identifying Distributions Histograms Kernel Density Estimation Quantile-Quantile Plots Statistics of Samples Meaning of a Sample Guessing the True Value 2 / 49

  3. Introduction Summarizing Variability CS147 Summarizing Variability 2015-06-15 Introduction ◮ A single number rarely tells entire story of a data set ◮ Usually, you need to know how much the rest of the data set varies from that index of central tendency Summarizing Variability ◮ A single number rarely tells entire story of a data set ◮ Usually, you need to know how much the rest of the data set varies from that index of central tendency 3 / 49

  4. Introduction Why Is Variability Important? CS147 Why Is Variability Important? 2015-06-15 Introduction ◮ Consider two Web servers: ◮ Server A services all requests in 1 second ◮ Server B services 90% of all requests in .5 seconds ◮ But 10% in 55 seconds ◮ Both have mean service times of 1 second Why Is Variability Important? ◮ But which would you prefer to use? ◮ Consider two Web servers: ◮ Server A services all requests in 1 second ◮ Server B services 90% of all requests in .5 seconds ◮ But 10% in 55 seconds ◮ Both have mean service times of 1 second ◮ But which would you prefer to use? 4 / 49

  5. Introduction Indices of Dispersion CS147 Indices of Dispersion 2015-06-15 Introduction ◮ Measures of how much a data set varies ◮ Range ◮ Variance and standard deviation ◮ Percentiles ◮ Semi-interquartile range Indices of Dispersion ◮ Mean absolute deviation ◮ Measures of how much a data set varies ◮ Range ◮ Variance and standard deviation ◮ Percentiles ◮ Semi-interquartile range ◮ Mean absolute deviation 5 / 49

  6. Indices of Dispersion Range Range CS147 Range 2015-06-15 Indices of Dispersion ◮ Minimum & maximum values in data set ◮ Can be tracked as data values arrive ◮ Variability characterized by difference between minimum and Range maximum ◮ Often not useful, due to outliers ◮ Minimum tends to go to zero Range ◮ Maximum tends to increase over time ◮ Not useful for unbounded variables ◮ Minimum & maximum values in data set ◮ Can be tracked as data values arrive ◮ Variability characterized by difference between minimum and maximum ◮ Often not useful, due to outliers ◮ Minimum tends to go to zero ◮ Maximum tends to increase over time ◮ Not useful for unbounded variables 6 / 49

  7. Indices of Dispersion Range Example of Range CS147 Example of Range 2015-06-15 Indices of Dispersion ◮ For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10 Range ◮ Maximum is 2056 ◮ Minimum is -17 ◮ Range is 2073 ◮ While arithmetic mean is 268 Example of Range ◮ For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10 ◮ Maximum is 2056 ◮ Minimum is -17 ◮ Range is 2073 ◮ While arithmetic mean is 268 7 / 49

  8. Indices of Dispersion Variance, Standard Deviation, C.V. Variance (and Its Cousins) CS147 Variance (and Its Cousins) 2015-06-15 Indices of Dispersion ◮ Sample variance is n s 2 = 1 � ( x i − x ) 2 Variance, Standard Deviation, C.V. n − 1 i = 1 ◮ Expressed in units of the measured quantity, squared Variance (and Its Cousins) ◮ Which isn’t always easy to understand ◮ Standard deviation and coefficient of variation are derived from variance ◮ Sample variance is n 1 s 2 = � ( x i − x ) 2 n − 1 i = 1 ◮ Expressed in units of the measured quantity, squared ◮ Which isn’t always easy to understand ◮ Standard deviation and coefficient of variation are derived from variance 8 / 49

  9. Indices of Dispersion Variance, Standard Deviation, C.V. Variance Example CS147 Variance Example 2015-06-15 Indices of Dispersion ◮ For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10 Variance, Standard Deviation, C.V. ◮ Variance is 413746.6 ◮ You can see the problem with variance: ◮ Given a mean of 268, what does that variance indicate? Variance Example ◮ For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10 ◮ Variance is 413746.6 ◮ You can see the problem with variance: ◮ Given a mean of 268, what does that variance indicate? 9 / 49

  10. Indices of Dispersion Variance, Standard Deviation, C.V. Standard Deviation CS147 Standard Deviation 2015-06-15 Indices of Dispersion ◮ Square root of the variance Variance, Standard Deviation, C.V. ◮ In same units as units of metric ◮ So easier to compare to metric Standard Deviation ◮ Square root of the variance ◮ In same units as units of metric ◮ So easier to compare to metric 10 / 49

  11. Indices of Dispersion Variance, Standard Deviation, C.V. Standard Deviation Example CS147 Standard Deviation Example 2015-06-15 Indices of Dispersion ◮ For sample set we’ve been using, standard deviation is 643 Variance, Standard Deviation, C.V. ◮ Given mean of 268, standard deviation clearly shows lots of variability from mean Standard Deviation Example ◮ For sample set we’ve been using, standard deviation is 643 ◮ Given mean of 268, standard deviation clearly shows lots of variability from mean 11 / 49

  12. Indices of Dispersion Variance, Standard Deviation, C.V. Coefficient of Variation CS147 Coefficient of Variation 2015-06-15 Indices of Dispersion ◮ Ratio of standard deviation to mean Variance, Standard Deviation, C.V. ◮ Normalizes units of these quantities into ratio or percentage ◮ Often abbreviated C.O.V. or C.V. Coefficient of Variation ◮ Ratio of standard deviation to mean ◮ Normalizes units of these quantities into ratio or percentage ◮ Often abbreviated C.O.V. or C.V. 12 / 49

  13. Indices of Dispersion Variance, Standard Deviation, C.V. Coefficient of Variation Example CS147 Coefficient of Variation Example 2015-06-15 Indices of Dispersion ◮ For sample set we’ve been using, standard deviation is 643 Variance, Standard Deviation, C.V. ◮ Mean is 268 ◮ So C.O.V. is 643 / 268 ≈ 2 . 4 Coefficient of Variation Example ◮ For sample set we’ve been using, standard deviation is 643 ◮ Mean is 268 ◮ So C.O.V. is 643 / 268 ≈ 2 . 4 13 / 49

  14. Indices of Dispersion Quantiles Percentiles CS147 Percentiles 2015-06-15 Indices of Dispersion ◮ Specification of how observations fall into buckets Quantiles ◮ E.g., 5-percentile is observation that is at the lower 5% of the set ◮ While 95-percentile is observation at the 95% boundary ◮ Useful even for unbounded variables Percentiles ◮ Specification of how observations fall into buckets ◮ E.g., 5-percentile is observation that is at the lower 5% of the set ◮ While 95-percentile is observation at the 95% boundary ◮ Useful even for unbounded variables 14 / 49

  15. Indices of Dispersion Quantiles Relatives of Percentiles CS147 Relatives of Percentiles 2015-06-15 Indices of Dispersion ◮ Quantiles - fraction between 0 and 1 ◮ Instead of percentage ◮ Also called fractiles Quantiles ◮ Deciles—percentiles at 10% boundaries ◮ First is 10-percentile, second is 20-percentile, etc. ◮ Quartiles—divide data set into four parts Relatives of Percentiles ◮ 25% of sample below first quartile, etc. ◮ Second quartile is also median ◮ Quantiles - fraction between 0 and 1 ◮ Instead of percentage ◮ Also called fractiles ◮ Deciles—percentiles at 10% boundaries ◮ First is 10-percentile, second is 20-percentile, etc. ◮ Quartiles—divide data set into four parts ◮ 25% of sample below first quartile, etc. ◮ Second quartile is also median 15 / 49

  16. Indices of Dispersion Quantiles Calculating Quantiles CS147 Calculating Quantiles 2015-06-15 Indices of Dispersion To estimate α -quantile: ◮ First sort the set Quantiles ◮ Then take [( n − 1 ) α + 1 ] th element ◮ 1-indexed ◮ Round to nearest integer index ◮ Exception: for small sets, may be better to choose Calculating Quantiles “intermediate” value as is done for median To estimate α -quantile: ◮ First sort the set ◮ Then take [( n − 1 ) α + 1 ] th element ◮ 1-indexed ◮ Round to nearest integer index ◮ Exception: for small sets, may be better to choose “intermediate” value as is done for median 16 / 49

  17. Indices of Dispersion Quantiles Quartile Example CS147 Quartile Example 2015-06-15 Indices of Dispersion ◮ For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10 (10 observations) Quantiles ◮ Sort it: -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056 ◮ First quartile, Q1, is -4.8 Quartile Example ◮ Third quartile, Q3, is 92 ◮ For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10 (10 observations) ◮ Sort it: -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056 ◮ First quartile, Q1, is -4.8 ◮ Third quartile, Q3, is 92 17 / 49

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend