CS 147: Computer Systems Performance Analysis Summarizing Data 1 / - PowerPoint PPT Presentation

CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Summarizing Data CS 147: Computer Systems Performance Analysis Summarizing Data 1 / 30

Overview CS147 Overview 2015-06-15 “Standard” Indices of Central Tendency Definitions Characteristics Selecting an Index Other Indices Geometric Mean Harmonic Mean Overview Dealing with Ratios Case 1: Two Physical Meanings “Standard” Indices of Central Tendency Case 1a: Constant Denominator Case 1b: Constant Numerator Case 2: Multiplicative Relationship Definitions Characteristics Selecting an Index Other Indices Geometric Mean Harmonic Mean Dealing with Ratios Case 1: Two Physical Meanings Case 1a: Constant Denominator Case 1b: Constant Numerator Case 2: Multiplicative Relationship 2 / 30

Summarizing Data With a Single Number CS147 Summarizing Data With a Single Number 2015-06-15 ◮ Most condensed form of presentation of set of data ◮ Usually called the average ◮ Average isn’t necessarily the mean ◮ Must be representative of a major part of the data set Summarizing Data With a Single Number ◮ Most condensed form of presentation of set of data ◮ Usually called the average ◮ Average isn’t necessarily the mean ◮ Must be representative of a major part of the data set 3 / 30

“Standard” Indices of Central Tendency Indices of Central Tendency CS147 Indices of Central Tendency 2015-06-15 “Standard” Indices of Central Tendency ◮ Mean ◮ Median ◮ Mode ◮ All specify center of location of distribution of observations in Indices of Central Tendency sample ◮ Mean ◮ Median ◮ Mode ◮ All specify center of location of distribution of observations in sample 4 / 30

“Standard” Indices of Central Tendency Definitions Sample Mean CS147 Sample Mean 2015-06-15 “Standard” Indices of Central Tendency ◮ Take sum of all observations ◮ Divide by number of observations Definitions ◮ More affected by outliers than median or mode ◮ Mean is a linear property ◮ Mean of sum is sum of means Sample Mean ◮ Not true for median and mode ◮ Take sum of all observations ◮ Divide by number of observations ◮ More affected by outliers than median or mode ◮ Mean is a linear property ◮ Mean of sum is sum of means ◮ Not true for median and mode 5 / 30

“Standard” Indices of Central Tendency Definitions Sample Median CS147 Sample Median 2015-06-15 “Standard” Indices of Central Tendency ◮ Sort observations Definitions ◮ Take observation in middle of series ◮ If even number, split the difference ◮ More resistant to outliers ◮ But not all points given “equal weight” Sample Median ◮ Sort observations ◮ Take observation in middle of series ◮ If even number, split the difference ◮ More resistant to outliers ◮ But not all points given “equal weight” 6 / 30

“Standard” Indices of Central Tendency Definitions Sample Mode CS147 Sample Mode 2015-06-15 “Standard” Indices of Central Tendency ◮ Plot histogram of observations ◮ Using existing categories Definitions ◮ Or dividing ranges into buckets ◮ Or using kernel density estimation ◮ Choose midpoint of bucket where histogram peaks ◮ For categorical variables, the most frequently occurring Sample Mode ◮ Effectively ignores much of the sample ◮ Plot histogram of observations ◮ Using existing categories ◮ Or dividing ranges into buckets ◮ Or using kernel density estimation ◮ Choose midpoint of bucket where histogram peaks ◮ For categorical variables, the most frequently occurring ◮ Effectively ignores much of the sample 7 / 30

“Standard” Indices of Central Tendency Characteristics Characteristics of Mean, Median, and Mode CS147 Characteristics of Mean, Median, and Mode 2015-06-15 “Standard” Indices of Central Tendency ◮ Mean and median always exist and are unique ◮ Mode may or may not exist Characteristics ◮ If there is a mode, may be more than one ◮ Mean, median and mode may be identical ◮ Or may all be different Characteristics of Mean, Median, and Mode ◮ Or some may be the same ◮ Mean and median always exist and are unique ◮ Mode may or may not exist ◮ If there is a mode, may be more than one ◮ Mean, median and mode may be identical ◮ Or may all be different ◮ Or some may be the same 8 / 30

“Standard” Indices of Central Tendency Characteristics Mean, Median, and Mode Identical CS147 Mean, Median, and Mode Identical 2015-06-15 Median “Standard” Indices of Central Tendency Mean Mode Characteristics pdf f(x) Mean, Median, and Mode Identical Median x Mean Mode pdf f(x) x 9 / 30

“Standard” Indices of Central Tendency Characteristics Median, Mean, and Mode All Different CS147 Median, Mean, and Mode All Different 2015-06-15 “Standard” Indices of Central Tendency Mean Median Mode Characteristics pdf f(x) Median, Mean, and Mode All Different x Mean Median Mode pdf f(x) x 10 / 30

“Standard” Indices of Central Tendency Selecting an Index So, Which Should I Use? CS147 So, Which Should I Use? 2015-06-15 “Standard” Indices of Central Tendency ◮ Depends on characteristics of the metric ◮ If data is categorical, use mode Selecting an Index ◮ If a total of all observations makes sense, use mean ◮ If not (e.g., ratios), and distribution is skewed, use median ◮ Otherwise, use mean So, Which Should I Use? . . . but think about what you’re choosing ◮ Depends on characteristics of the metric ◮ If data is categorical, use mode ◮ If a total of all observations makes sense, use mean ◮ If not (e.g., ratios), and distribution is skewed, use median ◮ Otherwise, use mean . . . but think about what you’re choosing 11 / 30

“Standard” Indices of Central Tendency Selecting an Index Some Examples CS147 Some Examples 2015-06-15 “Standard” Indices of Central Tendency ◮ Most-used resource in system Selecting an Index Some Examples ◮ Most-used resource in system 12 / 30

“Standard” Indices of Central Tendency Selecting an Index Some Examples CS147 Some Examples 2015-06-15 “Standard” Indices of Central Tendency ◮ Most-used resource in system ◮ Mode Selecting an Index ◮ Interarrival times Some Examples ◮ Most-used resource in system ◮ Mode ◮ Interarrival times 12 / 30

“Standard” Indices of Central Tendency Selecting an Index Some Examples CS147 Some Examples 2015-06-15 “Standard” Indices of Central Tendency ◮ Most-used resource in system ◮ Mode Selecting an Index ◮ Interarrival times ◮ Mean ◮ Load Some Examples ◮ Most-used resource in system ◮ Mode ◮ Interarrival times ◮ Mean ◮ Load 12 / 30

“Standard” Indices of Central Tendency Selecting an Index Some Examples CS147 Some Examples 2015-06-15 “Standard” Indices of Central Tendency ◮ Most-used resource in system ◮ Mode Selecting an Index ◮ Interarrival times ◮ Mean ◮ Load Some Examples ◮ Median ◮ Most-used resource in system ◮ Mode ◮ Interarrival times ◮ Mean ◮ Load ◮ Median 12 / 30

“Standard” Indices of Central Tendency Selecting an Index Don’t Always Use the Mean CS147 Don’t Always Use the Mean 2015-06-15 “Standard” Indices of Central Tendency ◮ Means are often overused and misused ◮ Means of significantly different values Selecting an Index ◮ Means of highly skewed distributions ◮ Multiplying means to get mean of a product ◮ Only works for independent variables ◮ Errors in taking ratios of means Don’t Always Use the Mean ◮ Means of categorical variables ◮ Means are often overused and misused ◮ Means of significantly different values ◮ Means of highly skewed distributions ◮ Multiplying means to get mean of a product ◮ Only works for independent variables ◮ Errors in taking ratios of means ◮ Means of categorical variables 13 / 30

Other Indices Geometric Mean Geometric Means CS147 Geometric Means 2015-06-15 Other Indices ◮ An alternative to the arithmetic mean Geometric Mean � n � 1 / n x = ˙ � x i i = 1 Geometric Means ◮ Use geometric mean if product of observations makes sense ◮ An alternative to the arithmetic mean � n � 1 / n � ˙ x = x i i = 1 ◮ Use geometric mean if product of observations makes sense 14 / 30

Other Indices Geometric Mean Good Places To Use Geometric Mean CS147 Good Places To Use Geometric Mean 2015-06-15 Other Indices ◮ Layered architectures Geometric Mean ◮ Performance improvements over successive versions ◮ Average error rate on multihop network path ◮ Year-to-year interest rates Good Places To Use Geometric Mean ◮ Layered architectures ◮ Performance improvements over successive versions ◮ Average error rate on multihop network path ◮ Year-to-year interest rates 15 / 30

Other Indices Harmonic Mean Harmonic Mean CS147 Harmonic Mean 2015-06-15 Other Indices ◮ Harmonic mean of sample { x 1 , x 2 , . . . , x n } is Harmonic Mean n ¨ x = 1 / x 1 + 1 / x 2 + · · · + 1 / x n Harmonic Mean ◮ Use when arithmetic mean of 1 / x i is sensible ◮ Harmonic mean of sample { x 1 , x 2 , . . . , x n } is n ¨ x = 1 / x 1 + 1 / x 2 + · · · + 1 / x n ◮ Use when arithmetic mean of 1 / x i is sensible 16 / 30

CS 147: Computer Systems Performance Analysis Summarizing Data 1 / - PowerPoint PPT Presentation

CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Summarizing Data CS 147: Computer Systems Performance Analysis Summarizing Data 1 / 30 Overview CS147 Overview 2015-06-15 Standard Indices of Central Tendency

CS 147: Computer Systems Performance Analysis Approaching Performance Projects 1 / 35 Overview

CS 147: Computer Systems Performance Analysis Comparing Systems and Analyzing Alternatives 1 /

CS 147: Computer Systems Performance Analysis Networks of Queues 1 / 18 Overview CS147

CS 147: Computer Systems Performance Analysis Selecting Techniques 1 / 37 Overview CS147

CS 147: Computer Systems Performance Analysis Advanced Regression Techniques 1 / 31 Overview

CS 147: Computer Systems Performance Analysis Introduction to Queueing Theory 1 / 27 Overview

CS 147: Computer Systems Performance Analysis Higher Designs and Other Considerations 1 / 25

CS 147: Computer Systems Performance Analysis Fractional Factorial Designs 1 / 26 Overview

CS 147: Computer Systems Performance Analysis Multiple and Categorical Regression 1 / 36

CS 147: Computer Systems Performance Analysis Examples Using a Distributed File System 1 / 37

CS 147: Computer Systems Performance Analysis Course Introduction 1 / 35 Overview CS147

CS 147: Computer Systems Performance Analysis Mistakes in Graphical Presentation 1 / 45

CS 147: Computer Systems Performance Analysis Workload Characterization 1 / 31 Overview CS147

CS 147: Computer Systems Performance Analysis Specifics of Graphical Presentation 1 / 35

CS 147: Computer Systems Performance Analysis Workload Selection 1 / 39 Overview CS147

CS 147: Computer Systems Performance Analysis Introduction to Graphical Presentation 1 / 25

Chapter 2 Methods for Describing Sets of Data Objectives Describe Data using Graphs Describe

Descriptive Statistics Chapter 3 Summarizing Data With lots of playtesting, there is a lot

Chapter 3 Chapter 3 Data Description McGraw-Hill, Bluman, 7 th ed, Chapter 3 1 Ch Chapter 3

INFO 1301 Prof. Michael Paul Prof. William Aspray Monday, September

Data Description Data Description p SLIDES PREPARED SLIDES PREPARED BY BY BY BY LLOYD R.

Non-Stationary Time Series and Unit Root Tests Heino Bohn Nielsen 1 of 28 Outline (1)

The QG Vorticity Equation The QG Vorticity Equation The quasi-geostrophic vorticity is g = k

Characterizing Twitter users who engage in Adversarial Interactions against Political Candidates

Sambuz

Useful Links

Newsletter

Mail Us