Descriptive Statistics Observed data are at the heart of every - PowerPoint PPT Presentation

ST 370 Probability and Statistics for Engineers Descriptive Statistics Observed data are at the heart of every application of statistics. We need tools for working with and describing data. To quickly see the main features of a set of data, we need summaries : Numerical summaries: e.g. means, standard deviations; Graphical summaries: e.g. histograms, box-and-whisker plots. 1 / 16 Descriptive Statistics

ST 370 Probability and Statistics for Engineers Numerical Summaries of Data 3 The 8 measurements of pull-off force for 32 inch nylon connectors, in lb f , were 12 . 6 , 12 . 9 , 13 . 4 , 12 . 3 , 13 . 6 , 13 . 5 , 12 . 6 , 13 . 1. We could describe these by saying that they are around 13 lb f , generally plus or minus around 0.5 lb f . That is, we give a typical value and an indication of dispersion around that typical value. 2 / 16 Descriptive Statistics Numerical Summaries of Data

ST 370 Probability and Statistics for Engineers Sample mean For observed values x 1 , x 2 , . . . , x n , the most widely used typical value is the sample mean : n x = x 1 + x 2 + · · · + x n = 1 � ¯ x i . n n i =1 The mean of the pull-off forces ( n = 8) is 13.0 lb f . 3 / 16 Descriptive Statistics Numerical Summaries of Data

ST 370 Probability and Statistics for Engineers Sample standard deviation When the sample mean is used as the typical value, dispersion around it is almost always measured by the sample standard deviation √ s 2 , where s 2 is the sample variance : s = n 1 s 2 = � x ) 2 . ( x i − ¯ n − 1 i =1 The standard deviation of the pull-off forces is 0.48 lb f . Note The sample variance is almost the average of the n values x ) 2 , ( x 2 − ¯ x ) 2 , . . . , ( x n − ¯ x ) 2 ; it differs only in the divisor ( x 1 − ¯ ( n − 1), instead of n . 4 / 16 Descriptive Statistics Numerical Summaries of Data

ST 370 Probability and Statistics for Engineers Sample median An alternative typical value is the sample median : one half of the observations fall below the median, and one half fall above. The ordered pull-off forces are 12 . 3 , 12 . 6 , 12 . 6 , 12 . 9 , 13 . 1 , 13 . 4 , 13 . 5 , 13 . 6 , so any value between 12.9 and 13.1 could be the median; by convention, we use the midpoint, which happens to be the same as the sample mean, 13.0 lb f . In general, the mean and the median will not be the same. 5 / 16 Descriptive Statistics Numerical Summaries of Data

ST 370 Probability and Statistics for Engineers Trimmed mean In some areas, a trimmed mean is used: For some k < n / 2, delete the k highest values and the k lowest values; The trimmed mean is the average of the remaining data. Examples In many sports involving a panel of judges, the highest and lowest scores are omitted ( k = 1). The LIBOR benchmark interest rate is found by averaging rates submitted by 18 banks, with the highest and lowest 4 submissions omitted ( n = 18 , k = 4). If all but the middle one or two values are trimmed ( k ≈ n / 2), the average is the median. 6 / 16 Descriptive Statistics Numerical Summaries of Data

ST 370 Probability and Statistics for Engineers Quantiles and Percentiles Recall that the median divides the data values in half: one half fall below, and one half fall above. More generally, for any 0 ≤ p ≤ 1, the p th quantile divides the data into a fraction p falling below the quantile and (1 − p ) falling above the quantile. The p th quantile is also called the (100 p )% percentile . 7 / 16 Descriptive Statistics Numerical Summaries of Data

ST 370 Probability and Statistics for Engineers Quartiles The most frequently used quantiles are the median ( p = 0 . 5) and the quartiles : lower quartile ( p = 0 . 25) and upper quartile ( p = 0 . 75). The quartiles of the pull-off forces are 12.60 and 13.42 lb f . Interquartile range The difference between the upper and lower quartiles is another measure of the dispersion of the data values. It is called the interquartile range (IQR). For the pull-off forces, the IQR is 0.82 lb f . 8 / 16 Descriptive Statistics Numerical Summaries of Data

ST 370 Probability and Statistics for Engineers Calculation of quantiles For a given data set size n and a given fraction p , how to divide the values into a fraction p and the complementary fraction (1 − p ) may not be obvious. Many different suggestions have been made for the precise calculation. The R function quantile() offers the choice of nine types of calculation; the definition given by Montgomery and Runger appears to be type = 6 ; the default in quantile() is type = 7 . 9 / 16 Descriptive Statistics Numerical Summaries of Data

ST 370 Probability and Statistics for Engineers Stem and Leaf The stem-and-leaf plot is a device for sorting and binning a set of data values. Useful pencil-and-paper method, but irrelevant in computer-based analysis. Example: Compressive strength of Al-Li alloy specimens Compressive strength (psi) of 80 specimens of an aeronautical alloy. alloy <- read.csv("Data/Table-06-02.csv")$Strength; stem(alloy, scale = 2) The number of leaves on each stem is the count from which a histogram is constructed. The outline of the display is a (rotated) histogram. Because the leaves are ordered, the order statistics can be read off from the display. 10 / 16 Descriptive Statistics Stem-and-Leaf Diagram

ST 370 Probability and Statistics for Engineers Frequency Distributions and Histograms The histogram is a display showing the frequency with which data values fall in various ranges. Example: Compressive strength of Al-Li alloy specimens # alloy <- read.csv("Data/Table-06-02.csv")$Strength hist(alloy) # To match Figure 6-7, use some non-default options: hist(alloy, breaks = seq(from = 70, to = 250, by = 20), right = FALSE, col = "wheat") 11 / 16 Descriptive Statistics Frequency Distributions and Histograms

ST 370 Probability and Statistics for Engineers The height of each bar is its “Frequency”: the number of data values that fall in the corresponding “bin”. Variations of histogram Sometimes the height of the bar is the relative frequency: the fraction of data values that fall in the bin, instead of the number . Sometimes the bins are of different widths; in that case, the height of the bar is usually chosen so that the area of the bar is the relative frequency. Then the total area of the histogram bars is 1. 12 / 16 Descriptive Statistics Frequency Distributions and Histograms

ST 370 Probability and Statistics for Engineers Box Plots The histogram shows the distribution of the data values in some detail. We often need a display that summarizes the data more succinctly. The box-and-whisker plot (or boxplot) shows principally: The extremes : lowest and highest values; The lower and upper quartiles; The median. 13 / 16 Descriptive Statistics Box Plots

ST 370 Probability and Statistics for Engineers In R: # alloy <- read.csv("Data/Table-06-02.csv")$Strength boxplot(alloy) The central box goes from the lower quartile to the upper quartile, and the median is shown by a line. Some of the more extreme values may be flagged as outliers , and are shown individually. The whiskers connect the box to the most extreme data point that is not flagged as a possible outlier. 14 / 16 Descriptive Statistics Box Plots

ST 370 Probability and Statistics for Engineers Comparative boxplots A boxplot of a single set of data is a useful graphical tool for displaying the key characteristics of the data: The typical value, represented by the median; The dispersion, represented by the IQR (interquartile range), which is the length of the box; The extreme values, including some that may be highlighted as outliers. Boxplots are much more valuable when comparing more than one set of data, such as the pull-off strengths of the two types of nylon connector. 15 / 16 Descriptive Statistics Box Plots

ST 370 Probability and Statistics for Engineers Example: strength of paper The percentage of hardwood fiber affects the tearing strength of paper. Six test sheets were prepared and tested for each of four levels of hardwood content. In R: paper <- read.csv("Data/Table-13-01.csv") boxplot(Strength ~ Hardwood, paper) The boxplots show: The typical strength increases progressively as the hardwood content increases; The dispersion of strength does not change greatly; No test sheets were out of line with the rest of their sample. 16 / 16 Descriptive Statistics Box Plots

Descriptive Statistics Observed data are at the heart of every - PowerPoint PPT Presentation

ST 370 Probability and Statistics for Engineers Descriptive Statistics Observed data are at the heart of every application of statistics. We need tools for working with and describing data. To quickly see the main features of a set of data, we

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Descriptive Statistics Descriptive and Inferential Statistics Recall that statistical methods are

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

Descriptive Epidem iology & Descriptive Epidem iology & Study design Study design

Descriptive Complexity of Jonni Virtema Deterministic Polylogarithmic Time Descriptive

Statistics and Data Analysis Descriptive Statistics (2): Summarization Ling-Chieh Kung

Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Introduction to Data Science CS 5963 / Math 3900 Lecture 2: Introduction to Descriptive

Descriptive Statistics Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1

Descriptive Statistics Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Descriptive Statistics and Probability: A Look at Real- World

Trademark and Unfair Competition Law Slides 22: Descriptive and Nominative Fair Use LAWS 7341-001

Descriptive combinatorics and ergodic theorems Anush Tserunyan University of Illinois at

Agenda for today 1. Descriptive Data Analysis 2. Graphics XploRe Descriptive Data Analysis 1-2

Variation Among Processors Under Turbo-Boost Bilge Acun, Ph.D.

Lecture 1: Review and Exploratory Data Analysis (EDA) Ani Manichaikul amanicha@jhsph.edu 16

Statistics I Chapter 2 Visualizing the Data Ling-Chieh Kung Department of Information

Session 3: Summarizing data Stats 60/Psych 10 Ismael Lemhadri Summer 2020 This time

Random-Variate Generation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Distinct Value Estimators For Zipfian Distributions Sergei Vassilvitskii Rajeev Motwani

Simulation Discrete-Event System Simulation Dr. Mesut Gne Computer Science, Informatik

Workshop 7.2b: Introduction to Bayesian models Murray Logan February 7, 2017 Table of

Sambuz

Useful Links

Newsletter

Mail Us

Descriptive Statistics Observed data are at the heart of every - PowerPoint PPT Presentation

ST 370 Probability and Statistics for Engineers Descriptive Statistics Observed data are at the heart of every application of statistics. We need tools for working with and describing data. To quickly see the main features of a set of data, we

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Descriptive Statistics Descriptive and Inferential Statistics Recall that statistical methods are

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

Descriptive Epidem iology &amp; Descriptive Epidem iology &amp; Study design Study design

Descriptive Complexity of Jonni Virtema Deterministic Polylogarithmic Time Descriptive

Statistics and Data Analysis Descriptive Statistics (2): Summarization Ling-Chieh Kung

Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Introduction to Data Science CS 5963 / Math 3900 Lecture 2: Introduction to Descriptive

Descriptive Statistics Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1

Descriptive Statistics Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Descriptive Statistics and Probability: A Look at Real- World

Trademark and Unfair Competition Law Slides 22: Descriptive and Nominative Fair Use LAWS 7341-001

Descriptive combinatorics and ergodic theorems Anush Tserunyan University of Illinois at

Agenda for today 1. Descriptive Data Analysis 2. Graphics XploRe Descriptive Data Analysis 1-2

Variation Among Processors Under Turbo-Boost Bilge Acun, Ph.D.

Lecture 1: Review and Exploratory Data Analysis (EDA) Ani Manichaikul amanicha@jhsph.edu 16

Statistics I Chapter 2 Visualizing the Data Ling-Chieh Kung Department of Information

Session 3: Summarizing data Stats 60/Psych 10 Ismael Lemhadri Summer 2020 This time

Random-Variate Generation Banks, Carson, Nelson &amp; Nicol Discrete-Event System Simulation

Distinct Value Estimators For Zipfian Distributions Sergei Vassilvitskii Rajeev Motwani

Simulation Discrete-Event System Simulation Dr. Mesut Gne Computer Science, Informatik

Workshop 7.2b: Introduction to Bayesian models Murray Logan February 7, 2017 Table of

Sambuz

Useful Links

Newsletter

Mail Us

Descriptive Epidem iology & Descriptive Epidem iology & Study design Study design

Random-Variate Generation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation