M2S2 - Distributions
Professor Jarad Niemi
STAT 226 - Iowa State University
August 29, 2018
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 1 / 19
M2S2 - Distributions Professor Jarad Niemi STAT 226 - Iowa State - - PowerPoint PPT Presentation
M2S2 - Distributions Professor Jarad Niemi STAT 226 - Iowa State University August 29, 2018 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 1 / 19 Outline Population Location Spread Modality: unimodal, bimodal
Professor Jarad Niemi
STAT 226 - Iowa State University
August 29, 2018
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 1 / 19
Population
Location Spread Modality: unimodal, bimodal Skewness: symmetric, right-skewed, left-skewed
Sample
Boxplot Histogram Summary statistics
Outliers
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 2 / 19
Population
Definition The population is the entire group of individuals that we want to say something about. Definition Individuals are the subjects/objects of interest. Definition A variable is any characteristic of an individual that we are interested in.
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 3 / 19
Population Distribution
Definition The distribution of a variable is the collection of possible values the variable can take and how often each value occurs in the population. Enumerating the values may be possible for categorical variables, but typically will not work for numerical variables. Instead we depict the distribution graphically, e.g.
−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4
Example distribution
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 4 / 19
Population Distribution
−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4
Location and spread
Location Spread
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 5 / 19
Population Modality
Definition A unimodal distribution has one peak. A bimodal distribution has two peaks.
−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4
Unimodal
−3 −2 −1 1 2 3 4 0.05 0.10 0.15 0.20
Bimodal
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 6 / 19
Population Skewness
Definition A distribution is symmetric if there is some vertical line where the graph is a mirror reflection. A distribution is right skewed if the tail of the distribution is longer to the right. A distribution is left skewed if the tail of the distribution is longer to the left.
2 4 6 8 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Left−skewed
tail
2 4 6 8 0.0 0.1 0.2 0.3 0.4
Symmetric
2 4 6 8 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Right−skewed
tail Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 7 / 19
Sample
Thus we often try to infer details about the population from our sample. We use our sample to infer the distribution’s location, spread, modality, and skewness.
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 8 / 19
Sample Boxplot
A boxplot can be used to help infer location, spread, and skewness, e.g.
−1 1 2 3
Symmetric
5 10 20 30
Right skewed
−10 −5 5 10
Left skewed
−2 2 4
Bimodal Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 9 / 19
Sample Boxplot
A boxplot can be used to help infer location, spread, and skewness, e.g.
−1 1 2 3
Symmetric
5 10 15 20 25 30 35
Right skewed
−10 −5 5 10
Left skewed
−2 2 4
Bimodal Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 10 / 19
Sample Histogram
Definition A histogram is a graphical display of numerical data that counts the number of observations in each bin where the bins are determined by the user.
Count
x Frequency −2 −1 1 2 3 5 10 15
Proportion
x Density −2 −1 1 2 3 0.00 0.10 0.20 0.30
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 11 / 19
Sample Histogram
A histogram can be used to help infer location, spread, skewness, and modality, e.g.
Unimodal, Symmetric
−1 1 2 3 5 10 15
Unimodal, Right skewed
5 10 15 20 25 30 35 20 40 60
Unimodal, Left skewed
−15 −10 −5 5 10 20 40 60
Bimodal
−4 −2 2 4 6 5 10 15 20
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 12 / 19
Sample Histogram
Histograms are affected by the choice of bins
Unimodal, Symmetric
−1 1 2 3 1 2 3 4
Unimodal, Right skewed
5 10 15 20 25 30 35 2 4 6 8 12
Unimodal, Left skewed
−10 −5 5 10 1 2 3 4 5 6 7
Bimodal
−2 2 4 1 2 3 4
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 13 / 19
Sample Histogram
Histograms are affected by the choice of bins
Symmetric
−2 −1 1 2 3 4 5 15 25 35
Right skewed
5 10 15 20 25 30 35 20 40 60
Left skewed
−15 −10 −5 5 10 20 40 60
Bimodal
−4 −2 2 4 6 10 20 30
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 14 / 19
Sample Summary statistics
Distribution min Q1 median mean Q3 max bimodal
0.16 0.57
5.42 left skew
4.36 7.14 5.24 4.36 9.76 right skew 0.18 1.39 2.84 4.89 1.39 34.23 symmetric
0.14 0.86 0.97 0.14 3.09 Right-skew: mean > median Left-skew: mean < median Symmetric: mean ≈ median
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 15 / 19
Sample Summary statistics
Distribution variance standard deviation range interquartile range bimodal 4.20 2.05 8.43 2.88 left skew 26.25 5.12 23.72 4.19 right skew 31.57 5.62 34.05 5.04 symmetric 1.35 1.16 4.54 1.67
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 16 / 19
Sample Example
10 15 20 25 30 35 40
Boxplot of mpg Histogram of mpg
mpg Frequency 5 10 15 20 25 30 35 40 20 40 60 80 100 120
summary(dd$mpg)
Median Mean 3rd Qu. Max. 8.509 17.359 19.298 19.313 21.334 39.086
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 17 / 19
Sample Outliers
Definition An outlier is an observation that is distant from other observations. Sometimes, any observation below Q1-1.5×IQR or above Q3+1.5×IQR is called an outlier.
10 15 20 25 30 35 40
Boxplot of mpg
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 18 / 19
Sample Outliers
Choice of an appropriate measure of location/spread depends on shape of the distribution presence of outliers. Generally, symmetric with no outliers = ⇒ mean and standard deviation skewed and/or outliers = ⇒ median, IQR, 5-number summary
Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 19 / 19