M2S2 - Distributions Professor Jarad Niemi STAT 226 - Iowa State - - PowerPoint PPT Presentation

m2s2 distributions
SMART_READER_LITE
LIVE PREVIEW

M2S2 - Distributions Professor Jarad Niemi STAT 226 - Iowa State - - PowerPoint PPT Presentation

M2S2 - Distributions Professor Jarad Niemi STAT 226 - Iowa State University August 29, 2018 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 1 / 19 Outline Population Location Spread Modality: unimodal, bimodal


slide-1
SLIDE 1

M2S2 - Distributions

Professor Jarad Niemi

STAT 226 - Iowa State University

August 29, 2018

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 1 / 19

slide-2
SLIDE 2

Outline

Population

Location Spread Modality: unimodal, bimodal Skewness: symmetric, right-skewed, left-skewed

Sample

Boxplot Histogram Summary statistics

Outliers

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 2 / 19

slide-3
SLIDE 3

Population

Population

Definition The population is the entire group of individuals that we want to say something about. Definition Individuals are the subjects/objects of interest. Definition A variable is any characteristic of an individual that we are interested in.

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 3 / 19

slide-4
SLIDE 4

Population Distribution

Distribution

Definition The distribution of a variable is the collection of possible values the variable can take and how often each value occurs in the population. Enumerating the values may be possible for categorical variables, but typically will not work for numerical variables. Instead we depict the distribution graphically, e.g.

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4

Example distribution

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 4 / 19

slide-5
SLIDE 5

Population Distribution

Distribution location and spread

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4

Location and spread

Location Spread

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 5 / 19

slide-6
SLIDE 6

Population Modality

Modality

Definition A unimodal distribution has one peak. A bimodal distribution has two peaks.

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4

Unimodal

−3 −2 −1 1 2 3 4 0.05 0.10 0.15 0.20

Bimodal

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 6 / 19

slide-7
SLIDE 7

Population Skewness

Skewness

Definition A distribution is symmetric if there is some vertical line where the graph is a mirror reflection. A distribution is right skewed if the tail of the distribution is longer to the right. A distribution is left skewed if the tail of the distribution is longer to the left.

2 4 6 8 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Left−skewed

tail

2 4 6 8 0.0 0.1 0.2 0.3 0.4

Symmetric

2 4 6 8 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Right−skewed

tail Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 7 / 19

slide-8
SLIDE 8

Sample

Sample

We never see the population!

Thus we often try to infer details about the population from our sample. We use our sample to infer the distribution’s location, spread, modality, and skewness.

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 8 / 19

slide-9
SLIDE 9

Sample Boxplot

Vertical Boxplots

A boxplot can be used to help infer location, spread, and skewness, e.g.

−1 1 2 3

Symmetric

5 10 20 30

Right skewed

−10 −5 5 10

Left skewed

−2 2 4

Bimodal Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 9 / 19

slide-10
SLIDE 10

Sample Boxplot

Horizontal Boxplots

A boxplot can be used to help infer location, spread, and skewness, e.g.

−1 1 2 3

Symmetric

5 10 15 20 25 30 35

Right skewed

−10 −5 5 10

Left skewed

−2 2 4

Bimodal Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 10 / 19

slide-11
SLIDE 11

Sample Histogram

Histogram

Definition A histogram is a graphical display of numerical data that counts the number of observations in each bin where the bins are determined by the user.

Count

x Frequency −2 −1 1 2 3 5 10 15

Proportion

x Density −2 −1 1 2 3 0.00 0.10 0.20 0.30

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 11 / 19

slide-12
SLIDE 12

Sample Histogram

Histograms

A histogram can be used to help infer location, spread, skewness, and modality, e.g.

Unimodal, Symmetric

−1 1 2 3 5 10 15

Unimodal, Right skewed

5 10 15 20 25 30 35 20 40 60

Unimodal, Left skewed

−15 −10 −5 5 10 20 40 60

Bimodal

−4 −2 2 4 6 5 10 15 20

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 12 / 19

slide-13
SLIDE 13

Sample Histogram

Histograms

Histograms are affected by the choice of bins

Unimodal, Symmetric

−1 1 2 3 1 2 3 4

Unimodal, Right skewed

5 10 15 20 25 30 35 2 4 6 8 12

Unimodal, Left skewed

−10 −5 5 10 1 2 3 4 5 6 7

Bimodal

−2 2 4 1 2 3 4

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 13 / 19

slide-14
SLIDE 14

Sample Histogram

Histograms

Histograms are affected by the choice of bins

Symmetric

−2 −1 1 2 3 4 5 15 25 35

Right skewed

5 10 15 20 25 30 35 20 40 60

Left skewed

−15 −10 −5 5 10 20 40 60

Bimodal

−4 −2 2 4 6 10 20 30

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 14 / 19

slide-15
SLIDE 15

Sample Summary statistics

Measures of location

Distribution min Q1 median mean Q3 max bimodal

  • 3.02
  • 0.90

0.16 0.57

  • 0.90

5.42 left skew

  • 13.96

4.36 7.14 5.24 4.36 9.76 right skew 0.18 1.39 2.84 4.89 1.39 34.23 symmetric

  • 1.45

0.14 0.86 0.97 0.14 3.09 Right-skew: mean > median Left-skew: mean < median Symmetric: mean ≈ median

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 15 / 19

slide-16
SLIDE 16

Sample Summary statistics

Measures of spread

Distribution variance standard deviation range interquartile range bimodal 4.20 2.05 8.43 2.88 left skew 26.25 5.12 23.72 4.19 right skew 31.57 5.62 34.05 5.04 symmetric 1.35 1.16 4.54 1.67

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 16 / 19

slide-17
SLIDE 17

Sample Example

Toyota Sienna Miles per Gallon

10 15 20 25 30 35 40

Boxplot of mpg Histogram of mpg

mpg Frequency 5 10 15 20 25 30 35 40 20 40 60 80 100 120

summary(dd$mpg)

  • Min. 1st Qu.

Median Mean 3rd Qu. Max. 8.509 17.359 19.298 19.313 21.334 39.086

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 17 / 19

slide-18
SLIDE 18

Sample Outliers

Outliers

Definition An outlier is an observation that is distant from other observations. Sometimes, any observation below Q1-1.5×IQR or above Q3+1.5×IQR is called an outlier.

10 15 20 25 30 35 40

Boxplot of mpg

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 18 / 19

slide-19
SLIDE 19

Sample Outliers

Summary statistic choice

Choice of an appropriate measure of location/spread depends on shape of the distribution presence of outliers. Generally, symmetric with no outliers = ⇒ mean and standard deviation skewed and/or outliers = ⇒ median, IQR, 5-number summary

Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 19 / 19