Gov 51: Visualizing Distributions Matthew Blackwell Harvard - - PowerPoint PPT Presentation

gov 51 visualizing distributions
SMART_READER_LITE
LIVE PREVIEW

Gov 51: Visualizing Distributions Matthew Blackwell Harvard - - PowerPoint PPT Presentation

Gov 51: Visualizing Distributions Matthew Blackwell Harvard University 1 / 14 Studying political effjcacy 2002 WHO survey of people in China and Mexico. Goal: determine feelings of political effjcacy. Question: How much say do


slide-1
SLIDE 1

Gov 51: Visualizing Distributions

Matthew Blackwell

Harvard University

1 / 14

slide-2
SLIDE 2

Studying political effjcacy

  • 2002 WHO survey of people in China and Mexico.
  • Goal: determine feelings of political effjcacy.
  • Question: “How much say do you have in getting the government to

address issues that interest you?”

  • 1. No say at all
  • 2. little say
  • 3. some say
  • 4. a lot of say
  • 5. unlimited say

2 / 14

slide-3
SLIDE 3

Studying political effjcacy

  • 2002 WHO survey of people in China and Mexico.
  • Goal: determine feelings of political effjcacy.
  • Question: “How much say do you have in getting the government to

address issues that interest you?”

  • 1. No say at all
  • 2. little say
  • 3. some say
  • 4. a lot of say
  • 5. unlimited say

2 / 14

slide-4
SLIDE 4

Studying political effjcacy

  • 2002 WHO survey of people in China and Mexico.
  • Goal: determine feelings of political effjcacy.
  • Question: “How much say do you have in getting the government to

address issues that interest you?”

  • 1. No say at all
  • 2. little say
  • 3. some say
  • 4. a lot of say
  • 5. unlimited say

2 / 14

slide-5
SLIDE 5

Studying political effjcacy

  • 2002 WHO survey of people in China and Mexico.
  • Goal: determine feelings of political effjcacy.
  • Question: “How much say do you have in getting the government to

address issues that interest you?”

  • 1. No say at all
  • 2. little say
  • 3. some say
  • 4. a lot of say
  • 5. unlimited say

2 / 14

slide-6
SLIDE 6

Studying political effjcacy

  • 2002 WHO survey of people in China and Mexico.
  • Goal: determine feelings of political effjcacy.
  • Question: “How much say do you have in getting the government to

address issues that interest you?”

  • 1. No say at all
  • 2. little say
  • 3. some say
  • 4. a lot of say
  • 5. unlimited say

2 / 14

slide-7
SLIDE 7

Studying political effjcacy

  • 2002 WHO survey of people in China and Mexico.
  • Goal: determine feelings of political effjcacy.
  • Question: “How much say do you have in getting the government to

address issues that interest you?”

  • 1. No say at all
  • 2. little say
  • 3. some say
  • 4. a lot of say
  • 5. unlimited say

2 / 14

slide-8
SLIDE 8

Studying political effjcacy

  • 2002 WHO survey of people in China and Mexico.
  • Goal: determine feelings of political effjcacy.
  • Question: “How much say do you have in getting the government to

address issues that interest you?”

  • 1. No say at all
  • 2. little say
  • 3. some say
  • 4. a lot of say
  • 5. unlimited say

2 / 14

slide-9
SLIDE 9

Studying political effjcacy

  • 2002 WHO survey of people in China and Mexico.
  • Goal: determine feelings of political effjcacy.
  • Question: “How much say do you have in getting the government to

address issues that interest you?”

  • 1. No say at all
  • 2. little say
  • 3. some say
  • 4. a lot of say
  • 5. unlimited say

2 / 14

slide-10
SLIDE 10

Data

  • Load the data:

vignettes <- read.csv(”data/vignettes.csv”) head(vignettes) ## self alison jane moses china age ## 1 1 5 5 2 31 ## 2 1 1 5 5 54 ## 3 2 3 1 1 50 ## 4 2 4 2 1 22 ## 5 2 3 3 3 52 ## 6 1 3 1 5 50

3 / 14

slide-11
SLIDE 11

Contingency table

  • table() shows how many units are in each category of a variable:

table(vignettes$self) ## ## 1 2 3 4 5 ## 327 210 130 56 58

  • prop.table() converts these counts into proportions of units:

prop.table(table(vignettes$self)) ## ## 1 2 3 4 5 ## 0.4187 0.2689 0.1665 0.0717 0.0743

  • Useful way to visualize this information: barplot

4 / 14

slide-12
SLIDE 12

Contingency table

  • table() shows how many units are in each category of a variable:

table(vignettes$self) ## ## 1 2 3 4 5 ## 327 210 130 56 58

  • prop.table() converts these counts into proportions of units:

prop.table(table(vignettes$self)) ## ## 1 2 3 4 5 ## 0.4187 0.2689 0.1665 0.0717 0.0743

  • Useful way to visualize this information: barplot

4 / 14

slide-13
SLIDE 13

Contingency table

  • table() shows how many units are in each category of a variable:

table(vignettes$self) ## ## 1 2 3 4 5 ## 327 210 130 56 58

  • prop.table() converts these counts into proportions of units:

prop.table(table(vignettes$self)) ## ## 1 2 3 4 5 ## 0.4187 0.2689 0.1665 0.0717 0.0743

  • Useful way to visualize this information: barplot

4 / 14

slide-14
SLIDE 14

Contingency table

  • table() shows how many units are in each category of a variable:

table(vignettes$self) ## ## 1 2 3 4 5 ## 327 210 130 56 58

  • prop.table() converts these counts into proportions of units:

prop.table(table(vignettes$self)) ## ## 1 2 3 4 5 ## 0.4187 0.2689 0.1665 0.0717 0.0743

  • Useful way to visualize this information: barplot

4 / 14

slide-15
SLIDE 15

Contingency table

  • table() shows how many units are in each category of a variable:

table(vignettes$self) ## ## 1 2 3 4 5 ## 327 210 130 56 58

  • prop.table() converts these counts into proportions of units:

prop.table(table(vignettes$self)) ## ## 1 2 3 4 5 ## 0.4187 0.2689 0.1665 0.0717 0.0743

  • Useful way to visualize this information: barplot

4 / 14

slide-16
SLIDE 16

Contingency table

  • table() shows how many units are in each category of a variable:

table(vignettes$self) ## ## 1 2 3 4 5 ## 327 210 130 56 58

  • prop.table() converts these counts into proportions of units:

prop.table(table(vignettes$self)) ## ## 1 2 3 4 5 ## 0.4187 0.2689 0.1665 0.0717 0.0743

  • Useful way to visualize this information: barplot

4 / 14

slide-17
SLIDE 17

Contingency table

  • table() shows how many units are in each category of a variable:

table(vignettes$self) ## ## 1 2 3 4 5 ## 327 210 130 56 58

  • prop.table() converts these counts into proportions of units:

prop.table(table(vignettes$self)) ## ## 1 2 3 4 5 ## 0.4187 0.2689 0.1665 0.0717 0.0743

  • Useful way to visualize this information: barplot

4 / 14

slide-18
SLIDE 18

Barplot example

None A little Some A lot Unlimited Self-reported political efficacy Proportion of Respodents 0.0 0.1 0.2 0.3 0.4

5 / 14

slide-19
SLIDE 19

Barplots in R

  • The barplot() function can help us visualize a categorical variable:

barplot(height = prop.table(table(vignettes$self)), names = c(”None”, ”A little”, ”Some”, ”A lot”, ”Unlimited”), xlab = ”Self-reported political efficacy”, ylab = ”Proportion of Respodents”)

  • Arguments:
  • height: height each bar should take (proportions in this case)
  • names: vector of labels for the each category/bar
  • xlab, ylab are axis labels

6 / 14

slide-20
SLIDE 20

Barplots in R

  • The barplot() function can help us visualize a categorical variable:

barplot(height = prop.table(table(vignettes$self)), names = c(”None”, ”A little”, ”Some”, ”A lot”, ”Unlimited”), xlab = ”Self-reported political efficacy”, ylab = ”Proportion of Respodents”)

  • Arguments:
  • height: height each bar should take (proportions in this case)
  • names: vector of labels for the each category/bar
  • xlab, ylab are axis labels

6 / 14

slide-21
SLIDE 21

Barplots in R

  • The barplot() function can help us visualize a categorical variable:

barplot(height = prop.table(table(vignettes$self)), names = c(”None”, ”A little”, ”Some”, ”A lot”, ”Unlimited”), xlab = ”Self-reported political efficacy”, ylab = ”Proportion of Respodents”)

  • Arguments:
  • height: height each bar should take (proportions in this case)
  • names: vector of labels for the each category/bar
  • xlab, ylab are axis labels

6 / 14

slide-22
SLIDE 22

Barplots in R

  • The barplot() function can help us visualize a categorical variable:

barplot(height = prop.table(table(vignettes$self)), names = c(”None”, ”A little”, ”Some”, ”A lot”, ”Unlimited”), xlab = ”Self-reported political efficacy”, ylab = ”Proportion of Respodents”)

  • Arguments:
  • height: height each bar should take (proportions in this case)
  • names: vector of labels for the each category/bar
  • xlab, ylab are axis labels

6 / 14

slide-23
SLIDE 23

Barplots in R

  • The barplot() function can help us visualize a categorical variable:

barplot(height = prop.table(table(vignettes$self)), names = c(”None”, ”A little”, ”Some”, ”A lot”, ”Unlimited”), xlab = ”Self-reported political efficacy”, ylab = ”Proportion of Respodents”)

  • Arguments:
  • height: height each bar should take (proportions in this case)
  • names: vector of labels for the each category/bar
  • xlab, ylab are axis labels

6 / 14

slide-24
SLIDE 24

Barplots in R

  • The barplot() function can help us visualize a categorical variable:

barplot(height = prop.table(table(vignettes$self)), names = c(”None”, ”A little”, ”Some”, ”A lot”, ”Unlimited”), xlab = ”Self-reported political efficacy”, ylab = ”Proportion of Respodents”)

  • Arguments:
  • height: height each bar should take (proportions in this case)
  • names: vector of labels for the each category/bar
  • xlab, ylab are axis labels

6 / 14

slide-25
SLIDE 25

Histogram

  • Histograms visualize density of continuous/numeric variable.

7 / 14

slide-26
SLIDE 26

Histogram

  • Histograms visualize density of continuous/numeric variable.

Distribution of Respondent's Age

Age Density 20 40 60 80 0.00 0.01 0.02 0.03 0.04

7 / 14

slide-27
SLIDE 27

How to create histograms?

  • How to create a histogram by hand:
  • 1. create bins along the variable of interest
  • 2. count number of observations in each bin
  • 3. density = bin height

density = proportion of observations in bin bin width

  • The areas of the bins = proportion of observations in those bins.
  • area of the blocks sum to 1 (100%)
  • Can lead to confusion: height of block can go above 1!
  • With equal-width bins, height is proportional to proportion in bin.

8 / 14

slide-28
SLIDE 28

How to create histograms?

  • How to create a histogram by hand:
  • 1. create bins along the variable of interest
  • 2. count number of observations in each bin
  • 3. density = bin height

density = proportion of observations in bin bin width

  • The areas of the bins = proportion of observations in those bins.
  • area of the blocks sum to 1 (100%)
  • Can lead to confusion: height of block can go above 1!
  • With equal-width bins, height is proportional to proportion in bin.

8 / 14

slide-29
SLIDE 29

How to create histograms?

  • How to create a histogram by hand:
  • 1. create bins along the variable of interest
  • 2. count number of observations in each bin
  • 3. density = bin height

density = proportion of observations in bin bin width

  • The areas of the bins = proportion of observations in those bins.
  • area of the blocks sum to 1 (100%)
  • Can lead to confusion: height of block can go above 1!
  • With equal-width bins, height is proportional to proportion in bin.

8 / 14

slide-30
SLIDE 30

How to create histograms?

  • How to create a histogram by hand:
  • 1. create bins along the variable of interest
  • 2. count number of observations in each bin
  • 3. density = bin height

density = proportion of observations in bin bin width

  • The areas of the bins = proportion of observations in those bins.
  • area of the blocks sum to 1 (100%)
  • Can lead to confusion: height of block can go above 1!
  • With equal-width bins, height is proportional to proportion in bin.

8 / 14

slide-31
SLIDE 31

How to create histograms?

  • How to create a histogram by hand:
  • 1. create bins along the variable of interest
  • 2. count number of observations in each bin
  • 3. density = bin height

density = proportion of observations in bin bin width

  • The areas of the bins = proportion of observations in those bins.
  • area of the blocks sum to 1 (100%)
  • Can lead to confusion: height of block can go above 1!
  • With equal-width bins, height is proportional to proportion in bin.

8 / 14

slide-32
SLIDE 32

How to create histograms?

  • How to create a histogram by hand:
  • 1. create bins along the variable of interest
  • 2. count number of observations in each bin
  • 3. density = bin height

density = proportion of observations in bin bin width

  • The areas of the bins = proportion of observations in those bins.
  • ⇝ area of the blocks sum to 1 (100%)
  • Can lead to confusion: height of block can go above 1!
  • With equal-width bins, height is proportional to proportion in bin.

8 / 14

slide-33
SLIDE 33

How to create histograms?

  • How to create a histogram by hand:
  • 1. create bins along the variable of interest
  • 2. count number of observations in each bin
  • 3. density = bin height

density = proportion of observations in bin bin width

  • The areas of the bins = proportion of observations in those bins.
  • ⇝ area of the blocks sum to 1 (100%)
  • Can lead to confusion: height of block can go above 1!
  • With equal-width bins, height is proportional to proportion in bin.

8 / 14

slide-34
SLIDE 34

How to create histograms?

  • How to create a histogram by hand:
  • 1. create bins along the variable of interest
  • 2. count number of observations in each bin
  • 3. density = bin height

density = proportion of observations in bin bin width

  • The areas of the bins = proportion of observations in those bins.
  • ⇝ area of the blocks sum to 1 (100%)
  • Can lead to confusion: height of block can go above 1!
  • With equal-width bins, height is proportional to proportion in bin.

8 / 14

slide-35
SLIDE 35

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-36
SLIDE 36

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-37
SLIDE 37

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-38
SLIDE 38

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-39
SLIDE 39

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-40
SLIDE 40

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-41
SLIDE 41

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-42
SLIDE 42

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-43
SLIDE 43

Histograms in R

  • In R, we use hist() with freq = FALSE:

hist(x = vignettes$age, freq = FALSE, ylim = c(0, 0.04), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

  • Other arguments:
  • ylim sets the range of the y-axis to show.
  • main sets the title for the fjgure.
  • We can also choose the bin locations on our own via:
  • breaks: location of the bin breaks, or
  • nclass (number of bins)

hist(vignettes$age, freq = FALSE, breaks = c(0, 18, 25, 45, 65, 100), xlab = ”Age”, main = ”Distribution of Respondent's Age”)

9 / 14

slide-44
SLIDE 44

Creating our own bins

Distribution of Respondent's Age

Age Density 20 40 60 80 100 0.000 0.005 0.010 0.015 0.020 0.025

10 / 14

slide-45
SLIDE 45

Boxplot

  • A boxplot can characterize the distribution of continuous variables

11 / 14

slide-46
SLIDE 46

Boxplot

  • A boxplot can characterize the distribution of continuous variables

20 30 40 50 60 70 80 90

Distribution of Age

Age

11 / 14

slide-47
SLIDE 47

Boxplots in R

  • “Box” represents range between lower and upper quartile.
  • “Whiskers” represents either:
  • 1.5 × IQR or max/min of the data, whichever is smaller.
  • Points beyond whiskers are outliers.
  • Use boxplot() in R:

boxplot(vignettes$age, main = ”Distribution of Age”, ylab = ”Age”)

12 / 14

slide-48
SLIDE 48

Boxplots in R

  • “Box” represents range between lower and upper quartile.
  • “Whiskers” represents either:
  • 1.5 × IQR or max/min of the data, whichever is smaller.
  • Points beyond whiskers are outliers.
  • Use boxplot() in R:

boxplot(vignettes$age, main = ”Distribution of Age”, ylab = ”Age”)

12 / 14

slide-49
SLIDE 49

Boxplots in R

  • “Box” represents range between lower and upper quartile.
  • “Whiskers” represents either:
  • 1.5 × IQR or max/min of the data, whichever is smaller.
  • Points beyond whiskers are outliers.
  • Use boxplot() in R:

boxplot(vignettes$age, main = ”Distribution of Age”, ylab = ”Age”)

12 / 14

slide-50
SLIDE 50

Boxplots in R

  • “Box” represents range between lower and upper quartile.
  • “Whiskers” represents either:
  • 1.5 × IQR or max/min of the data, whichever is smaller.
  • Points beyond whiskers are outliers.
  • Use boxplot() in R:

boxplot(vignettes$age, main = ”Distribution of Age”, ylab = ”Age”)

12 / 14

slide-51
SLIDE 51

Boxplots in R

  • “Box” represents range between lower and upper quartile.
  • “Whiskers” represents either:
  • 1.5 × IQR or max/min of the data, whichever is smaller.
  • Points beyond whiskers are outliers.
  • Use boxplot() in R:

boxplot(vignettes$age, main = ”Distribution of Age”, ylab = ”Age”)

12 / 14

slide-52
SLIDE 52

Boxplots in R

  • “Box” represents range between lower and upper quartile.
  • “Whiskers” represents either:
  • 1.5 × IQR or max/min of the data, whichever is smaller.
  • Points beyond whiskers are outliers.
  • Use boxplot() in R:

boxplot(vignettes$age, main = ”Distribution of Age”, ylab = ”Age”)

12 / 14

slide-53
SLIDE 53

Boxplot

20 30 40 50 60 70 80 90

Distribution of Age

Age median upper quartile lower quartile IQR 1.5 x IQR

13 / 14

slide-54
SLIDE 54

Review

  • Visualizing single discrete/categorical variables: barplots
  • Visualizing continuous variables: histograms, boxplots

14 / 14

slide-55
SLIDE 55

Review

  • Visualizing single discrete/categorical variables: barplots
  • Visualizing continuous variables: histograms, boxplots

14 / 14