Categorical Data Contingency Tables
STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation
STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation
Categorical Data Contingency Tables STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 / 29 Categorical Data Contingency Tables Frequency Tables How can we display a data set with categorical
Categorical Data Contingency Tables
Frequency Tables
How can we display a data set with categorical values? One option is simply a frequency table. Daily Weekly Monthly Semesterly N/R Total 9 28 18 23 13 91
Table: Results of a Survey of College Studentson Frequency of Video Game Playing (via Nolan and Speed, 2000)
3 / 29
Categorical Data Contingency Tables
Relative Frequency Tables
If we use proportions or percentages, we have a relative frequency table. Daily Weekly Monthly Semesterly N/R Total 0.0989 0.3077 0.1978 0.2527 0.1429 1.0000
Table: Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000)
4 / 29
Categorical Data Contingency Tables
Graphical Displays
Sometimes a chart is more effective than a table.
Figure: http://www.xkcd.com
5 / 29
Categorical Data Contingency Tables
Pie Charts
The pie chart is a popular choice for proportion data...
Daily ( 9.9%) Weekly ( 30.8%) Monthly ( 19.8%) Semesterly ( 25.3%) N/R ( 14.3%)
Figure: Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000)
6 / 29
Categorical Data Contingency Tables
Pie Charts...
Figure: http: //www.businessinsider.com/pie-charts-are-the-worst-2013-6
- “Pie charts are the Nickelback of data visualization.”
- “Pie charts are the Aquaman of data visualization.”
7 / 29
Categorical Data Contingency Tables
Bar Charts vs. Pie Charts
- Pie charts fail to convey anything useful with more than
maybe 4 categories
- Even with few categories, it’s difficult to judge differences in
slice size.
- People are better at judging linear size than area.
8 / 29
Categorical Data Contingency Tables
Bar Chart > Pie Chart
- Even in situations ideally suited to pie charts, it’s probably still
better to use bar charts. 9 / 29
Categorical Data Contingency Tables
The Verdict on Pie Charts
10 / 29
Categorical Data Contingency Tables
The One Exception
11 / 29
Categorical Data Contingency Tables
Bar Charts
Much easier to see differences between categories.
Daily Weekly Monthly Semesterly N/R % of respondents 10 20 30 40 50
Figure: Two bar plots of Video Game data showing frequency (left) and percentages (right)
12 / 29
Categorical Data Contingency Tables
Bar Charts
What’s this bar chart telling us?
Figure: A Fair and Balanced Bar Chart (from FOX News, 8/9/12)
13 / 29
Categorical Data Contingency Tables
Bar Chart Tips
- The cardinal rule of bar charts: Ratios in area = ratios in value
- The y-axis must start at 0!
- Equal distances = equal differences
14 / 29
Categorical Data Contingency Tables
Summary
Categorical Data
- Count occurrences of each value. Represent counts with a
frequency table, proportions with a relative frequency table.
- Pie charts are pretty useless. Prefer bar charts.
- Cardinal rule of bar charts: Ratios in area = ratios in value
- The y-axis must start at 0!
- Equal distances = equal differences
15 / 29
Categorical Data Contingency Tables
Contingency Tables
- Recall: with one categorical variable, we summarized by
counting the observations in each category
- With more than one variable, we do the same thing, but we
keep track of combinations.
- With two variables, we can store the counts in a two-way table
(also known as a contingency table). 17 / 29
Categorical Data Contingency Tables
A Simple Contingency Table
Student Sex Computer 1 M PC 2 F Mac 3 F PC 4 M PC 5 F PC 6 F Mac 7 M Mac 8 M PC = ⇒ Computer PC Mac Sex M 3 1 F 2 2 18 / 29
Categorical Data Contingency Tables
Proportions in a Context
Sometimes we want to ask about proportions within particular subsets of the data.
Example: Driving While Black/Brown
Armentrout, et al. (2007)1 reports data on a variety of outcomes related to traffic stops by the Los Angeles Police Department (LAPD). Two of the variables recorded are racial category of the driver and whether or not the vehicle was searched. Question of interest: Does the proportion of stops that lead to a search differ across racial categories?
1Armentrout, M., Goodrich, A., Nguyen, J., Ortega, L., Smith, L., &
Khadjavi, L.S. (2007). Cops and stops: Racial profiling and a preliminary statistics analysis of Los Angeles police department traffic stops and searches. Retrieved from http://www.public.asu.edu/ etcamach/AMSSI/reports/copsnstops.pdf
19 / 29
Categorical Data Contingency Tables
Proportions in a Context
Question of interest: Does the proportion of stops that lead to a search differ across racial categories? Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387 Pairs (3 min.):
- Identify the cases and the population the cases are drawn from.
- How would you address this question using this data?
20 / 29
Categorical Data Contingency Tables
Conditional Proportions
Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387
- We can group the cases according to one variable (e.g., driver
race), and look at the distribution of the other (searched or not) within each group.
- The resulting proportions are called conditional proportions:
proportions are computed within a context, i.e., cases that satisfy a certain condition. 21 / 29
Categorical Data Contingency Tables
Conditional Proportions
Searched Yes No Total Driver Race Hisp./Lat. 510/2336 1826/2336 2336 White 109/2190 2081/2190 2190 Black 240/1248 1008/1248 1248 Asian 15/502 486/502 502 Others 7/111 104/111 111 Total 882/6387 5505/6387 6387
22 / 29
Categorical Data Contingency Tables
Conditional Proportions
Searched Yes No Total Driver Race Hisp./Lat 0.218 0.782 2336 White 0.050 0.950 2190 Black 0.192 0.808 1248 Asian 0.032 0.968 502 Others 0.063 0.937 111 Total 0.138 0.862 6387
23 / 29
Categorical Data Contingency Tables
Conditional Proportions to Measure Association
Pairs, 1 min.: What do these conditional proportions tell us?
Grade Expected A B C Total Frequency Daily 0.56 0.33 0.11 9 Weekly 0.50 0.50 0.00 28 Monthly 0.11 0.67 0.22 18 Sem’ly 0.30 0.61 0.09 23 Total 0.36 0.55 0.09 78 Table: Results of a survey of college students on frequency of video game playing and expected grade in a stats class (Nolan and Speed, 2000)
24 / 29
Categorical Data Contingency Tables
A Three-Way Table
Figure: Religious Affiliation by Political Party, 2006 vs 2016 (via 538)
25 / 29
Categorical Data Contingency Tables
Religious and Political Affiliation
“In 2016, a whopping 35 percent of Republicans were white evangelical Protestants, 18 percent were white mainline Protestants, and 16 percent were white Catholics; together, those groups account for nearly 70 percent of the Republican base. But since 2006, the proportion of Americans identifying as white evangelical Protestant, white Catholic, and white mainline Protestant have all dropped by 5 or 6 percentage points.” From America’s Shifting Religious Makeup Could Spell Trouble For Both Parties, FiveThirtyEight.com 26 / 29
Categorical Data Contingency Tables
Age Breakdown by Religion
“Religious minorities like Muslims, Hindus and Buddhists; the religiously unaffiliated; and Hispanic Protestants and Catholics all have significant numbers of followers under the age of 30. And all of these groups disproportionately identify as Democrats. This youth and diversity might seem like a gift to the Democratic Party, but it also presents a serious challenge for politicians hoping to present a compelling vision to voters who have a wide range of values and priorities.”
27 / 29
Categorical Data Contingency Tables
Example: Medical Diagnosis
A test for a rare disease (affecting about 1 in 10,000 people) has been developed and shown to have high accuracy: 99% of those with the disease test positive, and 99% of those without it test negative. Pairs (3 min.): If you test positive, what are the chances you have the disease? Hint: Construct a contingency table with 1,000,000 people who may or may not have the disease and may or may not test positive. 28 / 29
Categorical Data Contingency Tables
Summary
Two Categorical Variables
- A two-way table called a contingency table contains the
number of times each combination of values appears.
- The conditional proportion of x given y is the proportion of
the time x occurs in the context of y.
- Conditional proportions are not symmetric. Direction of
conditioning can make a big difference to the apparent pattern when base rates are very different.
- We can use stacked or grouped bar plots to visualize joint