Hypothesis Testing
February 25, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter
1
Hypothesis Testing February 25, 2020 Data Science CSCI 1951A Brown - - PowerPoint PPT Presentation
Hypothesis Testing February 25, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Announcements .? 2 Today Two quick preliminaries: LoLN and CLT Follow up
February 25, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter
1
2
3
a large number times, the average will converge to the expected value
and uncorrelated, so will balance
https://en.wikipedia.org/wiki/Law_of_large_numbers
4
5
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00
6
0.75 1.5 2.25 3 1 2 3 4 5 6 7 8 9 10 11 12
3.00 2.00
7
1.5 3 4.5 6 1 2 3 4 5 6 7 8 9 10 11 12
1.00 1.00 6.00 2.00
8
10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12
10.00 20.00 40.00 20.00 10.00
9
10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12
10.00 20.00 40.00 20.00 10.00
I.e. test statistics are
distributed…
10
10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12
10.00 20.00 40.00 20.00 10.00
Can apply statistical methods designed for normal distributions even when underlying distribution is not normal
11
12
10 20 30 40 10 20 30 40 50 60 70 80 90100
Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?
10 20 30 40 10 20 30 40 50 60 70 80 90 100
10 20 30 40 10 20 30 40 50 60 70 80 90100
13
10 20 30 40 10 20 30 40 50 60 70 80 90100
Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?
10 20 30 40 10 20 30 40 50 60 70 80 90 100
10 20 30 40 10 20 30 40 50 60 70 80 90100
14
10 20 30 40 10 20 30 40 50 60 70 80 90100
Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?
10 20 30 40 10 20 30 40 50 60 70 80 90 100
10 20 30 40 10 20 30 40 50 60 70 80 90100
Central Limit Theorem: repeated measures of mean will be normally distributed, doesn’ t assume the population over which you are taking the mean is normally distributed.
15
16
assumption
lead to an explosive headline and are really hoping is true but you are a good scientist, so you will look to the data to confirm
17
assumption
lead to an explosive headline and are really hoping is true but you are a good scientist, so you will look to the data to confirm
18
assumption
lead to an explosive headline and are really hoping is true but you are a good scientist, so you will look to the data to confirm
19
assumption
lead to an explosive headline and are really hoping is true but you are a good scientist, so you will look to the data to confirm Thing you can model
20
assumption
lead to an explosive headline and are really hoping is true but you are a good scientist, so you will look to the data to confirm Thing you can model Matters for how you compute p-values…more soon
21
deviate from status quo without good reason :)
highly unlikely, then we can say we “reject the null hypothesis”
22
null hypothesis
statistic (often, this work has been done for you)
that theoretical distribution
23
“I swear literally like 80% of the answers are just (b)”
p(B) = 0.8 {b, not b}
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
24
“I swear literally like 80% of the answers are just (b)”
p(B) = 0.8 {b, not b}
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
Hypothesis
25
“I swear literally like 80% of the answers are just (b)”
p(B) = 0.8 {b, not b}
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
Observation/Sample
26
“I swear literally like 80% of the answers are just (b)”
p(B) = 0.8 {b, not b}
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
Test Statistic
27
“I swear literally like 80% of the answers are just (b)”
p(B) = 0.8 {b, not b}
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
Theoretical Distribution
28
“I swear literally like 80% of the answers are just (b)”
p(B) = 0.8 {b, not b}
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
P(#(b)s ≤ 4) = 0.0006
p-value
29
30
between two populations
states, do CS work harder than other majors (::rolling_eyes::)
distribution of some feature uniform across groups
college majors differ in terms of sociodemographic features
exploratory analyses, rather than hypothesis testing)
31
different between two populations
populated than red states, do CS work harder than other majors (::rolling_eyes::)
variable; is the distribution of some feature uniform across groups
features; do college majors differ in terms of sociodemographic features
32
different between two populations
populated than red states, do CS work harder than other majors (::rolling_eyes::)
variable; is the distribution of some feature uniform across groups
features; do college majors differ in terms of sociodemographic features
33
feature different between two populations
populated than red states, do CS work harder than other majors (::rolling_eyes::)
variable; is the distribution of some feature uniform across groups
features; do college majors differ in terms of sociodemographic features
34
35
http://www.censusscope.org/us/chart_age.html
Distribution of ages in the US Hypothesis: Mean age is 35.
36
http://www.censusscope.org/us/chart_age.html
Distribution of ages in the US Thoughts about test statistics? Hypothesis: Mean age is 35.
37
http://www.censusscope.org/us/chart_age.html
Distribution of ages in the US Hypothesis: Mean age is 35.
σ2 n
<latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit>38
http://www.censusscope.org/us/chart_age.html
Distribution of ages in the US Hypothesis: Mean age is 35.
σ2 n
<latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit>39
http://www.censusscope.org/us/chart_age.html
Distribution of ages in the US Hypothesis: Mean age is 35.
σ2 n
<latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit>expected
40
http://www.censusscope.org/us/chart_age.html
Distribution of ages in the US Hypothesis: Mean age is 35.
σ2 n
<latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit>variation (comes from normal distribution)
41
42
Why can we use a normally-distributed test statistic to evaluate mean age of a population?
43
Why can we use a normally-distributed test statistic to evaluate mean age of a population?
44
http://www.censusscope.org/us/chart_age.html
Distribution of mean ages given samples of US adults
σ2 n
<latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit>45
z = distance from mean in std units
46
z = (observed - expected) / standard deviation
47
z = (observed - expected) / standard deviation
test statistic
48
z = (observed - expected) / standard deviation cumulative density = p-value
49
50
“I swear literally like 80% of the answers are just (b)”
p(B) = 0.8 {b, not b}
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
P(#(b)s ≤ 4) = 0.0006
p-value
51
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
P(#(b)s ≤ 4) = 0.0006
p-value “assuming the null hypothesis were true, I would only expect to see a test statistic like this 0.006% of the time. How surprising!
52
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
P(#(b)s ≤ 4) = 0.0006
p-value “assuming the null hypothesis were true, I would only expect to see a test statistic like this 0.006% of the time. How surprising!
53
cdf
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00 0.93 0.73 0.44 0.21 0.07 0.02 0.00 0.00 0.00 0.00 0.00
P(#(b)s ≤ 4) = 0.0006
p-value “assuming the null hypothesis were true, I would only expect to see a test statistic like this 0.006% of the time. How surprising!
54
z = (observed - expected) / standard deviation cumulative density = p-value
significance level (set in advance)
55
z = (observed - expected) / standard deviation cumulative density = p-value Ha = test stat is less than expected “one tailed”
significance level (set in advance)
56
z = (observed - expected) / standard deviation cumulative density = p-value Ha = test stat is less than expected “one tailed”
significance level (set in advance)
57
z = (observed - expected) / standard deviation cumulative density = p-value Ha = test stat is different than expected “one tailed”
significance level (set in advance)
58
z = (observed - expected) / standard deviation cumulative density = p-value Ha = test stat is different than expected “two tailed”
significance level (set in advance)
59
http://www.censusscope.org/us/chart_age.html
H0 = mean age is 35 Ha = mean age is not 35
σ2 n
<latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit>60
http://www.censusscope.org/us/chart_age.html
H0 = mean age is 35 Ha = mean age is not 35
σ2 n
<latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit><latexit sha1_base64="zqIv5EpLuHtNgnVynuUg6FmDC1o=">ACHnicbVDLSgMxFM34rPVdekmWAQ3lpmi6EYounFZwT6gU0smzbShSWZMmIN8yVu/BU3LhQRXOnfmLaz0NYDFw7n3Mu9wQxo0q7rczN7+wuLScW8mvrq1vbBa2tusqSiQmNRyxSDYDpAijgtQ01Yw0Y0kQDxhpBIOLkd+4I1LRSFzrYUzaHPUEDSlG2kqdwvEDPIN+KBE2foCkuU/hIfR50nFT46tbqU1mKtrj6KacGpGmadQdEvuGHCWeBkpgzVTuHT70Y4URozJBSLc+NdsgqSlmJM37iSIxwgPUIy1LBeJEtc34vRTuW6ULw0jaEhqO1d8TBnGlhjywnRzpvpr2RuJ/XivR4WnbUBEnmg8WRQmDOoIjrKCXSoJ1mxoCcKS2lsh7iMbh7aJ5m0I3vTLs6ReLnluybs6KlbOszhyYBfsgQPgRNQAZegCmoAg0fwDF7Bm/PkvDjvzsekdc7JZnbAHzhfP1Uso0g=</latexit>61
http://www.censusscope.org/us/chart_age.html
H0 = mean age is 35 Ha = mean age is not 35
standard normal
62
http://www.censusscope.org/us/chart_age.html
H0 = mean age is 35 Ha = mean age is not 35
standard normal students t
63
http://www.censusscope.org/us/chart_age.html
H0 = mean age is 35 Ha = mean age is not 35
standard normal students t closer to standard normal as n increases
64
Grades 90 92 80 87 98 78 Null Hypothesis: The average grade is 85%.
65
66
Grades 90 92 80 87 98 78 Null Hypothesis: The average grade is 85%.
67
Grades 90 92 80 87 98 78 Null Hypothesis: The average grade is 85%.
68
Grades 90 92 80 87 98 78 Null Hypothesis: The average grade is 85%.
69
Grades 90 92 80 87 98 78 Null Hypothesis: The average grade is 85%.
70
Grades 90 92 80 87 98 78 Null Hypothesis: The average grade is 85%.
mean: 87.5 s: 7.5
71
Grades 90 92 80 87 98 78 Null Hypothesis: The average grade is 85%.
mean: 87.5 s: 7.5
72
Null Hypothesis: The average grade is 85%.
Grades 90 92 80 87 98 78
mean: 87.5 s: 7.5
73
Null Hypothesis: The average grade is 85%.
Grades 90 92 80 87 98 78
mean: 87.5 s: 7.5
74
Null Hypothesis: The average grade is 85%.
Grades 90 92 80 87 98 78
mean: 87.5 s: 7.5
75
http://www.censusscope.org/us/chart_age.html
76
http://www.censusscope.org/us/chart_age.html
mean 1 - mean 2
77
http://www.censusscope.org/us/chart_age.html
mean 1 - mean 2 zero
78
2020 95 92 83 87 98 75 2019 90 92 80 87 98 78 Null Hypothesis: The average grade is the same as last year.
79
2020 95 92 83 87 98 75 2019 90 92 80 87 98 78 Null Hypothesis: The average grade is the same as last year.
80
different between two populations
than red states, do CS work harder than other majors (::rolling_eyes::)
variable; is the distribution of some feature uniform across groups
features; do college majors differ in terms of sociodemographic features
81
82
1 2 3 4 a b c d
3 4 4 1
83
1 2 3 4 a b c d
3 4 4 1
Xi = count of answer i p(a) = p(b) = p(c) = p(d) = 0.25
84
Is this distribution significantly different than what we would expect by chance, assuming that in fact all answers are equally likely?
1 2 3 4 a b c d
3 4 4 1
Xi = count of answer i p(a) = p(b) = p(c) = p(d) = 0.25
85
1 2 3 4 a b c d
3 4 4 1
Observed Expected
0.75 1.5 2.25 3 a b c d
3 3 3 3
86
1 2 3 4 a b c d
3 4 4 1
Observed Expected
0.75 1.5 2.25 3 a b c d
3 3 3 3
Want to model the difference between these
87
0.5 1 a b c d
1 1
88
89
Should I use the total difference between observed and expected as my summary statistic? I.e.
i
90
Should I use the total difference between observed and expected as my summary statistic? I.e.
i
91
0.5 1 a b c d
1 1
i
92
0.125 0.25 0.375 0.5 a b c d
0.11 0.11 0.44
i
93
0.125 0.25 0.375 0.5 a b c d
0.11 0.11 0.44
i
Thoughts?
94
0.04 0.08 0.12 0.16 a b c d
0.037 0.037 0.148
X
i
(Xi − E(Xi))2 E(Xi)
<latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit>95
i
96
i
97
i
98
i
99
i
cdf that we can compute explicitly
100
i
1 2 3 4 a b c d
3 4 4 1
0.75 1.5 2.25 3 a b c d
3 3 3 3
101
i
1 2 3 4 a b c d
3 4 4 1
0.75 1.5 2.25 3 a b c d
3 3 3 3
not really remarkable
102
1 2 3 4 a b c d
3 4 4 1
Observed Expected
0.75 1.5 2.25 3 a b c d
3 3 3 3
Want to model the difference between these
103
Overall goal: Understand how cities differ in terms of professions. Null Hypothesis: No difference between Providence and Boston Art Tech Medicine PVD 35 33 32 Boston 25 30 45
104
Overall goal: Understand how cities differ in terms of professions. Null Hypothesis: No difference between Providence and Boston Art Tech Medicine PVD 35 33 32 Boston 25 30 45 “expected” “observed”
105
106
Null Hypothesis: No difference between Providence and Boston Art Tech Medicine PVD 35 33 32 Boston 25 30 45 “expected” “observed”
X
i
(Xi − E(Xi))2 E(Xi)
<latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit>107
Null Hypothesis: No difference between Providence and Boston Art Tech Medicine PVD 35 33 32 Boston 25 30 45 “expected” “observed”
X
i
(Xi − E(Xi))2 E(Xi)
<latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit>108
Null Hypothesis: No difference between Providence and Boston Art Tech Medicine PVD 35 33 32 Boston 25 30 45 “expected” “observed”
X
i
(Xi − E(Xi))2 E(Xi)
<latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit><latexit sha1_base64="+nxFYlAEC9qbXN+2+0COtMVmbg=">ACD3icbZBNS8MwGMdTX+d8q3r0EhzKdnC0Q9DjUASPE9wLrLWkWbqFJW1JUmGUfgMvfhUvHhTx6tWb38Z060E3Hwj58f8/D8nz92NGpbKsb2NpeWV1b20Ud7c2t7ZNf2OzJKBCZtHLFI9HwkCaMhaSuqGOnFgiDuM9L1x1e530gQtIovFOTmLgcDUMaUIyUljzxJEJ9yh0AoFwWu1pPIX+V2r3TeydIaZ1asujUtuAh2ARVQVMszv5xBhBNOQoUZkrJvW7FyUyQUxYxkZSeRJEZ4jIakrzFEnEg3ne6TwWOtDGAQCX1CBafq74kUcSkn3NedHKmRnPdy8T+vn6jgwk1pGCeKhHj2UJAwqCKYhwMHVBCs2EQDwoLqv0I8QjoZpSMs6xDs+ZUXodOo21bdvj2rNC+LOErgEByBKrDBOWiCG9ACbYDBI3gGr+DNeDJejHfjY9a6ZBQzB+BPGZ8/rjag=</latexit>109
110