15-388/688 - Practical Data Science: Hypothesis testing and experimental design
- J. Zico Kolter
Carnegie Mellon University Fall 2019
1
15-388/688 - Practical Data Science: Hypothesis testing and - - PowerPoint PPT Presentation
15-388/688 - Practical Data Science: Hypothesis testing and experimental design J. Zico Kolter Carnegie Mellon University Fall 2019 1 Outline Motivation Background: sample statistics and central limit theorem Basic hypothesis testing
1
2
3
4
5
6
7
Population Sample Mean Variance
푖=1 푚
푖=1 푚
푖=1 푚
푖=1 푚
푖=1 푚
푖=1 푚
8
−휈+1 2
9
푖=1 푚
푖=1 푚
2
푖=1 푚
푖=1 푚
푖=1 푚
푖=1 푚
10
11
12
13
14
15
1 2) ∼ 𝑈푚−1 (Student’s t-
1 2
16
17
𝑞 = Area
18
import numpy as np import scipy.stats as st x = np.random.randn(m) # compute t statistic and p value xbar = np.mean(x) s2 = np.sum((x - xbar)**2)/(m-1) std_err = np.sqrt(s2/m) t = xbar/std_err t_dist = st.t(m-1) p = 2*td.cdf(-np.abs(t)) # with scipy alone t,p = st.ttest_1samp(x, 0)
19
20
# simple confidence interval compuation CI = lambda s,m,a : s / np.sqrt(m) * st.t(m-1).ppf(1-a/2)
21
22
1 , … , 𝑦1 푚1 ,
1 , … , 𝑦2 푚2
2, 𝑡2 2 for each group
2/𝑛1 + 𝑡2 2/𝑛2 1/2
2/𝑛1 + 𝑡2 2/𝑛2 2
2/𝑛1 2
2/𝑛2 2
23
24
25
푖=1 푚
2
1/2
26
27
Histogram of p values from ~3,500 published journal papers (from E. J. Masicampo and Daniel Lalande, A peculiar prevalence of p values just below .05, 2012)