CS109A Introduction to Data Science
Pavlos Protopapas and Kevin Rader
Advanced Section #1: Linear Algebra and Hypothesis Testing
1
Advanced Section #1: Linear Algebra and Hypothesis Testing Will - - PowerPoint PPT Presentation
Advanced Section #1: Linear Algebra and Hypothesis Testing Will Claybaugh CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader 1 Advanced Section 1 WARNING This deck uses animations to focus attention and break apart complex
1
CS109A, PROTOPAPAS, RADER
2
CS109A, PROTOPAPAS, RADER
3
4
CS109A, PROTOPAPAS, RADER
same amount
5
CS109A, PROTOPAPAS, RADER
6
2
3 1 5 2
1 3 6 4 9 2 2 1 3 1
7 4
20
1 32 7 46 16 6 14
2,2,1 % 1,7, −2 1 5 2 3
4 1 1,5,2 % 3, −2,4 5 by 3 3 by 2 5 by 2
CS109A, PROTOPAPAS, RADER
5 1 4 2
7
2
3 1 5 2
1 3 6 4 9 2 2 1 3
4 20 1 7 46 6
3 2 3 9 1 4
+
%
%
2 1
6 2 3
% +
2CS109A, PROTOPAPAS, RADER 3 1
7 4
8
1 5 2 %
1
% +
3 1
5
% +
7
2
%
4
1 32
29
CS109A, PROTOPAPAS, RADER
10
4 1 4 2 3 2 3 9 1
+ % %
2 1
6 2
% +
11
CS109A, PROTOPAPAS, RADER
12
CS109A, PROTOPAPAS, RADER
function in the infinite basis 1, 𝑦, 𝑦,, 𝑦7, …
functions in a basis built on sines and cosines
yet another basis
(𝑦) = (0,1,0,
8 A , 0, 8 8,B , … )
but we only have x itself
to approximate
13
Taylor approximations to y=sin(x)
14
CS109A, PROTOPAPAS, RADER
3 1 2
3 2 9 7
15
3 2 3 9 3 2 3 9
𝑦 = 𝑦D =
3 2 3 9 1
2 7
𝐵 = 𝐵D =
CS109A, PROTOPAPAS, RADER
16
How do we write (-2,1) in this basis? Just multiply 𝐵F8 by (-2,1)
17
CS109A, PROTOPAPAS, RADER
directions
eigenvector; Here, (-2,5) is an eigenvector so (-4,10) is too
18
Original vectors: After multiplying by 2x2 matrix A:
CS109A, PROTOPAPAS, RADER
entirely)
matrix applies a rotation)
repetition (the matrix may scales some n-dimension subspace)
missing (shears)
eigenvalues
CS109A, PROTOPAPAS, RADER
multiplying by the matrix A is the same as just scaling by 𝜇 (x is then an eigenvector matching eigenvalue 𝜇)
diagonal of A
this produces a polynomial in 𝜇 which can be solved to find eigenvalues
20
21
CS109A, PROTOPAPAS, RADER
22
CS109A, PROTOPAPAS, RADER
23
Vector and Matrix dot product Invertibility 𝐵𝑦 = 𝑐 ; 𝑦 = 𝐵F8𝑐 Basis as a coordinate system for a space
2Span Other decompositions M = UP or M=PU 𝑁 = 𝑉𝛵𝑊D Eigenvalues 𝐵𝑦 = 𝜇𝑦 S = 𝑅𝐸𝑅F8
CS109A, PROTOPAPAS, RADER
24
25
CS109A, PROTOPAPAS, RADER
columns
26
AFTER A BREAK
27
CS109A, PROTOPAPAS, RADER
𝑠𝑓𝑡𝑞𝑝𝑜𝑡𝑓 = 𝛾8𝑔𝑓𝑏𝑢𝑣𝑠𝑓8 + 𝛾,𝑔𝑓𝑏𝑢𝑣𝑠𝑓, + 𝛾7𝑔𝑓𝑏𝑢𝑣𝑠𝑓7 +… 𝑧 = 𝑌𝛾
`
`
1. Drop the sqrt [why is that legal?] 2. Distribute the transpose 3. Distribute/FOIL all terms 4. Take the derivative with respect to 𝛾 (Matrix Cookbook (69) and (81): derivative of 𝛾D𝑏 is 𝑏D, …) 5. Simplify and solve for beta
28
CS109A, PROTOPAPAS, RADER
vectors and y?
features?
29
CS109A, PROTOPAPAS, RADER
actual y vector
30
𝑧 ^ = 𝑌𝛾 a = 𝑌 𝑌D𝑌 F8𝑌D𝑧
Observed response values Best we can do with a linear combination
31
CS109A, PROTOPAPAS, RADER
model format [y=Xβ]
from
different day, might we get that result instead?”
about where the data come from
32
Statistics 𝛾8 𝛾, 𝑚𝑝𝑡𝑡 Machine Learning
Optim al Betas
CS109A, PROTOPAPAS, RADER
𝑧~𝑂kl 𝑌𝛾, 𝜏,𝐽O
33
Image from: http://bolt.mph.ufl.edu/6050-6052/unit-4b/module-15/
𝜈g = 𝑦gβ 𝜏, unknown constant β vector of unknown constants 𝑧g~𝑂 𝜈g, 𝜏,
CS109A, PROTOPAPAS, RADER
34
CS109A, PROTOPAPAS, RADER
, rFs` t(uvwx)yz rFs`
` (𝑧 − 𝑌𝛾)D(𝑧 − 𝑌𝛾)
35
CS109A, PROTOPAPAS, RADER
36
CS109A, PROTOPAPAS, RADER
37
38
CS109A, PROTOPAPAS, RADER
39
CS109A, PROTOPAPAS, RADER
40
Model Stat A Stat B Stat C Stat D Value of Statistic Frequency Obs. Dataset Obs. Stat Dataset A Dataset B Dataset C Dataset D
CS109A, PROTOPAPAS, RADER
Ž (`••‘’“”’•)
41
CS109A, PROTOPAPAS, RADER
simulation is a little prettier
42
𝛾O€–– = [2.2, 5, 0, 1.6] 𝛾šgk8 𝛾šgk, 𝛾šgk7 … 𝛾šgk8B,BBB 𝛾|}~ = [2.2, 5, 3, 1.6] T-test for 𝜸𝟑 = 0 𝑌•žš 𝜏|}~ 𝑧šgk8 𝑧šgk, 𝑧šgk7 … 𝑧šgk8B,BBB
CS109A, PROTOPAPAS, RADER
43
Define Model Get Simulated Datasets/Statistics Compare to Observed Data Decision
CS109A, PROTOPAPAS, RADER
produce data that looks like ours?
‘weird’ should mean
Jargon: p values are “The probability, assuming the null model is exactly true, of seeing a value of [your statistic] as extreme or more extreme than what was seen in the
44
Results From Simulation Frequency Results from observed dataset Distribution of Simulation Results Simulations weirder than the observed data
CS109A, PROTOPAPAS, RADER
45
CS109A, PROTOPAPAS, RADER
46
47
CS109A, PROTOPAPAS, RADER
48
Dawn of Time β = -.3 β = -.2 β = -.1 β = 0 β = .1 β = .2 β = .3 Our Data β = -.4 β = -.4
CS109A, PROTOPAPAS, RADER
negative value of beta?
each of those models (but there are infinitely many…)
49
β β=MLE β=0 Can we rule these out? β=0 will be closer to matching the data (in terms of t statistic) than any other model in the set*; we only need to test β=0
* Non-trivial; true for student’s t but not for other measures
CS109A, PROTOPAPAS, RADER
β = -.2 β = -.4
pass them
50
Dawn of Time Our Data β = 0 β = .1 β = .2 β = .3 β = -.1 β = -.3 β = -.4
CS109A, PROTOPAPAS, RADER
some other threshold) against a no-effect model, meaning we reject the no-effect model
51
CS109A, PROTOPAPAS, RADER
52
CS109A, PROTOPAPAS, RADER
53
CS109A, PROTOPAPAS, RADER
54
Dawn of Time β = -.2 β = -.1 β = 0 β = .1 β = .2 Our Data Dawn of Time All other betas have their MLE values Other betas have different values Dawn of Time World is not linear World has non-Gaussian noise World is linear w/ MLE Gaussian noise β = -.2 β = -.1 β = 0 β = .1 β = .2 Our Data
CS109A, PROTOPAPAS, RADER
55
56
CS109A, PROTOPAPAS, RADER
57
CS109A, PROTOPAPAS, RADER
58
CS109A, PROTOPAPAS, RADER
59
CS109A, PROTOPAPAS, RADER
60