Dimensionality Reduction & Embedding (part 2/2)
2
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/
Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI)
- Prof. Mike Hughes
Dimensionality Reduction & Embedding (part 2/2) Prof. Mike - - PowerPoint PPT Presentation
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Dimensionality Reduction & Embedding (part 2/2) Prof. Mike Hughes Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI) 2 What
2
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/
Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI)
3
Mike Hughes - Tufts COMP 135 - Spring 2019
Data Examples data x
Supervised Learning Unsupervised Learning Reinforcement Learning
{xn}N
n=1
Task summary
Performance measure
4
Mike Hughes - Tufts COMP 135 - Spring 2019
Supervised Learning Unsupervised Learning Reinforcement Learning
embedding
x2 x1
Unit Objectives
5
Mike Hughes - Tufts COMP 135 - Spring 2019
6
Mike Hughes - Tufts COMP 135 - Spring 2019
Nature, 2008
7
Mike Hughes - Tufts COMP 135 - Spring 2019
Where possible, we based the geographic origin on the observed country data for grandparents. We used a ‘strict consensus’ approach: if all observed grandparents originated from a single country, we used that country as the
countries, we excluded the individual. Where grandparental data were unavailable, we used the individual’s country of birth. Total sample size after exclusion: 1,387 subjects Features: over half a million variable DNA sites in the human genome Nature, 2008
8
Mike Hughes - Tufts COMP 135 - Spring 2019
9
Mike Hughes - Tufts COMP 135 - Spring 2019
Source: https://textbooks.math.gatech.edu/ila/eigenvectors.html
eigenvalues/
10
Mike Hughes - Tufts COMP 135 - Spring 2019
Goal: each feature’s mean = 0.0
11
Mike Hughes - Tufts COMP 135 - Spring 2019
“reconstruction” of a dataset
dim vector
12
Mike Hughes - Tufts COMP 135 - Spring 2019
min
m∈RF N
X
n=1
(xn − m)T (xn − m) m∗ = mean(x1, . . . xN)
13
Mike Hughes - Tufts COMP 135 - Spring 2019
14
Mike Hughes - Tufts COMP 135 - Spring 2019
F vector High- dim. data K vector Low-dim vector F x K Basis F vector mean
15
Mike Hughes - Tufts COMP 135 - Spring 2019
Training step: .fit()
16
Mike Hughes - Tufts COMP 135 - Spring 2019
Transformation step: .transform()
17
Mike Hughes - Tufts COMP 135 - Spring 2019
18
Mike Hughes - Tufts COMP 135 - Spring 2019 Credit: Erik Sudderth
information
19
Mike Hughes - Tufts COMP 135 - Spring 2019
Take K=50
20
Mike Hughes - Tufts COMP 135 - Spring 2019
21
Mike Hughes - Tufts COMP 135 - Spring 2019
= 1 N
N
X
n=1
xT
nxn
Var(X) = 1 N
N
X
n=1 F
X
f=1
x2
nf
22
Mike Hughes - Tufts COMP 135 - Spring 2019
= 1 N
N
X
n=1
xT
nxn
= 1 N
N
X
n=1
(zn1w1 + . . . + znKwK)T (zn1w1 + . . . + znKwK)
= 1 N
N
X
n=1 K
X
k=1
z2
nk
=
K
X
k=1
λk
Just sum up the top K eigenvalues!
Proportion of Variance Explained by first K components
23
Mike Hughes - Tufts COMP 135 - Spring 2019
k=1 λk
f=1 λf
24
Mike Hughes - Tufts COMP 135 - Spring 2019
PRO
CON
25
Mike Hughes - Tufts COMP 135 - Spring 2019
variance 1
this
26
Mike Hughes - Tufts COMP 135 - Spring 2019
27
Mike Hughes - Tufts COMP 135 - Spring 2019
28
Mike Hughes - Tufts COMP 135 - Spring 2019
F vector High- dim. data K vector Low-dim vector F x K Basis F vector mean F vector noise
✏i ∼ N(0, IF )
29
Mike Hughes - Tufts COMP 135 - Spring 2019
In terms of matrix math:
30
Mike Hughes - Tufts COMP 135 - Spring 2019
F vector High- dim. data K vector Low-dim vector F x K Basis F vector mean F vector noise
✏i ∼ N(0, 2 2 2 )
31
Mike Hughes - Tufts COMP 135 - Spring 2019
✏i ∼ N(0, 2 2 2 )
Is this noise model realistic?
32
Mike Hughes - Tufts COMP 135 - Spring 2019
✏i ∼ N(0, 2
1
2
2
2
3
)
feature estimation of variance
not be orthogonal
33
Mike Hughes - Tufts COMP 135 - Spring 2019
✏i ∼ N(0, 2
1
2
2
2
3
)
34
Mike Hughes - Tufts COMP 135 - Spring 2019
35
Mike Hughes - Tufts COMP 135 - Spring 2019
36
Mike Hughes - Tufts COMP 135 - Spring 2019
) ∈ %&
*+) approximates the utility ,#)
from the same user;
scores to the same item
37
Mike Hughes - Tufts COMP 135 - Spring 2019
38
Mike Hughes - Tufts COMP 135 - Spring 2019
=
39
Mike Hughes - Tufts COMP 135 - Spring 2019 Credit: Wikipedia
40
Mike Hughes - Tufts COMP 135 - Spring 2019
=
K K K K
41
Mike Hughes - Tufts COMP 135 - Spring 2019
λ1, λ2, . . . λK w1, w2, . . . wK
42
Mike Hughes - Tufts COMP 135 - Spring 2019
43
Mike Hughes - Tufts COMP 135 - Spring 2019
44
Mike Hughes - Tufts COMP 135 - Spring 2019
INPUT: Each image represented by 784-dimensional vector Apply PCA transformation with K=2 OUTPUT: Each image is a 2-dimensional vector
45
Mike Hughes - Tufts COMP 135 - Spring 2019 Credit: Luuk Derksen (https://medium.com/@luckylwk/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python-
8ef87e7915b)
46
Mike Hughes - Tufts COMP 135 - Spring 2019 Credit: Luuk Derksen (https://medium.com/@luckylwk/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python-
8ef87e7915b)
47
Mike Hughes - Tufts COMP 135 - Spring 2019
48
Mike Hughes - Tufts COMP 135 - Spring 2019
https://distill.pub/2016/misread-tsne/
PCA to ~30 dims, then apply t-SNE
49
Mike Hughes - Tufts COMP 135 - Spring 2019
50
Goal: map each word in vocabulary to an embedding vector
vec(swimming) – vec(swim) + vec(walk) = vec(walking)
51
Goal: map each word in vocabulary to an embedding vector
Training
52
Reward embeddings that predict nearby words in the sentence. tacos s t a f f dinosaur hammer embedding dimensions typical 100-1000
Goal: learn weights
Credit: https://www.tensorflow.org/tutorials/representation/word2vec
3.2
7.1
fixed vocabulary typical 1000-100k
W = W
53
Mike Hughes - Tufts COMP 135 - Spring 2019 Credit: Ivanov & Burnaev ICML 2018 Credit: Choi et al. KDD 2016