Machine Learning
Fall 2017
Professor Liang Huang
Unsupervised Learning
(Clustering: k-means, EM, mixture models)
(Chaps. 15-16 of CIML)
Machine Learning Fall 2017 Unsupervised Learning (Clustering: k - - PowerPoint PPT Presentation
Machine Learning Fall 2017 Unsupervised Learning (Clustering: k -means, EM, mixture models) Professor Liang Huang (Chaps. 15-16 of CIML) Roadmap CIML Chaps. 3, 4,5,7,11,17 ,18 so far: (large-margin) supervised learning online learning:
Fall 2017
Professor Liang Huang
Unsupervised Learning
(Clustering: k-means, EM, mixture models)
(Chaps. 15-16 of CIML)
, SGD (Pegasos)
Viterbi
2
y=-1 y=+1
the man bit the dog DT NN VBD DT NN
CIML Chaps. 3,4,5,7,11,17,18 CIML Chaps. 15,16 CIML Chaps. 1,9,10,13
3
1 Decision Trees 2 Limits of Learning 3 Geometry and Nearest Neighbors 4 The Perceptron 5 Practical Issues 6 Beyond Binary Classification 7 Linear Models 8 Bias and Fairness 9 Probabilistic Modeling 10Neural Networks 11Kernel Methods 12Learning Theory 13Ensemble Methods 14Efficient Learning 15Unsupervised Learning 16Expectation Maximization 17Structured Prediction 18Imitation Learning
week 5b week 1 week 2 weeks 3,4 week 5 weeks 7,8a week 5b next: week 8b,9a next: week 8b
extra topics covered: MIRA aggressive MIRA convex programming quadratic programming Pegasos dual Pegasos structured Pegasos in retrospect: should start with k-NN should cover logistic regression
in DL important
4
5
6
7
8
(a) −2 2 −2 2
9
(a) −2 2 −2 2
10
(a) −2 2 −2 2
11
(b) −2 2 −2 2
12
(c) −2 2 −2 2
13
(d) −2 2 −2 2
14
(e) −2 2 −2 2
15
(f) −2 2 −2 2
16
(g) −2 2 −2 2
17
(h) −2 2 −2 2
18
(i) −2 2 −2 2
19
(i) −2 2 −2 2
20
J 1 2 3 4 500 1000
21
22
23
(i) −2 2 −2 2
x p(x)
0.5 0.3 0.2 (a) 0.5 1 0.5 1 (b) 0.5 1 0.5 1
24
(a) −2 2 −2 2
25
(a) −2 2 −2 2
26
(b) −2 2 −2 2
“fractional assignments”
27
(c) L = 1 −2 2 −2 2
28
(d) L = 2 −2 2 −2 2
29
(e) L = 5 −2 2 −2 2
30
(f) L = 20 −2 2 −2 2
31
(f) L = 20 −2 2 −2 2
32
(b) −2 2 −2 2 (c) L = 1 −2 2 −2 2
33
L = log P(data) = log ΠjP(xj) = X
j
log P(xj) = X
j
log X
i
P(ci)P(xj | ci)
34
35
(i) −2 2 −2 2
(f) L = 20 −2 2 −2 2
CS 562 - EM
36
CS 562 - EM
37
convex auxiliary function converge to local maxima
KL-divergence
= D
CS 562 - EM
38
θold θnew
L (q, θ) ln p(X|θ)
CS 562 - EM
39
CS 562 - EM
40
CS 562 - EM
41
Wrist rotation Fingers extension
from the Isomap paper (Tenebaum, de Silva, Langford, Science 2000)
CS 562 - EM
42
all are spectral methods! -- i.e., using eigenvalues
CS 562 - EM
the variance under projection is maximal
“minimum reconstruction error”
projection yields min MSE reconstruction
43
CS 562 - EM
44
CS 562 - EM
45
CS 562 - EM
46
CS 562 - EM
47
CS 562 - EM
48
LLE or isomap
CS 562 - EM
49
PCA
CS 562 - EM
50
PCA
CS 562 - EM
51
LLE
CS 562 - EM
52
PCA
CS 562 - EM
53
LLE
CS 562 - EM
54