Lecture 24:
−Autoencoders −ICA
Aykut Erdem
December 2017 Hacettepe University
Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe - - PowerPoint PPT Presentation
Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe University Last time Dimensionality Reduction Clustering - One way to summarize a complex real-valued data point with a single categorical variable
−Autoencoders −ICA
Aykut Erdem
December 2017 Hacettepe University
2
slide by Fereshteh Sadeghi
3
slide by Barnabás Póczos and Aarti Singh
4
Face Recognition Image Compression
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12Noise Filtering
x x’ U x
5
6
7
PCA ¡doesn’t ¡know ¡labels!
slide by Barnabás Póczos and Aarti Singh
8
Fisher Linear Discriminant
slide by Javier Hernandez Rivera
9
slide by Barnabás Póczos and Aarti Singh
10
slide by Barnabás Póczos and Aarti Singh
11
12
slide by Sanja Fidler
z = f (W x); ˆ x = g(V z)
13
slide by Sanja Fidler
z = f (W x); ˆ x = g(V z) min
W,V
1 2N
N
X
n=1
||x(n) − ˆ x(n)||2
14
slide by Sanja Fidler
z = f (W x); ˆ x = g(V z) min
W,V
1 2N
N
X
n=1
||x(n) − ˆ x(n)||2 min
W,V
1 2N
N
X
n=1
||x(n) − VW x(n)||2
15
slide by Sanja Fidler
z = f (W x); ˆ x = g(V z) min
W,V
1 2N
N
X
n=1
||x(n) − ˆ x(n)||2 min
W,V
1 2N
N
X
n=1
||x(n) − VW x(n)||2
16
slide by Sanja Fidler
17
slide by Sanja Fidler
18
Real data 30-d deep autoencoder 30-d logistic PCA 30-d PCA
slide by Sanja Fidler
19
20
slide by Kornel Laskowski and Dave Touretzky
to the most faithful representation in a reconstruction error sense (recall that we trained our autoencoder network using a mean-square error in an input reconstruction layer).
Gaussianity, since it penalizes datapoints close to the mean less that those that are far away.
21
slide by Kornel Laskowski and Dave Touretzky
components which are statistically independent, rather than just uncorrelated.
for any functions g1 and g2.
22
slide by Kornel Laskowski and Dave Touretzky
p(ξ1, ξ2, · · · , ξN) =
N
p(ξi)
ξiξj − ξi ξj = 0 , i ̸= j
g1(ξi)g2(ξj) − g1(ξi) g2(ξj) = 0 , i ̸= j
stronger requirement of independence, rather than uncorrelatedness.
PCA) exists, so ICA is implemented using neural network models.
descend/climb in.
in N-dimensional space; they need not be orthogonal.
(principal) components? When the generative distribution is uniquely determined by its first and second moments. This is true of only the Gaussian distribution.
23
slide by Kornel Laskowski and Dave Touretzky
24
slide by Kornel Laskowski and Dave Touretzky
¯ y = 1 1 + eWT ¯
ξ
(we’re trying to maximize the enclosed area representing information quantities).
25
slide by Kornel Laskowski and Dave Touretzky
H(p) = H(p|q) = I(p; q) = = =
entropy of distribution p of first neuron’s output
conditional entropy
H(p) − H(q|p) H(q) − H(p|q) mutual information
26
slide by Kornel Laskowski and Dave Touretzky
27
slide by Barnabás Póczos and Aarti Singh
6
ICA Estimation Sources Observation
Mixing
28
Paris Smaragdis
http://paris.cs.illinois.edu/demos/index.html