Dimensionality Reduction
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
Dimensionality Reduction Jia-Bin Huang Virginia Tech Spring 2019 - - PowerPoint PPT Presentation
Dimensionality Reduction Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 3 due March 27. HW 4 out tonight J. Mark Sowers Distinguished Lecture Michael Jordan Pehong Chen Distinguished Professor
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
Department of Statistics and Electrical Engineering and Computer Sciences
autonomous driving
ฯ๐ log ๐(๐ฆ ๐ ; ๐) = ฯ๐ log ฯ๐จ(๐) ๐(๐ฆ ๐ , ๐จ(๐); ๐) = ฯ๐ log ฯ๐จ(๐) ๐ ๐ ๐จ ๐
๐ ๐ฆ ๐ ,๐จ ๐ ;๐ ๐ ๐(๐จ(๐))
โฅ ฯ๐ ฯ๐จ(๐) ๐ ๐ ๐จ ๐ log
๐ ๐ฆ ๐ ,๐จ ๐ ;๐ ๐ ๐(๐จ(๐))
Jensenโs inequality: ๐ ๐น ๐ โฅ ๐น[๐ ๐ ]
ฯ๐ log ๐(๐ฆ ๐ ; ๐) โฅ ฯ๐ ฯ๐จ(๐) ๐ ๐ ๐จ ๐ log
๐ ๐ฆ ๐ ,๐จ ๐ ;๐ ๐ ๐(๐จ(๐))
= ๐น[๐ ๐ ]
๐ ๐ฆ ๐ , ๐จ ๐ ; ๐ ๐ ๐(๐จ(๐)) = ๐
๐ ๐(๐จ(๐))
= ๐
=
๐ ๐ฆ ๐ ,๐จ ๐ ;๐ ฯ๐จ ๐ ๐ฆ ๐ ,๐จ ๐ ;๐ = ๐ ๐ฆ ๐ ,๐จ ๐ ;๐ ๐ ๐ฆ ๐ ;๐
= ๐(๐จ ๐ |๐ฆ ๐ ; ๐)
Repeat until convergence{ (E-step) For each ๐, set ๐ ๐ ๐จ ๐ โ ๐(๐จ ๐ |๐ฆ ๐ ; ๐) (Probabilistic inference) (M-step) Set ๐ โ argmax๐ ฯ๐ ฯ๐จ(๐) ๐ ๐ ๐จ ๐ log
๐ ๐ฆ ๐ ,๐จ ๐ ;๐ ๐ ๐(๐จ(๐))
}
Expectation Maximization (EM) Algorithm
๏จ ๏ฉ๏ท
๏ธ ๏ถ ๏ง ๏จ ๏ฆ ๏ฝ
z
z x ๏ฑ ๏ฑ
๏ฑ
| , log argmax ห p
Goal:
๏ ๏
๏จ ๏ฉ ๏จ ๏ฉ
๏ ๏
X f X f E E ๏ณ
Jensenโs Inequality Log of sums is intractable
See here for proof: www.stanford.edu/class/cs229/notes/cs229-notes8.ps
for concave functions f(x) (so we maximize the lower bound!)
Expectation Maximization (EM) Algorithm
๏จ ๏ฉ ๏จ ๏ฉ
๏ ๏
๏จ ๏ฉ ๏จ ๏ฉ ๏จ
๏ฉ
) ( , |
, | | , log | , log E
) (
t x z
p p p
t
๏ฑ ๏ฑ ๏ฑ
๏ฑ
x z z x z x
z
๏ฝ
๏จ ๏ฉ ๏จ ๏ฉ ๏จ
๏ฉ
) ( ) 1 (
, | | , log argmax
t t
p p ๏ฑ ๏ฑ ๏ฑ
๏ฑ
x z z x
z
๏ฝ
๏ซ
๏จ ๏ฉ๏ท
๏ธ ๏ถ ๏ง ๏จ ๏ฆ ๏ฝ
z
z x ๏ฑ ๏ฑ
๏ฑ
| , log argmax ห p
Goal:
๏จ ๏ฉ ๏จ ๏ฉ
๏ ๏
๏จ ๏ฉ ๏จ ๏ฉ ๏จ
๏ฉ
) ( , |
, | | , log | , log E
) (
t x z
p p p
t
๏ฑ ๏ฑ ๏ฑ
๏ฑ
x z z x z x
z
๏ฝ
๏จ ๏ฉ ๏จ ๏ฉ ๏จ
๏ฉ
) ( ) 1 (
, | | , log argmax
t t
p p ๏ฑ ๏ฑ ๏ฑ
๏ฑ
x z z x
z
๏ฝ
๏ซ
๏จ ๏ฉ๏ท
๏ธ ๏ถ ๏ง ๏จ ๏ฆ ๏ฝ
z
z x ๏ฑ ๏ฑ
๏ฑ
| , log argmax ห p
Goal:
๏ ๏
๏จ ๏ฉ ๏จ ๏ฉ
๏ ๏
X f X f E E ๏ณ
log of expectation of P(x|z) expectation of log of P(x|z)
EM for Mixture of Gaussians - derivation
๏จ ๏ฉ
๏ฅ
๏ ๏ท ๏ท ๏ธ ๏ถ ๏ง ๏ง ๏จ ๏ฆ ๏ญ ๏ญ ๏ฝ
m m m m n m
x ๏ฐ ๏ณ ๏ญ ๏ฐ๏ณ
2 2 2 exp
2 1
๏จ ๏ฉ
๏จ ๏ฉ
๏ฅ
๏ฝ ๏ฝ
m m m m n n n
m z x p x p ๏ฐ ๏ณ ๏ญ , , | , , , |
2 2 ฯ
ฯ ฮผ
1. E-step: 2. M-step:
๏จ ๏ฉ ๏จ ๏ฉ ๏ ๏ ๏จ ๏ฉ ๏จ ๏ฉ ๏จ
๏ฉ
) ( , |
, | | , log | , log E
) (
t x z
p p p
t
๏ฑ ๏ฑ ๏ฑ
๏ฑ
x z z x z x
z
๏ฅ
๏ฝ
๏จ ๏ฉ ๏จ ๏ฉ ๏จ
๏ฉ
) ( ) 1 (
, | | , log argmax
t t
p p ๏ฑ ๏ฑ ๏ฑ
๏ฑ
x z z x
z
๏ฅ
๏ฝ
๏ซ
EM for Mixture of Gaussians
๏จ ๏ฉ
๏ฅ
๏ ๏ท ๏ท ๏ธ ๏ถ ๏ง ๏ง ๏จ ๏ฆ ๏ญ ๏ญ ๏ฝ
m m m m n m
x ๏ฐ ๏ณ ๏ญ ๏ฐ๏ณ
2 2 2 exp
2 1
๏จ ๏ฉ
๏จ ๏ฉ
๏ฅ
๏ฝ ๏ฝ
m m m m n n n
m z x p x p ๏ฐ ๏ณ ๏ญ , , | , , , |
2 2 ฯ
ฯ ฮผ
1. E-step: 2. M-step:
๏จ ๏ฉ ๏จ ๏ฉ ๏ ๏ ๏จ ๏ฉ ๏จ ๏ฉ ๏จ
๏ฉ
) ( , |
, | | , log | , log E
) (
t x z
p p p
t
๏ฑ ๏ฑ ๏ฑ
๏ฑ
x z z x z x
z
๏ฅ
๏ฝ
๏จ ๏ฉ ๏จ ๏ฉ ๏จ
๏ฉ
) ( ) 1 (
, | | , log argmax
t t
p p ๏ฑ ๏ฑ ๏ฑ
๏ฑ
x z z x
z
๏ฅ
๏ฝ
๏ซ
) , , , | (
) ( ) ( 2 ) ( t t t n n nm
x m z p ฯ ฯ ฮผ ๏ฝ ๏ฝ ๏ก ๏ฅ ๏ฅ
๏ฝ
๏ซ n n nm n nm t m
x ๏ก ๏ก ๏ญ 1 ห
) 1 (
๏จ ๏ฉ
๏ฅ ๏ฅ
๏ญ ๏ฝ
๏ซ n m n nm n nm t m
x
2 ) 1 ( 2
ห 1 ห ๏ญ ๏ก ๏ก ๏ณ
N
n nm t m
๏ฅ
๏ฝ
๏ซ
๏ก ๏ฐ
) 1 (
ห
http://lasa.epfl.ch/teaching/lectures/ML_Phd/Notes/GP-GMM.pdf
Take derivative with respect to ๐๐
Take derivative with respect to ฯ๐
โ1
variables
parameters of the machine learning model. ๐ฆ(1) โ ๐2 โ ๐จ 1 โ ๐ ๐ฆ(2) โ ๐2 โ ๐จ 1 โ ๐ โฎ ๐ฆ(๐) โ ๐2 โ ๐จ ๐ โ ๐
๐ฆ2 ๐ฆ1 ๐จ1
parameters of the machine learning model. ๐ฆ(1) โ ๐2 โ ๐จ 1 โ ๐ ๐ฆ(2) โ ๐2 โ ๐จ 1 โ ๐ โฎ ๐ฆ(๐) โ ๐2 โ ๐จ ๐ โ ๐
๐ฆ2 ๐ฆ1 ๐จ1
๐ฆ1 ๐ฆ3 ๐ฆ2 ๐ฆ1 ๐ฆ3 ๐ฆ2 ๐จ1 ๐จ2 ๐จ1 ๐จ2
๐ฆ2 ๐ฆ1
to project the data, so as to minimize the projection error
๐ฆ2 ๐ฆ1 ๐ฃ(1) ๐ฃ(1) ๐ฃ(2)
๐ง ๐ฆ1 ๐ฆ2 ๐ฆ1
๐๐ = 1 ๐ เท
๐
๐ฆ๐
(๐)
Replace each ๐ฆ๐
(๐) with ๐ฆ๐ โ ๐๐
If different features on different scales, scale features to have comparable range of values ๐ฆ๐
(๐) โ
๐ฆ๐
(๐) โ ๐๐
๐ก
๐
ฮฃ = 1 ๐ เท
๐=1 ๐
๐ฆ ๐ ๐ฆ ๐
โค
[U, S, V] = svd(Sigma); U = ๐ฃ(1), ๐ฃ(2), โฏ , ๐ฃ(๐) โ ๐๐ร๐ Principal components: ๐ฃ(1), ๐ฃ(2), โฏ , ๐ฃ ๐ โ ๐๐
๐จ ๐ = ๐ฃ 1 , ๐ฃ 2 , โฏ , ๐ฃ ๐
โค๐ฆ(๐) โ ๐๐
zero mean) and optionally feature scaling
1 ๐ ฯ๐=1 ๐
๐ฆ ๐ ๐ฆ ๐
โค
โค
๐ฆ(๐)
(๐)
= ๐reduce๐จ(๐)
(๐)
โ ๐๐ ๐reduce โ ๐๐ร๐ ๐จ(๐) โ ๐๐ร1
๐ฆ2 ๐ฆ1 ๐ฆ2 ๐ฆ1
A morphable model for the synthesis of 3D faces, SIGGRAPH 1999
SMPL: Skinned multi-person linear model, SIGGRAPH Asia 2015
1 m ฯ๐
๐ฆ ๐ โ ๐ฆapprox
๐ 2
1 m ฯ๐ ๐ฆ ๐ 2
1 m ฯ๐ ๐ฆ ๐ โ๐ฆapprox ๐ 2 1 m ฯ๐ ๐ฆ ๐ 2
โค 0.01 (1%) โ99% of variance is retainedโ
๐ฆapprox
1
, ๐ฆapprox
2
, โฏ , ๐ฆapprox
๐
1 m ฯ๐ ๐ฆ ๐ โ๐ฆapprox ๐ 2 1 m ฯ๐ ๐ฆ ๐ 2
โค 0.01 ?
๐ก11 โฏ โฎ โฑ โฎ โฏ ๐ก๐๐
1 โ ฯ๐=1
๐
๐ก๐๐ ฯ๐=1
๐
๐ก๐๐ โค 0.01 ฯ๐=1
๐
๐ก๐๐ ฯ๐=1
๐
๐ก๐๐ โฅ 0.99
http://www.math.chalmers.se/Stat/Grundutb/GU/MSA220/S18/DimRed2.pdf