CS 6316 Machine Learning
Generative Models
Yangfeng Ji
Department of Computer Science University of Virginia
CS 6316 Machine Learning Generative Models Yangfeng Ji Department - - PowerPoint PPT Presentation
CS 6316 Machine Learning Generative Models Yangfeng Ji Department of Computer Science University of Virginia Basic Definition Data generation process An idealized process to illustrate the relations among domain set X , label set Y , and the
Department of Computer Science University of Virginia
2
4
5
5
5
6
7
◮ AdaBoost (lecture 05) ◮ SVMs (lecture 07) ◮ Feed-forward neural network (lecture 08)
7
8
8
learning — we have no idea about the true data distribution
8
Data Generation Model Generative Model p(y) q(y) p(x | y +1) q(x | y +1) p(x | y −1) q(x | y −1)
9
10
11
11
11
11
12
12
θ′ m
13
θ′ m
13
θ′ m
θ
m
13
m
m
14
m
m
i1 log q(yi; α)
i1 δ(yi +1)
i1 δ(yi −1)
14
m
m
i1 log q(yi; α)
i1 δ(yi +1)
i1 δ(yi −1)
i1 δ(yi +1)
14
15
15
15
15
17
17
17
◮ µ+ ∈ Rd: d parameters ◮ Σ+ ∈ Rd×d: d2 parameters
19
1,1
1,d
d,1
d,d
20
1,1
1,d
d,1
d,d
20
1,1
d,d
j,j
j,j
21
d
j,j)
22
d
j,j)
22
d
j,j)
22
d
j,j)
22
d
j,j)
22
25
25
◮ A mixture model with two Gaussian components
25
26
26
26
27
28
28
28
28
29
30
30
30
m
◮ E.g., the value of α depends on {µc, Σc}2
c1, vice versa
31
32
32
32
33
33
ℓ(θ)
log q(xi, zi)
log
· (1 − α)δ(zi2) · N(xi; µ2, Σ2)δ(zi2) (40)
δ(zi 2) log(1 − α) + δ(zi 2) log N(xi; µ2, Σ2)
ℓ(θ)
log q(xi, zi)
log
· (1 − α)δ(zi2) · N(xi; µ2, Σ2)δ(zi2) (40)
δ(zi 2) log(1 − α) + δ(zi 2) log N(xi; µ2, Σ2)
34
36
i1 ⇔ θ {α, µ1, Σ1, µ2, Σ2}
36
i1 ⇔ θ {α, µ1, Σ1, µ2, Σ2}
i1
i1, estimate the value of θ
36
37
38
38
38
ℓ(θ)
m
δ(zi 2) log(1 − α) + δ(zi 2) log N(xi; µ2, Σ2)
39
ℓ(θ)
m
δ(zi 2) log(1 − α) + δ(zi 2) log N(xi; µ2, Σ2)
m
39
ℓ(θ)
m
δ(zi 2) log(1 − α) + δ(zi 2) log N(xi; µ2, Σ2)
m
i1 δ(zi 1)
i1(δ(zi 1) + δ(zi 2))
i1 δ(zi 1)
39
m
m
40
m
m
40
m
m
41
m
m
µ1 1 m
m
γixi µ2 1 m
m
(1 − γi)xi Σ1
m
m
γi(xi − µ1)(xi − µ1)T Σ2
m
m
(1 − γi)(xi − µ2)(xi − µ2)T (52)
41
i1, maximize the log-likelihood
i1
42
43
z q(x, z)
z
45
46
KL(q′q)
q′(z | x) log q′(z | x) q(z | x) discrete
z
q′(z | x) log q′(z | x) q(z | x) dz continuous
46
47
z
47
z
47
z
47
KL(q′q)
z
q′(z | x) log q′(z | x) q(z | x) dz
48
KL(q′q)
z
q′(z | x) log q′(z | x) q(z | x) dz
z
q′(z | x) log q′(z | x)q(x) q(z, x) dz
z
q′(z | x) log q′(z | x)q(x) q(x | z)q(z) dz
48
KL(q′q)
z
q′(z | x) log q′(z | x) q(z | x) dz
z
q′(z | x) log q′(z | x)q(x) q(z, x) dz
z
q′(z | x) log q′(z | x)q(x) q(x | z)q(z) dz
z
q′(z | x)
q(z) + log q(x)
48
KL(q′q)
z
q′(z | x) log q′(z | x) q(z | x) dz
z
q′(z | x) log q′(z | x)q(x) q(z, x) dz
z
q′(z | x) log q′(z | x)q(x) q(x | z)q(z) dz
z
q′(z | x)
q(z) + log q(x)
48
Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
49