CS480/680 Winter 2020 Zahra Sheikhbahaee
CS480/680 Machine Learning Lecture 12: February 13th, 2020
Expectation-Maximization Zahra Sheikhbahaee
University of Waterloo
1
CS480/680 Machine Learning Lecture 12: February 13 th , 2020 - - PowerPoint PPT Presentation
CS480/680 Machine Learning Lecture 12: February 13 th , 2020 Expectation-Maximization Zahra Sheikhbahaee University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1 Outline -mean Clustering Gaussian Mixture model EM for
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
1
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
2
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
3
CS480/680 Winter 2020 Zahra Sheikhbahaee
each of which has a reasonably homogeneous visual appearance or which corresponds to objects or parts of
values of each is stored with 8 bits precision. Total cost of the original image transmission 24π bits Transmitting the identity of nearest centroid for each pixel has the total cost of π log! πΏ transmitting the πΏ centroid vectors requires 24πΏ bits The compressed image has the cost of 24πΏ + π log! πΏbits
University of Waterloo
4
CS480/680 Winter 2020 Zahra Sheikhbahaee
β Each cluster has a cluster center, called centroid (π! where π = 1 β¦ πΏ). β The sum of the squares of the distances of each data point to its closest vector π!, is a minimum β Each data point π¦" has a corresponding set of binary indicator variables π "!which represent whether data point π¦# belongs to cluster π or not π "! = 2 1 if π¦# is assigned to cluster π 0 otherwise
University of Waterloo
5
CS480/680 Winter 2020 Zahra Sheikhbahaee
β Each cluster has a cluster center, called centroid (π! where π = 1 β¦ πΏ). β The sum of the squares of the distances of each data point to its closest vector π!, is a minimum β Each data point π¦" has a corresponding set of binary indicator variables π "!which represent whether data point π¦" belongs to cluster π or not πΎ = B
"$% #
B
!$% &
π "! β₯ π¦" β π! β₯'
University of Waterloo
6
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
7
Initialize π% , β¦ , π& Iterations:
!", keeping the π" fixed by assigning each data point to the closest centroid
!" fixed by recomputing the centroids using the current
cluster membership
Repeat until convergence
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
8
Algorithm :
Initialize π" , β¦ , π# Iterations:
!", keeping the π" fixed by assigning each data point to the closest centroid
π
!" = (
1 if π = arg min
#
β₯ π¦! β π# β₯$ 0 otherwise
!" fixed by recomputing the centroids using the current cluster membership
ππΎ ππ" = β2 ?
!%& '
π
!" π¦! β π" = 0 β π" =
β! π
!"π¦!
β! π
!"
Repeat until convergence
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
9
CS480/680 Winter 2020 Zahra Sheikhbahaee
GH) I
GH) I
University of Waterloo
10
CS480/680 Winter 2020 Zahra Sheikhbahaee
GH) I
J! = Cat(π|π)
University of Waterloo
11
CS480/680 Winter 2020 Zahra Sheikhbahaee
GH) I
J! = Cat(π|π)
University of Waterloo
12
CS480/680 Winter 2020 Zahra Sheikhbahaee
GH) I
J! = Cat(π|π)
GH) I
University of Waterloo
13
CS480/680 Winter 2020 Zahra Sheikhbahaee
J
GH) I
University of Waterloo
14
CS480/680 Winter 2020 Zahra Sheikhbahaee
π π¨! = 1 π = π π π¨! = 1 π(π¨! = 1) β"#$
%
π π π¨
" = 1 π(π¨ " = 1) =
π!πͺ(π|π!, π―!) β"#$
!
π" πͺ(π|π", π―") Let assume we have i.i.d data set. The log-likelihood function is ln π π π, π, π― = ln 4
&#$ '
5
(!
π(π¦'|π¨')π π¨' = 5
'#$ )
ln 5
!#$ %
π! πͺ(π¦'|π!, Ξ£!)
University of Waterloo
15
CS480/680 Winter 2020 Zahra Sheikhbahaee
The log-likelihood function is ln π π π, π, π― = 5
'#$ )
ln 5
!#$ %
π! πͺ(π¦'|π!, Ξ£!)
(whenever one of the Gaussian components collapses onto a specific data point)
A total of πΏ! equivalent solutions because of the πΏ! ways of assigning πΏ sets of parameters to πΏ components.
University of Waterloo
16
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
17
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
18
Initialize π+ Iterations:
E step: Evaluate the posterior distribution of the latent variables π and compute π π, πBCD = B
E
π(π|π, πBCD) ln π(π, π|π) M step: Evaluate πFGH πFGH = ππ π max
I
π π, πBCD Check for the convergence of either the log-likelihood or the parameter values, otherwise πBCD β πFGH
CS480/680 Winter 2020 Zahra Sheikhbahaee
_H) *
GH) I
J%! πͺ(π¦_|πG, π―G)J%!
_H) *
GH) I
University of Waterloo
19
CS480/680 Winter 2020 Zahra Sheikhbahaee
I
I
University of Waterloo
20
CS480/680 Winter 2020 Zahra Sheikhbahaee
π π, π π, Ξ£, π = 4
'#$ )
4
!#$ %
π!
(!$ πͺ(π¦'|π!, π―!)(!$
The log-likelihood β π π, π = ln π π, π π, Ξ£, π = 5
'#$ )
5
!#$ %
π¨'! {ln π! β 1 2 [ln 2π + ln Ξ£! + (π¦' β π!)*Ξ£!
+$(π¦'βπ!)]}
πβ ππ! = 5
'#$ )
π¨'!Ξ£!
+$(π¦'βπ!) = 0 βΆ π! =
β' π¨'!π¦' β' π¨'!
University of Waterloo
21
CS480/680 Winter 2020 Zahra Sheikhbahaee
π π, π π, Ξ£, π = -
./0 1
3
π2
4#$ πͺ(π¦.|π2, π―2)4#$
The log-likelihood β π π, π = ln π π, π π, Ξ£, π = A
./0 1
A
2/0 3
π¨.2 {ln π2 β 1 2 [ln 2π + ln Ξ£2 + (π¦. β π2)IΞ£2
J0(π¦.βπ2)]}
πβ πΞ£2 = Ξ£2
J0 A ./0 1
π¨.2 {1 β(π¦. β π2)IΞ£2
J0(π¦. βπ2)} = 0 βΆ Ξ£2 = β. π¨.2(π¦. β π2)I(π¦.βπ2)
β. π¨.2
University of Waterloo
22
CS480/680 Winter 2020 Zahra Sheikhbahaee
π π, π π, Ξ£, π = 4
'#$ )
4
!#$ %
π!
(!$ πͺ(π¦'|π!, π―!)(!$
For computing π!, we add a constraint to the log-likelihood using the Lagrange multiplier β π π, π = ln π π, π π, Ξ£, π + π 5
!#$ %
π! β 1 πβ ππ! = 0 βΆ π! = β' π¨'! π
University of Waterloo
23
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
24
Initialize π+ Iterations:
E step: Evaluate πΏJ = π½ π¨FJ and compute π½E ln π π, π πBCD = B
FK" L
B
JK" #
π½ π¨FJ {ln πJ + ln πͺ(π¦F|πJ, π―J)} M step: Evaluate πFGH where π = {πJ, Ξ£J, πJ} Check for the convergence of either the log-likelihood or the parameter values, otherwise πBCD β πFGH
CS480/680 Winter 2020 Zahra Sheikhbahaee
J
J
University of Waterloo
25
CS480/680 Winter 2020 Zahra Sheikhbahaee
So β π, π is a lower bound for ln π π π .
University of Waterloo
26
CS480/680 Winter 2020 Zahra Sheikhbahaee
ln π π π β₯ β π, π . So β π, π is a lower bound for ln π π π . EM algorithm : Initialize π+ Iterations: E step: The lower bound β(π, π-./) is maximized w.r.t. π(π) while holding π-./ fixed. It achieves when π π β π(π|π, π)
University of Waterloo
27
CS480/680 Winter 2020 Zahra Sheikhbahaee
So β π, π is a lower bound for ln π π π .
Initialize π+ Iterations:
E step: The lower bound β(π, πBCD) is maximized w.r.t. π(π) while holding πBCD fixed. It achieves when π π β π(π|π, π) M step: The distribution π(π) is held fixed and β(π, π) is maximized w.r.t. π. Evaluate πFGH where π = {πJ, Ξ£J, πJ} Check for the convergence of either the log-likelihood or the parameter values,
πBCD β πFGH
University of Waterloo
28
The log-likelihood increases but now π π β π(π|π, π!"#). Since the KL divergence is nonnegative, this causes the log likelihood ln π π π to increase by at least as much as the lower bound does.
CS480/680 Winter 2020 Zahra Sheikhbahaee
The free energy β±(π4 π¨ , π) is a lower bound on β(π).
EM algorithm :
Initialize πT Iterations:
E step: Infer posterior distributions over hidden variables given a current parameter setting. π!(
(#$%) β arg max ')(
β±(π! π¨ , π(#)) , βπ β 1, β¦ , π π!(
(#$%) = π(π¨(|π¦(, π(#))
M step: Maximize β(π) w.r.t. π. π(#$%) β arg max
)
β±(π!
(#$%) π¨ , π)
π(#$%) β arg max
)
E
(
F ππ¨( π(π¨(|π¦(, π(#)) ln π(π¨(, π¦(|π)
University of Waterloo
29
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
30