CSC 411 Lectures 16–17: Expectation-Maximization
Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla
University of Toronto
UofT CSC 411: 15-EM 1 / 33
CSC 411 Lectures 1617: Expectation-Maximization Roger Grosse, - - PowerPoint PPT Presentation
CSC 411 Lectures 1617: Expectation-Maximization Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 15-EM 1 / 33 A Generative View of Clustering Last time: hard and soft k-means algorithm Today:
UofT CSC 411: 15-EM 1 / 33
◮ This makes it possible to judge different methods ◮ It may help us decide on the number of clusters
◮ Then we adjust the model parameters to maximize the probability that
UofT CSC 411: 15-EM 2 / 33
UofT CSC 411: 15-EM 3 / 33
K
K
UofT CSC 411: 15-EM 4 / 33
UofT CSC 411: 15-EM 5 / 33
UofT CSC 411: 15-EM 6 / 33
N
◮ Singularities: Arbitrarily large likelihood when a Gaussian explains a
◮ Identifiability: Solution is invariant to permutations ◮ Non-convex
UofT CSC 411: 15-EM 7 / 33
K
K
UofT CSC 411: 15-EM 8 / 33
UofT CSC 411: 15-EM 9 / 33
K
N
N
K
UofT CSC 411: 15-EM 10 / 33
N
N
n=1 1[z(n)=k] x(n)
n=1 1[z(n)=k]
n=1 1[z(n)=k] (x(n) − µk)(x(n) − µk)T
n=1 1[z(n)=k]
N
UofT CSC 411: 15-EM 11 / 33
UofT CSC 411: 15-EM 12 / 33
UofT CSC 411: 15-EM 13 / 33
◮ In order to adjust the parameters, we must first solve the inference
◮ We cannot be sure, so it’s a distribution over all possibilities.
k
◮ Each Gaussian gets a certain amount of posterior probability for each
◮ We fit each Gaussian to the weighted datapoints ◮ We can derive closed form updates for all parameters UofT CSC 411: 15-EM 14 / 33
j
j
UofT CSC 411: 15-EM 15 / 33
j
UofT CSC 411: 15-EM 16 / 33
◮
P(x(i),z(i);Θ) p(z(i)=j|x(i),Θold) = P(x(i); Θold) and the inequality becomes an equality!
UofT CSC 411: 15-EM 17 / 33
UofT CSC 411: 15-EM 18 / 33
UofT CSC 411: 15-EM 19 / 33
Θ Q(Θ, Θold)
UofT CSC 411: 15-EM 20 / 33
j=1 p(z = j)p(x|z = j)
j=1 πjN(x|µj, Σj)
UofT CSC 411: 15-EM 21 / 33
k
k
k
k log(πk) +
k log(N(x(i); µk, Σk))
UofT CSC 411: 15-EM 22 / 33
k log(πk) +
k log(N(x(i); µk, Σk))
k .
N
k x(n)
N
k (x(n) − µk)(x(n) − µk)T
N
k
UofT CSC 411: 15-EM 23 / 33
◮ E-step: Evaluate the responsibilities given current parameters
k
j=1 πjN(x(n)|µj, Σj) ◮ M-step: Re-estimate the parameters given current responsibilities
N
k x(n)
N
k (x(n) − µk)(x(n) − µk)T
N
k ◮ Evaluate log likelihood and check for convergence
N
CSC 411: 15-EM 24 / 33
UofT CSC 411: 15-EM 25 / 33
UofT CSC 411: 15-EM 26 / 33
CSC 411: 15-EM 27 / 33
UofT CSC 411: 15-EM 28 / 33
UofT CSC 411: 15-EM 29 / 33
UofT CSC 411: 15-EM 30 / 33
UofT CSC 411: 15-EM 31 / 33
UofT CSC 411: 15-EM 32 / 33
◮ Solution: Variational inference (see CSC412) UofT CSC 411: 15-EM 33 / 33