EM, K-Means, GNG 4-6-16 Reading Quiz Which of the following can be - - PowerPoint PPT Presentation

em k means gng
SMART_READER_LITE
LIVE PREVIEW

EM, K-Means, GNG 4-6-16 Reading Quiz Which of the following can be - - PowerPoint PPT Presentation

EM, K-Means, GNG 4-6-16 Reading Quiz Which of the following can be considered an instance of the EM algorithm? a) Agglomerative clustering b) Divisive clustering c) K-means clustering d) Growing neural gas EM algorithm E step:


slide-1
SLIDE 1

EM, K-Means, GNG

4-6-16

slide-2
SLIDE 2

Reading Quiz

Which of the following can be considered an instance of the EM algorithm? a) Agglomerative clustering b) Divisive clustering c) K-means clustering d) Growing neural gas

slide-3
SLIDE 3

EM algorithm

E step: “expectation” … terrible name

  • Classify the data using the current model.

M step: “maximization” … slightly less terrible name

  • Generate the best model using the current classification of the data.

Initialize the model, then alternate E and M steps until convergence.

slide-4
SLIDE 4

K-means algorithm

Model: k clusters each represented by a centroid. E step:

  • Assign each point to the closest centroid.

M step:

  • Move each centroid to the mean of the points assigned to it.

Convergence: we ran an E step where no points had their assignment changed.

slide-5
SLIDE 5

Initializing k-means

Reasonable options: 1) Start with a random E step.

  • Randomly assign each point to a cluster in {1, 2, …, k}.

2) Start with a random M step. a) Pick random centroids within the maximum range of the data. b) Pick random data points to use as initial centroids.

slide-6
SLIDE 6

K-means in action

slide-7
SLIDE 7

Other examples of EM

  • Naive bayes soft clustering (from the reading)
  • Gaussian mixture model clustering
slide-8
SLIDE 8

Gaussian mixture models

A Gaussian distribution is a multivariate generalization

  • f a normal distribution (the classic bell curve).

A Gaussian mixture is a distribution comprised of several independent Gaussians. If we model our data as a Gaussian mixture, we’re saying that each data point was a random draw from

  • ne of several Gaussian distributions (but we may not

know which).

slide-9
SLIDE 9

EM for Gaussian mixture models

Model: data drawn from a mixture of k Gaussians E step:

  • Compute the (log) likelihood of the data

○ Each point’s probability of being drawn from each Gaussian. M step:

  • Update the mean and covariance of each Gaussian.

○ Weighted by how responsible that Gaussian was for each data point.

slide-10
SLIDE 10

How do we pick k?

There’s no hard rule.

  • Sometimes the application for which the clusters will be used dictates k.
  • If k can be flexible, then we need to consider the tradeoffs:

○ Higher k will always decrease the error (increase the likelihood). ○ Lower k will always produce a simpler model.

slide-11
SLIDE 11

Growing neural gas

0) Start with two random connected nodes, then repeat 1...9: 1) Pick a random data point. 2) Find the two closest nodes to the data point. 3) Increment the age of all edges from the closest node. 4) Add the squared distance to the error of the closest node. 5) Move the closest node and all of its neighbors toward the data point.

  • Move the closest node more than its neighbors.

6) Connect the two closest nodes or reset their edge age. 7) Remove old edges; if a node is isolated, delete it. 8) Every λ iterations, add a new node.

  • Between the highest-error node and its highest-error neighbor

9) Decay all errors.

slide-12
SLIDE 12

Adjusting nodes based on one data point

slide-13
SLIDE 13

These edges get aged. These node’s error increases.

Adjusting nodes based on one data point

slide-14
SLIDE 14

Every λ iterations, add a new node

Highest error node. Highest error neighbor.

slide-15
SLIDE 15

Growing neural gas

0) Start with two random connected nodes, then repeat 1...9: 1) Pick a random data point. 2) Find the two closest nodes to the data point. 3) Increment the age of all edges from the closest node. 4) Add the squared distance to the error of the closest node. 5) Move the closest node and all of its neighbors toward the data point.

  • Move the closest node more than its neighbors.

6) Connect the two closest nodes or reset their edge age. 7) Remove old edges. 8) Every λ iterations, add a new node.

  • Between the highest-error node and its highest-error neighbor

9) Decay all errors.

slide-16
SLIDE 16

Growing neural gas in action

slide-17
SLIDE 17

Discussion question

What unsupervised learning problem is growing neural gas solving? Is it clustering? Is it dimensionality reduction? Is it something else?