Clustering and Dimensionality Reduction Preview Clustering K - - PowerPoint PPT Presentation

clustering and dimensionality reduction preview
SMART_READER_LITE
LIVE PREVIEW

Clustering and Dimensionality Reduction Preview Clustering K - - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction Preview Clustering K -means clustering Mixture models Hierarchical clustering Dimensionality reduction Principal component analysis Multidimensional scaling Isomap


slide-1
SLIDE 1

Clustering and Dimensionality Reduction

slide-2
SLIDE 2

Preview

  • Clustering

– K-means clustering – Mixture models – Hierarchical clustering

  • Dimensionality reduction

– Principal component analysis – Multidimensional scaling – Isomap

slide-3
SLIDE 3

Unsupervised Learning

  • Problem: Too much data!
  • Solution: Reduce it
  • Clustering: Reduce number of examples
  • Dimensionality reduction:

Reduce number of dimensions

slide-4
SLIDE 4

Clustering

  • Given set of examples
  • Divide them into subsets of “similar” examples
  • How to measure similarity?
  • How to evaluate quality of results?
slide-5
SLIDE 5

K-Means Clustering

  • Pick random examples as initial means
  • Repeat until convergence:

– Assign each example to nearest mean – New mean = Average of examples assigned to it

slide-6
SLIDE 6

K-Means Works If . . .

  • Clusters are spherical
  • Clusters are well separated
  • Clusters are of similar volumes
  • Clusters have similar numbers of points
slide-7
SLIDE 7

Mixture Models

P(x) =

nc

  • i=1

P(ci)P(x|ci) Objective function: Log likelihood of data Naive Bayes: P(x|ci) = nd

j=1 P(xj|ci)

AutoClass: Naive Bayes with various xj models Mixture of Gaussians: P(x|ci) = Multivariate Gaussian In general: P(x|ci) can be any distribution

slide-8
SLIDE 8

Mixtures of Gaussians

p(x) x

P(x|µi) = 1 √ 2πσ2 exp

  • −1

2 x − µi σ 2

slide-9
SLIDE 9

The EM Algorithm

Initialize parameters ignoring missing information Repeat until convergence: E step: Compute expected values of unobserved variables, assuming current parameter values M step: Compute new parameter values to maximize probability of data (observed & estimated) (Also: Initialize expected values ignoring missing info)

slide-10
SLIDE 10

EM for Mixtures of Gaussians

Initialization: Choose means at random, etc. E step: For all examples xk:

P(µi|xk) = P(µi)P(xk|µi) P(xk) = P(µi)P(xk|µi)

  • i′ P(µi′)P(xk|µi′)

M step: For all components ci:

P(ci) = 1 ne

ne

  • k=1

P(µi|xk) µi =

ne

k=1 xk P(µi|xk)

ne

k=1 P(µi|xk)

σ2

i

=

ne

k=1(xk − µi)2 P(µi|xk)

ne

k=1 P(µi|xk)

slide-11
SLIDE 11

Mixtures of Gaussians (cont.)

  • K-means clustering ≺ EM for mixtures of Gaussians
  • Mixtures of Gaussians ≺ Bayes nets
  • Also good for estimating joint distribution of

continuous variables

slide-12
SLIDE 12

Hierarchical Clustering

  • Agglomerative clustering

– Start with one cluster per example – Merge two nearest clusters (Criteria: min, max, avg, mean distance) – Repeat until all one cluster – Output dendrogram

  • Divisive clustering

– Start with all in one cluster – Split into two (e.g., by min-cut) – Etc.

slide-13
SLIDE 13

Dimensionality Reduction

  • Given data points in d dimensions
  • Convert them to data points in r < d dimensions
  • With minimal loss of information
slide-14
SLIDE 14

Principal Component Analysis

Goal: Find r-dim projection that best preserves variance

  • 1. Compute mean vector µ and covariance matrix Σ
  • f original points
  • 2. Compute eigenvectors and eigenvalues of Σ
  • 3. Select top r eigenvectors
  • 4. Project points onto subspace spanned by them:

y = A(x − µ) where y is the new point, x is the old one, and the rows of A are the eigenvectors

slide-15
SLIDE 15

Multidimensional Scaling

Goal: Find projection that best preserves inter-point distances xi Point in d dimensions yi Corresponding point in r < d dimensions δij Distance between xi and xj dij Distance between yi and yj

  • Define (e.g.)

E(y) =

  • i,j

dij − δij δij 2

  • Find yi’s that minimize E by gradient descent
  • Invariant to translations, rotations and scalings
slide-16
SLIDE 16

Isomap

Goal: Find projection onto nonlinear manifold

  • 1. Construct neighborhood graph G:

For all xi, xj If distance(xi, xj) < ǫ Then add edge (xi, xj) to G

  • 2. Compute shortest distances along graph δG(xi, xj)

(e.g., by Floyd’s algorithm)

  • 3. Apply multidimensional scaling to δG(xi, xj)
slide-17
SLIDE 17

Summary

  • Clustering

– K-means clustering – Mixture models – Hierarchical clustering

  • Dimensionality reduction

– Principal component analysis – Multidimensional scaling – Isomap