Learning From Data Lecture 19 A Peek At Unsupervised Learning k - - PowerPoint PPT Presentation

learning from data lecture 19 a peek at unsupervised
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 19 A Peek At Unsupervised Learning k - - PowerPoint PPT Presentation

Learning From Data Lecture 19 A Peek At Unsupervised Learning k -Means Clustering Probability Density Estimation Gaussian Mixture Models M. Magdon-Ismail CSCI 4100/6100 recap: Radial Basis Functions Nonparametric RBF Parametric k -RBF-Network


slide-1
SLIDE 1

Learning From Data Lecture 19 A Peek At Unsupervised Learning

k-Means Clustering Probability Density Estimation Gaussian Mixture Models

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap: Radial Basis Functions

Nonparametric RBF Parametric k-RBF-Network g(x) =

N

  • n=1
  • αn(x)

N

m=1 αm(x)

  • · yn

αn(x) = φ

  • |

| x−xn | | r

  • (bump on x)

r = 0.05

No Training h(x) = w0 +

k

  • j=1

wj · φ | | x − µj | | r

  • = wtΦ(x)

(bump on µj)

linear model given µj choose µj as centers of k-clusters of data

x y x y

k = 4, r = 1

k

k = 10, regularized

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 2 /23

Unsupervised learning − →

slide-3
SLIDE 3

Unsupervised Learning

  • Preprocessor to organize the data for supervised learning:

Organize data for faster nearest neighbor search Determine centers for RBF bumps.

  • Important to be able to organize the data to identify patterns.

Learn the patterns in data, e.g. the patterns in a language before getting into a supervised setting. amazon.com organizes books into categories

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 3 /23

Clustering digits − →

slide-4
SLIDE 4

Clustering Digits

21-NN rule, 10 Classes 10 Clustering of Data

1 2 3 4 6 7 8 9

Average Intensity Symmetry

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 4 /23

Clustering − →

slide-5
SLIDE 5

Clustering

A cluster is a collection of points S A k-clustering is a partition of the data into k clusters S1, . . . , Sk. ∪k

j=1Sj = D

Si ∩ Sj = ∅ for i = j Each cluster has a center µj

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 5 /23

k-means error − →

slide-6
SLIDE 6

How good is a clustering?

Points in a cluster should be similar (close to each other, and the center) Error in cluster j: Ej =

  • xn∈Sj

| | xn − µj | |2. k-Means Clustering Error: Ein(S1, . . . , Sk; µ1, . . . , µk) =

k

  • j=1

Ej =

N

  • n=1

| | xn − µ(xn) | |2 µ(xn) is the center of the cluster to which xn belongs.

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 6 /23

− →

slide-7
SLIDE 7

k-Means Clustering

You get to pick S1, . . . , Sk and µ1, . . . , µk to minimize Ein(S1, . . . , Sk; µ1, . . . , µk) If centers µj are known, picking the sets is easy:

Add to Sj all points closest to µj

If the clusters Sj are known, picking the centers is easy:

Center µj is the centroid of cluster Sj µj = 1 |Sj|

  • xn∈Sj

xn

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 7 /23

Lloyd’s algorithm − →

slide-8
SLIDE 8

Lloyd’s Algorithm for k-Means Clustering

Ein(S1, . . . , Sk; µ1, . . . , µk) =

N

  • n=1

| | xn − µ(xn) | |2

1: Initialize Pick well separated centers µj. 2: Update Sj to be all points closest µj.

Sj ← {xn : | | xn − µj | | ≤ | | xn − µℓ | | for ℓ = 1, . . . , k}.

3: Update µj to the centroid of Sj.

µj ← 1 |Sj|

  • xn∈Sj

xn

4: Repeat steps 2 and 3 until Ein stops decreasing. c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 8 /23

Update clusters − →

slide-9
SLIDE 9

Lloyd’s Algorithm for k-Means Clustering

Ein(S1, . . . , Sk; µ1, . . . , µk) =

N

  • n=1

| | xn − µ(xn) | |2

1: Initialize Pick well separated centers µj. 2: Update Sj to be all points closest µj.

Sj ← {xn : | | xn − µj | | ≤ | | xn − µℓ | | for ℓ = 1, . . . , k}.

3: Update µj to the centroid of Sj.

µj ← 1 |Sj|

  • xn∈Sj

xn

4: Repeat steps 2 and 3 until Ein stops decreasing. c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 9 /23

Update centers − →

slide-10
SLIDE 10

Lloyd’s Algorithm for k-Means Clustering

Ein(S1, . . . , Sk; µ1, . . . , µk) =

N

  • n=1

| | xn − µ(xn) | |2

1: Initialize Pick well separated centers µj. 2: Update Sj to be all points closest µj.

Sj ← {xn : | | xn − µj | | ≤ | | xn − µℓ | | for ℓ = 1, . . . , k}.

3: Update µj to the centroid of Sj.

µj ← 1 |Sj|

  • xn∈Sj

xn

4: Repeat steps 2 and 3 until Ein stops decreasing. c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 10 /23

Application to RBF-Network − →

slide-11
SLIDE 11

Application to k-RBF-Network

10-center RBF-network 300-center RBF-network

Choosing k - knowledge of problem (10 digits) or CV.

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 11 /23

Probability density estimation − →

slide-12
SLIDE 12

Probability Density Estimation

P(x) P(x) measures how likely it is to generate inputs similar to x. Estimating P(x) results in a ‘softer/finer’ representation than clustering

Clusters are regions of high probability.

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 12 /23

Parzen windows − →

slide-13
SLIDE 13

Parzen Windows – RBF density estimation

Basic idea: put a bump of ‘size’ (volume) 1

N on each data point.

x P(x)

ˆ P(x) = 1 Nrd

N

  • i=1

φ

|

| x − xi | | r

  • φ(z) =

1 (2π)d/2e− 1

2z2 c A

M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 13 /23

Digits data − →

slide-14
SLIDE 14

Digits Data

RBF Density Estimate Density Contours

x y

x y

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 14 /23

GMM − →

slide-15
SLIDE 15

The Gaussian Mixture Model (GMM)

Instead of N bumps − → k ≪ N bumps.

(Similar to nonparametric RBF − → parametric k-RBF-network)

Instead of uniform spherical bumps − → each bump has its own shape. Bump centers: µ1, . . . , µk Bump shapes: Σ1, . . . , Σk Gaussian formula for the bump: N(x; µj, Σj) = 1 (2π)d/2|Σj|1/2e−1

2(x − µj)tΣj−1(x − µj).

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 15 /23

GMM formula − →

slide-16
SLIDE 16

GMM Density Estimate

N(x; µj, Σj) = 1 (2π)d/2|Σj|1/2e−1

2(x − µj)tΣj−1(x − µj).

ˆ P(x) =

k

  • j=1

wj N(x; µj, Σj) (Sum of k weighted bumps).

wj > 0,

k

  • j=1

wj = 1 You get to pick {wj, µj, Σj}j=1,...,k

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 16 /23

Maximum likelihood − →

slide-17
SLIDE 17

Maximize Likelihood Estimation

Pick {wj, µj, Σj}j=1,...,k to best explain the data. Maximize the likelihood of the data given {wj, µj, Σj}j=1,...,k

(We saw this when we derived the cross entropy error for logistic regression)

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 17 /23

E-M algorithm − →

slide-18
SLIDE 18

Expectation-Maximization: The E-M Algorithm

A simple algorithm to get to the local minimum of the likelihood.

Partition variables into two sets. Given one-set, you can estimate the other ‘Bootstrap’ your way to a decent solution.

Lloyd’s algorithm for k-means is an example for ‘hard clustering’

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 18 /23

γnj − →

slide-19
SLIDE 19

Bump Memberships

Fraction of xn belonging to bump j (a ‘hidden variable’)

γnj

Nj =

N

  • n=1

γnj

(‘number’ of points in bump j)

wj = Nj N

(probability bump j)

µj = 1 Nj

N

  • n=1

γnjxn

(centroid of bump j)

Σj = 1 Nj

N

  • n=1

γnjxnxt

n − µjµt j (covariance matrix of bump j)

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 19 /23

Parameters given γnj − →

slide-20
SLIDE 20

Bump Memberships

Fraction of xn belonging to bump j (a ‘hidden variable’)

γnj

Nj =

N

  • n=1

γnj

(‘number’ of points in bump j)

wj = Nj N

(probability bump j)

µj = 1 Nj

N

  • n=1

γnjxn

(centroid of bump j)

Σj = 1 Nj

N

  • n=1

γnjxnxt

n − µjµt j (covariance matrix of bump j)

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 20 /23

Restimating γnj − →

slide-21
SLIDE 21

Re-Estimating Bump Memberships

γnj = wj N(xn; µj, Σj)

k

ℓ=1 wℓ N(xn; µℓ, Σℓ)

γnj is the probability that xn came from bump j

probability of bump j: wj probability density for xn given bump j: N(xn; µj, Σj)

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 21 /23

E-M Algorithm − →

slide-22
SLIDE 22

E-M Algorithm

E-M Algorithm for GMMs:

1: Start with estimates for the bump membership γnj. 2: Estimate wj, µj, Σj given the bump memberships. 3: Update the bump memberships given wj, µj, Σj; 4: Iterate to step 2 until convergence. c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 22 /23

GMM on digits − →

slide-23
SLIDE 23

GMM on Digits Data

10-center GMM Density Contours

x y

x y

c A M L Creator: Malik Magdon-Ismail

Unsupervised Learning: 23 /23