Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! - - PowerPoint PPT Presentation

clustering and k means root mean square error rms
SMART_READER_LITE
LIVE PREVIEW

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! - - PowerPoint PPT Presentation

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , , ! x N R d Approximations: ! z 1 , ! z 2 , , ! z N R d x i ! ! N 1 2 Root Mean Square error = y i 2 z N i = 1 PCA based predic>on Data: !


slide-1
SLIDE 1

Clustering and K-means

slide-2
SLIDE 2

Root Mean Square Error (RMS)

Data: ! x1, ! x2,…, ! xN ∈Rd Approximations: ! z1,! z2,…,! zN ∈Rd Root Mean Square error = 1 N ! xi − ! yi 2

2 i=1 N

z

slide-3
SLIDE 3

PCA based predic>on

Data: ! x1, ! x2,…, ! xN ∈Rd Mean vector: ! µ Top k eigenvectors: ! v1, ! v2,…, ! vk Approximation of ! x j : !

  • j = !

µ +

i=1 k

! vi ⋅ ! x j

( )!

vi

x x x x x x x x

  • o
  • RMS Error =

1 N i=1

N

! xi − !

  • i 2

2

slide-4
SLIDE 4

Regression based Predic>on

Data: (! x1,y1), (! x2,y2), … , (! xN,yN ) ∈Rd Input: ! x ∈Rd Output: y ∈R Approximation of y given ! x : ˆ y = a0 + ai

i=1 d

xi

x x x x x x x x

  • RMS Error =

1 N i=1

N

yi − ˆ yi

( )

2

slide-5
SLIDE 5

K-means clustering

RMS Error = 1 N i=1

N

! xi − !

  • i 2

2

Data: ! x1, ! x2,…, ! xN ∈Rd Model: k representatives: ! r

1,!

r

2,…,!

r

k ∈Rd

Approximation of ! x j : !

  • j = argmin!

r

i

! x j − ! r

i 2 2

= the representative closest to ! x j

slide-6
SLIDE 6

K-means Algorithm

Initialize k representatives ! r

1,!

r

2,…,!

r

k ∈Rd

Iterate until convergence:

  • a. Associate each !

xi with it's closest representative xi "! → rj " !

  • b. Replace each representative !

rj with the mean of the points assigned to ! rj Both a step and b step reduce RMSE

slide-7
SLIDE 7
slide-8
SLIDE 8

Simple Ini>aliza>on

Simplest Ini>aliza>on: choose representa>ve from data points independently at random.

– Problem: some representa>ves are close to each

  • ther and some parts of the data have no

representa>ves. – Kmeans is a local search method – can get stuck in local minima.

slide-9
SLIDE 9

Kmeans++

Data: ! x1,…, ! xN Current Reps: ! r

1,…,!

rj Distance of example to Reps: d(! x,{! r

1,…,!

rj}) = min1≤i≤ j‖! x − ! r

i‖

  • Prob. of selecting example !

x as next representative: P(! x) = 1 Z 1 d(! x,{! r

1,…,!

rj})

  • A different method for ini>alizing representa>ves.
  • Spreads out ini>al representa>ves
  • Add representa>ves one by one
  • Before adding representa>ve, define distribu>on
  • ver unselected data points.
slide-10
SLIDE 10

Example for Kmeans++

This is an unlikely ini>aliza>on for kmeans++

slide-11
SLIDE 11

Parallelized Kmeans

  • Suppose the data points are par>>oned randomly across

several machines.

  • We want to perform the a,b steps with minimal

communica>on btwn machines.

  • 1. Choose ini>al representa>ves and broadcast to all

machines.

  • 2. Each machine par>>ons its own data points according to

closest representa>ve. Defines (key,value) pairs where key=index of closest representa>ve. Value=example.

  • 3. Compute the mean for each set by performing
  • reduceByKey. (most of the summing done locally on each

machine).

  • 4. Broadcast new reps to all machines.
slide-12
SLIDE 12

Clustering stability

slide-13
SLIDE 13

Clustering stability

Clustering using Star>ng points I Clustering using Star>ng points 2 Clustering using Star>ng points 3

slide-14
SLIDE 14

Measuring clustering stability

x1 x2 x3 x4 x5 x6 xn Clustering 1 1 1 3 1 3 2 2 2 3 Clustering 2 2 2 1 2 1 3 3 3 1 Clustering 3 2 2 3 2 3 1 1 1 3 Clustering 4 1 1 1 1 3 3 3 3 1 Entry in row “clustering j”, column “xi” contains the index of the closest representa>ve to xi for clustering j The first three clusterings are completely consistent with each other The fourth clustering has a disagreement in x5

slide-15
SLIDE 15

How to quan>fy stability?

  • We say that a clustering is stable if the

examples are always grouped in the same way.

  • When we have thousands of examples, we

cannot expect all of them to always be grouped the same way.

  • We need a way to quan>fy the stability.
  • Basic idea: measure how much groupings

differ between clusterings.

slide-16
SLIDE 16

Entropy

A partition G of the data defines a distribution over the parts: p1 + p2 +!+ pk = 1 The information in this partition is measured by the Entropy: H(G) = H(p1, p2,…, pk) = pi

i=1 k

log2 1 pi H(G) is a number between 0 (one part with prob. 1) and log2 k (p1 = p2 =!= pk = 1 k)

slide-17
SLIDE 17

Entropy of a combined par>>on

If clustering1 and clustering 2 partition the data in the exact same way then G1 = G2, H(G1,G2) = H(G1) = H(G2) Suppse we produce many clusterings, using many starting points. Suppose we plot H(G1),H(G1,G2),…,H(G1,G2,…,Gi),… As a function of i If the graph increases like ilog2 k then the clustering is completely unstable If the graph stops increasing after some i then we reached stability.

If clustering1 and clustering 2 are independent (partition the data independently from each other). then H(G1,G2) = H(G1)+ H(G2)