Clustering and Dimensionality Reduction Stony Brook University - - PowerPoint PPT Presentation

clustering and dimensionality reduction
SMART_READER_LITE
LIVE PREVIEW

Clustering and Dimensionality Reduction Stony Brook University - - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised Supervised


slide-1
SLIDE 1

Clustering and Dimensionality Reduction

Stony Brook University CSE545, Fall 2017

slide-2
SLIDE 2

Goal: Generalize to new data

Original Data New Data?

Does the model accurately reflect new data? Model

slide-3
SLIDE 3

Supervised vs. Unsupervised

Supervised

  • Predicting an outcome:
  • Loss function used to characterize quality of prediction
slide-4
SLIDE 4

Supervised vs. Unsupervised

Supervised

  • Predicting an outcome:
  • Loss function used to characterize quality of prediction

Expected value of y (something we are trying to predict) based on X (our features or “evidence” for what y should be)

slide-5
SLIDE 5

Supervised vs. Unsupervised

Supervised

  • Predicting an outcome
  • Loss function used to characterize quality of prediction

Unsupervised

  • No outcome to predict
  • Goal: Infer properties of without a supervised loss function.
  • Often larger data.
  • Don’t need to worry about conditioning on another variable.
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

Concept, In Matrix Form:

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

columns: p features rows: N observations

slide-9
SLIDE 9

Concept, In Matrix Form:

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N
slide-10
SLIDE 10

Concept, In Matrix Form:

c1, c2, c3, c4, … cp’

  • 1
  • 2
  • 3

  • N

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

Try to best represent but with on p’ columns.

Dimensionality reduction

slide-11
SLIDE 11

Concept, In Matrix Form:

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

Cluster 1 Cluster 2 Cluster 3

Clustering: Group observations based

  • n the features (i.e. like reducing the

number of observations into K groups).

slide-12
SLIDE 12

Concept: in 2-d (clustering)

Feature 1 Feature 2

each point is an observation

slide-13
SLIDE 13

Concept: in 2-d (clustering)

Feature 1 Feature 2

slide-14
SLIDE 14

Clustering

Typical formalization: Given:

  • set of points
  • distance metric (Euclidean, cosine, etc…)
  • number of clusters (not always provided)

Do: Group observations together that are similar. Ideally,

  • Members of same cluster are the “same”.
  • Members of different clusters are “different”.

Keep in mind: usually many more than 2 dimensions.

slide-15
SLIDE 15

Often many dimensions and no clean separation.

Clustering

slide-16
SLIDE 16

Often many dimensions and no clean separation.

Clustering

Supposes

  • bservations have a

“true” cluster.

slide-17
SLIDE 17

K-Means Clustering

Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model).

Euclidean Distance:

slide-18
SLIDE 18

K-Means Clustering

Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model).

Euclidean Distance: centers = a random selection of k cluster centers until centers converge:

  • 1. For all xi, find the closest center (according to d)
  • 2. Recalculate centers based on mean of euclidean distance
slide-19
SLIDE 19

K-Means Clustering

Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model).

Euclidean Distance: centers = a random selection of k cluster centers until centers converge:

  • 1. For all xi, find the closest center (according to d)
  • 2. Recalculate centers based on mean of euclidean distance

Example: http://shabal.in/visuals/kmeans/6.html

slide-20
SLIDE 20

K-Means Clustering

Understanding K-Means

(source: Scikit-Learn)

slide-21
SLIDE 21

The Curse of Dimensionality

Problems with high-dimensional spaces:

  • 1. All points (i.e. observations) are nearly equally far apart.
  • 2. The angle between vectors are almost always 90 degrees

(i.e. they are orthogonal).

slide-22
SLIDE 22

Hierarchical Clustering

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

Cluster 1 Cluster 2 Cluster 3 Cluster 4

slide-23
SLIDE 23

Hierarchical Clustering

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6

slide-24
SLIDE 24

Hierarchical Clustering

  • Agglomerative (bottom up):

○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one

  • Divisive (top down):

○ Start with one cluster and recursively split it

slide-25
SLIDE 25

Hierarchical Clustering

  • Agglomerative (bottom up):

○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one

  • Divisive (top down):

○ Start with one cluster and recursively split it

  • Regular K-Means is

“Point assignment clustering”:

○ Maintain a set of clusters ○ Points belong to “nearest” cluster

slide-26
SLIDE 26

Hierarchical Clustering

  • Agglomerative (bottom up):

○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one

slide-27
SLIDE 27

Hierarchical Clustering

  • Agglomerative (bottom up):

○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ○ Stop when reaching a threshold in ■ Distance between points in cluster, or ■ Maximum distance of points from “center” ■ Maximum number of points

slide-28
SLIDE 28

Hierarchical Clustering

  • Agglomerative (bottom up):

○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ○ Stop when reaching a threshold in ■ Distance between points in cluster, or ■ Maximum distance from “center” ■ Maximum number of points In Euclidean space

slide-29
SLIDE 29

Hierarchical Clustering

  • Agglomerative (bottom up):

○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one But what if we have no “centroid”? (such as when using cosine distance)

slide-30
SLIDE 30

Clustering: Applications

slide-31
SLIDE 31

Clustering: Applications

slide-32
SLIDE 32

Clustering: Applications

slide-33
SLIDE 33

Clustering: Applications

(musicmachinery.com)

slide-34
SLIDE 34

Clustering: Applications

(musicmachinery.com)

slide-35
SLIDE 35

Concept: Dimensionality Reduction in 3-D, 2-D, and 1-D

Data (or, at least, what we want from the data) may be accurately represented with less dimensions.

slide-36
SLIDE 36

Concept, In Matrix Form:

c1, c2, c3, c4, … cp’

  • 1
  • 2
  • 3

  • N

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

Try to best represent but with on p’ columns.

Dimensionality reduction

slide-37
SLIDE 37

Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns through addition). Q: What is the rank of this matrix?

Dimensionality Reduction

1

  • 2

3 2

  • 3

5 1 1

slide-38
SLIDE 38

Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns). Q: What is the rank of this matrix? A: 2. The 1st is just the sum of the second two columns … we can represent as linear combination of 2 vectors:

Dimensionality Reduction

1

  • 2

3 2

  • 3

5 1 1 1

  • 2

2

  • 3

1 1

slide-39
SLIDE 39

Dimensionality Reduction - PCA

Linear approximates of data in r dimensions. Found via Singular Value Decomposition:

X[nxp] = U[nxr] D[rxr] V[pxr]

T

X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors”

slide-40
SLIDE 40

Dimensionality Reduction - PCA

Linear approximates of data in r dimensions. Found via Singular Value Decomposition:

X[nxp] = U[nxr] D[rxr] V[pxr]

T

X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors”

X

n n p p

slide-41
SLIDE 41

Dimensionality Reduction - PCA - Example

X[nxp] = U[nxr] D[rxr] V[pxr]

T

Users to movies matrix

slide-42
SLIDE 42

Dimensionality Reduction - PCA - Example

X[nxp] = U[nxr] D[rxr] V[pxr]

T

slide-43
SLIDE 43

Dimensionality Reduction - PCA - Example

X[mxn] = U[mxr] D[rxr] VT

[nxr]

slide-44
SLIDE 44

Dimensionality Reduction - PCA - Example

X[mxn] = U[mxr] D[rxr] VT

[nxr]

V =

slide-45
SLIDE 45

Dimensionality Reduction - PCA - Example

X[mxn] = U[mxr] D[rxr] VT

[nxr]

(UD)T =

slide-46
SLIDE 46

Dimensionality Reduction - PCA

Linear approximates of data in r dimensions. Found via Singular Value Decomposition:

X[nxp] = U[nxr] D[rxr] V[pxr]

T

X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” Projection (dimensionality reduced space) in 3 dimensions: (U[nx3] D[3x3] V[px3]

T)

To reduce features in new dataset: Xnew V = Xnew_small

slide-47
SLIDE 47

Dimensionality Reduction - PCA

Linear approximates of data in r dimensions. Found via Singular Value Decomposition:

X[nxp] = U[nxr] D[rxr] V[pxr]

T

U, D, and V are unique D: always positive

slide-48
SLIDE 48

Dimensionality Reduction v. Clustering

Clustering: Group n observations into k clusters Soft Clustering: Assign observations to k clusters with some weight or probability. Dimensionality Reduction: Assign m features to p components with some weight

  • r probability.
slide-49
SLIDE 49

Dimensionality Reduction v. Clustering

Clustering: Group n observations into k clusters Soft Clustering: Assign observations to k clusters with some weight or probability. Dimensionality Reduction: Assign m features to p components with some weight

  • r probability.

Can often use one to do the other with one extra step. Examples

  • From Dimensionality Reduction to Clusters:

○ Use U instead of a V from SVD = mapping observations to soft clusters ○ Project based on V, Apply a threshold on U = mapping observations to clusters ○ Threshold v (or use sparse PCA) = soft clustering of features

  • From Clusters to Dimensionality Reduction:

○ Use soft cluster ids as features