clustering and dimensionality reduction preview
play

Clustering and Dimensionality Reduction Preview Clustering K - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction Preview Clustering K -means clustering Mixture models Hierarchical clustering Dimensionality reduction Principal component analysis Multidimensional scaling Isomap


  1. Clustering and Dimensionality Reduction

  2. Preview • Clustering – K -means clustering – Mixture models – Hierarchical clustering • Dimensionality reduction – Principal component analysis – Multidimensional scaling – Isomap

  3. Unsupervised Learning • Problem: Too much data! • Solution: Reduce it • Clustering: Reduce number of examples • Dimensionality reduction: Reduce number of dimensions

  4. Clustering • Given set of examples • Divide them into subsets of “similar” examples • How to measure similarity? • How to evaluate quality of results?

  5. K -Means Clustering • Pick random examples as initial means • Repeat until convergence: – Assign each example to nearest mean – New mean = Average of examples assigned to it

  6. K -Means Works If . . . • Clusters are spherical • Clusters are well separated • Clusters are of similar volumes • Clusters have similar numbers of points

  7. Mixture Models n c � P ( x ) = P ( c i ) P ( x | c i ) i =1 Objective function: Log likelihood of data Naive Bayes: P ( x | c i ) = � n d j =1 P ( x j | c i ) AutoClass: Naive Bayes with various x j models Mixture of Gaussians: P ( x | c i ) = Multivariate Gaussian In general: P ( x | c i ) can be any distribution

  8. Mixtures of Gaussians p(x) x � � 2 � 1 − 1 � x − µ i P ( x | µ i ) = √ 2 πσ 2 exp 2 σ

  9. The EM Algorithm Initialize parameters ignoring missing information Repeat until convergence: E step: Compute expected values of unobserved variables, assuming current parameter values M step: Compute new parameter values to maximize probability of data (observed & estimated) (Also: Initialize expected values ignoring missing info)

  10. EM for Mixtures of Gaussians Initialization: Choose means at random, etc. E step: For all examples x k : P ( µ i | x k ) = P ( µ i ) P ( x k | µ i ) P ( µ i ) P ( x k | µ i ) = � P ( x k ) i ′ P ( µ i ′ ) P ( x k | µ i ′ ) M step: For all components c i : n e 1 � P ( c i ) = P ( µ i | x k ) n e k =1 � n e k =1 x k P ( µ i | x k ) = µ i � n e k =1 P ( µ i | x k ) k =1 ( x k − µ i ) 2 P ( µ i | x k ) � n e σ 2 = i � n e k =1 P ( µ i | x k )

  11. Mixtures of Gaussians (cont.) • K-means clustering ≺ EM for mixtures of Gaussians • Mixtures of Gaussians ≺ Bayes nets • Also good for estimating joint distribution of continuous variables

  12. Hierarchical Clustering • Agglomerative clustering – Start with one cluster per example – Merge two nearest clusters (Criteria: min, max, avg, mean distance) – Repeat until all one cluster – Output dendrogram • Divisive clustering – Start with all in one cluster – Split into two (e.g., by min-cut) – Etc.

  13. Dimensionality Reduction • Given data points in d dimensions • Convert them to data points in r < d dimensions • With minimal loss of information

  14. Principal Component Analysis Goal: Find r -dim projection that best preserves variance 1. Compute mean vector µ and covariance matrix Σ of original points 2. Compute eigenvectors and eigenvalues of Σ 3. Select top r eigenvectors 4. Project points onto subspace spanned by them: y = A ( x − µ ) where y is the new point, x is the old one, and the rows of A are the eigenvectors

  15. Multidimensional Scaling Goal: Find projection that best preserves inter-point distances Point in d dimensions x i Corresponding point in r < d dimensions y i δ ij Distance between x i and x j d ij Distance between y i and y j � 2 � d ij − δ ij � • Define (e.g.) E ( y ) = δ ij i,j • Find y i ’s that minimize E by gradient descent • Invariant to translations, rotations and scalings

  16. Isomap Goal: Find projection onto nonlinear manifold 1. Construct neighborhood graph G : For all x i , x j If distance( x i , x j ) < ǫ Then add edge ( x i , x j ) to G 2. Compute shortest distances along graph δ G ( x i , x j ) (e.g., by Floyd’s algorithm) 3. Apply multidimensional scaling to δ G ( x i , x j )

  17. Summary • Clustering – K -means clustering – Mixture models – Hierarchical clustering • Dimensionality reduction – Principal component analysis – Multidimensional scaling – Isomap

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend