learning from data lecture 19 a peek at unsupervised
play

Learning From Data Lecture 19 A Peek At Unsupervised Learning k - PowerPoint PPT Presentation

Learning From Data Lecture 19 A Peek At Unsupervised Learning k -Means Clustering Probability Density Estimation Gaussian Mixture Models M. Magdon-Ismail CSCI 4100/6100 recap: Radial Basis Functions Nonparametric RBF Parametric k -RBF-Network


  1. Learning From Data Lecture 19 A Peek At Unsupervised Learning k -Means Clustering Probability Density Estimation Gaussian Mixture Models M. Magdon-Ismail CSCI 4100/6100

  2. recap: Radial Basis Functions Nonparametric RBF Parametric k -RBF-Network N � � k α n ( x ) � | | x − µ j | | � � � g ( x ) = · y n h ( x ) = w 0 + w j · φ � N r m =1 α m ( x ) n =1 j =1 = w t Φ ( x ) � � | | x − x n | | (bump on µ j ) α n ( x ) = φ (bump on x ) r linear model given µ j choose µ j as centers of k -clusters of data r = 0 . 05 y y x x k = 4 , r = 1 k = 10, regularized No Training k M Unsupervised Learning : 2 /23 � A c L Creator: Malik Magdon-Ismail Unsupervised learning − →

  3. Unsupervised Learning • Preprocessor to organize the data for supervised learning: Organize data for faster nearest neighbor search Determine centers for RBF bumps. • Important to be able to organize the data to identify patterns. Learn the patterns in data, e.g. the patterns in a language before getting into a supervised setting. amazon.com organizes books into categories M Unsupervised Learning : 3 /23 � A c L Creator: Malik Magdon-Ismail Clustering digits − →

  4. Clustering Digits 21-NN rule, 10 Classes 10 Clustering of Data 1 Symmetry 4 0 9 8 3 7 2 6 Average Intensity M Unsupervised Learning : 4 /23 � A c L Creator: Malik Magdon-Ismail Clustering − →

  5. Clustering A cluster is a collection of points S A k -clustering is a partition of the data into k clusters S 1 , . . . , S k . ∪ k j =1 S j = D S i ∩ S j = ∅ for i � = j Each cluster has a center µ j M Unsupervised Learning : 5 /23 � A c L Creator: Malik Magdon-Ismail k -means error − →

  6. How good is a clustering? Points in a cluster should be similar (close to each other, and the center) Error in cluster j : � | 2 . | | x n − µ j | E j = x n ∈ S j k -Means Clustering Error: k � E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) = E j j =1 N | 2 � | | x n − µ ( x n ) | = n =1 µ ( x n ) is the center of the cluster to which x n belongs. M Unsupervised Learning : 6 /23 � A c L Creator: Malik Magdon-Ismail − →

  7. k -Means Clustering You get to pick S 1 , . . . , S k and µ 1 , . . . , µ k to minimize E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) If centers µ j are known, picking the sets is easy: Add to S j all points closest to µ j If the clusters S j are known, picking the centers is easy: Center µ j is the centroid of cluster S j 1 � µ j = x n | S j | x n ∈ S j M Unsupervised Learning : 7 /23 � A c L Creator: Malik Magdon-Ismail Lloyd’s algorithm − →

  8. Lloyd’s Algorithm for k -Means Clustering N � | 2 E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) = | | x n − µ ( x n ) | n =1 1: Initialize Pick well separated centers µ j . 2: Update S j to be all points closest µ j . S j ← { x n : | | x n − µ j | | ≤ | | x n − µ ℓ | | for ℓ = 1 , . . . , k } . 3: Update µ j to the centroid of S j . 1 � µ j ← x n | S j | x n ∈ S j 4: Repeat steps 2 and 3 until E in stops decreasing. M Unsupervised Learning : 8 /23 � A c L Creator: Malik Magdon-Ismail Update clusters − →

  9. Lloyd’s Algorithm for k -Means Clustering N � | 2 E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) = | | x n − µ ( x n ) | n =1 1: Initialize Pick well separated centers µ j . 2: Update S j to be all points closest µ j . S j ← { x n : | | x n − µ j | | ≤ | | x n − µ ℓ | | for ℓ = 1 , . . . , k } . 3: Update µ j to the centroid of S j . 1 � µ j ← x n | S j | x n ∈ S j 4: Repeat steps 2 and 3 until E in stops decreasing. M Unsupervised Learning : 9 /23 � A c L Creator: Malik Magdon-Ismail Update centers − →

  10. Lloyd’s Algorithm for k -Means Clustering N � | 2 E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) = | | x n − µ ( x n ) | n =1 1: Initialize Pick well separated centers µ j . 2: Update S j to be all points closest µ j . S j ← { x n : | | x n − µ j | | ≤ | | x n − µ ℓ | | for ℓ = 1 , . . . , k } . 3: Update µ j to the centroid of S j . 1 � µ j ← x n | S j | x n ∈ S j 4: Repeat steps 2 and 3 until E in stops decreasing. M Unsupervised Learning : 10 /23 � A c L Creator: Malik Magdon-Ismail Application to RBF-Network − →

  11. Application to k -RBF-Network 10-center RBF-network 300-center RBF-network Choosing k - knowledge of problem (10 digits) or CV. M Unsupervised Learning : 11 /23 � A c L Creator: Malik Magdon-Ismail Probability density estimation − →

  12. Probability Density Estimation P ( x ) P ( x ) measures how likely it is to generate inputs similar to x . Estimating P ( x ) results in a ‘softer/finer’ representation than clustering Clusters are regions of high probability. M Unsupervised Learning : 12 /23 � A c L Creator: Malik Magdon-Ismail Parzen windows − →

  13. Parzen Windows – RBF density estimation Basic idea: put a bump of ‘size’ (volume) 1 N on each data point. P ( x ) x N � | | x − x i | | 1 � ˆ � P ( x ) = φ Nr d r i =1 (2 π ) d/ 2 e − 1 1 2 z 2 φ ( z ) = M Unsupervised Learning : 13 /23 � A c L Creator: Malik Magdon-Ismail Digits data − →

  14. Digits Data RBF Density Estimate Density Contours y y x x M Unsupervised Learning : 14 /23 � A c L Creator: Malik Magdon-Ismail GMM − →

  15. The Gaussian Mixture Model (GMM) Instead of N bumps − → k ≪ N bumps. (Similar to nonparametric RBF − → parametric k -RBF-network) Instead of uniform spherical bumps − → each bump has its own shape. Bump centers: µ 1 , . . . , µ k Bump shapes: Σ 1 , . . . , Σ k Gaussian formula for the bump: 1 (2 π ) d/ 2 | Σ j | 1 / 2 e − 1 2 ( x − µ j ) t Σ j − 1 ( x − µ j ) . N ( x ; µ j , Σ j ) = M Unsupervised Learning : 15 /23 � A c L Creator: Malik Magdon-Ismail GMM formula − →

  16. GMM Density Estimate (2 π ) d/ 2 | Σ j | 1 / 2 e − 1 1 2 ( x − µ j ) t Σ j − 1 ( x − µ j ) . N ( x ; µ j , Σ j ) = k ˆ � w j N ( x ; µ j , Σ j ) P ( x ) = j =1 (Sum of k weighted bumps). k � w j > 0 , w j = 1 j =1 You get to pick { w j , µ j , Σ j } j =1 ,...,k M Unsupervised Learning : 16 /23 � A c L Creator: Malik Magdon-Ismail Maximum likelihood − →

  17. Maximize Likelihood Estimation Pick { w j , µ j , Σ j } j =1 ,...,k to best explain the data. Maximize the likelihood of the data given { w j , µ j , Σ j } j =1 ,...,k (We saw this when we derived the cross entropy error for logistic regression) M Unsupervised Learning : 17 /23 � A c L Creator: Malik Magdon-Ismail E-M algorithm − →

  18. Expectation-Maximization: The E-M Algorithm A simple algorithm to get to the local minimum of the likelihood. Partition variables into two sets. Given one-set, you can estimate the other ‘Bootstrap’ your way to a decent solution. Lloyd’s algorithm for k -means is an example for ‘hard clustering’ M Unsupervised Learning : 18 /23 � A c L Creator: Malik Magdon-Ismail γ nj − →

  19. Bump Memberships Fraction of x n belonging to bump j (a ‘hidden variable’) γ nj N � N j = γ nj (‘number’ of points in bump j ) n =1 w j = N j (probability bump j ) N N 1 � µ j = γ nj x n (centroid of bump j ) N j n =1 N 1 � γ nj x n x t n − µ j µ t Σ j = (covariance matrix of bump j ) j N j n =1 M Unsupervised Learning : 19 /23 � A c L Creator: Malik Magdon-Ismail Parameters given γ nj − →

  20. Bump Memberships Fraction of x n belonging to bump j (a ‘hidden variable’) γ nj N � N j = γ nj (‘number’ of points in bump j ) n =1 w j = N j (probability bump j ) N N 1 � µ j = γ nj x n (centroid of bump j ) N j n =1 N 1 � γ nj x n x t n − µ j µ t Σ j = (covariance matrix of bump j ) j N j n =1 M Unsupervised Learning : 20 /23 � A c L Creator: Malik Magdon-Ismail Restimating γ nj − →

  21. Re-Estimating Bump Memberships w j N ( x n ; µ j , Σ j ) γ nj = � k ℓ =1 w ℓ N ( x n ; µ ℓ , Σ ℓ ) γ nj is the probability that x n came from bump j probability of bump j : w j probability density for x n given bump j : N ( x n ; µ j , Σ j ) M Unsupervised Learning : 21 /23 � A c L Creator: Malik Magdon-Ismail E-M Algorithm − →

  22. E-M Algorithm E-M Algorithm for GMMs: 1: Start with estimates for the bump membership γ nj . 2: Estimate w j , µ j , Σ j given the bump memberships. 3: Update the bump memberships given w j , µ j , Σ j ; 4: Iterate to step 2 until convergence. M Unsupervised Learning : 22 /23 � A c L Creator: Malik Magdon-Ismail GMM on digits − →

  23. GMM on Digits Data 10-center GMM Density Contours y y x x M Unsupervised Learning : 23 /23 � A c L Creator: Malik Magdon-Ismail

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend