large scale clustering through functional
play

Large-Scale Clustering through Functional NCut Embedding Embedding - PowerPoint PPT Presentation

Large-Scale Clustering Ratle, Weston & Miller Introduction Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary Frdric Ratle Jason Weston Matthew L. Miller IGAR - University of


  1. Large-Scale Clustering Ratle, Weston & Miller Introduction Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary Frédéric Ratle ∗ Jason Weston † Matthew L. Miller † ∗ IGAR - University of Lausanne Switzerland † NEC Labs America Princeton NJ - USA ECML PKDD 2008

  2. Large-Scale Clustering A new way of performing data Ratle, Weston & Miller clustering. Introduction NCut Embedding • Dimensionality reduction with direct optimization over Experiments discrete labels. Summary • Joint optimization of embedding and clustering → improved results. • Training by stochastic gradient descent → fast and scalable. • Implementation within a neural network → no out-of-sample problem.

  3. Large-Scale Clustering Clustering - the usual way Ratle, Weston & Miller Introduction Popular clustering algorithms such as spectral clustering are NCut based on a two-stage approach: Embedding Experiments 1 Find a “good” embedding Summary 2 Perform k-means (or a similar variant) Also: • K-means in feature space (e.g. Dhillon et al. 2004) • Margin-based clustering (e.g. Ben-Hur et al. 2001)

  4. Large-Scale Clustering Embedding Algorithms Ratle, Weston & Miller Introduction Many existing embedding algorithms optimize: NCut Embedding U Experiments f i ∈ R d � L ( f ( x i ) , f ( x j ) , W ij ) , min Summary i , j = 1 minimize ( || f i − f j || − W ij ) 2 MDS: ISOMAP: same, but W defined by shortest path on neighborhood graph. ij W ij || f i − f j || 2 Laplacian Eigenmaps: minimize � subject to “balancing constraint”: f ⊤ Df = I and f ⊤ D 1 = 0. Spectral clustering → add k-means on top.

  5. Large-Scale Clustering Siamese Networks: functional Ratle, Weston & Miller embedding Introduction Equivalent to Lap. Eigenmaps but f ( x ) is a NN. NCut Embedding DrLIM [Hadsell et al.,’06 ]: Experiments Summary � || f i − f j || if W ij = 1, L ( f i , f j , W ij ) = max ( 0 , m − || f i − f j || ) 2 if W ij = 0. → neighbors close, others have distance of at least m • Balancing handled by W ij = 0 case → easy optimization • f ( x ) not just a lookup-table → control capacity, add prior knowledge, no out-of-sample problem

  6. Large-Scale Clustering NCut Embedding Ratle, Weston & Miller Introduction • Many approaches exist to learn manifolds with functional NCut models. Embedding • We wish to learn the clustering task directly. Experiments Summary • The main idea is to train a classifier f ( x ) to: • Classify neighbors together. • Classify non-neighbors apart. updated current current updated

  7. Large-Scale Clustering Functional Embedding for Ratle, Weston & Miller Clustering Introduction NCut Embedding Experiments We use a general objective of this type: Summary L ( f i , f j , W ij ) = � � H ( f ( x i ) , c ) Y c ( f ( x i ) , f ( x j ) , W ij ) c ij where H ( · ) is a classification based loss function such as the hinge loss: H ( f ( x ) , y ) = max ( 0 , 1 − yf ( x ))

  8. Large-Scale Clustering 2-class clustering Ratle, Weston & Miller Y c ( f ( x i ) , f ( x j ) , W ij ) encodes the weight to assign to point i being in Introduction cluster c . NCut Embedding Experiments It can be expressed as follows: Summary if sign ( f i + f j ) = c and W ij = 1  η (+)   Y c ( f i , f j , W ij ) = if sign ( f j ) = c and W ij = 0 − η ( − )  0 otherwise.  Optimization by stochastic gradient descent: w t + 1 ← w t + ∇ L ( f i , f j , 1 )

  9. Large-Scale Clustering NCut Embedding Algorithm. Ratle, Weston & Miller Introduction NCut Input: unlabeled data x ∗ i , and matrix W Embedding Experiments repeat Summary Pick a random pair of neighbors x ∗ i , x ∗ j . Select the class c i = sign ( f i + f j ) if BalancingConstraint( c i ) then Gradient step for L ( x ∗ i , x ∗ j , 1 ) end if Pick a random pair x ∗ i , x ∗ k . Gradient step for L ( x ∗ i , x ∗ k , 0 ) until stopping criterion

  10. Large-Scale Clustering Balancing constraint - 2 class Ratle, Weston & Miller Introduction NCut Embedding Balancing constraints prevent the solution from getting trapped. Experiments Many possible ways: Summary 1 “Hard” constraint • Keep a list of the N last predictions in memory. • Ignore examples of class c i if seen ( c i ) > N 2 + ξ 2 “Soft” constraint • Weigh the learning rate for each class. • η = η 0 seen ( c i )

  11. Large-Scale Clustering Multiclass algorithm. Ratle, Weston & Miller Introduction Two different flavours: MAX and ALL. NCut Embedding 1 MAX approach Experiments Select class c i , with i = argmax ( max ( f i ) , max ( f j )) Summary 2 ALL approach: one learning rate per class � if W ij = 1 η c Y c ( f i , f j , W ij ) = 0 otherwise where η c ← η (+) f c ( x i ) We use balancing constraints similar to those for 2-class clustering.

  12. Large-Scale Clustering Small-scale datasets. Ratle, Weston & Miller Introduction NCut Embedding data set classes dims points Experiments g50c 2 50 550 Summary text 2 7511 1946 bcw 2 9 569 ellips 4 50 1064 glass 6 10 214 usps 10 256 2007 Table: Small-scale datasets used throughout the experiments.

  13. Large-Scale Clustering 2-class experiments. Ratle, Weston & Miller Clustering error: Introduction NCut Embedding bcw g50c text k -means Experiments 3.89 4.64 7.26 Summary 6.73 spectral-rbf 3.94 5.56 spectral-knn 3.60 6.02 12.9 NCutEmb h 3.63 4.59 7.03 NCutEmb s 3.15 4.41 7.89 Out-of-sample error: k -means 6.06 4.22 8.75 NCutEmb h 3.21 6.06 7.68 NCutEmb s 7.38 3.64 6.36

  14. Large-Scale Clustering Multiclass experiments. Ratle, Weston & Miller Clustering error: Introduction NCut Embedding ellips glass usps k -means Experiments 20.29 25.71 30.34 Summary spectral-rbf 10.16 39.30 32.93 2.51 spectral-knn 40.64 33.82 NCutEmb max 24.58 4.76 19.36 NCutEmb all 19.05 2.75 24.91 Out-of-sample error: k -means 20.85 28.52 29.44 NCutEmb max 5.11 25.16 20.80 NCutEmb all 2.88 24.96 17.31

  15. Large-Scale Clustering MNIST experiments Ratle, Weston & Miller Introduction NCut Embedding Experiments Summary

  16. Large-Scale Clustering Clustering MNIST. Ratle, Weston & Miller Introduction # clusters method train test NCut k -means 50 18.46 17.70 Embedding NCutEmb max 13.82 14.23 Experiments NCutEmb all Summary 18.67 18.37 k -means 20 29.00 28.03 NCutEmb max 20.12 23.43 NCutEmb all 17.64 21.90 k -means 10 40.98 39.89 NCutEmb max 21.93 24.37 NCutEmb all 24.10 24.90 Table: Clustering the MNIST database (60k train, 10k test). A one-hidden layer network has been used.

  17. Large-Scale Clustering Training on Pairs? Ratle, Weston & Miller Introduction NCut Embedding • k -nn Experiments Summary • OK for small datasets. • Very slow otherwise, but many methods to speed it up. • Sequences • video: frames t & t + 1 → same label • audio: consecutive audio frames → same speaker • text: two words close in text → same topic • web: link information

  18. Large-Scale Clustering Summary Ratle, Weston & Miller Introduction • The joint optimization of clustering and embedding NCut provides better results - or at least similar - to existing Embedding clustering methods. Experiments Summary • Functional embedding allows fast training and avoids out-of-sample problem. • Neural nets provide a scalable and flexible framework to perform clustering.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend