certifying the global optimality of graph cuts via
play

Certifying the Global Optimality of Graph Cuts via Semidefinite - PowerPoint PPT Presentation

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint work with Prof. Thomas Strohmer at UC


  1. Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint work with Prof. Thomas Strohmer at UC Davis Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 1 / 43

  2. Outline Motivation: data clustering K -means and spectral clustering A graph cut perspective of spectral clustering Convex relaxation of ratio cuts and normalized cuts Theory and applications Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 2 / 43

  3. Data clustering and unsupervised learning Question : Given a set of N data points in R d , how to partition them into k clusters based on the similarity? Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 3 / 43

  4. K -means clustering K -means Cluster the data by minimizing the k -means objective function: centroid � �� � k � � 1 2 � � � � � min � x i − x i � | Γ l | { Γ l } k l =1 l =1 i ∈ Γ l i ∈ Γ l � �� � within-cluster sum of squares where { Γ l } k l =1 is a partition of { 1 , · · · , N } . Widely used in vector quantization, unsupervised learning, Voronoi tessellation, etc. An NP-hard problem, even if d = 2. [Mahajan, etc 09] Heuristic method: Lloyd’s algorithm [Lloyd 82] Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 4 / 43

  5. Limitation of k -means Limitation of k -means K -means only works for datasets with individual clusters: isotropic and within convex boundaries well-separated Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 5 / 43

  6. Kernel k -means and nonlinear embedding Goal: map the data into a feature space so that they are well-separated and k -means would work. ϕ : nonlinear map − − − − − − − − − − → How: locally-linear embedding, isomap, multidimensional scaling, Laplacian eigenmaps, diffusion maps, etc. Focus: We will focus on Laplacian eigenmaps. Spectral clustering consists of Laplacian eigenmaps followed by k -means clustering. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 6 / 43

  7. Graph Laplacian i =1 ∈ R d and construct a similarity (weight) matrix W via Suppose { x i } N � � −� x i − x j � 2 W ∈ R N × N , w ij := exp , 2 σ 2 where σ controls the size of neighborhood. In fact, W represents a weighted undirected graph. Definition of graph Laplacian The (unnormalized) graph Laplacian associated to W is L = D − W where D = diag( W 1 N ) is the degree matrix. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 7 / 43

  8. Properties of graph Laplacian The (unnormalized) graph Laplacian associated to W is L = D − W , D = diag( W 1 N ) . Properties L is positive semidefinite, � z ⊤ Lz = w ij ( z i − z j ) 2 . i < j 1 N is in the null space of L , i.e., λ 1 ( L ) = 0 . λ 2 ( L ) > 0 if and only if the graph is connected. The dimension of null space equals the number of connected components. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 8 / 43

  9. Laplacian eigenmaps and k -means Laplacian eigenmaps For the graph Laplacian L , we let the Laplacian eigenmap be   ϕ ( x 1 ) .  .  ∈ R N × k  := [ u 1 , · · · , u k ] .  � �� � ϕ ( x N ) U where { u l } k l =1 are the eigenvectors w.r.t. the k smallest eigenvalues. In other words, ϕ maps data in R d to R k , a coordinate in terms of eigenvectors: ϕ : x i − → ϕ ( x i ) . ���� � �� � R d R k Then we apply k -means to { ϕ ( x i ) } N i =1 to perform clustering. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 9 / 43

  10. Algorithm of spectral clustering based on graph Laplacian Unnormalized spectral clustering 1 Input: Given the number of clusters k and a dataset { x i } N i =1 , construct the similarity matrix W from { x i } N i =1 . Compute the unnormalized graph Laplacian L = D − W Compute the eigenvectors { u l } k l =1 of L w.r.t. the smallest k eigenvalues. Let U = [ u 1 , u 2 , · · · , u k ] ∈ R N × k . Perform k -means clustering on the rows of U by using Lloyd’s algorithm. Obtain the partition based on the outcome of k -means. 1 For more details, see an excellent review by [Von Luxburg, 2007] Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 10 / 43

  11. Another variant of spectral clustering Normalized spectral clustering Input: Given the number of clusters k and a dataset { x i } N i =1 , construct the similarity matrix W from { x i } N i =1 . Compute the normalized graph Laplacian L sym = I N − D − 1 2 WD − 1 2 = D − 1 2 LD − 1 2 Compute the eigenvectors { u l } k l =1 of L sym w.r.t. the smallest k eigenvalues. Let U = [ u 1 , u 2 , · · · , u k ] ∈ R N × k . Perform k -means clustering on the rows of D − 1 2 U by using Lloyd’s algorithm. Obtain the partition based on the outcome of k -means. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 11 / 43

  12. Comments on spectral clustering Pros and Cons of spectral clustering Pros: Spectral clustering enjoys high popularity and conveniently applies to various settings. Rich connections to random walk on graph, spectral graph theory, and differential geometry. Cons: Rigorous justifications of spectral clustering are still lacking. The two-step procedures complicate the analysis, e.g. how to analyze the performance of Laplacian eigenmaps and the convergence analysis of k -means? Our goal: we take a different route by looking at the convex relaxation of spectral clustering to understand its performance better. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 12 / 43

  13. A graph cut perspective Key observation: The matrix W is viewed as a weight matrix of a graph with N vertices. Partitioning the dataset into k clusters is equivalent to finding a k -way graph cut such that any pair of induced subgraphs is not well-connected. Graph cut The cut is defined as the weight sum of edges whose two ends are in different subsets, � cut(Γ , Γ c ) := w ij i ∈ Γ , j ∈ Γ c where Γ is a subset of vertices and Γ c is its complement. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 13 / 43

  14. A graph cut perspective Graph cut The cut is defined as the weight sum of edges whose two ends are in different subsets, � cut(Γ , Γ c ) := w ij i ∈ Γ , j ∈ Γ c where Γ is a subset of vertices and Γ c is its complement. However, minimizing cut(Γ , Γ c ) may not lead to satisfactory results since it is more likely to get an imbalanced cut. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 14 / 43

  15. RatioCut RatioCut The ratio cut of { Γ a } k a =1 is given by k cut(Γ a , Γ c a ) � RatioCut( { Γ a } k a =1 ) = . | Γ a | a =1 In particular, if k = 2, RatioCut(Γ , Γ c ) = cut(Γ , Γ c ) + cut(Γ , Γ c ) . | Γ | | Γ c | But, it is worth noting minimizing RatioCut is NP-hard. A possible solution is to relax!! Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 15 / 43

  16. RatioCut and graph Laplacian Let 1 Γ a ( · ) be an indicator vector which maps a vertex to a vector in R N via � 1 , l ∈ Γ a , 1 Γ a ( l ) = 0 , l / ∈ Γ a . Relating RatioCut to graph Laplacian There holds � � cut(Γ a , Γ c L , 1 Γ a 1 ⊤ = 1 ⊤ a ) = Γ a L 1 Γ a Γ a k � � 1 � RatioCut( { Γ a } k L , 1 Γ a 1 ⊤ a =1 ) = = � L , X rcut � , Γ a | Γ a | a =1 where k 1 � | Γ a | 1 Γ a 1 ⊤ X rcut := Γ a ← − a block-diagonal matrix a =1 Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 16 / 43

  17. Spectral relaxation of RatioCut Spectral clustering is a relaxation by these two properties, X rcut = UU ⊤ , U ⊤ U = I k , U ∈ R N × k . Spectral relaxation of RatioCut Substituting X rcut = UU ⊤ results in � L , UU ⊤ � U ⊤ U = I k , min s.t. U ∈ R N × k whose global minimizer is easily found via computing the eigenvectors w.r.t. the k smallest eigenvalues of the graph Laplacian L . The spectral relaxation gives exactly the first step of unnormalized spectral clustering. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 17 / 43

  18. Normalized cut For normalized spectral clustering, we consider k cut(Γ a , Γ c a ) � NCut( { Γ a } k a =1 ) := vol(Γ a ) a =1 where vol(Γ a ) = 1 ⊤ Γ a D 1 Γ a . Therefore, NCut( { Γ a } k a =1 ) = � L sym , X ncut � . Here L sym = D − 1 2 LD − 1 2 is the normalized Laplacian and k 1 � 1 1 2 1 Γ a 1 ⊤ 2 . X ncut := D Γ a D 1 ⊤ Γ a D 1 Γ a a =1 By relaxing X ncut = UU ⊤ , it gives the spectral relaxation of normalized graph Laplacian. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 18 / 43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend