Certifying the Global Optimality of Graph Cuts via Semidefinite - PowerPoint PPT Presentation

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint work with Prof. Thomas Strohmer at UC Davis Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 1 / 43

Outline Motivation: data clustering K -means and spectral clustering A graph cut perspective of spectral clustering Convex relaxation of ratio cuts and normalized cuts Theory and applications Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 2 / 43

Data clustering and unsupervised learning Question : Given a set of N data points in R d , how to partition them into k clusters based on the similarity? Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 3 / 43

K -means clustering K -means Cluster the data by minimizing the k -means objective function: centroid � �� k � � 1 2 � � � � � min � x i − x i � | Γ l | { Γ l } k l =1 l =1 i ∈ Γ l i ∈ Γ l � �� within-cluster sum of squares where { Γ l } k l =1 is a partition of { 1 , · · · , N } . Widely used in vector quantization, unsupervised learning, Voronoi tessellation, etc. An NP-hard problem, even if d = 2. [Mahajan, etc 09] Heuristic method: Lloyd’s algorithm [Lloyd 82] Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 4 / 43

Limitation of k -means Limitation of k -means K -means only works for datasets with individual clusters: isotropic and within convex boundaries well-separated Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 5 / 43

Kernel k -means and nonlinear embedding Goal: map the data into a feature space so that they are well-separated and k -means would work. ϕ : nonlinear map − − − − − − − − − − → How: locally-linear embedding, isomap, multidimensional scaling, Laplacian eigenmaps, diffusion maps, etc. Focus: We will focus on Laplacian eigenmaps. Spectral clustering consists of Laplacian eigenmaps followed by k -means clustering. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 6 / 43

Graph Laplacian i =1 ∈ R d and construct a similarity (weight) matrix W via Suppose { x i } N � � −� x i − x j � 2 W ∈ R N × N , w ij := exp , 2 σ 2 where σ controls the size of neighborhood. In fact, W represents a weighted undirected graph. Definition of graph Laplacian The (unnormalized) graph Laplacian associated to W is L = D − W where D = diag( W 1 N ) is the degree matrix. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 7 / 43

Properties of graph Laplacian The (unnormalized) graph Laplacian associated to W is L = D − W , D = diag( W 1 N ) . Properties L is positive semidefinite, � z ⊤ Lz = w ij ( z i − z j ) 2 . i < j 1 N is in the null space of L , i.e., λ 1 ( L ) = 0 . λ 2 ( L ) > 0 if and only if the graph is connected. The dimension of null space equals the number of connected components. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 8 / 43

Laplacian eigenmaps and k -means Laplacian eigenmaps For the graph Laplacian L , we let the Laplacian eigenmap be   ϕ ( x 1 ) .  .  ∈ R N × k  := [ u 1 , · · · , u k ] .  � �� ϕ ( x N ) U where { u l } k l =1 are the eigenvectors w.r.t. the k smallest eigenvalues. In other words, ϕ maps data in R d to R k , a coordinate in terms of eigenvectors: ϕ : x i − → ϕ ( x i ) . �� R d R k Then we apply k -means to { ϕ ( x i ) } N i =1 to perform clustering. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 9 / 43

Algorithm of spectral clustering based on graph Laplacian Unnormalized spectral clustering 1 Input: Given the number of clusters k and a dataset { x i } N i =1 , construct the similarity matrix W from { x i } N i =1 . Compute the unnormalized graph Laplacian L = D − W Compute the eigenvectors { u l } k l =1 of L w.r.t. the smallest k eigenvalues. Let U = [ u 1 , u 2 , · · · , u k ] ∈ R N × k . Perform k -means clustering on the rows of U by using Lloyd’s algorithm. Obtain the partition based on the outcome of k -means. 1 For more details, see an excellent review by [Von Luxburg, 2007] Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 10 / 43

Another variant of spectral clustering Normalized spectral clustering Input: Given the number of clusters k and a dataset { x i } N i =1 , construct the similarity matrix W from { x i } N i =1 . Compute the normalized graph Laplacian L sym = I N − D − 1 2 WD − 1 2 = D − 1 2 LD − 1 2 Compute the eigenvectors { u l } k l =1 of L sym w.r.t. the smallest k eigenvalues. Let U = [ u 1 , u 2 , · · · , u k ] ∈ R N × k . Perform k -means clustering on the rows of D − 1 2 U by using Lloyd’s algorithm. Obtain the partition based on the outcome of k -means. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 11 / 43

Comments on spectral clustering Pros and Cons of spectral clustering Pros: Spectral clustering enjoys high popularity and conveniently applies to various settings. Rich connections to random walk on graph, spectral graph theory, and differential geometry. Cons: Rigorous justifications of spectral clustering are still lacking. The two-step procedures complicate the analysis, e.g. how to analyze the performance of Laplacian eigenmaps and the convergence analysis of k -means? Our goal: we take a different route by looking at the convex relaxation of spectral clustering to understand its performance better. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 12 / 43

A graph cut perspective Key observation: The matrix W is viewed as a weight matrix of a graph with N vertices. Partitioning the dataset into k clusters is equivalent to finding a k -way graph cut such that any pair of induced subgraphs is not well-connected. Graph cut The cut is defined as the weight sum of edges whose two ends are in different subsets, � cut(Γ , Γ c ) := w ij i ∈ Γ , j ∈ Γ c where Γ is a subset of vertices and Γ c is its complement. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 13 / 43

A graph cut perspective Graph cut The cut is defined as the weight sum of edges whose two ends are in different subsets, � cut(Γ , Γ c ) := w ij i ∈ Γ , j ∈ Γ c where Γ is a subset of vertices and Γ c is its complement. However, minimizing cut(Γ , Γ c ) may not lead to satisfactory results since it is more likely to get an imbalanced cut. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 14 / 43

RatioCut RatioCut The ratio cut of { Γ a } k a =1 is given by k cut(Γ a , Γ c a ) � RatioCut( { Γ a } k a =1 ) = . | Γ a | a =1 In particular, if k = 2, RatioCut(Γ , Γ c ) = cut(Γ , Γ c ) + cut(Γ , Γ c ) . | Γ | | Γ c | But, it is worth noting minimizing RatioCut is NP-hard. A possible solution is to relax!! Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 15 / 43

RatioCut and graph Laplacian Let 1 Γ a ( · ) be an indicator vector which maps a vertex to a vector in R N via � 1 , l ∈ Γ a , 1 Γ a ( l ) = 0 , l / ∈ Γ a . Relating RatioCut to graph Laplacian There holds � � cut(Γ a , Γ c L , 1 Γ a 1 ⊤ = 1 ⊤ a ) = Γ a L 1 Γ a Γ a k � � 1 � RatioCut( { Γ a } k L , 1 Γ a 1 ⊤ a =1 ) = = � L , X rcut � , Γ a | Γ a | a =1 where k 1 � | Γ a | 1 Γ a 1 ⊤ X rcut := Γ a ← − a block-diagonal matrix a =1 Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 16 / 43

Spectral relaxation of RatioCut Spectral clustering is a relaxation by these two properties, X rcut = UU ⊤ , U ⊤ U = I k , U ∈ R N × k . Spectral relaxation of RatioCut Substituting X rcut = UU ⊤ results in � L , UU ⊤ � U ⊤ U = I k , min s.t. U ∈ R N × k whose global minimizer is easily found via computing the eigenvectors w.r.t. the k smallest eigenvalues of the graph Laplacian L . The spectral relaxation gives exactly the first step of unnormalized spectral clustering. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 17 / 43

Normalized cut For normalized spectral clustering, we consider k cut(Γ a , Γ c a ) � NCut( { Γ a } k a =1 ) := vol(Γ a ) a =1 where vol(Γ a ) = 1 ⊤ Γ a D 1 Γ a . Therefore, NCut( { Γ a } k a =1 ) = � L sym , X ncut � . Here L sym = D − 1 2 LD − 1 2 is the normalized Laplacian and k 1 � 1 1 2 1 Γ a 1 ⊤ 2 . X ncut := D Γ a D 1 ⊤ Γ a D 1 Γ a a =1 By relaxing X ncut = UU ⊤ , it gives the spectral relaxation of normalized graph Laplacian. Shuyang Ling (New York University) Data Science Seminar, HKUST Aug 13, 2018 18 / 43

Certifying the Global Optimality of Graph Cuts via Semidefinite - PowerPoint PPT Presentation

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint work with Prof. Thomas Strohmer at UC

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &

Certifying solutions to a square analytic system Coauthors Certifying regular roots (The 44 th

Graph Cuts for Image Segmentation Meghshyam G. Prasad CSE Department IIT Bombay Mumbai.

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

Submodularity beyond submodular energies: Coupling edges in graph cuts Stefanie Jegelka and Jeff

Cycles, Cuts and Spanning Trees Sections 4 and 5 of Algebraic Graph Theory by N. Biggs

Cuts and Connectivity Cuts and Connectivity CSE, IIT KGP Vertex Cut and Connectivity Vertex Cut

Minimum cuts via Breadth-First search R. Ravi ravi@cmu.edu Outline Minimum s-t cut in

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Shared-Memory Exact Minimum Cuts M. Henzinger, A. Noe, C. Schulz, D. Strash 1 Christian Schulz :

Stability, Networks: Stability, Networks: Control, and Optimality Control, and Optimality

the Certifying Authority Kaur Siruli Ministry of Finance of the Republic of Estonia Financial

Certifying the Safe Design of a Virtual Fixture Control Algorithm for a Surgical Robot Yanni

Certifying OCaml type inference (and other type systems) Jacques Garrigue Nagoya University

Certifying Non-negativity with Lasserres Hierarchy and Semidefinite Programming Victor Magron ,

Cuts and Centers Olli Pottonen olli.pottonen@tkk.fi February 15, 2008 Cuts: introduction

A JCR view of the world a-jcr-folder a-subfolder a-jcr-node propertyA : Nodes have 0..N

We Have Your Back A Worker Safety Collaborative An Initiative of the Florida Hospital Association

The Vaginal Mesh Mania: Consultant Johnson & Johnson Facts & Fiction Olga Ramm,

Agile RESTful Web Development Michael Marth Michael Drig v e l o p e r n i o r D e

Line Maintenance Keeping assets in service Scheduled and un-scheduled line maintenance Scheduled

Cloudinomicon :: Idempotent Infrastructure, Survivable Systems & Bringing Sexy Back to

Server-side OSGi with Apache Sling Felix Meschberger Day Management AG 124 About Felix

CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs Mo Li 1,2 ,

Certifying the Global Optimality of Graph Cuts via Semidefinite - PowerPoint PPT Presentation

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint work with Prof. Thomas Strohmer at UC

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &amp;

Certifying solutions to a square analytic system Coauthors Certifying regular roots (The 44 th

Graph Cuts for Image Segmentation Meghshyam G. Prasad CSE Department IIT Bombay Mumbai.

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

Submodularity beyond submodular energies: Coupling edges in graph cuts Stefanie Jegelka and Jeff

Cycles, Cuts and Spanning Trees Sections 4 and 5 of Algebraic Graph Theory by N. Biggs

Cuts and Connectivity Cuts and Connectivity CSE, IIT KGP Vertex Cut and Connectivity Vertex Cut

Minimum cuts via Breadth-First search R. Ravi ravi@cmu.edu Outline Minimum s-t cut in

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Shared-Memory Exact Minimum Cuts M. Henzinger, A. Noe, C. Schulz, D. Strash 1 Christian Schulz :

Stability, Networks: Stability, Networks: Control, and Optimality Control, and Optimality

the Certifying Authority Kaur Siruli Ministry of Finance of the Republic of Estonia Financial

Certifying the Safe Design of a Virtual Fixture Control Algorithm for a Surgical Robot Yanni

Certifying OCaml type inference (and other type systems) Jacques Garrigue Nagoya University

Certifying Non-negativity with Lasserres Hierarchy and Semidefinite Programming Victor Magron ,

Cuts and Centers Olli Pottonen olli.pottonen@tkk.fi February 15, 2008 Cuts: introduction

A JCR view of the world a-jcr-folder a-subfolder a-jcr-node propertyA : Nodes have 0..N

We Have Your Back A Worker Safety Collaborative An Initiative of the Florida Hospital Association

The Vaginal Mesh Mania: Consultant Johnson &amp; Johnson Facts &amp; Fiction Olga Ramm,

Agile RESTful Web Development Michael Marth Michael Drig v e l o p e r n i o r D e

Line Maintenance Keeping assets in service Scheduled and un-scheduled line maintenance Scheduled

Cloudinomicon :: Idempotent Infrastructure, Survivable Systems &amp; Bringing Sexy Back to

Server-side OSGi with Apache Sling Felix Meschberger Day Management AG 124 About Felix

CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs Mo Li 1,2 ,

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &

The Vaginal Mesh Mania: Consultant Johnson & Johnson Facts & Fiction Olga Ramm,

Cloudinomicon :: Idempotent Infrastructure, Survivable Systems & Bringing Sexy Back to