introduction to spectral clustering
play

Introduction to Spectral Clustering 1 / 42 Motivation Image - PowerPoint PPT Presentation

ECS 231 Introduction to Spectral Clustering 1 / 42 Motivation Image segmentation in computer vision 2 / 42 Motivation Community detection in network analysis 3 / 42 Outline I. Graph and graph Laplacian Graph Weighted graph


  1. ECS 231 Introduction to Spectral Clustering 1 / 42

  2. Motivation Image segmentation in computer vision 2 / 42

  3. Motivation Community detection in network analysis 3 / 42

  4. Outline I. Graph and graph Laplacian ◮ Graph ◮ Weighted graph ◮ Graph Laplacian II. Graph clustering ◮ Graph clustering ◮ Normalized cut ◮ Spectral clustering 4 / 42

  5. I.1 Graph An (undirected) graph is G = ( V, E ) , where ◮ V = { v i } is a set of vertices; ◮ E = { ( v i , v j ) , v i , v j ∈ V } is a subset of V × V . Remarks: ◮ An edge is a pair { v i , v j } with v i � = v j (no self-loop); ◮ There is at most one edge from v i to v j (simple graph). 5 / 42

  6. I.1 Graph ◮ For every vertex v i ∈ V , the degree d ( v i ) of v i is the number of edges adjacent to v : d ( v i ) = |{ v j ∈ V |{ v j , v i } ∈ E }| . ◮ Let d i = d ( v i ) , the degree matrix D = D ( G ) = diag ( d 1 , . . . , d n ) .   2 0 0 0   0 3 0 0   D =  .  0 0 3 0 0 0 0 2 6 / 42

  7. I.1 Graph ◮ Given a graph G = ( V, E ) , with | V | = n and | E | = m , the incidence matrix � D ( G ) of G is an n × m matrix with � 1 , if ∃ k s.t. e j = { v i , v k } ˜ d ij = 0 , otherwise e 1 e 2 e 3 e 4 e 5   v 1 1 1 0 0 0 v 2 1 0 1 1 0   � D ( G ) =   v 3 0 1 1 0 1 v 4 0 0 0 1 1 7 / 42

  8. I.1 Graph ◮ Given a graph G = ( V, E ) , with | V | = n and | E | = m , the adjacency matrix A ( G ) of G is a symmetric n × n matrix with � 1 , if { v i , v j } ∈ E a ij = . 0 , otherwise   0 1 1 0   1 0 1 1   A ( G ) =   1 1 0 1 0 1 1 0 8 / 42

  9. I.2 Weighted graph A weighted graph is G = ( V, W ) where ◮ V = { v i } is a set of vertices and | V | = n ; ◮ W ∈ R n × n is called weight matrix with � w ji ≥ 0 if i � = j w ij = 0 if i = j The underlying graph of G is � G = ( V, E ) with E = {{ v i , v j }| w ij > 0 } . ◮ If w ij ∈ { 0 , 1 } , W = A , ◮ Since w ii = 0 , there is no self-loops in � G . 9 / 42

  10. I.2 Weighted graph ◮ For every vertex v i ∈ V , the degree d ( v i ) of v i is the sum of the weights of the edges adjacent to v i : n � d ( v i ) = w ij . j =1 ◮ Let d i = d ( v i ) , the degree matrix D = D ( G ) = diag ( d 1 , . . . , d n ) . ◮ Let d = diag(D) and denote 1 = (1 , . . . , 1) T , then d = W 1 . 10 / 42

  11. I.2 Weighted graph ◮ Given a subset of vertices A ⊆ V , we define the volume by   � � � n  .  vol ( A ) = d ( v i ) = w ij j =1 v i ∈ A v i ∈ A ◮ If vol ( A ) = 0 , all the vertices in A are isolated. Example: If A = { v 1 , v 3 } , then vol ( A ) = d ( v 1 ) + d ( v 3 ) = ( w 12 + w 13 )+ ( w 31 + w 32 + w 34 ) 11 / 42

  12. I.2 Weighted graph ◮ Given two subsets of vertices A, B ⊆ V , the links is defined by � links ( A, B ) = w ij . v i ∈ A,v j ∈ B Remarks: ◮ A and B are not necessarily distinct; ◮ Since W is symmetric, links ( A, B ) = links ( B, A ) ; ◮ vol ( A ) = links ( A, V ) . 12 / 42

  13. I.2 Weighted graph ◮ The quantity cut ( A ) is defined by cut ( A ) = links ( A, V − A ) . ◮ The quantity assoc ( A ) is defined by assoc ( A ) = links ( A, A ) . Remarks: ◮ cut ( A ) measures how many links escape from A ; ◮ assoc ( A ) measures how many links stay within A ; ◮ cut ( A ) + assoc ( A ) = vol ( A ) . 13 / 42

  14. I.3 Graph Laplacian Given a weighted graph G = ( V, W ) , the (graph) Laplacian L of G is defined by L = D − W. where D is the degree matrix of G , and D = diag ( W · 1 ) . 14 / 42

  15. I.3 Graph Laplacian Properties of Laplacian n � 1. x T Lx = 1 w ij ( x i − x j ) 2 for ∀ x ∈ R n , 2 i,j =1 2. L ≥ 0 if w ij ≥ 0 for all i, j , 3. L · 1 = 0 , 4. If the underlying graph of G is connected, then 0 = λ 1 < λ 2 ≤ λ 3 ≤ . . . ≤ λ n , where λ i are the eigenvalues of L . 5. If the underlying graph of G is connected, then the dimension of the nullspace of L is 1. 15 / 42

  16. I.3 Graph Laplacian Proof of Property 1. Since L = D − W , we have x T Lx = x T Dx − x T Wx n n � � d i x 2 = i − w ij x i x j i =1 i,j =1 n n n � � � = 1 d i x 2 d j x 2 2( i − 2 w ij x i x j + j ) i i,j =1 j =1 � n � n � n = 1 w ij x 2 w ij x 2 2( i − 2 w ij x i x j + j ) i,j =1 i,j =1 i,j =1 � n = 1 w ij ( x i − x j ) 2 . 2 i,j =1 16 / 42

  17. I.3 Graph Laplacian Proof of Property 2. ◮ Since L T = D − W T = D − W = L , L is symmetric. � n ◮ Since x T Lx = 1 i,j =1 w ij ( x i − x j ) 2 and w ij ≥ 0 for all i, j , 2 we have x T Lx ≥ 0 . 17 / 42

  18. I.3 Graph Laplacian Proof of Property 3. L · 1 = ( D − W ) 1 = D 1 − W 1 = d − d = 0 . Proofs of Properties 4 and 5 are skipped, see § 2.2 of [Gallier’13]. 18 / 42

  19. Outline I. Graph and graph Laplacian ◮ Graph ◮ Weighted graph ◮ Graph Laplacian II. Graph clustering ◮ Graph clustering ◮ Normalized cut ◮ Spectral clustering 19 / 42

  20. II.1 Graph clustering k -way partitioning: given a weighted graph G = ( V, W ) , find a partition A 1 , A 2 , . . . , A k of V , such that ◮ A 1 ∪ A 2 ∪ . . . ∪ A k = V ; ◮ A 1 ∩ A 2 ∩ . . . ∩ A k = ∅ ; ◮ for any i and j , the edges between ( A i , A j ) have low weight and the edges within A i have high weight. If k = 2 , it is a two-way partitioning . 20 / 42

  21. II.1 Graph clustering ◮ Recall: (two-way) cut: � cut ( A ) = links ( A, V − A ) = w ij v i ∈ A, v j ∈ V − A 21 / 42

  22. II.1 Graph clustering problems The mincut is defined by � min cut ( A ) = min w ij . A v i ∈ A, v j ∈ V − A In practice, the mincut typically yields unbalanced partitions. min cut ( A ) = 1 + 2 = 3 ; 22 / 42

  23. II.2 Normalized cut The normalized cut 1 is defined by vol ( A ) + cut ( ¯ Ncut ( A ) = cut ( A ) A ) A ) . vol ( ¯ where ¯ A = V − A . 1 Jianbo Shi and Jitendra Malik, 2000 23 / 42

  24. II.2 Normalized cut Minimal Ncut: min Ncut ( A ) Example: 4 3+6+6+3 = 4 4 min Ncut ( A ) = 3+6+6+3 + 9 . 24 / 42

  25. II.2 Normalized cut Let x = ( x 1 , . . . , x n ) be the indicator vector , such that � 1 if v i ∈ A x i = if v i ∈ ¯ − 1 A = V − A Then 1. ( 1 + x ) T D ( 1 + x ) = 4 � v i ∈ A d i = 4 · vol ( A ) ; 2. ( 1 + x ) T W ( 1 + x ) = 4 � v i ∈ A,v j ∈ A w ij = 4 · assoc ( A ) . 3. ( 1 + x ) T L ( 1 + x ) = 4 · ( vol ( A ) − assoc ( A )) = 4 · cut ( A ) ; 4. ( 1 − x ) T D ( 1 − x ) = 4 � A d i = 4 · vol ( ¯ A ) ; v i ∈ ¯ 5. ( 1 − x ) T W ( 1 − x ) = 4 � A w ij = 4 · assoc ( ¯ A ) . v i ∈ ¯ A,v j ∈ ¯ 6. ( 1 − x ) T L ( 1 − x ) = 4 · ( vol ( ¯ A ) − assoc ( ¯ A )) = 4 · cut ( ¯ A ) . 7. vol ( V ) = 1 T D 1 . 25 / 42

  26. II.2 Normalized cut ◮ With the above basic properties, Ncut ( A ) can now be written as � ( 1 + x ) T L ( 1 + x ) � + ( 1 − x ) T L ( 1 − x ) Ncut ( A ) = 1 k ( 1 T D 1 ) (1 − k )( 1 T D 1 ) 4 4 · (( 1 + x ) − b ( 1 − x )) T L (( 1 + x ) − b ( 1 − x )) = 1 . b ( 1 T D 1 ) where k = vol ( A ) / vol ( V ) , b = k/ (1 − k ) and vol ( V ) = 1 T D 1 . ◮ Let y = ( 1 + x ) − b ( 1 − x ) , we have y T Ly Ncut ( A ) = 1 4 · b ( 1 T D 1 ) where � 2 if v i ∈ A y i = A . if v i ∈ ¯ − 2 b 26 / 42

  27. II.2 Normalized cut ◮ Since b = k/ (1 − k ) = vol ( A ) / vol ( ¯ A ) , we have � d i + b 2 � 1 4( y T Dy ) = d i = vol ( A ) + b 2 vol ( ¯ A ) v i ∈ A v i ∈ ¯ A A ) + vol ( A )) = b · ( 1 T D 1 ) . = b ( vol ( ¯ ◮ In addition, � � y T D 1 = y T d = 2 · d i − 2 b · d i v i ∈ ¯ v i ∈ A A = 2 · vol ( A ) − 2 b · vol ( ¯ A ) = 0 27 / 42

  28. II.2 Normalized cut In summary, the minimal normalized cut is to solve the following binary optimization : y T Ly min (1) y T Dy y s.t. y ( i ) ∈ { 2 , − 2 b } y T D 1 = 0 By Relaxation, we solve y T Ly min (2) y T Dy y y ∈ R n s.t. y T D 1 = 0 28 / 42

  29. II.2 Normalized cut Variational principle ◮ Let A, B ∈ R n × n , A T = A , B T = B > 0 and λ 1 ≤ λ 2 ≤ . . . λ n be the eigenvalues of Au = λBu with corresponding eigenvectors u 1 , u 2 , . . . , u n , ◮ then x T Ax x T Ax min x T Bx = λ 1 , arg min x T Bx = u 1 x x and x T Ax x T Ax min x T Bx = λ 2 , arg min x T Bx = u 2 . x T Bu 1 =0 x T Bu 1 =0 ◮ More general form exists. 29 / 42

  30. II.2 Normalized cut ◮ For the matrix pair ( L, D ) , it is known that ( λ 1 , y 1 ) = (0 , 1 ) . ◮ By the variational principle, the relaxed minimal Ncut (2) is equivalent to finding the second smallest eigenpair ( λ 2 , y 2 ) of Ly = λDy (3) Remarks: ◮ L is extremely sparse and D is diagonal; ◮ Precision requirement for eigenvectors is low, say O (10 − 3 ) . 30 / 42

  31. II.2 Normalized cut Image segmentation: original graph 31 / 42

  32. II.2 Normalized cut Image segmentation: heatmap of eigenvectors 32 / 42

  33. II.2 Normalized cut Image segmentation: result of min Ncut 33 / 42

  34. II.3 Spectral clustering Ncut remaining issues ◮ Once the indicator vector is computed, how to search the splitting point that the resulting partition has the minimal Ncut ( A ) value? ◮ How to use the extreme eigenvectors to do the k -way partitioning? The above two problems are addressed in spectral clustering algorithm. 34 / 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend