1 means clustering and conductance
play

1-means clustering and conductance Twan van Laarhoven Radboud - PowerPoint PPT Presentation

1-means clustering and conductance Twan van Laarhoven Radboud University Nijmegen, The Netherlands Institute for Computing and Information Sciences November 11th, 2016 1 / 30 Outline Network community detection with conductance The relation


  1. 1-means clustering and conductance Twan van Laarhoven Radboud University Nijmegen, The Netherlands Institute for Computing and Information Sciences November 11th, 2016 1 / 30

  2. Outline Network community detection with conductance The relation to k -means clustering Algorithms Experiments Conclusions 2 / 30

  3. Outline Network community detection with conductance The relation to k -means clustering Algorithms Experiments Conclusions 3 / 30

  4. Network community detection Global community detection Given a network, find all tightly connected sets of nodes (communities). Local community detection Given a network and a seed node, find the community/communities containing that seed. Without inspecting the whole graph. 4 / 30

  5. Network community detection Global community detection Given a network, find all tightly connected sets of nodes (communities). Local community detection Given a network and a seed node, find the community/communities containing that seed. Without inspecting the whole graph. 4 / 30

  6. Communities as optima Graphs G = ( V , E ), a ij = a ji = 1 if ( i , j ) ∈ E else 0 Score function φ G : C ( G ) → R Note: I’ll consider sets and vectors interchangeably, so C ( G ) = P ( V ) or C ( G ) = R V . 5 / 30

  7. Conductance Definition Fraction of incident edges leaving the community φ ( c ) = # { ( i , j ) ∈ E | i ∈ c , j / ∈ c } # { ( i , j ) ∈ E | i ∈ c , j ∈ V } , or � i , j ∈ V c i a ij c j φ ( c ) = 1 − � i , j ∈ V c i a ij where c i ∈ { 0 , 1 } . Very popular objective for finding network communities. 6 / 30

  8. Conductance Definition Fraction of incident edges leaving the community φ ( c ) = # { ( i , j ) ∈ E | i ∈ c , j / ∈ c } # { ( i , j ) ∈ E | i ∈ c , j ∈ V } , or � i , j ∈ V c i a ij c j φ ( c ) = 1 − � i , j ∈ V c i a ij where c i ∈ { 0 , 1 } . Very popular objective for finding network communities. 6 / 30

  9. Conductance Definition Fraction of incident edges leaving the community φ ( c ) = # { ( i , j ) ∈ E | i ∈ c , j / ∈ c } # { ( i , j ) ∈ E | i ∈ c , j ∈ V } , or � i , j ∈ V c i a ij c j φ ( c ) = 1 − � i , j ∈ V c i a ij where c i ∈ { 0 , 1 } . Very popular objective for finding network communities. 6 / 30

  10. Continuous optimization As an optimization problem minimize φ ( c ) c subject to c i ∈ { 0 , 1 } for all i . Karush-Kuhn-Tucker conditions c is a local optimum if for all c i 0 ≤ c i ≤ 1, and ∂φ ( c ) ≤ 0 if c i ≤ 1, and ∂ c i ∂φ ( c ) ≥ 0 if c i ≥ 0 . ∂ c i 7 / 30

  11. Continuous optimization As an optimization problem minimize φ ( c ) c subject to 0 ≤ c i ≤ 1 for all i . Karush-Kuhn-Tucker conditions c is a local optimum if for all c i 0 ≤ c i ≤ 1, and ∂φ ( c ) ≤ 0 if c i ≤ 1, and ∂ c i ∂φ ( c ) ≥ 0 if c i ≥ 0 . ∂ c i 7 / 30

  12. Continuous optimization As an optimization problem minimize φ ( c ) c subject to 0 ≤ c i ≤ 1 for all i . Karush-Kuhn-Tucker conditions c is a local optimum if for all c i 0 ≤ c i ≤ 1, and ∂φ ( c ) ≤ 0 if c i ≤ 1, and ∂ c i ∂φ ( c ) ≥ 0 if c i ≥ 0 . ∂ c i 7 / 30

  13. Continuous optimization As an optimization problem minimize φ ( c ) c subject to 0 ≤ c i ≤ 1 for all i . Karush-Kuhn-Tucker conditions c is a local optimum if for all c i 0 ≤ c i ≤ 1 ∇ φ ( c ) i ≥ 0 if c i = 0 ∇ φ ( c ) i = 0 if 0 < c i < 1 , ∇ φ ( c ) i ≤ 0 if c i = 1 . 7 / 30

  14. Local optima Local optima are discrete If c as a strict local minimum of φ , then c i ∈ { 0 , 1 } for all i . Proof sketch Look at φ as a function of a single c i : φ ( c i ) = α 1 + α 2 c i + α 3 c 2 i . α 4 + α 5 c i If 0 < c i < 1 and φ ′ ( c i ) = 0, then φ ′′ ( c i ) = 2 α 3 / ( α 4 + α 5 c i ) 3 ≥ 0. 8 / 30

  15. Local optima Local optima are discrete If c as a strict local minimum of φ , then c i ∈ { 0 , 1 } for all i . Proof sketch Look at φ as a function of a single c i : φ ( c i ) = α 1 + α 2 c i + α 3 c 2 i . α 4 + α 5 c i If 0 < c i < 1 and φ ′ ( c i ) = 0, then φ ′′ ( c i ) = 2 α 3 / ( α 4 + α 5 c i ) 3 ≥ 0. 8 / 30

  16. Outline Network community detection with conductance The relation to k -means clustering Algorithms Experiments Conclusions 9 / 30

  17. k -means clustering k -means clustering n k � � c ij � x i − µ j � 2 minimize 2 c i =1 j =1 Subject to the constraint that exactly one c ij is 1 for every i . 1-means clustering � c i � x i − µ � 2 � � minimize w i 2 + (1 − c i ) λ i c i 10 / 30

  18. k -means clustering weighted k -means clustering n k � � w i c ij � x i − µ j � 2 minimize 2 c i =1 j =1 Subject to the constraint that exactly one c ij is 1 for every i . 1-means clustering � c i � x i − µ � 2 � � minimize w i 2 + (1 − c i ) λ i c i 10 / 30

  19. k -means clustering weighted k -means clustering n k � � w i c ij � x i − µ j � 2 minimize 2 c i =1 j =1 Subject to the constraint that exactly one c ij is 1 for every i . 1-means clustering � c i � x i − µ � 2 � � minimize w i 2 + (1 − c i ) λ i c i 10 / 30

  20. k -means clustering (cont.) Optimal µ Fix cluster assignment c i , then � i w i c i x i µ = . � i w i c i Optimal c Fix µ , then c i is 1 if � x i − µ � < λ i , and 0 otherwise. 11 / 30

  21. Kernel k -means clustering Kernels K ( i , j ) = � x i , x j � so � x i − x j � 2 2 = K ( i , i ) + K ( j , j ) − 2 K ( i , j ). Implicit centroid The centroid is then a linear combination of points, µ = � i µ i x i , giving � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k Optimal µ becomes w i c i µ i = . � j w j c j 12 / 30

  22. Kernel k -means clustering Kernels K ( i , j ) = � x i , x j � so � x i − x j � 2 2 = K ( i , i ) + K ( j , j ) − 2 K ( i , j ). Implicit centroid The centroid is then a linear combination of points, µ = � i µ i x i , giving � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k Optimal µ becomes w i c i µ i = . � j w j c j 12 / 30

  23. Kernel k -means clustering (cont.) Implicit centroid � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k w i c i µ i = . � j w j c j 1 -means objective � c i � x i − µ � 2 � � minimize 2 + (1 − c i ) λ i w i c i 13 / 30

  24. Kernel k -means clustering (cont.) Implicit centroid � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k w i c i µ i = . � j w j c j 1 -means objective � � � minimize w i c i K ( i , i ) − 2 w i c i µ j K ( i , j ) c i j � � + w i c i µ j K ( j , k ) µ k + w i (1 − c i ) λ i j , k 13 / 30

  25. Kernel k -means clustering (cont.) Implicit centroid � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k w i c i µ i = . � j w j c j 1 -means objective � � minimize w i c i ( K ( i , i ) − λ i ) + w i λ i c i i � i , j w i c i w j c j K ( i , j ) − . � i w i c i 13 / 30

  26. Kernel k -means clustering (cont.) Implicit centroid � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k w i c i µ i = . � j w j c j 1 -means objective � i , j w i c i w j c j K ( i , j ) minimize 1 − , � i w i c i c taking λ i = K ( i , i ). 13 / 30

  27. What is the kernel? Idea K = W − 1 AW − 1 , � w i = a ij j turns the objective into � i , j c i c j a ij minimize 1 − = φ ( c ) , � i , j c i a ij c We get conductance! But this kernel is not positive definite. 14 / 30

  28. What is the kernel? Idea K = W − 1 AW − 1 , � w i = a ij j turns the objective into � i , j c i c j a ij minimize 1 − = φ ( c ) , � i , j c i a ij c We get conductance! But this kernel is not positive definite. 14 / 30

  29. What is the kernel? Idea K = W − 1 AW − 1 , � w i = a ij j turns the objective into � i , j c i c j a ij minimize 1 − = φ ( c ) , � i , j c i a ij c We get conductance! But this kernel is not positive definite. 14 / 30

  30. Positive definite kernel Add a diagonal K = W − 1 AW − 1 + σ W − 1 The objective becomes � � i , j c 2 i , j c i c j a ij i a ij minimize 1 − − σ = φ σ ( c ) . � � ij c i a ij ij c i a ij c When c i ∈ { 0 , 1 } , c 2 i = c i , so the last term is constant. 15 / 30

  31. Positive definite kernel Add a diagonal K = W − 1 AW − 1 + σ W − 1 The objective becomes � � i , j c 2 i , j c i c j a ij i a ij minimize 1 − − σ = φ σ ( c ) . � � ij c i a ij ij c i a ij c When c i ∈ { 0 , 1 } , c 2 i = c i , so the last term is constant. 15 / 30

  32. A look at local optima Relaxing the optimization problem minimize φ σ ( c ) (1) subject to 0 ≤ c i ≤ 1 for all i ∈ V . Theorem When σ ≥ 2, every discrete community c is a local optimum of (1). In practice Higher σ ⇒ more clusters are local optima. 16 / 30

  33. A look at local optima Relaxing the optimization problem minimize φ σ ( c ) (1) subject to 0 ≤ c i ≤ 1 for all i ∈ V . Theorem When σ ≥ 2, every discrete community c is a local optimum of (1). In practice Higher σ ⇒ more clusters are local optima. 16 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend