SLIDE 1
Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel - - PowerPoint PPT Presentation
Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel - - PowerPoint PPT Presentation
Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel Approach (PAMI 2007) User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations (PAKDD 2016) Problem Definition Clustering nonlinearly separable data: Kernel
SLIDE 2
SLIDE 3
Problem Definition
Clustering nonlinearly separable data:
1
Kernel k-means
2
spectral clustering Goal: Design a fast graph clustering method Computing eigenvectors in expensive in large graphs.
SLIDE 4
k-MEANS
Given a set of vectors a1, a2, . . . , an the k-means algorithm seeks to find clusters π1, π2, . . . , πk that minimize the objective function mc is centroid or the mean of cluster πc
SLIDE 5
KERNEL k-MEANS
To allow nonlinear separators we use kernel (mapping to higher dimension). The squared distance ||φ(ai) − mc||2 may be rewritten as We just need kernel matrix K, where Ki,j = φ(ai) · φ(aj)
SLIDE 6
KERNEL k-MEANS
SLIDE 7
Weighted KERNEL k-MEANS
The weights wi are non negative. ||φ(ai) − mc||2 can be written as
SLIDE 8
SLIDE 9
Computational Complexity
The algorithm monotonically converges as long as K is positive semi-definite Bottleneck is in step 2. Computing distance d(ai, mc). O(n) for every data point = ⇒ O(n2) per iteration. With sparse matrix K = ⇒ O(nz). Therefore time complexity is : O(n2(τ + m)). m: is original data dimension. τ: number of iterations
SLIDE 10
GRAPH CLUSTERING
Given a graph Partition the graph into k disjoint clusters V1, . . . , Vk such that their union is V links(A, B) is the sum of edge weights between nodes in A and B.
SLIDE 11
Different objectives (Ration association)
Maximize within-cluster association relative to the size of the cluster.
SLIDE 12
Different objectives (Ration cut & Kernighan-Lin)
Minimize the cut between clusters and the remaining vertices. Equal size partitions
SLIDE 13
Different objectives (Normalized cut)
minimizing the normalized cut is equivalent to maximizing the normalized association, since
SLIDE 14
Different objectives (General weighted graph cuts/association)
We introduce a weight wi for each node of the graph, and for each cluster Vc, we define w(Vc) =
i∈Vcwi
Ration association: weights equal to one normalized association: weight equal to degree
SLIDE 15
EQUIVALENCE OF THE OBJECTIVES
At first glance, the two approaches to clustering presented in the previous two sections appear to be unrelated. kernel k-means objective as a trace maximization problem and weighted graph association problem are equivalent.
SLIDE 16
EQUIVALENCE OF THE OBJECTIVES
Weighted Kernel k-Means as Trace Maximization where ˜ Y is the orthonormal n × k matrix that is proportional to the square root of the weight matrix W Graph Clustering as Trace Maximization
SLIDE 17
Enforcing Positive Definiteness
For weighted graph association, we define a matrix K = W −1AW −1 to map to weighted kernel k-means. A is an arbitrary adjacency matrix, so K is not necessarily positive definite. Given A, define K ′ = σW −1 + W −1AW −1
SLIDE 18
THE MULTILEVEL ALGORITHM
SLIDE 19
Coarsening Phase
Starting with the initial graph G0, the coarsening phase repeatedly transforms the graph into smaller and smaller graphs G1; G2; . . . ; Gm such that |V0| > |Vi| > . . . > |Vm|. One popular approach start with all nodes unmarked Visit each vertex in a random order. For each vertex x, if x is not marked, merge x with the unmarked vertex y that corresponds to the highest edge weight among all edges between x and unmarked vertices. Then, mark x and y. If all neighbors of x have been marked, mark x and do not merge it with any vertex. Once all vertices are marked, the coarsening for this level is complete.
SLIDE 20
max-cut coarsening
Given a vertex x, instead of merging using the criterion of heavy edges, we instead look for the unmarked vertex y that maximizes where e(x, y) corresponds to the edge weight between vertices x and y, and w(x) and w(y) are the weights of vertices x and y, respectively.
SLIDE 21
Base Clustering Phase
A parameter indicating how small we want the coarsest graph to be. For example, than 5k nodes, where k is the number of desired clusters. region-growing (no eigenvector computation) spectral clustering bisection method (no eigenvector computation)
SLIDE 22
Refinement
The final phase of the algorithm is the refinement phase. Given a graph Gi, we form the graph Gi−1 initialization If a supernode in Gi is in cluster c, then all nodes in Gi−1 formed from that supernode are in cluster c. improve it using a refinement algorithm (Optimized version) Use only boundary nodes
SLIDE 23
Local Search
A common problem when running standard batch kernel k-means is that the algorithm has a tendency to be trapped into qualitatively poor local minima. An effective technique to counter this issue is to do a local search by incorporating an incremental strategy. A step of incremental kernel k-means attempts to move a single point from one cluster to another in order to improve the objective function.
SLIDE 24
EXPERIMENTAL RESULTS
Gene Network Analysis
SLIDE 25
SLIDE 26
SLIDE 27
SLIDE 28
SLIDE 29
Introduction
One of the key challenges in large attributed graph clustering is how to select representative attributes. a single user may only pick out the samples that s/he is familiar with while ignore the others, such that the selected samples are often biased. allows multiple individuals to select samples for a specific clustering
SLIDE 30
Problem
Given a large attributed graph G(V , E, F) with |V | = n nodes and |E| = m edges, where each node is associated with |F| = d attributes, we target to extract cluster C from G with the guidance of K users. Each user independently labels the samples based on his/her own
- knowledge. The samples annotated by the k-th user are denoted as
- Uk. For each set Uk, we assume that nodes inside it are similar to
each other, and they are dissimilar to the nodes outside the set.
SLIDE 31
Method CGMA
combine the annotations first in an unbiased way to obtain the guidance information Then, use a local clustering method to cluster the graph with the guidance of combined annotations.
SLIDE 32
Annotations Combination
Since the annotations are sparse labels with little overlaps, straightforward methods like majority voting may not effectively capture the relations among the annotations. Here, Pk
C and Pk D denote the similar and dissimilar set of the k-th
annotation. where χ(x) = 1 if x < 0 and χ(x) = 0 otherwise, and dc is a distance
- threshold. The algorithm is only sensitive to the relative magnitude of ρk
in different points.
SLIDE 33
Algorithm
SLIDE 34
Algorithm
SLIDE 35
Experiments
SLIDE 36
Experiments
SLIDE 37
Experiments
SLIDE 38