Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel - - PowerPoint PPT Presentation

outlines
SMART_READER_LITE
LIVE PREVIEW

Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel - - PowerPoint PPT Presentation

Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel Approach (PAMI 2007) User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations (PAKDD 2016) Problem Definition Clustering nonlinearly separable data: Kernel


slide-1
SLIDE 1

Outlines

Weighted Graph Cuts without Eigenvectors: A Multilevel Approach (PAMI 2007) User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations (PAKDD 2016)

slide-2
SLIDE 2
slide-3
SLIDE 3

Problem Definition

Clustering nonlinearly separable data:

1

Kernel k-means

2

spectral clustering Goal: Design a fast graph clustering method Computing eigenvectors in expensive in large graphs.

slide-4
SLIDE 4

k-MEANS

Given a set of vectors a1, a2, . . . , an the k-means algorithm seeks to find clusters π1, π2, . . . , πk that minimize the objective function mc is centroid or the mean of cluster πc

slide-5
SLIDE 5

KERNEL k-MEANS

To allow nonlinear separators we use kernel (mapping to higher dimension). The squared distance ||φ(ai) − mc||2 may be rewritten as We just need kernel matrix K, where Ki,j = φ(ai) · φ(aj)

slide-6
SLIDE 6

KERNEL k-MEANS

slide-7
SLIDE 7

Weighted KERNEL k-MEANS

The weights wi are non negative. ||φ(ai) − mc||2 can be written as

slide-8
SLIDE 8
slide-9
SLIDE 9

Computational Complexity

The algorithm monotonically converges as long as K is positive semi-definite Bottleneck is in step 2. Computing distance d(ai, mc). O(n) for every data point = ⇒ O(n2) per iteration. With sparse matrix K = ⇒ O(nz). Therefore time complexity is : O(n2(τ + m)). m: is original data dimension. τ: number of iterations

slide-10
SLIDE 10

GRAPH CLUSTERING

Given a graph Partition the graph into k disjoint clusters V1, . . . , Vk such that their union is V links(A, B) is the sum of edge weights between nodes in A and B.

slide-11
SLIDE 11

Different objectives (Ration association)

Maximize within-cluster association relative to the size of the cluster.

slide-12
SLIDE 12

Different objectives (Ration cut & Kernighan-Lin)

Minimize the cut between clusters and the remaining vertices. Equal size partitions

slide-13
SLIDE 13

Different objectives (Normalized cut)

minimizing the normalized cut is equivalent to maximizing the normalized association, since

slide-14
SLIDE 14

Different objectives (General weighted graph cuts/association)

We introduce a weight wi for each node of the graph, and for each cluster Vc, we define w(Vc) =

i∈Vcwi

Ration association: weights equal to one normalized association: weight equal to degree

slide-15
SLIDE 15

EQUIVALENCE OF THE OBJECTIVES

At first glance, the two approaches to clustering presented in the previous two sections appear to be unrelated. kernel k-means objective as a trace maximization problem and weighted graph association problem are equivalent.

slide-16
SLIDE 16

EQUIVALENCE OF THE OBJECTIVES

Weighted Kernel k-Means as Trace Maximization where ˜ Y is the orthonormal n × k matrix that is proportional to the square root of the weight matrix W Graph Clustering as Trace Maximization

slide-17
SLIDE 17

Enforcing Positive Definiteness

For weighted graph association, we define a matrix K = W −1AW −1 to map to weighted kernel k-means. A is an arbitrary adjacency matrix, so K is not necessarily positive definite. Given A, define K ′ = σW −1 + W −1AW −1

slide-18
SLIDE 18

THE MULTILEVEL ALGORITHM

slide-19
SLIDE 19

Coarsening Phase

Starting with the initial graph G0, the coarsening phase repeatedly transforms the graph into smaller and smaller graphs G1; G2; . . . ; Gm such that |V0| > |Vi| > . . . > |Vm|. One popular approach start with all nodes unmarked Visit each vertex in a random order. For each vertex x, if x is not marked, merge x with the unmarked vertex y that corresponds to the highest edge weight among all edges between x and unmarked vertices. Then, mark x and y. If all neighbors of x have been marked, mark x and do not merge it with any vertex. Once all vertices are marked, the coarsening for this level is complete.

slide-20
SLIDE 20

max-cut coarsening

Given a vertex x, instead of merging using the criterion of heavy edges, we instead look for the unmarked vertex y that maximizes where e(x, y) corresponds to the edge weight between vertices x and y, and w(x) and w(y) are the weights of vertices x and y, respectively.

slide-21
SLIDE 21

Base Clustering Phase

A parameter indicating how small we want the coarsest graph to be. For example, than 5k nodes, where k is the number of desired clusters. region-growing (no eigenvector computation) spectral clustering bisection method (no eigenvector computation)

slide-22
SLIDE 22

Refinement

The final phase of the algorithm is the refinement phase. Given a graph Gi, we form the graph Gi−1 initialization If a supernode in Gi is in cluster c, then all nodes in Gi−1 formed from that supernode are in cluster c. improve it using a refinement algorithm (Optimized version) Use only boundary nodes

slide-23
SLIDE 23

Local Search

A common problem when running standard batch kernel k-means is that the algorithm has a tendency to be trapped into qualitatively poor local minima. An effective technique to counter this issue is to do a local search by incorporating an incremental strategy. A step of incremental kernel k-means attempts to move a single point from one cluster to another in order to improve the objective function.

slide-24
SLIDE 24

EXPERIMENTAL RESULTS

Gene Network Analysis

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Introduction

One of the key challenges in large attributed graph clustering is how to select representative attributes. a single user may only pick out the samples that s/he is familiar with while ignore the others, such that the selected samples are often biased. allows multiple individuals to select samples for a specific clustering

slide-30
SLIDE 30

Problem

Given a large attributed graph G(V , E, F) with |V | = n nodes and |E| = m edges, where each node is associated with |F| = d attributes, we target to extract cluster C from G with the guidance of K users. Each user independently labels the samples based on his/her own

  • knowledge. The samples annotated by the k-th user are denoted as
  • Uk. For each set Uk, we assume that nodes inside it are similar to

each other, and they are dissimilar to the nodes outside the set.

slide-31
SLIDE 31

Method CGMA

combine the annotations first in an unbiased way to obtain the guidance information Then, use a local clustering method to cluster the graph with the guidance of combined annotations.

slide-32
SLIDE 32

Annotations Combination

Since the annotations are sparse labels with little overlaps, straightforward methods like majority voting may not effectively capture the relations among the annotations. Here, Pk

C and Pk D denote the similar and dissimilar set of the k-th

annotation. where χ(x) = 1 if x < 0 and χ(x) = 0 otherwise, and dc is a distance

  • threshold. The algorithm is only sensitive to the relative magnitude of ρk

in different points.

slide-33
SLIDE 33

Algorithm

slide-34
SLIDE 34

Algorithm

slide-35
SLIDE 35

Experiments

slide-36
SLIDE 36

Experiments

slide-37
SLIDE 37

Experiments

slide-38
SLIDE 38

Experiments