Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel - PowerPoint PPT Presentation

Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel Approach (PAMI 2007) User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations (PAKDD 2016)

Problem Definition Clustering nonlinearly separable data: Kernel k-means 1 spectral clustering 2 Goal: Design a fast graph clustering method Computing eigenvectors in expensive in large graphs.

k-MEANS Given a set of vectors a 1 , a 2 , . . . , a n the k-means algorithm seeks to find clusters π 1 , π 2 , . . . , π k that minimize the objective function m c is centroid or the mean of cluster π c

KERNEL k-MEANS To allow nonlinear separators we use kernel (mapping to higher dimension). The squared distance || φ ( a i ) − mc || 2 may be rewritten as We just need kernel matrix K , where K i , j = φ ( a i ) · φ ( a j )

KERNEL k-MEANS

Weighted KERNEL k-MEANS The weights w i are non negative. || φ ( a i ) − m c || 2 can be written as

Computational Complexity The algorithm monotonically converges as long as K is positive semi-definite Bottleneck is in step 2. Computing distance d ( a i , m c ). O ( n ) for ⇒ O ( n 2 ) per iteration. every data point = With sparse matrix K = ⇒ O ( nz ). Therefore time complexity is : O ( n 2 ( τ + m )). m : is original data dimension. τ : number of iterations

GRAPH CLUSTERING Given a graph Partition the graph into k disjoint clusters V 1 , . . . , V k such that their union is V links ( A , B ) is the sum of edge weights between nodes in A and B .

Different objectives (Ration association) Maximize within-cluster association relative to the size of the cluster.

Different objectives (Ration cut & Kernighan-Lin) Minimize the cut between clusters and the remaining vertices. Equal size partitions

Different objectives (Normalized cut) minimizing the normalized cut is equivalent to maximizing the normalized association, since

Different objectives (General weighted graph cuts/association) We introduce a weight w i for each node of the graph, and for each cluster V c , we define w ( V c ) = � i ∈ V c w i Ration association: weights equal to one normalized association: weight equal to degree

EQUIVALENCE OF THE OBJECTIVES At first glance, the two approaches to clustering presented in the previous two sections appear to be unrelated. kernel k-means objective as a trace maximization problem and weighted graph association problem are equivalent.

EQUIVALENCE OF THE OBJECTIVES Weighted Kernel k-Means as Trace Maximization where ˜ Y is the orthonormal n × k matrix that is proportional to the square root of the weight matrix W Graph Clustering as Trace Maximization

Enforcing Positive Definiteness For weighted graph association, we define a matrix K = W − 1 AW − 1 to map to weighted kernel k-means. A is an arbitrary adjacency matrix, so K is not necessarily positive definite. Given A, define K ′ = σ W − 1 + W − 1 AW − 1

THE MULTILEVEL ALGORITHM

Coarsening Phase Starting with the initial graph G 0 , the coarsening phase repeatedly transforms the graph into smaller and smaller graphs G 1 ; G 2 ; . . . ; G m such that | V 0 | > | V i | > . . . > | V m | . One popular approach start with all nodes unmarked Visit each vertex in a random order. For each vertex x, if x is not marked, merge x with the unmarked vertex y that corresponds to the highest edge weight among all edges between x and unmarked vertices. Then, mark x and y. If all neighbors of x have been marked, mark x and do not merge it with any vertex. Once all vertices are marked, the coarsening for this level is complete.

max-cut coarsening Given a vertex x, instead of merging using the criterion of heavy edges, we instead look for the unmarked vertex y that maximizes where e ( x , y ) corresponds to the edge weight between vertices x and y, and w ( x ) and w ( y ) are the weights of vertices x and y, respectively.

Base Clustering Phase A parameter indicating how small we want the coarsest graph to be. For example, than 5k nodes, where k is the number of desired clusters. region-growing (no eigenvector computation) spectral clustering bisection method (no eigenvector computation)

Refinement The final phase of the algorithm is the refinement phase. Given a graph G i , we form the graph G i − 1 initialization If a supernode in G i is in cluster c, then all nodes in G i − 1 formed from that supernode are in cluster c. improve it using a refinement algorithm (Optimized version) Use only boundary nodes

Local Search A common problem when running standard batch kernel k-means is that the algorithm has a tendency to be trapped into qualitatively poor local minima. An effective technique to counter this issue is to do a local search by incorporating an incremental strategy. A step of incremental kernel k-means attempts to move a single point from one cluster to another in order to improve the objective function.

EXPERIMENTAL RESULTS Gene Network Analysis

Introduction One of the key challenges in large attributed graph clustering is how to select representative attributes. a single user may only pick out the samples that s/he is familiar with while ignore the others, such that the selected samples are often biased . allows multiple individuals to select samples for a specific clustering

Problem Given a large attributed graph G ( V , E , F ) with | V | = n nodes and | E | = m edges, where each node is associated with | F | = d attributes, we target to extract cluster C from G with the guidance of K users. Each user independently labels the samples based on his/her own knowledge. The samples annotated by the k-th user are denoted as U k . For each set U k , we assume that nodes inside it are similar to each other, and they are dissimilar to the nodes outside the set.

Method CGMA combine the annotations first in an unbiased way to obtain the guidance information Then, use a local clustering method to cluster the graph with the guidance of combined annotations.

Annotations Combination Since the annotations are sparse labels with little overlaps, straightforward methods like majority voting may not effectively capture the relations among the annotations. Here, P k C and P k D denote the similar and dissimilar set of the k-th annotation. where χ ( x ) = 1 if x < 0 and χ ( x ) = 0 otherwise, and dc is a distance threshold. The algorithm is only sensitive to the relative magnitude of ρ k in different points.

Algorithm

Experiments

Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel - PowerPoint PPT Presentation

Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel Approach (PAMI 2007) User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations (PAKDD 2016) Problem Definition Clustering nonlinearly separable data: Kernel

Using these Planning Outlines The following Planning Outlines contain much more information than a

Priorities of lines 1. Visible outlines and edges 2. Hidden outlines and edges 3. Cutting planes 4.

into Drive-by Cryptocurrency Mining and Its Defense RAJSHAKHAR PAUL Outlines Introduction

Word Tutorial 5 Working with Templates and Outlines 6 Using Mail Merge 7 Collaborating with Others

Amal Meas Al-Anizi, PharmD Candidate KSU, Infectious Disease Rotation 2014 Outlines

The Congestion Management / TIP Selection Process Document: Outlines the policies and procedures to

Presentation Outlines for Meeting #2 September 16, 2014 Clatsop Community College, 1651 Lexington

February 19, 2020 Outlines SPRING Recommended Repairs and LAKE ALLEN CREEK

Pa ra lytic she llfish to xins www.harmfulalgae.info OUTLINES The toxins The route of

Outlines Introduction Roles and Responsibility of Department of Meteorology and Hydrology

Death with Dignity Act By Kelvin Loh, MD, FACS 11-5-16 DWD 1 Outlines Dutch experience

2015 Presidents - Elect Training Seminar Presentations Outlines This booklet contains an outline

Presentation outlines Foreword : Company overview / phage therapy field, Strategy for the

4/23/2015 Objectives Provide an algorithm that outlines the use of first-line treatment

MARI NE DI NOF L AGE L L AT E S www.harmfulalgae.info OUTLINES The organisms

FY20 results FY20 results Peter Harmer Nick Hawkins Managing Director and Deputy Chief

Understanding flash reconstruction Bruce Howard and Denver Whittington DUNE PD Sim Meeting 22

Stat 5101 Lecture Slides Deck 1 Charles J. Geyer School of Statistics University of Minnesota

FY 2016 Regional CoC Debriefing Norm Suchar Director Office of Special Needs Assistance

1 Outline 2 Outline 3 Review the characteristics of this SMART design 4 This primary aim is a

Estimating Mean and Need to Consider . . . Variance under Interval Simplest Case: . . .

Checking Assumptions Normal distributions: use probability plot (or quantile-quantile plot);

Fast leakage assessment Oscar Reparaz COSIC / KU Leuven Benedikt Gierlichs CHES 2017 Taipei

Why use R? Introduction to R: To perform inferential statistics (e.g., use a statistical

Sambuz

Useful Links

Newsletter

Mail Us