Large-scale Spectral Clustering Methods for Image and Text Data - - PowerPoint PPT Presentation

large scale spectral clustering methods
SMART_READER_LITE
LIVE PREVIEW

Large-scale Spectral Clustering Methods for Image and Text Data - - PowerPoint PPT Presentation

Large-scale Spectral Clustering Methods for Image and Text Data Sponsor: Verizon Wireless Jeffrey Lee*, Scott Li*, Jiye Ding, Maham Niaz, Khiem Pham, Xin Xu, Zhengxia Yi, Xin Zhang May 23, 2018 Outline Background Clustering Basics


slide-1
SLIDE 1

Large-scale Spectral Clustering Methods

for Image and Text Data

Sponsor: Verizon Wireless Jeffrey Lee*, Scott Li*, Jiye Ding, Maham Niaz, Khiem Pham, Xin Xu, Zhengxia Yi, Xin Zhang May 23, 2018

slide-2
SLIDE 2

Outline

Background

  • Clustering Basics
  • Spectral Clustering
  • Limitations

Scalable Methods

  • Scalable Cosine
  • Landmark Based Methods
  • Bipartite Graph Models

Cluster Interpretation Comparisons Conclusion

slide-3
SLIDE 3

Background

Background

  • Verizon has a large amount of browsing data from their cell phone

users.

  • Problem: How can we draw insights from this data?

CAMCOS Project - San José State University 3/82

slide-4
SLIDE 4

Background

CAMCOS

  • Spring 2017

– Proof of concept study based on a documents dataset – Focused on a general framework: preprocessing, similarity measures, different clustering algorithms

  • Spring 2018

– Focused on speed improvements for different spectral clustering algorithms – Understanding the content of the clusters

CAMCOS Project - San José State University 4/82

slide-5
SLIDE 5

Background

Clustering

  • Clustering is an unsupervised machine learning task that groups data

such that: – Data within a group are more similar to each other than data in different groups

  • Possible applications for Verizon:

– Customer and market segmentation – Grouping web pages

CAMCOS Project - San José State University 5/82

slide-6
SLIDE 6

Background

Clustering Components

  • Data matrix xi, . . . , xn ∈ Rd
  • A specified number of clusters
  • Similarity measure
  • Criterion to evaluate the clusters

CAMCOS Project - San José State University 6/82

slide-7
SLIDE 7

Background

Similarity

  • Similarity describes how alike

two observations are

  • wi,j = S(xi, xj)
  • Common similarity measures:

– Gaussian similarity – Cosine similarity A weight matrix, W

CAMCOS Project - San José State University 7/82

slide-8
SLIDE 8

Background

Spectral Clustering

Spectral clustering = graph cut! Weighted graphs are composed of:

  • Vertices: xi
  • Edges: xi ←

→ xj

  • Weights: W = (wij)

New problem: Find the "best" cut

CAMCOS Project - San José State University 8/82

slide-9
SLIDE 9

Background

More Graph Terminology

  • Degree matrix - each degree sums the similarities for one observation

D = diag(W · 1)

  • Transition matrix

P = D−1W Note: P 1 = 1 ( 1 is an eigenvector associated to the largest eigen- value, 1)

CAMCOS Project - San José State University 9/82

slide-10
SLIDE 10

Background

Spectral Clustering (Normalized Cut)

Criterion: minA,B Ncut(A, B) = Cut(A, B) V ol(A) + Cut(A, B) V ol(B) Can be shown to be approximated by solving an eigenvalue problem: Pv = λv and use the second largest eigenvector for clustering. For k clusters, we would use the second to kth eigenvectors for k-means clustering

CAMCOS Project - San José State University 10/82

slide-11
SLIDE 11

Background

Ng, Jordan, Weiss Spectral Clustering (NJW)

Other clustering algorithms use similar weight matrices for decomposition:

  • ˜

W = D− 1

2 WD− 1 2 is similar to P from Ncut

  • NJW uses the eigenvectors of ˜

W for spectral clustering

  • Note: Diffusion maps is another clustering method. It uses the

eigenvectors and eigenvalues of P t for clustering

CAMCOS Project - San José State University 11/82

slide-12
SLIDE 12

Background

Spectral Clustering vs kmeans Clustering

CAMCOS Project - San José State University 12/82

slide-13
SLIDE 13

Background

Pros and Cons of Spectral Clustering

Pros

  • Relatively simple to implement
  • Equivalent to some graph cut

problems

  • Handles arbitrarily shaped

clusters Cons

  • Computationally expensive for

large datasets

  • O(n2) storage
  • O(n3) time

CAMCOS Project - San José State University 13/82

slide-14
SLIDE 14

Background

Project Overview

Goal: Each team focused on one idea for improving the scalability

  • Team 1

– Use cosine similarity and clever matrix manipulations to avoid the calculation of W

  • Team 2

– Use landmarks to find a sparse representation of the data

  • Team 3

– Use landmarks and given data to build bipartite graph models

CAMCOS Project - San José State University 14/82

slide-15
SLIDE 15

Background

Datasets Considered

Type Dataset Instances Features Classes 20Newsgroups 18,768 55,570 20 Text Reuters 8,067 18,933 30 TDT2 9,394 36,771 30 USPS 9,298 256 10 Image Pendigits 10,992 16 10 MNIST 70,000 784 10

CAMCOS Project - San José State University 15/82

slide-16
SLIDE 16

Background

Sample Text Data - Sparse

Word Count Word 1 Word 2 Word 3 . . . Word d Document 1 6 . . . Document 2 2 1 . . . 2 Document 3 1 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Document n 8 . . .

CAMCOS Project - San José State University 16/82

slide-17
SLIDE 17

Background

Sample Image Data - Low Dimension

Pixel Intensity Pixel 1 Pixel 2 Pixel 3 . . . Pixel d Image 1 41 100 6 . . . 80 Image 2 20 100 25 . . . 70 Image 3 20 95 40 . . . 44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image n 100 . . . 50

CAMCOS Project - San José State University 17/82

slide-18
SLIDE 18

Scalable Spectral Clustering using Cosine Similarity

Scalable Spectral Clustering using Cosine Similarity Team 1

Group Leader: Jeffrey Lee Team Members: Xin Xu, Xin Zhang, Zhengxia Yi

CAMCOS Project - San José State University 18/82

slide-19
SLIDE 19

Scalable Spectral Clustering using Cosine Similarity Overview of NJW Spectral Clustering Input: Data A, specified number k, α fraction cutoff for outliers

  • 1. W =(wi,j) ∈ Rn×n, where wi,j = S(xi, xj)
  • 2. D = diag(W ·

1)

  • 3. Symmetric normalization: ˜

W = D− 1

2 WD− 1 2

  • 4. Compute the top k eigenvectors of ˜

W

  • 5. Run K-means on ˜

U to cluster. Output: Cluster labels

CAMCOS Project - San José State University 19/82

slide-20
SLIDE 20

Scalable Spectral Clustering using Cosine Similarity Setting for Scalable Spectral Clustering

  • Relevance of Cosine Similarity: Many clustering problems involve

document data or image data. For these types of data, cosine similarity is appropriate to use.

  • Main idea: Although the similarity matrix is very expensive in

spectral clustering, we can omit the similarity matrix calculation and still be able to cluster under cosine similarity.

  • Assumptions:

– The data is sparse or low dimensional – Cosine similarity is used: W = AAT − I

CAMCOS Project - San José State University 20/82

slide-21
SLIDE 21

Scalable Spectral Clustering using Cosine Similarity Cosine Similarity S(x, y) = cosθ = x · y ||x|| · ||y||

  • Measures content
  • verlap with the

bag-of-words model

  • Removes influence
  • f document length
  • Fast to compute

CAMCOS Project - San José State University 21/82

slide-22
SLIDE 22

Scalable Spectral Clustering using Cosine Similarity Math derivation: If plug in W = AAT − I, we will have:

  • 1. D= diag(W ·

1) = diag((AAT − I) · 1) = diag(A(AT 1) − 1) without the need of W

  • 2. ˜

W = D− 1

2 (AAT − I)D− 1 2

= D− 1

2 AAT D− 1 2 − D−1

= ˜ A ˜ AT − D−1 where ˜ A = D− 1

2 A

If D−1 has constant diagonals, then left singular vectors of ˜ A = eigenvec- tors of ˜ W. So, with just A, clustering is more efficient and does not rely on W.

CAMCOS Project - San José State University 22/82

slide-23
SLIDE 23

Scalable Spectral Clustering using Cosine Similarity Outlier Cutoff Entries of D−1 ordered from largest to smallest (USPS data) Discard outliers without changing the eigenspace of ˜ W

CAMCOS Project - San José State University 23/82

slide-24
SLIDE 24

Scalable Spectral Clustering using Cosine Similarity

Implementing the Scalable Spectral Clustering Algorithm

Input: Data A, Specified number k, clustering method (NJW, Ncut

  • r DM) and α fraction cutoff for outliers
  • 1. L2 normalize data A. Compute degree matrix D, remove outliers

from D and A

  • 2. Compute ˜

A = D− 1

2 A

  • 3. Compute the ˜

U, the top k left singular vectors of ˜ A

  • 4. Convert ˜

U according to clustering method and run K-means Output: Cluster labels, including a label for outliers

CAMCOS Project - San José State University 24/82

slide-25
SLIDE 25

Scalable Spectral Clustering using Cosine Similarity

Experimental Settings

  • α = 1%
  • methods: NJW and Scalable NJW
  • both algorithms coded by our team
  • golub server at San José State University
  • six data sets (three image data, three text data)

CAMCOS Project - San José State University 25/82

slide-26
SLIDE 26

Scalable Spectral Clustering using Cosine Similarity

Benchmark - Accuracy Comparison

Scalable Spectral Clustering vs. Plain NJW Spectral Clustering Accuracy (%) Dataset Scalable Plain

  • Both methods are similar

in accuracy. The Plain method is slightly more accurate. 20Newsgroup 64.40 64.95 Reuters 24.60 25.23 TDT2 51.20 51.80 USPS 67.53 67.47 Pendigits 73.56 73.56 Mnist 52.60 Out of Memory

CAMCOS Project - San José State University 26/82

slide-27
SLIDE 27

Scalable Spectral Clustering using Cosine Similarity

Benchmark - Runtime Comparison

Scalable Spectral Clustering vs. Plain NJW Spectral Clustering Runtime (Seconds) Dataset Scalable Plain

  • The Scalable method is

much faster than the Plain method. 20Newsgroup 57.7 154.9 Reuters 5.9 51.1 TDT2 25.3 53.9 USPS 1.1 52.9 Pendigits 3.4 102.0 Mnist 36.2 Out of Memory

CAMCOS Project - San José State University 27/82

slide-28
SLIDE 28

Scalable Spectral Clustering using Cosine Similarity

Robustness To Outliers (Accuracy)

CAMCOS Project - San José State University 28/82

slide-29
SLIDE 29

Scalable Spectral Clustering using Cosine Similarity

Robustness To Outliers (Runtime)

CAMCOS Project - San José State University 29/82

slide-30
SLIDE 30

Scalable Spectral Clustering using Cosine Similarity General Remarks and Results From Experiments

  • The scalable spectral clustering method is fast and comparably

accurate.

  • In general insensitive to choice of α.

Further Studies and Considerations

  • More experiments on other clustering methods (NCut, DM).
  • Extend our method to handle other similarities (Gaussian).

CAMCOS Project - San José State University 30/82

slide-31
SLIDE 31

Landmark-based Spectral Clustering

Landmark-based Spectral Clustering Team 2

Group Leader: Scott Li Team Members: Jiye Ding, Maham Niaz

CAMCOS Project - San José State University 31/82

slide-32
SLIDE 32

Landmark-based Spectral Clustering

Landmark-based Spectral Clustering (LSC) Steps:

Main Idea: Use landmarks to find a sparse representation of the data

  • Landmark selection
  • Affinity matrix computation
  • Nearest landmarks
  • Normalization, SVD, k-means

CAMCOS Project - San José State University 32/82

slide-33
SLIDE 33

Landmark-based Spectral Clustering

Landmark Selection

Random Selection

  • Very fast

k-means Selection

  • Very slow for larger datasets
  • Can be more representative

CAMCOS Project - San José State University 33/82

slide-34
SLIDE 34

Landmark-based Spectral Clustering

Affinity Matrix Computation

Gaussian Similarity S(x, y) = e

− ||x−y||2

2βσ2

Cosine Similarity S(x, y) = cosθ = x · y ||x|| · ||y||

CAMCOS Project - San José State University 34/82

slide-35
SLIDE 35

Landmark-based Spectral Clustering

Nearest Landmarks

  • The largest r entries in each row are kept. The rest are set to zero.
  • Makes the affinity matrix sparse, speeding up computations
  • Makes clustering more robust to noise

CAMCOS Project - San José State University 35/82

slide-36
SLIDE 36

Landmark-based Spectral Clustering

Data Clustering

  • L1 row normalization, then

√ L1 column normalization on A

  • Find the top k left singular vectors (u1...uk)
  • k-means outputs cluster assignments on the data

Landmark Clustering - new method

  • Cluster landmarks based on the top k right singular vectors (v1...vk)
  • Use k-NN to classify the original data

CAMCOS Project - San José State University 36/82

slide-37
SLIDE 37

Landmark-based Spectral Clustering

Experiments

  • 20 Seeds
  • Cosine Similarity
  • Compare Landmark Selection Method and Clustering Method

– p = 500, r = 6

  • Parameter Sensitivity

– Number of Landmarks (p) – Number of Nearest Landmarks (r)

CAMCOS Project - San José State University 37/82

slide-38
SLIDE 38

Landmark-based Spectral Clustering

Results

Accuracy (%) Dataset Random LM Selection k-means LM Selection NJW Data Clustering Landmark Clustering Data Clustering Landmark Clustering 20Newsgroups 65.51 58.37 69.42 60.69 63.36 Reuters 25.37 27.50 27.38 31.21 25.68 TDT2 59.85 64.34 59.45 65.69 44.38 USPS 62.12 66.70 67.83 74.70 67.74 Pendigits 78.81 78.76 77.94 81.59 73.75 MNIST 63.32 59.41 69.43 65.10 –

CAMCOS Project - San José State University 38/82

slide-39
SLIDE 39

Landmark-based Spectral Clustering

CPU Run-time (s) Dataset Random LM Selection k-means LM Selection NJW Data Clustering Landmark Clustering Data Clustering Landmark Clustering 20Newsgroups 5.95 3.78 12.75 11.16 150.96 Reuters 7.38 6.61 451.88 444.28 52.31 TDT2 12.12 11.67 1912.68 1862.29 49.46 USPS 3.93 3.56 11.65 11.76 55.46 Pendigits 2.70 2.25 3.76 3.63 95.13 MNIST 31.05 27.62 584.06 619.06 –

CAMCOS Project - San José State University 39/82

slide-40
SLIDE 40

Landmark-based Spectral Clustering

Parameter Sensitivity

Varying the Number of Landmarks - Accuracy

CAMCOS Project - San José State University 40/82

slide-41
SLIDE 41

Landmark-based Spectral Clustering

Varying the Number of Landmarks - CPU Run-time

CAMCOS Project - San José State University 41/82

slide-42
SLIDE 42

Landmark-based Spectral Clustering

Varying the Number of Nearest Landmarks - Accuracy

CAMCOS Project - San José State University 42/82

slide-43
SLIDE 43

Landmark-based Spectral Clustering

Conclusions

  • LSC techniques can improve the speed and accuracy over NJW
  • Random landmark selection is very efficient
  • Landmark clustering is often more accurate
  • Accuracy can be sensitive to the parameters

CAMCOS Project - San José State University 43/82

slide-44
SLIDE 44

Landmark-based Spectral Clustering

Spectral Clustering for Image Segmentation

Image Segmentation: Given an image, partition it into different regions for different

  • bjects.

Original Spectral Clustering

  • Input data: m × n pixels
  • Similarity measure: location

and intensity

CAMCOS Project - San José State University 44/82

slide-45
SLIDE 45

Landmark-based Spectral Clustering

New Methods of Image Segmentation by LSC

  • NJW: W ∈ R(mn)×(mn)
  • A grid of representative pixels are landmarks
  • Only consider the pixels close to each landmark

CAMCOS Project - San José State University 45/82

slide-46
SLIDE 46

Landmark-based Spectral Clustering

Example 1

Image Size: 115 × 71

NJW Result

time = 28.02

LSC Result

time = 3.55

CAMCOS Project - San José State University 46/82

slide-47
SLIDE 47

Landmark-based Spectral Clustering

Example 2

Image Size: 125 × 75

NJW Result

time = 74.17

LSC Result

time = 6.85

CAMCOS Project - San José State University 47/82

slide-48
SLIDE 48

Landmark-based Bipartite Graph Spectral Clustering Landmark-based Bipartite Graph Spectral Clustering

Team 3

Team Member: Khiem Pham

CAMCOS Project - San José State University 48/82

slide-49
SLIDE 49

Landmark-based Bipartite Graph Spectral Clustering

Motivation

EVD of n × n matrix: O(n3) time. SVD of n × m matrix, m ≪ n: O(nm2 + m3) time, linear in n. Team 1: avoid forming affinity matrix Team 2: dictionary learning + sparse coding feature A more "native" approach?

CAMCOS Project - San José State University 49/82

slide-50
SLIDE 50

Landmark-based Bipartite Graph Spectral Clustering

Bipartite Graph

  • Pick representative landmarks

CAMCOS Project - San José State University 50/82

slide-51
SLIDE 51

Landmark-based Bipartite Graph Spectral Clustering

  • Form affinity matrix between landmarks and datapoints

CAMCOS Project - San José State University 51/82

slide-52
SLIDE 52

Landmark-based Bipartite Graph Spectral Clustering

Proposition A ∈ Rn∗m: affinity matrix between n data points and m landmarks D1 (D2): diagonal matrices of row (column) sums of A. Then the eigenvectors of P =

  • D−1

1

D−1

2

A At

  • are:

V =

  • D−1/2

1

  • V1

D−1/2

2

  • V2
  • where

V1 and V2 are left and right singular vectors of:

  • A = D−1/2

1

AD−1/2

2

∈ Rn×m which can be computed in O(nm2 + m3)time

CAMCOS Project - San José State University 52/82

slide-53
SLIDE 53

Landmark-based Bipartite Graph Spectral Clustering

Diffusion Map

  • Generate random walks on bipartite graph.
  • "Enhance" global affinity of far-away data points.

b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b

* * * * * * * * * * * * * * * *

given data landmarks

b b b b b b b b b b b b b b b b b b

* * * * * * * *

a b

CAMCOS Project - San José State University 53/82

slide-54
SLIDE 54

Landmark-based Bipartite Graph Spectral Clustering

  • For odd time step, co-clustering
  • For even time step, direct clustering or landmark clustering (with

extension)

b b b b b b b b b b b b

* * * * * *

* * * * * * * * * * * *

b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b

* * * * * *

α = 1 α = 2q α = 2q + 1

CAMCOS Project - San José State University 54/82

slide-55
SLIDE 55

Landmark-based Bipartite Graph Spectral Clustering

t=1, data points <-> landmarks

CAMCOS Project - San José State University 55/82

slide-56
SLIDE 56

Landmark-based Bipartite Graph Spectral Clustering

t=5, data points <-> landmarks

CAMCOS Project - San José State University 56/82

slide-57
SLIDE 57

Landmark-based Bipartite Graph Spectral Clustering

t=9, data points <-> landmarks

CAMCOS Project - San José State University 57/82

slide-58
SLIDE 58

Landmark-based Bipartite Graph Spectral Clustering

t=2, data points <-> data points

CAMCOS Project - San José State University 58/82

slide-59
SLIDE 59

Landmark-based Bipartite Graph Spectral Clustering

t=6, data points <-> data points

CAMCOS Project - San José State University 59/82

slide-60
SLIDE 60

Landmark-based Bipartite Graph Spectral Clustering

t=10, data points <-> data points

CAMCOS Project - San José State University 60/82

slide-61
SLIDE 61

Landmark-based Bipartite Graph Spectral Clustering

Experiment Results (accuracy)

LBDM(1): diffusion map, co-clustering, time step = 1 LBDM(2,X): diffusion map, direct clustering, time step = 2 LBDM(2,Y ): diffusion map, landmark clustering, time step = 2 Dataset Ncut KASP LSC cSPEC Dhillon LBDM(1) –(2,X) –(2,Y ) usps 66.21 67.25 66.86 66.89 68.21 67.80 68.10 69.45 pendigits 69.73 68.45 77.93 67.93 73.20 72.95 74.70 73.22 letter 24.93 26.19 31.51 24.98 32.06 32.13 32.21 31.28 protein 43.68 43.85 43.85 44.84 43.35 43.55 43.16 45.88 shuttle 74.52 39.71 82.78 74.24 74.26 74.38 74.49 mnist 57.99 70.28 54.50 72.15 72.43 72.37 73.29 CAMCOS Project - San José State University 61/82

slide-62
SLIDE 62

Landmark-based Bipartite Graph Spectral Clustering

Experiment Results (Time)

LBDM(1): diffusion map, co-clustering, time step = 1 LBDM(2,X): diffusion map, direct clustering, time step = 2 LBDM(2,Y ): diffusion map, landmark clustering, time step = 2 Dataset Ncut (k-means) KASP LSC cSPEC Dhillon LBDM(1) –(2,X) –(2,Y ) usps 131.78 7.46 + 0.61 4.44 7.89 4.45 4.39 4.17 1.95 pendigits 246.08 3.13 + 0.55 3.08 5.26 3.14 2.91 3.08 1.65 letter 1180.70 5.30 + 0.77 12.24 25.07 13.51 14.96 12.87 2.78 protein 2024.54 27.04 + 0.41 3.55 7.54 3.93 4.04 3.93 4.40 shuttle 23.89 + 1.23 8.49 61.68 12.35 15.09 12.15 5.88 mnist 299.74 + 0.63 25.07 39.26 27.17 25.69 25.83 16.67 CAMCOS Project - San José State University 62/82

slide-63
SLIDE 63

Landmark-based Bipartite Graph Spectral Clustering

Parameter Sensitivity

  • Investigate the influence of each parameter on MNIST and USPS
  • Baseline configuration:

– # landmarks = 500. – # nearest neighbors = 5. – # random walk length/time step = 2.

CAMCOS Project - San José State University 63/82

slide-64
SLIDE 64

Landmark-based Bipartite Graph Spectral Clustering

  • Varying number of landmarks

200 400 600 800 1000

m (# landmarks)

50 55 60 65 70 75 80

accuracy (%)

# landmarks vs accuracy (higher is better)

LBDM2Y LBDM2X cSPEC LSC KASP

200 400 600 800 1000

m (# landmarks)

65 66 67 68 69 70 71

accuracy (%)

# landmarks vs accuracy (higher is better)

CAMCOS Project - San José State University 64/82

slide-65
SLIDE 65

Landmark-based Bipartite Graph Spectral Clustering

  • Varying number of nearest landmark neighbors

2 4 6 8 10

s (#nearest landmarks)

50 55 60 65 70 75 80

accuracy (%)

# nearest landmarks vs accuracy

LBDM2Y LBDM2X cSPEC LSC KASP

2 4 6 8 10

s (#nearest landmarks)

64 66 68 70 72

accuracy (%)

# nearest landmarks vs accuracy

CAMCOS Project - San José State University 65/82

slide-66
SLIDE 66

Landmark-based Bipartite Graph Spectral Clustering

  • Varying time step

10 20 30 40

(time step)

69 70 71 72 73 74 75

accuracy (%)

time step vs accuracy (higher is better) LBDM Y LBDM X LBDM

10 20 30 40

(time step)

66 68 70 72 74

accuracy (%)

time step vs accuracy (higher is better)

CAMCOS Project - San José State University 66/82

slide-67
SLIDE 67

Landmark-based Bipartite Graph Spectral Clustering

Biparite graph model of documents and words

  • Applicable to text data.
  • Each document is a bag-of-word (ignoring syntax)
  • Documents are data points (to be clustered), words are landmarks

(not artificial landmarks).

CAMCOS Project - San José State University 67/82

slide-68
SLIDE 68

Landmark-based Bipartite Graph Spectral Clustering

  • Recall: eigenvectors are embeddings of data points and landmarks
  • Get embeddings of both documents and words
  • Great for dimensionality reduction and visualization (similar to Lapla-

cian Eigenmap1)

1Belkin, Mikhail, and Partha Niyogi. "Laplacian eigenmaps for dimensionality reduction

and data representation." Neural computation 15, no. 6 (2003): 1373-1396. CAMCOS Project - San José State University 68/82

slide-69
SLIDE 69

Landmark-based Bipartite Graph Spectral Clustering

Problem

  • 20 news accuracy: 26.09%
  • due to sparse matrix, many low degree words, several low degree

documents

  • can remove low degree nodes in graph, but lose information
  • ?

CAMCOS Project - San José State University 69/82

slide-70
SLIDE 70

alt.atheism comp.graphics rec.sport.baseball sci.electronics sci.med

slide-71
SLIDE 71

Landmark-based Bipartite Graph Spectral Clustering

Solution

  • Based on recent works on degree-corrected stochastic block model,

"inflate" degree of node:2 – D1 = D1 + τ1I – D2 = D2 + τ2I –

  • A =

D−1/2

1

A D−1/2

2

  • Accuracy: 63.94%

2Rohe, Karl, and Bin Yu.

"Co-clustering for directed graphs; the stochastic co- blockmodel and a spectral algorithm." stat 1050 (2012): 10. CAMCOS Project - San José State University 71/82

slide-72
SLIDE 72

religion she god atheists atheism medical disease doctor

  • 0.8

keith

  • 0.6

pitt

  • 0.8
  • 0.4

radio

  • 0.2

0.6 0.2

  • 0.6

electronics voltage 0.4 circuit image 0.6 0.4

  • 0.4

baseball program files 0.2

  • 0.2

year graphics thanks thanks team games game

  • 0.2

0.2

  • 0.4

0.4

  • 0.6

0.6 0.8

  • 0.8

1

  • 1

alt.atheism comp.graphics rec.sport.baseball sci.electronics sci.med

slide-73
SLIDE 73

Concluding Remarks

Concluding Remarks

CAMCOS Project - San José State University 73/82

slide-74
SLIDE 74

Concluding Remarks

Text Cluster Interpretation

Singular Value Decomposition: Take the first basis vector of each cluster Frequencies Ranking: Rank all words based on total frequency inside each cluster

CAMCOS Project - San José State University 74/82

slide-75
SLIDE 75

Concluding Remarks

Text Cluster Interpretation

  • After clustering, we use rank 1 singular value decomposition to
  • btain the first basis vector of each cluster.
  • The top entries in each first basis vector represent important words

in that cluster.

CAMCOS Project - San José State University 75/82

slide-76
SLIDE 76

Concluding Remarks

Text Cluster Interpretation

Rank all words based on the total frequency inside each cluster

CAMCOS Project - San José State University 76/82

slide-77
SLIDE 77

Concluding Remarks

Team Comparisons

  • 1. Cosine
  • 2. Landmark
  • 3. Bipartite

Dataset Accuracy Time Accuracy Time Accuracy Time USPS 67.5 (1.1) 74.7 (11.8) 69.5 (9.4) Pendigits 73.6 (3.4) 81.6 (3.6) 74.7 (6.2) MNIST 52.6 (36.2) 69.4 (584.1) 73.3 (316.4) TDT2 51.2 (25.3) 64.3 (11.7) 70.8 (38.1) Reuters 24.6 (5.9) 27.5 (6.6) 38.3 (36.6)

CAMCOS Project - San José State University 77/82

slide-78
SLIDE 78

Concluding Remarks

Conclusion

  • We worked on three ideas for scalable spectral clustering methods
  • They are often faster and more accurate than older spectral clustering

algorithms

  • Next: Clustering data provided by Verizon

CAMCOS Project - San José State University 78/82

slide-79
SLIDE 79

Concluding Remarks

Future Work

  • More Evaluation Metrics

– F1 score

  • Recursive Partitioning

– Finds a hierarchical structure – Useful for determining the number of clusters

  • Clustering Browsing History with Demographic Data

– Categorical data

CAMCOS Project - San José State University 79/82

slide-80
SLIDE 80

Acknowledgements

  • We would like to thank Prof. Guangliang Chen for his guidance and

supervision with this project and Prof. Slobodan Simic for helping to organize this project

  • Thanks to Verizon for their generous sponsorship
slide-81
SLIDE 81

Concluding Remarks

References

[1] A.Y. Ng, M. I. Jordan, Y. Weiss"On Spectral Clustering: Analysis and an Algorithm", NIPS Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, pp: 849-856 MIT Press Cambridge, MA, USA, Dec 2001 [2]

  • U. Von Luxburg "A tutorial on spectral clustering", Statistics and Computing, 17(4):pp 395-416,2007

[3] Zelnik-Manor, Lihi, P. Perona. "Self-tuning spectral clustering." Advances in neural information processing systems. 2005 [4]

  • G. Chen, "Scalable spectral clustering with cosine similarity." To appear in the Proceedings of the 24th International

Conference on Pattern Recognition (ICPR), Beijing, China. 2018 [5]

  • J. Fitch et al., "Adaptive Spectral Clustering for High-Dimensional Sparse Count Data" Dept. Math., San Jose State

Univ., San Jose, CA, 2017 [6]

  • D. Cai, X. Chen, "Large Scale Spectral Clustering Via Landmark-Based Sparse Representation" IEEE Trans. Cybernetics,

Vol 45 Issue 8, August 2015

CAMCOS Project - San José State University 81/82

slide-82
SLIDE 82

Questions?