Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn - - PowerPoint PPT Presentation

social network clustering
SMART_READER_LITE
LIVE PREVIEW

Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn - - PowerPoint PPT Presentation

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California Los Angeles August 2, 2011 Kyle Luh,


slide-1
SLIDE 1

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Social Network Clustering

Kyle Luh, Peter Elliott, and Raymond Ahn

University of California Los Angeles

August 2, 2011

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-2
SLIDE 2

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Outline

1

Preliminaries

2

Attempted Solutions and Results

3

Recommended Solution and Results

4

Artificial Data

5

Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-3
SLIDE 3

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Hollenbeck Gang Activity

Hollenbeck has an area of approximately 15.2 miles. In this area, 31 violent gangs reside. Hollenbeck is one of the top three most violent LA policing regions. Gang violence in this region has existed since before WWII.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-4
SLIDE 4

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Hollenbeck Gang Activity

The LAPD has provided an Excel database of non-criminal stops they have made in the Hollenbeck area. The data includes: time of stop location (gang territory and coordinates) gang affiliation sex age ethnicity

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-5
SLIDE 5

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Goals

Use clustering techniques to predict unknown gang affiliations. Detect other social structures that may not be captured by gang affiliation.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-6
SLIDE 6

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Graph Models

Convert individuals into nodes. Edge weights indicate similarity. Unfortunately, data is sparse.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-7
SLIDE 7

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Choosing a Measure of Similarity

A function of Euclidean distance Dot product of feature vector

Gang territory Individuals and their gang associations Individual to individual interactions

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-8
SLIDE 8

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Results

Actual Gang Clusters

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-9
SLIDE 9

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Outline

1

Preliminaries

2

Attempted Solutions and Results

3

Recommended Solution and Results

4

Artificial Data

5

Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-10
SLIDE 10

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

K-means algorithm

Algorithm Choose number of partitions. Compute centroids. Shift centers to the centroid of their affiliated points. Repeat until equilibrium is achieved. Cons The K-means algorithm only accounts for location. We hope to utilize more of the data.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-11
SLIDE 11

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Results: K-means Approach

6.47 6.48 6.49 6.5 6.51 6.52 6.53

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-12
SLIDE 12

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

A Metric for Cluster Evaluation

We define purity to be purity(Ω, C) = 1 N

  • k

maxj|ωk ∩ cj| where Ω = {ω1, · · · , ωK} are the clusters and C = {c1, · · · , cj} are the actual classes. Another measure we used was Adjusted Mutual Information which may be more appropriate since our gangs vary significantly in size.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-13
SLIDE 13

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Results: K-means Approach

Purity ≈ 0.4 and AMI ≈ 0.4

6.47 6.48 6.49 6.5 6.51 6.52 6.53

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-14
SLIDE 14

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Modularity Maximization

Modularity compares the number of edges within a cluster to the number expected Maximize modularity. We can calculate the change in modularity at each step and stop when the change is not positive [M.J. Newman, 2006]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-15
SLIDE 15

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Results: Modularity Maximization

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-16
SLIDE 16

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Convergence of Iterated Correlations (CONCOR)

Compute correlations of entries to the mean of rows/columns Continue to calculate the correlations of the correlation matrix until we are left with +1 and −1. The method is repeated on each cluster to achieve a finer partition. [Wasserman, 1994]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-17
SLIDE 17

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Results: CONCOR

Purity ≈ 0.5 and AMI ≈ 0.46.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-18
SLIDE 18

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Outline

1

Preliminaries

2

Attempted Solutions and Results

3

Recommended Solution and Results

4

Artificial Data

5

Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-19
SLIDE 19

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Spectral Clustering

Create a matrix of eigenvectors of the Adjacency matrix. The eigenvectors capture the axes which contain the most variation in the data. Run k-means algorithm on new space. [Ng et al, 2001]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-20
SLIDE 20

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Eigenvalues

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-21
SLIDE 21

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Eigenvector Plots: Distance Only

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-22
SLIDE 22

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Eigenvector Plots: Social Information Only

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-23
SLIDE 23

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Where to go from here?

The geographic data provides no insights. The social data is so sparse that its eigenvectors are completely useless alone. We decided to combine the two adjacency matrices, αA + (1 − α)B, where α is a weighting parameter.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-24
SLIDE 24

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Eigenvector Plots: Combined

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-25
SLIDE 25

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Clustering Results

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-26
SLIDE 26

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Clustering Results

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-27
SLIDE 27

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Clustering Results

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-28
SLIDE 28

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Results: Spectral Approach

Purity ≈ .7 and AMI ≈ .65

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-29
SLIDE 29

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Outline

1

Preliminaries

2

Attempted Solutions and Results

3

Recommended Solution and Results

4

Artificial Data

5

Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-30
SLIDE 30

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Artificial Data

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-31
SLIDE 31

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Artificial Data

Inputs: Number of people Number of communities Gang multiplier Threshold G(x, y) = (ηx+ηy)σ

dist(x,y) (1 + Mδij)

[N Masuda, 2005]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-32
SLIDE 32

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Artificial Data

Inputs: Number of people Number of gangs Probability within gangs Probability outside gangs [A. Lancichinetti, 2008]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-33
SLIDE 33

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Artificial Data

Inputs: Number of people Number of gangs Average degree Mixing parameter [E.N. Gilbert, 1959]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-34
SLIDE 34

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Artificial Data

Inputs: Number of people Number of gangs Average degree Mixing parameter

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-35
SLIDE 35

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Artificial Data

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-36
SLIDE 36

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Outline

1

Preliminaries

2

Attempted Solutions and Results

3

Recommended Solution and Results

4

Artificial Data

5

Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-37
SLIDE 37

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Future Work

Matrix Completion and Link Prediction Robustness of Algorithms Artificial Testing Data

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-38
SLIDE 38

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

Acknowledgements

We would like to thank Professor Yves van Gennip for advising us throughout the project. We are also grateful to Professors Blake Hunter and Allon Percus for their helpful advice.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

slide-39
SLIDE 39

Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work

References

Ng et al. (2001) On Spectral Clustering: Analysis and an Algorithm Advances in Neural Information Processing Systems 14(2), 849 – 857. Wasserman (1994) Social Network Analysis Cambridge University Press M.E.J. Newman (2006) Modularity and community structure in networks Proceedings of the National Academy of Sciences in the United States of America 103 (23) 8577-8582 Gilbert (1959) Random Graphs The Annals of Mathematical Statistics Lancichinetti et al. (2008) Benchmark Graphs for Testing Community Detection Algorithms Physical Review E

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering