A Tensor Spectral Approach to Learning Mixed Membership Community - - PowerPoint PPT Presentation

a tensor spectral approach to learning mixed membership
SMART_READER_LITE
LIVE PREVIEW

A Tensor Spectral Approach to Learning Mixed Membership Community - - PowerPoint PPT Presentation

A Tensor Spectral Approach to Learning Mixed Membership Community Models Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Daniel Hsu, Furong Huang, Niranjan UN, Mohammad Hakeem, Sham Kakade. Network Communities in Various Domains Social


slide-1
SLIDE 1

A Tensor Spectral Approach to Learning Mixed Membership Community Models

Anima Anandkumar

U.C. Irvine

Joint work with Rong Ge, Daniel Hsu, Furong Huang, Niranjan UN, Mohammad Hakeem, Sham Kakade.

slide-2
SLIDE 2

Network Communities in Various Domains

Social Networks

Social ties: e.g. friendships, co-authorships

Biological Networks

Functional relationships: e.g. gene regulation, neural activity.

Recommendation Systems

Recommendations: e.g. yelp reviews. Community Detection: Infer hidden communities from observed network.

slide-3
SLIDE 3

Community Formation Models

Basic Intuition: Nodes connect due to their community memberships

slide-4
SLIDE 4

Community Formation Models

Basic Intuition: Nodes connect due to their community memberships

Classical: Stochastic Block Model

Edges conditionally independent given node community memberships Single membership model: Nodes in at most one community

slide-5
SLIDE 5

Community Formation Models

Basic Intuition: Nodes connect due to their community memberships

Classical: Stochastic Block Model

Edges conditionally independent given node community memberships Single membership model: Nodes in at most one community

Modeling Overlapping Communities

People belong to multiple communities Community formation models? Detection algorithms? Computational/sample complexities?

MIT Microsoft UC Irvine C

  • r

n e l l

slide-6
SLIDE 6

Pure vs. Mixed Membership Community Models

Stochastic Block Model

slide-7
SLIDE 7

Pure vs. Mixed Membership Community Models

Stochastic Block Model Mixed Membership Model

slide-8
SLIDE 8

Mixed Membership Community Models

Node Membership Model

Mixed memberships: Nodes can belong to multiple communities Fractional memberships: Node memberships normalized to one.

slide-9
SLIDE 9

Mixed Membership Community Models

Node Membership Model

Mixed memberships: Nodes can belong to multiple communities Fractional memberships: Node memberships normalized to one.

slide-10
SLIDE 10

Mixed Membership Community Models

Node Membership Model

Mixed memberships: Nodes can belong to multiple communities Fractional memberships: Node memberships normalized to one.

slide-11
SLIDE 11

Mixed Membership Community Models

Node Membership Model

Mixed memberships: Nodes can belong to multiple communities Fractional memberships: Node memberships normalized to one.

Edge Formation Model

Edges conditionally independent given node community memberships Linearity: Edge probability averaged over community memberships

slide-12
SLIDE 12

Mixed Membership Dirichlet Model (Airoldi et. al.)

Independent draws for community membership vectors {πu}u∈V from Dirichlet distribution P[πu] ∝ k

j=1 πu(j)αj−1,

k

j=1 πu(j) = 1

Dirichlet distribution supported over simplex Dir(α)

slide-13
SLIDE 13

Mixed Membership Dirichlet Model (Airoldi et. al.)

Independent draws for community membership vectors {πu}u∈V from Dirichlet distribution P[πu] ∝ k

j=1 πu(j)αj−1,

k

j=1 πu(j) = 1

Dirichlet distribution supported over simplex αj → 0

slide-14
SLIDE 14

Mixed Membership Dirichlet Model (Airoldi et. al.)

Independent draws for community membership vectors {πu}u∈V from Dirichlet distribution P[πu] ∝ k

j=1 πu(j)αj−1,

k

j=1 πu(j) = 1

Dirichlet distribution supported over simplex αj = 1

slide-15
SLIDE 15

Mixed Membership Dirichlet Model (Airoldi et. al.)

Independent draws for community membership vectors {πu}u∈V from Dirichlet distribution P[πu] ∝ k

j=1 πu(j)αj−1,

k

j=1 πu(j) = 1

Dirichlet distribution supported over simplex Large αj

slide-16
SLIDE 16

Mixed Membership Dirichlet Model (Airoldi et. al.)

Independent draws for community membership vectors {πu}u∈V from Dirichlet distribution P[πu] ∝ k

j=1 πu(j)αj−1,

k

j=1 πu(j) = 1

Dirichlet distribution supported over simplex αj → ∞

slide-17
SLIDE 17

Mixed Membership Dirichlet Model (Airoldi et. al.)

Independent draws for community membership vectors {πu}u∈V from Dirichlet distribution P[πu] ∝ k

j=1 πu(j)αj−1,

k

j=1 πu(j) = 1

Dirichlet distribution supported over simplex Dir(α)

slide-18
SLIDE 18

Mixed Membership Dirichlet Model (Airoldi et. al.)

Independent draws for community membership vectors {πu}u∈V from Dirichlet distribution P[πu] ∝ k

j=1 πu(j)αj−1,

k

j=1 πu(j) = 1

Dirichlet distribution supported over simplex Dirichlet concentration parameter α0 :=

j αj

Roughly, level of sparsity in π is O(α0). Regime of interest: small αi Dir(α)

slide-19
SLIDE 19

Employing Mixed Membership Models

Advantages

Generative model: can be used for prediction Dirichlet distribution: can model sparse memberships Sparsity level related to concentration parameter: α0 =

i αi

Stochastic block model is a special case (αi → 0)

slide-20
SLIDE 20

Employing Mixed Membership Models

Advantages

Generative model: can be used for prediction Dirichlet distribution: can model sparse memberships Sparsity level related to concentration parameter: α0 =

i αi

Stochastic block model is a special case (αi → 0)

Challenges in Learning Mixed Membership Models

Identifiability: when can parameters be estimated? Guaranteed learning? What input required? Potentially large sample and computational complexities

slide-21
SLIDE 21

Overview of the Approach

Method of Moments and Spectral Approach

Inverse moment method: solve equations relating parameters to

  • bserved moments

Spectral approach: reduce equation solving to computing the “spectrum” of the observed moments Non-convex but computationally tractable approaches

slide-22
SLIDE 22

Overview of the Approach

Method of Moments and Spectral Approach

Inverse moment method: solve equations relating parameters to

  • bserved moments

Spectral approach: reduce equation solving to computing the “spectrum” of the observed moments Non-convex but computationally tractable approaches

Spectral Approach to Learning Mixed Membership Models

Edge and Subgraph Counts: Moments of the observed network Tensor Spectral Approach: Low rank tensor form and efficient decomposition via power method Parallel Implementation: Linear algebraic operations and iterative tensor decomposition techniques

slide-23
SLIDE 23

Outline

1

Introduction

2

Summary of Theoretical Guarantees

3

Graph Moments: Tensor Form of Subgraph Counts

4

Algorithms for Tensor Decomposition

5

GPU Implementation and Experimental Results

6

Conclusion

slide-24
SLIDE 24

Summary of Results

Contributions

First guaranteed learning method for overlapping (probabilistic) community models. Correctness under exact moments: edges and 3-star counts. Efficient sample and computational complexity.

slide-25
SLIDE 25

Summary of Results

Contributions

First guaranteed learning method for overlapping (probabilistic) community models. Correctness under exact moments: edges and 3-star counts. Efficient sample and computational complexity.

Scaling Requirements

k communities, n nodes. Uniform communities. Dirichlet parameter: α0 :=

i αi. p, q: intra/inter-community connectivity

n = ˜ Ω(k2(α0 + 1)2), p − q √p = ˜ Ω (α0 + 1)k n1/2

  • .
slide-26
SLIDE 26

Summary of Results

Contributions

First guaranteed learning method for overlapping (probabilistic) community models. Correctness under exact moments: edges and 3-star counts. Efficient sample and computational complexity.

Scaling Requirements

k communities, n nodes. Uniform communities. Dirichlet parameter: α0 :=

i αi. p, q: intra/inter-community connectivity

n = ˜ Ω(k2(α0 + 1)2), p − q √p = ˜ Ω (α0 + 1)k n1/2

  • .

For stochastic block model (α0 = 0), tight results Performance degradation as α0 increases Efficient method for sparse community overlaps

slide-27
SLIDE 27

Main Results: Recovery Guarantees

k communities, n nodes. Uniform communities. Dirichlet concentration parameter: α0 :=

i αi.

p, q: intra/inter-community connectivity

Scaling Requirements

n = ˜ Ω(k2(α0 + 1)2), p − q √p = ˜ Ω (α0 + 1)k n1/2

  • .
slide-28
SLIDE 28

Main Results: Recovery Guarantees

k communities, n nodes. Uniform communities. Dirichlet concentration parameter: α0 :=

i αi.

p, q: intra/inter-community connectivity

Scaling Requirements

n = ˜ Ω(k2(α0 + 1)2), p − q √p = ˜ Ω (α0 + 1)k n1/2

  • .

Recovery Bounds (Anandkumar, Ge, Hsu, Kakade ‘13)

επ n := 1 n max

i

  • Πi − Πi1 = ˜

O

  • (α0 + 1)3/2√p

(p − q)√n

  • εP := max

i,j∈[n]|

Pi,j − Pi,j| = ˜ O

  • (α0 + 1)3/2k√p

(p − q)√n

slide-29
SLIDE 29

Support Recovery Guarantees (Homophilic Models)

k communities, n nodes. Uniform communities. εP : Error in recovering P Π is the true community membership matrix. Homophilic Models: p > q

slide-30
SLIDE 30

Support Recovery Guarantees (Homophilic Models)

k communities, n nodes. Uniform communities. εP : Error in recovering P Π is the true community membership matrix. Homophilic Models: p > q

Support Recovery Guarantee (AGHK ‘13)

For a threshold ξ = Ω(εP ) , for all nodes j ∈ [n] and all communities i ∈ [k], the estimated support S satisfies (w.h.p) Π(i, j) ≥ ξ ⇒ S(i, j) = 1 and Π(i, j) ≤ ξ 2 ⇒ S(i, j) = 0. Zero-error Support Recovery of Significant Memberships of All Nodes

slide-31
SLIDE 31

Outline

1

Introduction

2

Summary of Theoretical Guarantees

3

Graph Moments: Tensor Form of Subgraph Counts

4

Algorithms for Tensor Decomposition

5

GPU Implementation and Experimental Results

6

Conclusion

slide-32
SLIDE 32

Subgraph Counts as Graph Moments

slide-33
SLIDE 33

Subgraph Counts as Graph Moments

slide-34
SLIDE 34

Subgraph Counts as Graph Moments

3-star counts for learning communities

slide-35
SLIDE 35

Subgraph Counts as Graph Moments

3-star counts for learning communities

3-Star Count Tensor

M3(a, b, c) = |X|−1# of 3-stars with leaves a,b,c = |X|−1

x∈X

G(x, a)G(x, b)G(x, c). M3 = |X|−1

x∈X

[G⊤

x,A ⊗ G⊤ x,B ⊗ G⊤ x,C]

x a b c A B C X

slide-36
SLIDE 36

Moments Under Stochastic Block Model

One-hot encoding: πu = ei if node u is in community i. P[Gu,v = 1|πu, πv] = π⊤

u Pπv] . E.g. if πu = ei and πv = ej, prob. is

Pi,j.

slide-37
SLIDE 37

Moments Under Stochastic Block Model

One-hot encoding: πu = ei if node u is in community i. P[Gu,v = 1|πu, πv] = π⊤

u Pπv] . E.g. if πu = ei and πv = ej, prob. is

Pi,j.

Expected Edge Counts

Community matrix: ΠA := [πa]a∈A E[G⊤

x,A|Π] = π⊤ x PΠA = Π⊤ AP ⊤πx = FAπx

x a A B C X

slide-38
SLIDE 38

Moments Under Stochastic Block Model

One-hot encoding: πu = ei if node u is in community i. P[Gu,v = 1|πu, πv] = π⊤

u Pπv] . E.g. if πu = ei and πv = ej, prob. is

Pi,j.

Expected Edge Counts

Community matrix: ΠA := [πa]a∈A E[G⊤

x,A|Π] = π⊤ x PΠA = Π⊤ AP ⊤πx = FAπx

x a A B C X

Expected 3-Star Tensor

λi = P[π = ei]: probability of community i E[M3|ΠA,B,C] =

i∈[k]

λi[(FA)i ⊗ (FB)i ⊗ (FC)i] x a b c A B C X

slide-39
SLIDE 39

Moments Under Stochastic Block Model

One-hot encoding: πu = ei if node u is in community i. P[Gu,v = 1|πu, πv] = π⊤

u Pπv] . E.g. if πu = ei and πv = ej, prob. is

Pi,j.

Expected Edge Counts

Community matrix: ΠA := [πa]a∈A E[G⊤

x,A|Π] = π⊤ x PΠA = Π⊤ AP ⊤πx = FAπx

x a A B C X

Expected 3-Star Tensor

λi = P[π = ei]: probability of community i E[M3|ΠA,B,C] =

i∈[k]

λi[(FA)i ⊗ (FB)i ⊗ (FC)i] x a b c A B C X Multi-linear Model for Edge Generation

slide-40
SLIDE 40

Geometric Picture for 3-Star Tensor Form

E[M3|ΠA,B,C,X] =

x∈X |X|−1[(FAπx) ⊗ (FBπx) ⊗ (FCπx)]

Linear Multiview Model

Draw of πx, for x ∈ X: πx x a b c A B C X FA FB FC FA · πx FB · πx FC · πx

slide-41
SLIDE 41

Geometric Picture for 3-Star Tensor Form

E[M3|ΠA,B,C,X] =

x∈X |X|−1[(FAπx) ⊗ (FBπx) ⊗ (FCπx)]

Linear Multiview Model

Draw of πx, for x ∈ X: πx x a b c A B C X FA FB FC FA · πx FB · πx FC · πx E[M3|ΠA,B,C] =

i∈[k]

λi[(FA)i ⊗ (FB)i ⊗ (FC)i] Goal: Recover FA, FB, FC, λ through CP tensor decomposition

slide-42
SLIDE 42

Outline

1

Introduction

2

Summary of Theoretical Guarantees

3

Graph Moments: Tensor Form of Subgraph Counts

4

Algorithms for Tensor Decomposition

5

GPU Implementation and Experimental Results

6

Conclusion

slide-43
SLIDE 43

Low-rank Tensor Decomposition

= + ....

Tensor E[M3|ΠA,B,C]

λ1(FA)1 ⊗ (FB)1 ⊗ (FC)1 λ2(FA)2 ⊗ (FB)2 ⊗ (FC)2

E[M3|ΠA,B,C] =

i∈[k] λi[(FA)i ⊗ (FB)i ⊗ (FC)i]

Rank-k tensor decomposition and typically k ≪ n u ⊗ v ⊗ w is a rank-1 tensor whose i, j, kth entry is uivjwk.

slide-44
SLIDE 44

Low-rank Tensor Decomposition

= + ....

Tensor E[M3|ΠA,B,C]

λ1(FA)1 ⊗ (FB)1 ⊗ (FC)1 λ2(FA)2 ⊗ (FB)2 ⊗ (FC)2

E[M3|ΠA,B,C] =

i∈[k] λi[(FA)i ⊗ (FB)i ⊗ (FC)i]

Rank-k tensor decomposition and typically k ≪ n u ⊗ v ⊗ w is a rank-1 tensor whose i, j, kth entry is uivjwk.

Challenges

Guaranteed algorithm for tensor decomposition? Efficient and scalable implementation? Noisy tensor decomposition: exact moments not available Sample complexity? How large n compared to k?

slide-45
SLIDE 45

Dimensionality Reduction for Tensor Decomposition

E[M3|ΠA,B,C] =

i∈[k] λi[(FA)i ⊗ (FB)i ⊗ (FC)i]

Rank-k tensor decomposition and typically k ≪ n M3 has size O(n3) but number of free parameters: nk + k

First Step: Dimensionality Reduction

Convert M3 of size |A| × |B| × |C| to a tensor T of size k × k × k Carry out decomposition of T

Advantages

Reduced computation Robustness to noise Tensor M3 Tensor T Dimensionality reduction through multi-linear transforms

slide-46
SLIDE 46

Dimensionality Reduction through Whitening

E[M3|ΠA,B,C] =

i λi[(FA)i ⊗ (FB)i ⊗ (FC)i]

Whitening: Conversion of (expected) M3 to Orthogonal Symmetric Tensor T

Whitening: WA s.t. W ⊤

A FADiag(λ)1/2 = R,

R⊤R = I ∈ Rk×k. Similarly for WB, WC. Symmetrization: ˜ WB := WB(W ⊤

B FBDiag(λ)F ⊤ A WA). Similarly

˜ WC SVD of (expected) GX,A gives Col(FA): E[G⊤

X,A|ΠA,X] = FAΠX

r1 r2 r3 FA FB FC

WA ˜ WB ˜ WC

slide-47
SLIDE 47

Dimensionality Reduction through Whitening

E[M3|ΠA,B,C] =

i λi[(FA)i ⊗ (FB)i ⊗ (FC)i]

Whitening: Conversion of (expected) M3 to Orthogonal Symmetric Tensor T

Whitening: WA s.t. W ⊤

A FADiag(λ)1/2 = R,

R⊤R = I ∈ Rk×k. Similarly for WB, WC. Symmetrization: ˜ WB := WB(W ⊤

B FBDiag(λ)F ⊤ A WA). Similarly

˜ WC SVD of (expected) GX,A gives Col(FA): E[G⊤

X,A|ΠA,X] = FAΠX

r1 r2 r3 FA FB FC

WA ˜ WB ˜ WC

Multi-linear Transformation of 3-star Tensor

T := E[M3|ΠA,B,C](WA, ˜ WB, ˜ WC) =

i ρir⊗3 i

T is symmetric orthogonal tensor: {ri} are orthonormal.

slide-48
SLIDE 48

Tensor Decomposition Through Eigen Analysis

Orthogonal symmetric tensor: T =

i ρir⊗3 i

T(I, ri, ri) =

j ρjri, rj2rj = ρiri

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

slide-49
SLIDE 49

Tensor Decomposition Through Eigen Analysis

Orthogonal symmetric tensor: T =

i ρir⊗3 i

T(I, ri, ri) =

j ρjri, rj2rj = ρiri

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Obtaining eigenvectors through power iterations

u → T(I, u, u) T(I, u, u)

slide-50
SLIDE 50

Tensor Decomposition Through Eigen Analysis

Orthogonal symmetric tensor: T =

i ρir⊗3 i

T(I, ri, ri) =

j ρjri, rj2rj = ρiri

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Obtaining eigenvectors through power iterations

u → T(I, u, u) T(I, u, u)

Basic Algorithm

Pick random initialization vectors Run power iterations Go with the winner, deflate and repeat

slide-51
SLIDE 51

Tensor Decomposition Algorithm

Analysis and Implementation of Tensor Power Method

Initialization: Different from matrix eigen analysis Robust implementation for noisy tensor decomposition Implicit manipulation of the tensor Stochastic updates for online tensor decomposition

Summary of the Tensor Approach

Use 3-star count tensor for community detection Dimensionality reduction through whitening (SVD operations) Tensor power method (stochastic updates) Post-processing (linear transformations and thresholding)

slide-52
SLIDE 52

Outline

1

Introduction

2

Summary of Theoretical Guarantees

3

Graph Moments: Tensor Form of Subgraph Counts

4

Algorithms for Tensor Decomposition

5

GPU Implementation and Experimental Results

6

Conclusion

slide-53
SLIDE 53

GPU-based implementation

Overview of GPUs

Hundreds of cores; parallelism for matrix/tensor operations Speed-up: Order of magnitude gains Big data challenges: GPU memory ≪ CPU memory ≪ Hard disk Hard disk (expandable)

CPU memory (expandable) GPU memory (not expandable) block block block block block block block block q q q q q q q q q q q q Storage hierarchy Partitioned matrix

slide-54
SLIDE 54

Summary of Results

Friendship Users

Facebook n ∼ 20k

Businesses Users Reviews

Yelp n ∼ 40k

Authors Coauthorship

DBLP n ∼ 120k Error (E) and Recovery ratio (R) Dataset ˆ k Method Running Time E R Facebook(k=360) 500

  • urs

468 0.0175 100% Facebook(k=360) 500 variational 86,808 0.0308 100% Yelp(k=159) 100

  • urs

190 0.0185 100% Yelp(k=159) 100 variational – – – DBLP(k=250) 100

  • urs

7,664 0.0722 35.60% DBLP(k=250) 100 variational 69, 156 7.4527 98.80% DBLP(k=250) 500

  • urs

10,157 0.1393 88.80% DBLP(k=250) 500 variational 558, 723 16.3874 99.20%

slide-55
SLIDE 55

Summary of Results - Yelp Dataset

Lowest error business categories & largest weight businesses

Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31

slide-56
SLIDE 56

Summary of Results - Yelp Dataset

Lowest error business categories & largest weight businesses

Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31

Bridgeness: Distance from vector [1/ˆ k, . . . , 1/ˆ k]⊤

Top-5 bridging nodes (businesses)

Business Categories Four Peaks Brewing Co Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Company Restaurants, Bars, Nightlife, Pubs, Tempe

slide-57
SLIDE 57

Yelp Visualization

slide-58
SLIDE 58

Outline

1

Introduction

2

Summary of Theoretical Guarantees

3

Graph Moments: Tensor Form of Subgraph Counts

4

Algorithms for Tensor Decomposition

5

GPU Implementation and Experimental Results

6

Conclusion

slide-59
SLIDE 59

Conclusion

Mixed Membership Models

Can model overlapping communities Efficient to learn from low order moments: edge counts and 3-star counts.

Tensor Spectral Method

Whitened 3-star count tensor is an

  • rthogonal symmetric tensor

Efficient decomposition through power method Perturbation analysis: tight for stochastic block model Zero-error support recovery guarantees

MIT M i c r

  • s
  • f

t UC Irvine Cornell