Community Detection on Graphs with Side Information Aria Nosratinia - - PowerPoint PPT Presentation

community detection on graphs with side information
SMART_READER_LITE
LIVE PREVIEW

Community Detection on Graphs with Side Information Aria Nosratinia - - PowerPoint PPT Presentation

Motivation & Background Graph Models Side Information Belief Propagation Conclusion Community Detection on Graphs with Side Information Aria Nosratinia ACOSIS 2019, Marrakech November 20, 2019 Aria Nosratinia Community Detection with


slide-1
SLIDE 1

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Community Detection on Graphs with Side Information

Aria Nosratinia

ACOSIS 2019, Marrakech

November 20, 2019

Aria Nosratinia Community Detection with Side Information 1/ 40

slide-2
SLIDE 2

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-3
SLIDE 3

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-4
SLIDE 4

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-5
SLIDE 5

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-6
SLIDE 6

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-7
SLIDE 7

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-8
SLIDE 8

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-9
SLIDE 9

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-10
SLIDE 10

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Network Science Abstraction of data structures into networks

Technology: Internet, telephone network, power grids, transportation networks Information: world wide web, citation networks Biology: metabolic networks, protein interaction networks, genetic regulatory networks, etc. Social Networks: friendships, business interactions, facebook financial networks...

Important for two reasons:

Prevalence in engineering, biology, sociology, finance Usefulness in modeling, analysis, and solving problems.

Aria Nosratinia Community Detection with Side Information 2/ 40

slide-11
SLIDE 11

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Why Study Networks Pattern of interactions between parts of a system is important .... significant effect on behavior of complex systems A network is an abstraction of the topology of interactions Networks have been studied in many ways, by scholars in different fields Key Question: how the structure of network determines properties we care about?

Aria Nosratinia Community Detection with Side Information 3/ 40

slide-12
SLIDE 12

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Why Study Networks Pattern of interactions between parts of a system is important .... significant effect on behavior of complex systems A network is an abstraction of the topology of interactions Networks have been studied in many ways, by scholars in different fields Key Question: how the structure of network determines properties we care about?

Aria Nosratinia Community Detection with Side Information 3/ 40

slide-13
SLIDE 13

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Why Study Networks Pattern of interactions between parts of a system is important .... significant effect on behavior of complex systems A network is an abstraction of the topology of interactions Networks have been studied in many ways, by scholars in different fields Key Question: how the structure of network determines properties we care about?

Aria Nosratinia Community Detection with Side Information 3/ 40

slide-14
SLIDE 14

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Community Structure In Networks One of the key features of many networks is the way the nodes relate to each other. Many networks consist of clusters whose internal structure is consistent, but separate from the remainder of the network. A small network displaying community structure

Aria Nosratinia Community Detection with Side Information 4/ 40

slide-15
SLIDE 15

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Community Detection in Networks Community detection is a central problem in machine learning and data science. I.e., which items belong together and have similar properties Can be a goal unto itself, or first step toward other learning tasks Applications in biology, criminology, social networks, politics, link prediction, advertising Studied in statistics, computer science, and theoretical statistical physics

Aria Nosratinia Community Detection with Side Information 5/ 40

slide-16
SLIDE 16

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Community Detection in Networks Community detection is a central problem in machine learning and data science. I.e., which items belong together and have similar properties Can be a goal unto itself, or first step toward other learning tasks Applications in biology, criminology, social networks, politics, link prediction, advertising Studied in statistics, computer science, and theoretical statistical physics

Aria Nosratinia Community Detection with Side Information 5/ 40

slide-17
SLIDE 17

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Community Detection in Networks Community detection is a central problem in machine learning and data science. I.e., which items belong together and have similar properties Can be a goal unto itself, or first step toward other learning tasks Applications in biology, criminology, social networks, politics, link prediction, advertising Studied in statistics, computer science, and theoretical statistical physics

Aria Nosratinia Community Detection with Side Information 5/ 40

slide-18
SLIDE 18

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Community Detection in Networks Community detection is a central problem in machine learning and data science. I.e., which items belong together and have similar properties Can be a goal unto itself, or first step toward other learning tasks Applications in biology, criminology, social networks, politics, link prediction, advertising Studied in statistics, computer science, and theoretical statistical physics

Aria Nosratinia Community Detection with Side Information 5/ 40

slide-19
SLIDE 19

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Community Detection in Networks Community detection is a central problem in machine learning and data science. I.e., which items belong together and have similar properties Can be a goal unto itself, or first step toward other learning tasks Applications in biology, criminology, social networks, politics, link prediction, advertising Studied in statistics, computer science, and theoretical statistical physics

Aria Nosratinia Community Detection with Side Information 5/ 40

slide-20
SLIDE 20

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Graph Models Graphs are modeled statistically. Basic model: Erd˝

  • s-R´

enyi graph:

Vertices are fixed Each edge appears with probability p independent of others. Independence of edges makes analysis tractable

Refinement: nodes have types, edge probability dependent on node types.

Aria Nosratinia Community Detection with Side Information 6/ 40

slide-21
SLIDE 21

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Graph Models Graphs are modeled statistically. Basic model: Erd˝

  • s-R´

enyi graph:

Vertices are fixed Each edge appears with probability p independent of others. Independence of edges makes analysis tractable

Refinement: nodes have types, edge probability dependent on node types.

Aria Nosratinia Community Detection with Side Information 6/ 40

slide-22
SLIDE 22

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Graph Models Graphs are modeled statistically. Basic model: Erd˝

  • s-R´

enyi graph:

Vertices are fixed Each edge appears with probability p independent of others. Independence of edges makes analysis tractable

Refinement: nodes have types, edge probability dependent on node types.

Aria Nosratinia Community Detection with Side Information 6/ 40

slide-23
SLIDE 23

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Graph Models Graphs are modeled statistically. Basic model: Erd˝

  • s-R´

enyi graph:

Vertices are fixed Each edge appears with probability p independent of others. Independence of edges makes analysis tractable

Refinement: nodes have types, edge probability dependent on node types.

Aria Nosratinia Community Detection with Side Information 6/ 40

slide-24
SLIDE 24

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Graph Models Graphs are modeled statistically. Basic model: Erd˝

  • s-R´

enyi graph:

Vertices are fixed Each edge appears with probability p independent of others. Independence of edges makes analysis tractable

Refinement: nodes have types, edge probability dependent on node types.

Aria Nosratinia Community Detection with Side Information 6/ 40

slide-25
SLIDE 25

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Graph Models Graphs are modeled statistically. Basic model: Erd˝

  • s-R´

enyi graph:

Vertices are fixed Each edge appears with probability p independent of others. Independence of edges makes analysis tractable

Refinement: nodes have types, edge probability dependent on node types.

Aria Nosratinia Community Detection with Side Information 6/ 40

slide-26
SLIDE 26

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Binary Symmetric Stochastic Block Model G(n, PX (x), p, q)

n: number of nodes. PX(x): prior on the size of the two communities. p and q: Inter- and Intra-communities connectivity probabilities. G(20, 0.5, 0.8, 0.2)

Aria Nosratinia Community Detection with Side Information 7/ 40

slide-27
SLIDE 27

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Binary Symmetric Stochastic Block Model G(n, PX (x), p, q) G(20, 0.5, 0.8, 0.2)

Aria Nosratinia Community Detection with Side Information 8/ 40

slide-28
SLIDE 28

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Binary Symmetric Stochastic Block Model G(n, PX (x), p, q) G(20, 0.5, 0.8, 0.2)

Aria Nosratinia Community Detection with Side Information 9/ 40

slide-29
SLIDE 29

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Stochastic Block Models Binary: Two communities, each has internal structure Single community, also known as hidden community Multi-community Overlapping communities

Aria Nosratinia Community Detection with Side Information 10/ 40

slide-30
SLIDE 30

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Recovery Metrics Exact Recovery: recovery of the label of every node with probability one (ε = 0 when n → ∞) . Weak Recovery: recovery of the label of all but vanishing number of nodes, with probability one (ε → 0 as n → ∞). Correlated Recovery: Strictly better than random guessing, with probability one (ε < 0.5 as n → ∞) ε=fraction of misclassified nodes,

Aria Nosratinia Community Detection with Side Information 11/ 40

slide-31
SLIDE 31

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Recovery Metrics Exact Recovery: recovery of the label of every node with probability one (ε = 0 when n → ∞) . Weak Recovery: recovery of the label of all but vanishing number of nodes, with probability one (ε → 0 as n → ∞). Correlated Recovery: Strictly better than random guessing, with probability one (ε < 0.5 as n → ∞) ε=fraction of misclassified nodes,

Aria Nosratinia Community Detection with Side Information 11/ 40

slide-32
SLIDE 32

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Recovery Metrics Exact Recovery: recovery of the label of every node with probability one (ε = 0 when n → ∞) . Weak Recovery: recovery of the label of all but vanishing number of nodes, with probability one (ε → 0 as n → ∞). Correlated Recovery: Strictly better than random guessing, with probability one (ε < 0.5 as n → ∞) ε=fraction of misclassified nodes,

Aria Nosratinia Community Detection with Side Information 11/ 40

slide-33
SLIDE 33

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Areas of Study: Fundamental Limits QUESTION: which community detection problems are solvable? Depends in part on recovery metric. The question is answered via “regions” of SBM model parameters. Roughly, if behavior inter/intra communities is different enough, communities can be distinguished. This is related to the notion of phase transition.

Aria Nosratinia Community Detection with Side Information 12/ 40

slide-34
SLIDE 34

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Areas of Study: Fundamental Limits QUESTION: which community detection problems are solvable? Depends in part on recovery metric. The question is answered via “regions” of SBM model parameters. Roughly, if behavior inter/intra communities is different enough, communities can be distinguished. This is related to the notion of phase transition.

Aria Nosratinia Community Detection with Side Information 12/ 40

slide-35
SLIDE 35

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Areas of Study: Fundamental Limits QUESTION: which community detection problems are solvable? Depends in part on recovery metric. The question is answered via “regions” of SBM model parameters. Roughly, if behavior inter/intra communities is different enough, communities can be distinguished. This is related to the notion of phase transition.

Aria Nosratinia Community Detection with Side Information 12/ 40

slide-36
SLIDE 36

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Phase Transition With low edge probability p graph is a disconnected collection

  • f small graphs..

If p ∼ 1

n, a giant component emerges

If p ∼ log(n)

n

, the graph is connected with high probability This is a manifestation of concentration of measure. Recovery also experiences a concentration effect.

Aria Nosratinia Community Detection with Side Information 13/ 40

slide-37
SLIDE 37

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Phase Transition With low edge probability p graph is a disconnected collection

  • f small graphs..

If p ∼ 1

n, a giant component emerges

If p ∼ log(n)

n

, the graph is connected with high probability This is a manifestation of concentration of measure. Recovery also experiences a concentration effect.

Aria Nosratinia Community Detection with Side Information 13/ 40

slide-38
SLIDE 38

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Phase Transition With low edge probability p graph is a disconnected collection

  • f small graphs..

If p ∼ 1

n, a giant component emerges

If p ∼ log(n)

n

, the graph is connected with high probability This is a manifestation of concentration of measure. Recovery also experiences a concentration effect.

Aria Nosratinia Community Detection with Side Information 13/ 40

slide-39
SLIDE 39

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Phase Transition With low edge probability p graph is a disconnected collection

  • f small graphs..

If p ∼ 1

n, a giant component emerges

If p ∼ log(n)

n

, the graph is connected with high probability This is a manifestation of concentration of measure. Recovery also experiences a concentration effect.

Aria Nosratinia Community Detection with Side Information 13/ 40

slide-40
SLIDE 40

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Phase Transition With low edge probability p graph is a disconnected collection

  • f small graphs..

If p ∼ 1

n, a giant component emerges

If p ∼ log(n)

n

, the graph is connected with high probability This is a manifestation of concentration of measure. Recovery also experiences a concentration effect.

Aria Nosratinia Community Detection with Side Information 13/ 40

slide-41
SLIDE 41

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Fundamental Recovery Limits p = a

n, q = b n, d: average degree of a node.

Exact Recovery: [Abbe-Bandeira-Hall ’14] d = Ω(log(n)) and (√a − √ b)2 > 2 log(n) Weak Recovery: [Mossel-Neeman-Sly ’14] d = Ω(1) and (a − b)2 (a + b) → ∞ Correlated Recovery: [Mossel-Neeman-Sly ’13] d = θ(1) and (a − b)2 (a + b) > 2

Aria Nosratinia Community Detection with Side Information 14/ 40

slide-42
SLIDE 42

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Fundamental Recovery Limits p = a

n, q = b n, d: average degree of a node.

Exact Recovery: [Abbe-Bandeira-Hall ’14] d = Ω(log(n)) and (√a − √ b)2 > 2 log(n) Weak Recovery: [Mossel-Neeman-Sly ’14] d = Ω(1) and (a − b)2 (a + b) → ∞ Correlated Recovery: [Mossel-Neeman-Sly ’13] d = θ(1) and (a − b)2 (a + b) > 2

Aria Nosratinia Community Detection with Side Information 14/ 40

slide-43
SLIDE 43

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Fundamental Recovery Limits p = a

n, q = b n, d: average degree of a node.

Exact Recovery: [Abbe-Bandeira-Hall ’14] d = Ω(log(n)) and (√a − √ b)2 > 2 log(n) Weak Recovery: [Mossel-Neeman-Sly ’14] d = Ω(1) and (a − b)2 (a + b) → ∞ Correlated Recovery: [Mossel-Neeman-Sly ’13] d = θ(1) and (a − b)2 (a + b) > 2

Aria Nosratinia Community Detection with Side Information 14/ 40

slide-44
SLIDE 44

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Areas of Study: Efficient Algorithms Local algorithm: belief propagation Computation gap: it has been shown belief propagation threshold is different from information theoretic limits Belief propagation results are shown via a tree approximation

  • f a sub-graph.

Aria Nosratinia Community Detection with Side Information 15/ 40

slide-45
SLIDE 45

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Side Information in Community Detection Graphical Models are powerful but do not capture everything. Non-graph always information accompanies graph information

Social Networks: date of birth, nationality, school, ... Citation Networks: authors names, keywords, ...

Non-graph information can be game changing

Improve error performance Enable operation on smaller graphs Help efficient algorithms, e.g., belief propagation

Aria Nosratinia Community Detection with Side Information 16/ 40

slide-46
SLIDE 46

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Side Information in Community Detection Graphical Models are powerful but do not capture everything. Non-graph always information accompanies graph information

Social Networks: date of birth, nationality, school, ... Citation Networks: authors names, keywords, ...

Non-graph information can be game changing

Improve error performance Enable operation on smaller graphs Help efficient algorithms, e.g., belief propagation

Aria Nosratinia Community Detection with Side Information 16/ 40

slide-47
SLIDE 47

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Side Information in Community Detection Graphical Models are powerful but do not capture everything. Non-graph always information accompanies graph information

Social Networks: date of birth, nationality, school, ... Citation Networks: authors names, keywords, ...

Non-graph information can be game changing

Improve error performance Enable operation on smaller graphs Help efficient algorithms, e.g., belief propagation

Aria Nosratinia Community Detection with Side Information 16/ 40

slide-48
SLIDE 48

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Side Information in Community Detection Graphical Models are powerful but do not capture everything. Non-graph always information accompanies graph information

Social Networks: date of birth, nationality, school, ... Citation Networks: authors names, keywords, ...

Non-graph information can be game changing

Improve error performance Enable operation on smaller graphs Help efficient algorithms, e.g., belief propagation

Aria Nosratinia Community Detection with Side Information 16/ 40

slide-49
SLIDE 49

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Side Information in Community Detection Graphical Models are powerful but do not capture everything. Non-graph always information accompanies graph information

Social Networks: date of birth, nationality, school, ... Citation Networks: authors names, keywords, ...

Non-graph information can be game changing

Improve error performance Enable operation on smaller graphs Help efficient algorithms, e.g., belief propagation

Aria Nosratinia Community Detection with Side Information 16/ 40

slide-50
SLIDE 50

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Side Information in Community Detection Graphical Models are powerful but do not capture everything. Non-graph always information accompanies graph information

Social Networks: date of birth, nationality, school, ... Citation Networks: authors names, keywords, ...

Non-graph information can be game changing

Improve error performance Enable operation on smaller graphs Help efficient algorithms, e.g., belief propagation

Aria Nosratinia Community Detection with Side Information 16/ 40

slide-51
SLIDE 51

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Side Information in Community Detection Graphical Models are powerful but do not capture everything. Non-graph always information accompanies graph information

Social Networks: date of birth, nationality, school, ... Citation Networks: authors names, keywords, ...

Non-graph information can be game changing

Improve error performance Enable operation on smaller graphs Help efficient algorithms, e.g., belief propagation

Aria Nosratinia Community Detection with Side Information 16/ 40

slide-52
SLIDE 52

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Side Information in Community Detection Graphical Models are powerful but do not capture everything. Non-graph always information accompanies graph information

Social Networks: date of birth, nationality, school, ... Citation Networks: authors names, keywords, ...

Non-graph information can be game changing

Improve error performance Enable operation on smaller graphs Help efficient algorithms, e.g., belief propagation

Aria Nosratinia Community Detection with Side Information 16/ 40

slide-53
SLIDE 53

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Objective When and by how much does side information help in recovering the communities? Mathematical analysis for a broad model class Asymptotic in the size of graph Optimal (maximum likelihood) recovery/error results Also, analyzing some local algorithms

Aria Nosratinia Community Detection with Side Information 17/ 40

slide-54
SLIDE 54

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Binary Symmetric SBM with Side Information Each node includes side information observation with finite-cardinality alphabet. Side information is conditionally independent of the graph: P(G, S|X) = P(G|X)P(S|X)

Side information

Aria Nosratinia Community Detection with Side Information 18/ 40

slide-55
SLIDE 55

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Noisy-Label Side Information

a

  • 1

a a

a

  • 1

X

Y

1 1

  • 1
  • 1

Let p = a log(n)

n

, q = b log(n)

n

, M = log( a

b) and c = log( 1−α α ).

Node i has Ai, Bi edges to communities 1 and 2. For intuition, consider scalar detection M(Ai − Bi) + cyi > 0.

Need c → ∞ or α → 0 Question: how fast?

log(n) How fast ? Network Information Side Information H(X) bits Aria Nosratinia Community Detection with Side Information 19/ 40

slide-56
SLIDE 56

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Exact Recovery with Noisy-Label Side Information Theorem With noisy-label side information, exact recovery is possible if and

  • nly if:

     (√a − √ b)2 > 2, c = o(log(n)) η(a, b, β) > 2, c = β log(n) + o(log(n)), 0 < β ≤ M(a−b)

2

β > 1, c = β log(n) + o(log(n)), β ≥ M(a−b)

2

η(a, b, k) = a + b + β − 2 γ

M + k M log( γ+β γ−β), where

γ =

  • β2 + abM2.

Even for a, b: (√a − √ b)2 > 2, side information is beneficial as it breaks symmetry.

Aria Nosratinia Community Detection with Side Information 20/ 40

slide-57
SLIDE 57

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Exact Recovery with Noisy-Label Side Information

0.5 1 1.5 2 2.5 3 3.5

Quality of Side information (β)

  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

Error exponent 1-0.5η(a,b,β) VS 1-β

Less discriminatory graph Recovery region Critical β

Aria Nosratinia Community Detection with Side Information 21/ 40

slide-58
SLIDE 58

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Achievability: Two Stage Algorithm Stage One: Generate Erd˝

  • s-R´

enyi random graph H1 with n nodes. Partition G: G1 = G ∩ H1 and G2 = G ∩ Hc

1 .

Idea: G1 is sparse for weak recovery, G2 keeps most of the information of G Perform partial recovery [L. Massoulie ’13] on G1. This will provably produce weak recovery on G1.

Aria Nosratinia Community Detection with Side Information 22/ 40

slide-59
SLIDE 59

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Achievability: Stage One

G Original Graph G1 ∼ ( Da

n , Db n ): sparse for weak recovery

G2 ≈ G for Local modifications

Aria Nosratinia Community Detection with Side Information 23/ 40

slide-60
SLIDE 60

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Achievability: Two Stage Algorithm Stage Two: Idea: use side information for a local (node-wise) scalar update of weak recovery results For each community: adjust label estimate according to difference of G2 edges to that node from two communities, plus LLR of side information. If results in unequal number of labels, abort.

Aria Nosratinia Community Detection with Side Information 24/ 40

slide-61
SLIDE 61

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Achievability: Two Stage Algorithm Error Event for local modification

A B A’ B’

Aria Nosratinia Community Detection with Side Information 25/ 40

slide-62
SLIDE 62

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Converse ML rule with side information: max

  • M(# edges not crossing communities) + c x · y
  • .

Bound error probability of ML by error probability of scalar (node-wise) estimation error Lemma Denote by A and B the two communities and define F {Maximum Likelihood fails} FA {∃i ∈ A : E[i, B] ≥ E[i, A] + cyi M + 1} FB {∃j ∈ B : E[j, A] ≥ E[j, B] − cyi M + 1} Then, FA ∩ FB ⇒ F.

Aria Nosratinia Community Detection with Side Information 26/ 40

slide-63
SLIDE 63

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Converse Theorem ML fails in recovering the communities with probability bounded away from zero if:      (√a − √ b)2 < 2, when c = o(log(n)) η(a, b, β) < 2, when c = β log(n), 0 < β ≤ M(a−b)

2

β < 1, when c = β log(n), β ≥ M(a−b)

2

Aria Nosratinia Community Detection with Side Information 27/ 40

slide-64
SLIDE 64

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Partially Revealed Labels

e

e

  • 1

X

Y

1 1

  • 1
  • 1

e

e

e

  • 1

Theorem With binary erasure side information, exact recovery is possible if and only if:

  • (√a −

√ b)2 > 2, log(ǫ) = o(log(n)) (√a − √ b)2 + 2β > 2, log(ǫ) = −β log(n) + o(log(n)), β > 0

Aria Nosratinia Community Detection with Side Information 28/ 40

slide-65
SLIDE 65

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Partially Revealed Labels

1 2 3 4 5 −3 −2.5 −2 −1.5 −1 −0.5 0.5 1

Quality of Side information (β) Error exponent 1 − (0.5(√a − √ b)2 + β)

Recovery region Critical β

Colors from right to left indicate more informative graph.

Aria Nosratinia Community Detection with Side Information 29/ 40

slide-66
SLIDE 66

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Partially Revealed Labels Achievability: Use the same weak recovery as BSC, then the following local modification:

if weak recovery contradicts side information, change it. if side information is erasure, count edges. If results in unequal number of labels, abort.

Converse: FA {∃i ∈ A : E[i, B] ≥ E[i, A] + 1 and yi = 0} FB {∃j ∈ B : E[j, A] ≥ E[j, B] + 1 and yj = 0}

Aria Nosratinia Community Detection with Side Information 30/ 40

slide-67
SLIDE 67

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

General Side Information Motivation: general finite-cardinality alphabet for side information For each node, K side information observations (features). Features are mutually independent conditioned on labels. For a given node, define: f1(n) LLR of all side information f2(n) Log Likelihood of all side information given x = +1 f3(n) Log Likelihood of all side information given x = −1 Asymptotic behavior of f1, f2, f3 determines the role of side information for exact recovery.

Aria Nosratinia Community Detection with Side Information 31/ 40

slide-68
SLIDE 68

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Fixed Number of Features, Variable Quality K fixed, side information LLR varies with size of graph n. How informative is the side information outcome? (LLR)

Informative: f1 = O(log(n)) ≈ graph information Non-Informative: f1 = o(log(n))

How probable is the side information outcome? (likelihoods)

rare: f2 or f3 = o(log(n)) Not rare: f2 or f3 = O(log(n))

Aria Nosratinia Community Detection with Side Information 32/ 40

slide-69
SLIDE 69

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Intuition for K = 1 Combinations:

Worst: Non-Informative + not rare for both communities: Side information will not help. Others (need conditions for them to help, next theorem):

1

Informative + not rare for only one community.

2

Informative + rare for both communities.

3

Non-Informative + rare for both communities.

These combinations are tested for all sequences of outcomes

  • f side information as size of graph grows.

Aria Nosratinia Community Detection with Side Information 33/ 40

slide-70
SLIDE 70

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Main Result Define: ξi = lim

n→∞

fi log n Theorem Exact Recovery is possible if and only if, for any sequence of side information such that:

ξ1 = ξ2 = ξ3 = 0, then (√a − √ b)2 > 2 must hold. ξ1 = 0 and ξ2 = ξ3 = −β < 0 then (√a − √ b)2 + 2β > 2 must hold. ξ1 = β < M (a−b)

2

and ξ2 = 0, then η(a, b, β) > 2 must hold. ξ1 = β < M (a−b)

2

and ξ2 = β′, η(a, b, β) + 2β′ > 2 must hold.

Aria Nosratinia Community Detection with Side Information 34/ 40

slide-71
SLIDE 71

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Varying Number of Fixed-Quality Features K is varying with n while likelihoods of side information are fixed. Theorem Assume that all features are i.i.d. conditioned on the labels. Then, if K = o(log(n)), (√a − √ b)2 > 2 is necessary and sufficient for exact recovery. This suggests that:

1

Even if not i.i.d, the theorem should hold.

2

If K = O(log(n)), side information will help under certain conditions (Asadi et al.)

Aria Nosratinia Community Detection with Side Information 35/ 40

slide-72
SLIDE 72

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Belief Propagation in Single-Community Detection For graph alone, the following SNR parameter is critical: λK 2(p − q)2 (n − K)q Hajek et al. demonstrated weak recovery if λ > 1

e ...

.... and exact recovery is possible with a local voting procedure. We found that side information acts as “SNR amplifier” With side information, weak recovery is possible if λ >

1 Λe .

Aria Nosratinia Community Detection with Side Information 36/ 40

slide-73
SLIDE 73

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Belief Propagation in Single-Community Detection For graph alone, the following SNR parameter is critical: λK 2(p − q)2 (n − K)q Hajek et al. demonstrated weak recovery if λ > 1

e ...

.... and exact recovery is possible with a local voting procedure. We found that side information acts as “SNR amplifier” With side information, weak recovery is possible if λ >

1 Λe .

Aria Nosratinia Community Detection with Side Information 36/ 40

slide-74
SLIDE 74

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Belief Propagation in Single-Community Detection For graph alone, the following SNR parameter is critical: λK 2(p − q)2 (n − K)q Hajek et al. demonstrated weak recovery if λ > 1

e ...

.... and exact recovery is possible with a local voting procedure. We found that side information acts as “SNR amplifier” With side information, weak recovery is possible if λ >

1 Λe .

Aria Nosratinia Community Detection with Side Information 36/ 40

slide-75
SLIDE 75

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Belief Propagation in Single-Community Detection For graph alone, the following SNR parameter is critical: λK 2(p − q)2 (n − K)q Hajek et al. demonstrated weak recovery if λ > 1

e ...

.... and exact recovery is possible with a local voting procedure. We found that side information acts as “SNR amplifier” With side information, weak recovery is possible if λ >

1 Λe .

Aria Nosratinia Community Detection with Side Information 36/ 40

slide-76
SLIDE 76

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Belief Propagation in Single-Community Detection For graph alone, the following SNR parameter is critical: λK 2(p − q)2 (n − K)q Hajek et al. demonstrated weak recovery if λ > 1

e ...

.... and exact recovery is possible with a local voting procedure. We found that side information acts as “SNR amplifier” With side information, weak recovery is possible if λ >

1 Λe .

Aria Nosratinia Community Detection with Side Information 36/ 40

slide-77
SLIDE 77

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Belief Propagation Case Study

(1) All is well w/o side info (2) BP exact recovery w S.I., not w/o (3) BP weak recovery, not strong (4) BP weak recovery w SI, not w/o (5) Exact recovery, but BP fails (6) Weak recovery, but BP fails anyway

200 400 600 800 1000

b (determines cross community edges)

0.02 0.04 0.06 0.08 0.1 0.12

c (determines community size)

BP weak recovery limit without side infomation BP weak recovery limit with = 0.3 Information theoretic limit for exact recovery

1 3 5 6 2 4

Aria Nosratinia Community Detection with Side Information 37/ 40

slide-78
SLIDE 78

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Conclusion For binary symmetric stochastic block model with average degree ≈ log(n):

If side information quality improves with o(log(n)), exact recovery threshold remains unchanged. If side information quality improves with O(log(n)), exact recovery threshold is changed. The weakest condition (outcome) dominates.

Belief propagation was studied in the single-community context Case study showed the importance of side information

Aria Nosratinia Community Detection with Side Information 38/ 40

slide-79
SLIDE 79

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Conclusion For binary symmetric stochastic block model with average degree ≈ log(n):

If side information quality improves with o(log(n)), exact recovery threshold remains unchanged. If side information quality improves with O(log(n)), exact recovery threshold is changed. The weakest condition (outcome) dominates.

Belief propagation was studied in the single-community context Case study showed the importance of side information

Aria Nosratinia Community Detection with Side Information 38/ 40

slide-80
SLIDE 80

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Conclusion For binary symmetric stochastic block model with average degree ≈ log(n):

If side information quality improves with o(log(n)), exact recovery threshold remains unchanged. If side information quality improves with O(log(n)), exact recovery threshold is changed. The weakest condition (outcome) dominates.

Belief propagation was studied in the single-community context Case study showed the importance of side information

Aria Nosratinia Community Detection with Side Information 38/ 40

slide-81
SLIDE 81

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Conclusion For binary symmetric stochastic block model with average degree ≈ log(n):

If side information quality improves with o(log(n)), exact recovery threshold remains unchanged. If side information quality improves with O(log(n)), exact recovery threshold is changed. The weakest condition (outcome) dominates.

Belief propagation was studied in the single-community context Case study showed the importance of side information

Aria Nosratinia Community Detection with Side Information 38/ 40

slide-82
SLIDE 82

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Conclusion For binary symmetric stochastic block model with average degree ≈ log(n):

If side information quality improves with o(log(n)), exact recovery threshold remains unchanged. If side information quality improves with O(log(n)), exact recovery threshold is changed. The weakest condition (outcome) dominates.

Belief propagation was studied in the single-community context Case study showed the importance of side information

Aria Nosratinia Community Detection with Side Information 38/ 40

slide-83
SLIDE 83

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Conclusion For binary symmetric stochastic block model with average degree ≈ log(n):

If side information quality improves with o(log(n)), exact recovery threshold remains unchanged. If side information quality improves with O(log(n)), exact recovery threshold is changed. The weakest condition (outcome) dominates.

Belief propagation was studied in the single-community context Case study showed the importance of side information

Aria Nosratinia Community Detection with Side Information 38/ 40

slide-84
SLIDE 84

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

Acknowledgments My PhD students Hussein Saad and Mohammad Esmaeili Support from UT-Dallas endowment

Aria Nosratinia Community Detection with Side Information 39/ 40

slide-85
SLIDE 85

Motivation & Background Graph Models Side Information Belief Propagation Conclusion

THANK YOU!

Aria Nosratinia Community Detection with Side Information 40/ 40