Optimization of Network Topology Elias Boutros Khalil, Bistra - - PowerPoint PPT Presentation

optimization of network topology
SMART_READER_LITE
LIVE PREVIEW

Optimization of Network Topology Elias Boutros Khalil, Bistra - - PowerPoint PPT Presentation

Scalable Diffusion-Aware Optimization of Network Topology Elias Boutros Khalil, Bistra Dilkina, Le Song Georgia Institute of Technology Problem Given G(V,E), a set of source nodes X (infected nodes) Linear Threshold Model


slide-1
SLIDE 1

Scalable Diffusion-Aware Optimization of Network Topology

Elias Boutros Khalil, Bistra Dilkina, Le Song Georgia Institute of Technology

slide-2
SLIDE 2

Problem

  • Given
  • G(V,E),
  • a set of source nodes X (infected nodes)
  • Linear Threshold Model
  • Find a set of k edges to
  • remove, s.t., the spread of a certain

substance is minimized

  • add, s.t., the spread of a certain substance

is maximized

2

slide-3
SLIDE 3

Review: Diffusion Models

  • Linear Threshold Model
  • Each edge has a weight Wuv
  • each node u chooses a threshold uniformly

at random in [0,1]

  • Node v will be infected if
  • Independent Cascade Model
  • Each edge has a propagation probability

Puv

  • Each infected node u has only one chance

to infect its neighbor v with prob. Puv

3

slide-4
SLIDE 4

Review: Influence Maximization

  • Given
  • G(V,E)
  • LT model or IC model
  • To find k nodes to activate to maximize

the spread of a certain substance

  • Greedy algorithm
  • Objective function is submodular
  • (1-1/e)-appriximation

4

slide-5
SLIDE 5

Edge Deletion Problem

  • Given G, source set A,
  • Find k edges
  • Supermodular
  • Greedy algorithm provides (1-1/e)-

approximation

  • Scaling up tricks

5

slide-6
SLIDE 6

Edge Addition Problem

  • Given G, source set A,
  • Find k edges
  • Still supermodular (Equivalent to

constrained submodular minimization)

  • Algorithm: max. the lowerbound

6

slide-7
SLIDE 7

Edge Addition Problem

  • Marginal Gain is bounded
  • Apply an approach for constrained submodular

minimization with approximation guarantees

  • R. Iyer, S. Jegelka, and J. Bilmes. Fast semidifferential based

submodular function optimization. In ICML, 2013.

7

slide-8
SLIDE 8

Experiments

  • Datasets
  • Syntetic dataset: generated by Kronecker

graph model

  • (1) CorePeriphery, (2) ErdosRenyi and (3)

Hierarchical

  • Real datasets:

8

slide-9
SLIDE 9

Experiments

  • Competing heuristics
  • Random
  • Weights: highest weights
  • Betweenness
  • Eigen: k edges to max the leading

eigendrop

  • Degree: k edges whose destination nodes

have the highest out-degrees [8]

9

slide-10
SLIDE 10

Experiments

Edge deletion Edge addition

10

slide-11
SLIDE 11

Core Decomposition of Uncertain Graphs

Francesco Bonchi, Francesco Gullo, Andreas Kaltenbrunner, Yana Volkovich Yahoo Labs, Spain

slide-12
SLIDE 12

Core decomposition

  • k-core of a graph
  • a maximal subgraph in which every vertex

is connected to at least k other vertices within that subgraph

  • Core decomposition
  • The set of all k-cores of a graph G forms

the core decomposition of G

12

slide-13
SLIDE 13

K-core under uncertain graphs

  • A maximal subgraph whose vertices have at

least k neigbours in that subgraph with probability no less than η

13

slide-14
SLIDE 14

Example

14

slide-15
SLIDE 15

Motivation

  • core decomposition can be computed

efficiently in deterministic graphs

  • computed in linear time
  • However, does not guarantee efficiency

in uncertain graphs

  • even the simplest graph operations may

become computationally intensive.

  • uncertain graph
  • edges are assigned a probability of existence
  • E.g.:, protein-interaction, the influence of one

person on another

15

slide-16
SLIDE 16

Applications

  • Influence maximization
  • Idea: just reduce the input graph G by keeping only

the inner-most η-shells

  • the higher the core index is, the more likely the

vertex is an influential spreader [17]

  • Task-driven team formation
  • Node: individuals; edge: a probabilistic topic model
  • Given a pair <T,Q> where T is the set of terms, Q is

a set of nodes

  • Goal: Find a node of nodes A where Q⊆A, which a

good team to perform the task in T

  • Solution: find a connected component of (k,η)-core

which contains A

16

slide-17
SLIDE 17

Algorithm framework

17

Follow the deterministic case the maximum degree such that the probability for v to have that degree is no less than η Non-trivial to compute

slide-18
SLIDE 18

Experiments

18

Influence Maximization Task-driven Team-formation

slide-19
SLIDE 19

Fast Influence-based Coarsening for Large Networks

KDD, New York City August 26, 2014

Manish Purohit^, B. Aditya Prakash*, Chanhyun Kang^, Yao Zhang*, V S Subrahmanian^ *Virginia Tech ^University of Maryland

slide-20
SLIDE 20

Networks are getting huge!

20

Flickr (friendship network): 87 million users and 8 billion photos until 2013 Amazon (friendship network): 237 million accounts until 2013 Twitter (follower network): 271 million monthly active users Facebook (friendship network): 829 million daily active users on average in June 2014

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-21
SLIDE 21

Need for fast analysis

  • Ever growing list of applications of

network effects

  • Viral Marketing
  • Immunization
  • Information Diffusion

21

However, scaling up traditional algorithms up to millions of nodes is hard

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-22
SLIDE 22

How to handle large-scale networks

  • Approaches
  • Use faster / simpler algorithms
  • Perform analysis locally
  • i.e., divide the large network into

smaller subgraphs

  • Zoom-out the network to
  • btain a smaller

representation of the network

22

this paper

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-23
SLIDE 23

Bird’s eye view of a network

23

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-24
SLIDE 24

Bird’s eye view of a network

  • “Zoom-out” of the graph to get a quick

picture

24

Called “coarsen” in this paper

Big graph Zoom-out A F E D C B Small representation

  • f the network

A C B E F D

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-25
SLIDE 25

Outline

  • Motivation
  • Challenges
  • Problem Definition
  • Our Proposed Method
  • Experiments
  • Applications
  • Conclusion

25

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-26
SLIDE 26

Challenges

  • C1: How do we maintain diffusive

characteristics when coarsening networks?

  • C2: How do we merge node to get the

coarse network?

  • C3: how do we find the best node to

merge fast?

26

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-27
SLIDE 27

C1: Information Diffusion

  • Cascading behavior in networks

27

Diffusion is graph induced by a time ordered propagation of information (edges)

Blogs Posts Links Information cascade Source: [McGlohon et. al., SDM2007] B1 B2 B4 B3 1 1 2 3 1 Blog network

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-28
SLIDE 28

C1: Model information diffusion

  • Information spreads over networks
  • e.g.:, rumor/meme spreads over Twitter following

network

  • Independent cascade model (IC) [Kempe+, KDD03]
  • Weights pij: propagation prob. from i to j
  • Each node has only one chance to infect its

neighbors

28

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

Meme spreading

slide-29
SLIDE 29

C1: Diffusive characteristics

  • First eigenvalue λ1 (of adjacency matrix)

is enough for most diffusion models. (Prakash et al. [ICDM’12])

29

λ1 is the epidemic threshold “Safe”

“Vulnerable” “Deadly”

Increasing λ1 , Increasing vulnerability

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-30
SLIDE 30

C1: maintain diffusive characteristics

  • Goal: maintain the diffusive characteristics of

the original network in the coarsened network?

30

Original network coarsen A F E D C B Coarsened network A C B E F D

Make the coarsened network has the least change in the first eigenvalue

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-31
SLIDE 31

C2: How to merge nodes

  • Goal: Merge nodes of graph G to get the

coarsened graph that “approximates” G with respect to diffusion

31

Merge b and a can get the least change

  • f λ1

Is this correct?

0.375!

Original network

Influence from d to b: 0.5 Influence from d to a: 0.25 Average: 0.375

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-32
SLIDE 32
  • In general:

32

C2: How to merge nodes

Merging a,b

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

Details

slide-33
SLIDE 33

C3: which nodes to merge

  • Goal:
  • Find the best nodes to merge
  • Fast, scalable to large network

33

Talk about it later

Original network coarsen A F E D C B Coarsened network A C B E F D

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-34
SLIDE 34

Outline

  • Motivation
  • Challenges
  • Problem Definition
  • Our Proposed Method
  • Experiments
  • Applications
  • Conclusion

34

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-35
SLIDE 35

Problem Definition

Graph Coarsening Problem (GCP) Given: large graph G(V, E), and reduction factor α Find: the best set of edges to merge Such that: |λG - λH| is minimized

  • (i.e. H is the coarsened graph with the

least change in the first eigenvalue)

35

slide-36
SLIDE 36

Naive Greedy Heuristic

Step:

  • Score every edge by the change in eigenvalue
  • Greedily choose the edge (a,b) with the least score,

and merge (a,b)

  • Re-evaluate the scores of every edge and repeat

36

  • Too slow! O(m2) time to score all edges
  • Lose time benefits of analyzing the smaller

graph

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-37
SLIDE 37

Outline

  • Motivation
  • Problem Definition
  • Challenges
  • Our Proposed Method
  • CoarseNet
  • Experiments
  • Applications
  • Conclusion

37

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-38
SLIDE 38

CoarseNet: idea

  • Can we approximate the edge scores faster?
  • Yes!
  • Use matrix perturbation arguments to

estimate (up to first order terms) the score of an edge in constant time!

  • Score all edges in O(m) time
  • Naive Heuristic: O(m2) time

38

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-39
SLIDE 39

CoarseNet: details

  • Corollary 5.1: Given the first eigenvalue λ,

and corresponding eigenvectors u, v, the score of a node pair score(a, b) can be approximated in constant time.

39

(a,b) is a node- pair

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

We want to characterize the change of λ after coarsening

a b f g e Coarsen merge (a,b) c f g e

slide-40
SLIDE 40

the out-adjacency vector of merged node c

CoarseNet

40

See paper for details

A u = λ . u

u(i)

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

left eigenvector right eigenvector weight of (b,a) weight of (a,b)

Details

slide-41
SLIDE 41

CoarseNet: Complete algorithm

  • Step

1: compute scores for all edge pairs 2: Merge nodes with smallest score

  • 3. Goto step 1 until αn nodes left

41

Original Network (weight=0.5) Assigning scores Merging edges Coarsened Network

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-42
SLIDE 42

CoarseNet: running time

42

  • Running time: O(mln(m)+αnnθ)
  • m: number of edges
  • n: number of nodes
  • nθ : the maximum degree of any vertex during the

merging process

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-43
SLIDE 43

Outline

  • Motivation
  • Challenges
  • Problem Definition
  • Our Proposed Method
  • Experiments
  • Applications
  • Conclusion

43

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-44
SLIDE 44

How do we perform?

44

The first eigenvalue gets preserved well up to large coarsening factors!

Amazon (See more results in the paper) DBLP

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-45
SLIDE 45

Scalability w.r.t Reduction Factor (α)

45

Scales linearly with the desired reduction factor

Amazon (334,863 vertices) DBLP (511,163 vertices) (See more results in the paper)

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-46
SLIDE 46

Scalability w.r.t Graph Size (𝑜)

46

Flickr

Scales linearly with the number of nodes

We extracted 6 connected components (with 500K to 1M vertices in steps of 100K) of the Flickr network

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-47
SLIDE 47

Outline

  • Motivation
  • Challenges
  • Problem Definition
  • Our Proposed Method
  • Experiments
  • Applications
  • Conclusion

47

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-48
SLIDE 48
  • How to market well?
  • Convince a subset of individuals to adopt a new

product

  • Then, trigger a large cascade of further adoptions
  • Influence maximization problem
  • [Kempe et. al, KDD03]
  • Find the best set of seeds in a network to achieve

highest diffusion

48

Application 1: Influence Maximization

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

Who is the most influential person? Influence

slide-49
SLIDE 49

Application 1: Influence Maximization

  • Our fast algorithm CSPIN:

Step 1: Coarsen the large social network using CoarsenNet Step 2: Solve influence maximization on the coarsened network Step 3: Randomly select one node from each selected “supernode”

49

Step 1: Coarsen A C B E F D Step 2: Solve influence maximization A C B E F D Step 3: Randomly select one node from C

We call it CSPIN

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-50
SLIDE 50

Quality of CSPIN

  • We use and compare against the fast and

popular PMIA algorithm (Chen et al. [KDD’07])

50

We obtain influence spread as good as by PMIA

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-51
SLIDE 51

Quality of CSPIN w.r.t 𝛽

51

We can merge up to 95% of the vertices are merged without significantly affecting the influence spread!

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-52
SLIDE 52

Scalability w.r.t number of seeds

52

Log scale

Finds good solutions in minutes instead of hours!

Portland (1.5 million vertices) (See more results in the paper)

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-53
SLIDE 53

Application 2: Diffusion Characterization

  • Goal: use Graph Coarsening to understand

information cascades

  • Dataset: Flixster
  • a fridendship network with movie ratings
  • Cascade: the same movie rating from friends
  • Methodology
  • coarsen the network using CoarseNet with the

reduction factor α=0.5

  • study the formed groups (supernodes)

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

53

slide-54
SLIDE 54

Diffusion observation

Observation 1: a very large fraction of movies propagate in a small number of groups Observation 2: a multi-modal distribution

Stats:

  • 1891 groups
  • mean group size: 16.6
  • the largest group: 22061

nodes (roughly 40% of nodes)

(See more results in the paper)

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

54

Can get non-network surrogates for super-nodes

slide-55
SLIDE 55

Outline

  • Motivation
  • Challenges
  • Problem Definition
  • Our Proposed Method
  • Experiments
  • Applications
  • Conclusion

55

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-56
SLIDE 56

Conclusion

Graph Coarsening Problem

  • Given: a large graph and

the reduction factor

  • Find: "best" nodes to

coarsen CoarseNet

  • estimate edge score in

constant time

  • Sub-quadratic

Applications

  • Influence Maximization
  • Diffusion Characterization

56

Original network

coarsen

A F E D C B Coarsened network A C B E F D

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

slide-57
SLIDE 57

Any Questions?

  • Code at:

http://www.cs.vt.edu/~badityap/

Funding:

57

Original network

coarsen

A F E D C B Coarsened network A C B E F D