How to Network in Online Social Network Giovanni Neglia, Xiuhui Ye - - PowerPoint PPT Presentation

how to network in online social network
SMART_READER_LITE
LIVE PREVIEW

How to Network in Online Social Network Giovanni Neglia, Xiuhui Ye - - PowerPoint PPT Presentation

How to Network in Online Social Network Giovanni Neglia, Xiuhui Ye (Politecnico di Torino), Maksym Gabielkov, Arnaud Legout (Inria) Maestro Team 16 January 2014 Outline 1. Influence maximization problem (Kempe, Kleinberg and Tards in 2003)


slide-1
SLIDE 1

How to Network in Online Social Network

Giovanni Neglia, Xiuhui Ye (Politecnico di Torino), Maksym Gabielkov, Arnaud Legout (Inria)

Maestro Team 16 January 2014

slide-2
SLIDE 2

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 2

Outline

  • 1. Influence maximization problem (Kempe, Kleinberg and

Tardös in 2003)

  • 2. How the problem changes for a user in an online social

network

  • 3. Simulation results on Twitter’s complete graph (2012)
slide-3
SLIDE 3

Influence propagation

Recruited node Influenced node p p p p p

16 January 2014 - 3

t=0

  • G. Neglia – How to Network in Online Social Networks
slide-4
SLIDE 4

Influence propagation

Recruited node Influenced node p p p p p

16 January 2014 - 4

t=1

  • G. Neglia – How to Network in Online Social Networks
slide-5
SLIDE 5

Influence propagation

Recruited node Influenced node p p p p p

16 January 2014 - 5

t=2

  • G. Neglia – How to Network in Online Social Networks
slide-6
SLIDE 6

Influence maximization

Recruit a set A of K nodes to maximize the expected number of influenced nodes (σ(A)=E[|φ(A)|])

Recruited node Influenced node

16 January 2014 - 6

  • G. Neglia – How to Network in Online Social Networks
slide-7
SLIDE 7

Kempe et al 2003

  • 1. Decreasing cascade model:

q pv(u,S) = prob. that u can influence v, given that nodes in S have already tried to influence v q pv(u,S) ≥ pv(u,T) if

16 January 2014 - 7

v u

t=2

  • G. Neglia – How to Network in Online Social Networks

S ⊂ T

slide-8
SLIDE 8

Kempe et al 2003

  • 2. Linear Threshold Model

q Node v has a threshold θv sampled from a uniform random variable in [0,1] and link (i,j) has a weight bij q Node v is influenced if Σ biv 1(i is influenced) > θn

v s u

16 January 2014 - 8

t=2

  • G. Neglia – How to Network in Online Social Networks

buv bsv

slide-9
SLIDE 9

Kempe et al 2003

  • 2. General Threshold Model

q Node v has a threshold θv sampled from a uniform random variable in [0,1] q Node v has a monotone activation function fv:2V->[0,1] and is influenced at t if fv(S) > θv, where S is the set of influenced nodes at t

16 January 2014 - 9

  • G. Neglia – How to Network in Online Social Networks
slide-10
SLIDE 10

Kempe et al 2003

Their results:

I. Decreasing cascade model & General threshold model are equivalent

q For each {pv(u,S)}, it is possible to find {fv(S)} such that the probability distribution of φ(A) is the same

16 January 2014 - 10

  • G. Neglia – How to Network in Online Social Networks
slide-11
SLIDE 11

Kempe et al 2003

Their results:

I. Decreasing cascade model & General threshold model are equivalent

q For each {pv(u,S)}, it is possible to find {wij} such that the probability distribution of φ(A) is the same

  • II. The greedy algorithm achieves a (1-1/e)

approximation ratio

q This follows from a general result proven by Nemhauser, Wolsey, Fisher in '78 for non-negative, monotone, submodular functions

16 January 2014 - 11

  • G. Neglia – How to Network in Online Social Networks
slide-12
SLIDE 12

Monotonicity of σ(A)

q σ(A1)≤σ(A2) if

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 12

A

1 ⊂ A2

slide-13
SLIDE 13

Submodularity of σ(A)

q σ(A1U {v}) - σ(A1) ≥ σ(A2U {v}) - σ(A2) if

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 13

v

A

1 ⊂ A2

v

slide-14
SLIDE 14

The greedy algorithm

1: start with A={} 2: for i =1 to K 3: let vi be the node maximizing the marginal gain σ(A U {v}) - σ(A) 4: set A:=A U {vi} Question: how to calculate σ(A U {v}) - σ(A)?

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 14
slide-15
SLIDE 15

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 15
  • 2. How the problem changes in OSN

v u Tweeting node Retweeting node p p p p p

v follows u’s tweet v is a follower of u u is a following of v

slide-16
SLIDE 16

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 16
  • 2. How the problem changes in OSN

Assumption: a user can only influence people through Twitter itself

v u Tweeting node Retweeting node p p p p p

v follows u’s tweet v is a follower of u u is a following of v

slide-17
SLIDE 17

The user can only select its followers (up to K=2000)…

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 17
  • 2. How the problem changes in OSN

v u p p p p p

v follows u’s tweet v is a follower of u u is a following of v

slide-18
SLIDE 18

The user can only select its followers (up to K=2000)… And hope that they follow back

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 18
  • 2. How the problem changes in OSN

v u p p p p p

v follows u’s tweet v is a follower of u u is a following of v

slide-19
SLIDE 19

Our problem

Let the reciprocation probability rv be known How should the user select the set of followers A in

  • rder to maximize σ(A)=E[|φ(A)|]? (all the choices at t=0)

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 19

v u p p p p p

slide-20
SLIDE 20

Map the new problem to the old one

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 20

u2 u3 u1 p p p u4 p Select K followers u2 u3 u1 p p p u4 p

slide-21
SLIDE 21

Map the new problem to the old one

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 21

u2 u3 u1 p p p u4 p Select K followers u2 u3 u1 p p p u4 p u'1 u’2 u'3 u'4 p r p r p r p r Recruit K nodes in V’ equivalent to

slide-22
SLIDE 22

Map the new problem to the old one

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 22

Select K followers u2 u3 u1 p p p u4 p u2 u3 u1 p p p u4 p u'1 u’2 u'3 u'4 p r p r p r p r Recruit K nodes in V’ equivalent to

Greedy algorithm has the same approximation ratio

slide-23
SLIDE 23

A 2nd twist: dynamic policies

q Following users is not expensive q Idea: replace non-reciprocating users q How to operate:

  • follow one user
  • if the user does not reciprocate by T
  • unfollow it and follow someone else

q It is now possible to follow over time more than K users, but only K at a given time instant

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 23
slide-24
SLIDE 24

An ideal policy

q Imagine to know who is going to reciprocate by T q The greedy algorithm with such knowledge would achieve an (1-1/e) approximation ratio

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 24

u2 u3 u1 p p p u4 p u2 u3 u1 p p p u4 p u'3 u'4 p p

slide-25
SLIDE 25

A practical greedy policy

1: start with A={}, D={} i=0 2: while i ≤ K 3: let vi be the node in V-D maximizing the marginal gain σ(A U {v}) - σ(A), given that it reciprocates 5: follow vi 6: if vi reciprocates by T: 7: A:=A U {vi}, i=i+1 5: else: 6: D:=D U {vi}

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 25
slide-26
SLIDE 26

A practical greedy policy

1: start with A={}, D={} i=0 2: while i ≤ K 3: let vi be the node in V-D maximizing the marginal gain σ(A U {v}) - σ(A), given that it reciprocates 5: follow vi 6: if vi reciprocates by T: 7: A:=A U {vi}, i=i+1 5: else: 6: D:=D U {vi} practical greedy = ideal greedy

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 26
slide-27
SLIDE 27

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 27

#Readers vs #Retwitters (3rd twist)

v u Retweeting (and reading) node p p p p p Reading (non- retweeting) node

What if we consider as performance metric #readers?

slide-28
SLIDE 28

Map the new problem to the old one

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 28

u2 u3 u1 p p p u4 p w(u''2)=1 u2 u3 u1 p p p u4 p u''1 u''2 u''3 u''4 1 1 1 1 1 1 1 1 w(u''1)=1 w(u''4)=1 w(u''3)=1 w(u1)=0 w(u4)=0 w(u3)=0 w(u2)=0

Select K nodes to maximize E[Σ w(ui) 1(ui is active)]

slide-29
SLIDE 29

An ideal policy

q Is E[Σ w(ui) 1(ui is active)] submodular?

  • Yes it is (need to go carefully through the steps of

Kempe et al)

q then greedy is a (1-1/e) approximation algorithm

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 29
slide-30
SLIDE 30

Wrap up

q The point of view of a user in an OSN introduces new twists, but does not change fundamentally the problem

  • In particular the greedy algorithm guarantees a

(1-1/e) approximation ratio

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 30
slide-31
SLIDE 31

Wrap up

q The point of view of a user in an OSN introduces new twists, but does not change fundamentally the problem

  • In particular the greedy algorithm guarantees a

(1-1/e) approximation ratio

q Limits:

  • need to know the whole topology, pv(u,S), rv
  • How to calculate the marginal gain? Montecarlo

simulations…

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 31
slide-32
SLIDE 32

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 32

Outline

  • 1. Influence maximization problem (Kempe, Kleinberg

and Tardös in 2003)

  • 2. How the problem changes for a user in an online

social network

  • 3. Simulation results on Twitter’s complete graph (2012)
slide-33
SLIDE 33

Know your enemy q Crawl of the whole Twitter in June 2012 q 500 million of nodes q 23 billion of arcs q 417GB as an edgelist

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 33
slide-34
SLIDE 34

Montecarlo simulations q Naive implementation

  • O(NKS) simulations,
  • where S is #simulations to achieve the

required confidence

  • ≈100GB to store the graph in RAM

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 34
slide-35
SLIDE 35

Trade RAM for Storage

  • Influenced node of a cascade = reachable

nodes in the pruned graph

  • Need to store S * p * 417GB
  • RAM still a problem for p≥1%

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 35

u2 u3 u1 p p p u4 p

u2 u3 u1 u4 u2 u3 u1 u4 u2 u3 u1 u4 u2 u3 u1 u4

prune

slide-36
SLIDE 36

Useful preprocessing

  • Reachability can also be calculated on the

SCCs’ graph

  • For larger p we save memory, storage and

computation

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 36

u2 u3 u1 u4 u5 S1 S2

Pruned graph SCCs’ graph u1, u2, u3 u4, u5 Calculation of the Strongly Connected Components

slide-37
SLIDE 37

How many samples? q We tried to estimate it analytically

  • Random configuration model
  • Subcritical branching process for small p
  • All-or-nothing supercritical branching process

for large p

q S≤100 for all the values of p

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 37
slide-38
SLIDE 38

Different algorithms

  • 1. Greedy
  • Know topology, probabilities
  • 2. Highest degree
  • Know nodes’ degrees
  • 3. Random
  • Know nodes’ ids

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 38
slide-39
SLIDE 39

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 39

1 20 40 60 80 100 120 140 160 180 200 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10

6

initial set size #retweets p = 0.001 greedy high−degree random 1 20 40 60 80 100 120 140 160 180 200 2 4 6 8 10 12x 10

4

initial set size #retweets p = 0.0001 greedy high−degree random

<d>p≈4*10-3 <d>p≈4*10-2

slide-40
SLIDE 40

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 40

1 20 40 60 80 100 120 140 160 180 200 0.5 1 1.5 2 2.5x 10

8

initial set size #retweets p = 0.1 greedy high−degree random 1 20 40 60 80 100 120 140 160 180 200 1 2 3 4 5 6 7x 10

7

initial set size #retweets p = 0.01 greedy high−degree random

<d>p≈0.4 <d>p≈4

High variability

slide-41
SLIDE 41

The effect of reciprocity

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 41

1 10 20 30 40 50 60 70 80 90 100 110 1 2 3 4 5 6 7x 10

7

initial set size #retweets p = 0.01 greedy high−degree random 1 20 40 60 80 100 6.145 6.15 6.155x 10

7

r = min # followings # followers +100,1 ! " # $ % &

slide-42
SLIDE 42

#Readers vs #Retwitters

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 42

1 5 10 15 20 25 30 35 40 0.5 1 1.5 2 2.5 3 3.5 x 10

8

initial set size #retweets or #readers p = 0.01 greedy retweets high−degree retweets random retweets greedy readers high−degree readers random readers

slide-43
SLIDE 43

Take Home Lesson q For sparse graphs, highest degree (1-hop ahead) works as well as greedy q For dense graphs, any strategy, even random, works as well as greedy q Only in the middle, greedy can outperform highest degree…

  • Remarks in Habiba and Berger-Wolf, 2011

q … but we do not observe it

16 January 2014

  • G. Neglia – How to Network in Online Social Networks
  • 43
slide-44
SLIDE 44

Thank you! Questions?