http://cs224w.stanford.edu We are more influenced by our friends - - PowerPoint PPT Presentation
http://cs224w.stanford.edu We are more influenced by our friends - - PowerPoint PPT Presentation
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of consumers consult friends and family before purchasing home electronics 50% do
¡ We are more influenced by our friends
than strangers
2
¨ 68% of consumers consult
friends and family before purchasing home electronics
¨50% do research online
before purchasing electronics
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3
Identify influential customers These customers endorse the product among their friends Convince them to adopt the product – Offer discount or free samples
¡ Information epidemics:
§ Which are the influential users? § Which news sites create big cascades? § Where should we advertise?
4 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
vs. Which node shall we target?
¡ Independent Cascade Model
§ Directed finite 𝑯 = (𝑾, 𝑭) § Set 𝑻 starts out with new behavior
§ Say nodes with this behavior are “active”
§ Each edge (𝒘, 𝒙) has a probability 𝒒𝒘𝒙 § If node 𝒘 is active, it gets one chance to make 𝒙 active, with probability 𝒒𝒘𝒙
§ Each edge fires at most once
¡ Does scheduling matter? No
§ If 𝒗, 𝒘 are both active at the same time, it doesn’t matter which tries to activate 𝒙 first
§ But the time moves in discrete steps
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5
¡ Initially some nodes S are active ¡ Each edge (𝒘, 𝒙) has probability (weight) 𝒒𝒘𝒙 ¡ When node v becomes active:
§ It activates each out-neighbor 𝒙 with prob. 𝒒𝒘𝒙
¡ Activations spread through the network
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6
0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.2
e g f c b a d h i f g e
Problem: (k is a user-specified parameter)
¡ Most influential set of
size k: set S of k nodes producing largest expected cascade size f(S) if activated [Domingos-
Richardson ‘01]
¡ Optimization problem:
10/23/17 7 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
) ( max
k size
- f
S
S f
0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.2 c b e a d g f h i Influence set Xd of d Influence set Xa of a
Why “expected cascade size”? Xa is a result of a random process. So in practice we would want to compute Xa for many random realizations and then maximize the “average” value f(S). For now let’s ignore this nuisance and simply assume that each node u influences a set of nodes Xu
Random realizations i
𝑔 𝑇 = 1 |𝐽| 2 𝑔
3(𝑇)
¡ S: is initial active set ¡ f(S): The expected size of final active set
§ f(S) is the size of the union of Xu: 𝒈(𝑻) = ∪𝒗∈𝑻 𝒀𝒗
¡ Set S is more influential if f(S) is larger
𝒈( 𝒃, 𝒄 ) < 𝒈({𝒃, 𝒅}) < 𝒈({𝒃, 𝒆})
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8
graph G
c … influence set Xu of node u a b d
¡ Problem: Most influential set of k nodes:
set S on k nodes producing largest expected cascade size f(S) if activated
¡ The optimization problem: ¡ How hard is this problem?
§ NP-COMPLETE!
§ Show that finding most influential set is at least as hard as a set cover problem
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10
) ( max
k size
- f
S
S f
¡ Set cover problem
(a known NP-complete problem):
§ Given universe of elements 𝑽 = {𝒗𝟐, … , 𝒗𝒐} and sets 𝒀𝟐, … , 𝒀𝒏 ⊆ 𝑽 § Q: Are there k sets among X1,…, Xm such that their union is U?
¡ Goal:
Encode set cover as an instance of
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11
) ( max
k size
- f
S
S f
U X1
X2 X3
X4
¡ Given a set cover instance with sets X1,…, Xm ¡ Build a bipartite “X-to-U” graph: ¡ Set Cover as Influence Maximization in
X-to-U graph: There exists a set S of size k with f(S)=k+n iff there exists a size k set cover
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12
Construction:
- Create edge
(Xi,u) " Xi " uÎXi
- - directed edge
from sets to their elements
- Put weight 1 on
each edge (the activation is deterministic)
u1 u2 u3 un e.g.: X1 = {u1, u2, u3}
1 1 1
X1 X2 X3 Xm
Note: Optimal solution is always a set of nodes Xi (we never influence nodes “u”) This problem is hard in general, but there could be special cases that are easier.
¡ Extremely bad news:
§ Influence maximization is NP-complete
¡ Next, good news:
§ There exists an approximation algorithm!
§ For some inputs the algorithm won’t find globally
- ptimal solution/set OPT
§ But we will also prove that the algorithm will never do too badly either. More precisely, the algorithm will find a set S that where f(S) > 0.63*g(OPT), where OPT is the globally optimal set.
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13
¡ Consider a Greedy Hill Climbing algorithm to
find S:
§ Input: Influence set 𝒀𝒗 of each node 𝒗: 𝒀𝒗 = {𝒘𝟐, 𝒘𝟑, … }
§ That is, if we activate 𝒗, nodes {𝒘𝟐, 𝒘𝟑, … } will eventually get active
§ Algorithm: At each iteration 𝒋 activate the node 𝒗 that gives largest marginal gain: 𝐧𝐛𝐲
𝒗 𝒈(𝑻𝒋M𝟐 ∪ {𝒗})
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14
𝑇𝑗 … Initially active set 𝑔(𝑇3) … Size of the union of 𝑌P, 𝑣 ∈ 𝑇3
Algorithm:
¡ Start with 𝑻𝟏 = { } ¡ For 𝒋 = 𝟐 … 𝒍
§ Activate node 𝒗 that max 𝒈(𝑻𝒋M𝟐 ∪ {𝒗}) § Let 𝑻𝒋 = 𝑻𝒋M𝟐 ∪ {𝒗}
¡ Example:
§ Eval. 𝑔 𝑏 , … , 𝑔({𝑓}), pick argmax of them § Eval. 𝑔 𝒆, 𝑏 , … , 𝑔({𝒆, 𝑓}), pick argmax § Eval. 𝑔(𝒆, 𝒄, 𝑏}), … , 𝑔({𝒆, 𝒄, 𝑓}), pick argmax
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
a b c a b c d d f(Si-1È{u}) e e
15
¡ Claim: Hill climbing produces a solution S
where: f(S) ³(1-1/e)*f(OPT) (f(S)>0.63*f(OPT))
[Nemhauser, Fisher, Wolsey ’78, Kempe, Kleinberg, Tardos ‘03]
¡ Claim holds for functions f(·) with 2 properties:
§ f is monotone: (activating more nodes doesn’t hurt) if S Í T then f(S) £ f(T) and f({})=0 § f is submodular: (activating each additional node helps less) adding an element to a set gives less improvement than adding it to one of its subsets: "S Í T
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16
Gain of adding a node to a small set Gain of adding a node to a large set
f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)
¡ Diminishing returns:
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17
f(·) Set size |T|, |S|
Gain of adding a node to a small set Gain of adding a node to a large set
f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)
f(S) f(S È{u}) f(T È{u})
"S Í T
f(T) Adding u to T helps less than adding it to S!
Also see the hangout posted on the course website.
¡ We must show our f(·) is submodular: ¡ "S Í T ¡ Basic fact 1:
§ If 𝒈𝟐(𝒚), … , 𝒈𝒍(𝒚) are submodular, and 𝒅𝟐, … , 𝒅𝒍 ≥ 𝟏 then 𝑮 𝒚 = ∑ 𝒅𝒋 ] 𝒈𝒋 𝒚
- 𝒋
is also submodular
(Non-negative combination of submodular functions is a submodular function)
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19
Gain of adding a node to a small set Gain of adding a node to a large set
f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)
¡ "S Í T: ¡ Basic fact 2: A simple submodular function
§ Sets 𝒀𝟐, … , 𝒀𝒏 § 𝒈 𝑻 = ⋃ 𝒀𝒍
- 𝒍∈𝑻
(size of the union of sets 𝒀𝒍, 𝒍 ∈ 𝑻)
§ Claim: 𝒈(𝑻) is submodular!
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20
S T u
Gain of adding u to a small set Gain of adding u to a large set
f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)
S Í T
The more sets you already have the less new area a given set u will cover
¡ Proof strategy:
§ We will argue that influence maximization is an instance of the Set cover problem:
§ Set cover problem: f(S) is the size of the union of nodes influenced by active set S
§ Note f(S) is “random” (a result of a random process) so we need to be a bit careful
§ Principle of deferred decision to the rescue!
¡ We will create many parallel universes and then
average over them
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21
c b e g f h i
a d
Random realizations i
𝑔 𝑇 = 1 |𝐽| 2 𝑔
3(𝑇)
¡ Principle of deferred decision:
§ Flip all the coins at the beginning and record which edges fire successfully § Now we have a deterministic graph! § Def: Edge is live if it fired successfully
§ That is, we remove edges that did not fire
¡ What is influence set 𝒀𝒗 of node 𝒗?
§ The set reachable by live-edge paths from 𝒗
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22
c b e g f h i
a d
Influence sets for realization 𝒋: 𝑌_
3 = {a,f,c,g}
𝑌`
3 = {b,c},
𝑌a
3 = {c}
𝑌b
3 = {d,e,h}
…
Random realizations i
𝑔 𝑇 = 1 |𝐽| 2 𝑔
3(𝑇)
¡ What is an influence set 𝒀𝒗?
§ The set reachable by live-edge paths from 𝒗
¡ What is now f(S)?
§ fi(S) = size of the set reachable by live-edge paths from nodes in S
¡ For the i-th realization of coin flips
§ 𝑔𝑗(𝑇 = 𝑏, 𝑐 ) = 𝑏, 𝑔, 𝑑, ∪ 𝑐, 𝑑 = 5 § 𝑔𝑗 𝑇 = 𝑏, 𝑒 = 𝑏, 𝑔, 𝑑, } ∪ {𝑒, 𝑓, ℎ = 7
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23
c b e g f h i
a d
Influence sets for realization 𝒋: 𝑌_
3 = {a,f,c,g}
𝑌`
3 = {b,c},
𝑌a
3 = {c}
𝑌b
3 = {d,e,h}
Random realizations i
𝑔 𝑇 = 1 |𝐽| 2 𝑔
3(𝑇)
¡ Fix outcome 𝒋 ∈ 𝑱 of coin flips ¡ 𝒀𝒘
𝒋 = set of nodes reachable from
𝒘 on live-edge paths
¡ 𝒈𝒋(𝑻) = size of cascades from 𝑻
given the coin flips 𝒋
¡ 𝒈𝒋 𝑻 = ⋃
𝒀𝒘
𝒋
- 𝒘∈𝑻
Þ 𝒈𝒋(𝑻) is submodular!
§ 𝒀𝒘
𝒋 are sets, 𝒈𝒋(𝑻) is the size of their union
¡ Expected influence set size:
𝒈 𝑻 = 𝟐
|𝑱| ∑
𝒈𝒋(𝑻)
- 𝒋∈𝑱
Þ 𝒈(𝑻) is submodular!
§ 𝒈(𝑻) is a linear combination of submodular functions
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24
c a b d e f
Activate edges by coin flipping
c a b d e f c a b d e f c a b d e f 𝒀𝒃
𝟐
𝒀𝒃
𝟑
𝒀𝒃
𝟒
Random realizations i
𝑔 𝑇 = 1 |𝐽| 2 𝑔
3(𝑇)
¡ Find most influential set S of size k: largest
expected cascade size f(S) if set S is activated
¡ Want to solve:
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25
c a b d e f
Network, each edge activates with prob. puv Activate edges by coin flipping
c a b d e f c a b d e f c a b d e f
Multiple realizations i. Each realization is a “parallel universe” … influence set of node a … influence set of node d
Consider S={a,d} then: f1(S)=5, f2(S)=4, f3(S)=3 and f(S) = 12
𝐛𝐬𝐡 𝐧𝐛𝐲
n op
𝒈 𝑻 = 𝟐 |𝑱| 2 𝒈𝒋(𝑻)
- 𝒋∈𝑱
Claim: When f(S) is monotone and submodular then Hill climbing produces active set S where: 𝒈 𝑻 ≥ 𝟐 − 𝟐
𝒇 ⋅ 𝒈(𝑷𝑸𝑼)
§ In other words: 𝑔 𝑇 ≥ 0.63 ⋅ 𝑔(𝑃𝑄𝑈)
¡ The setting:
§ Keep adding nodes that give the largest gain § Start with 𝑻𝟏 = {}, produce sets 𝑻𝟐, 𝑻𝟑, … , 𝑻𝒍 § Add elements one by one § Let 𝑷𝑸𝑼 = {𝒖𝟐 … 𝒖𝒍} be the optimal set (OPT) of size 𝒍
¡ We need to show: 𝒈 𝑻 ≥ (𝟐 − 𝟐
𝒇) 𝒈(𝑷𝑸𝑼)
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27
¡ Define: Marginal gain: 𝜺𝒋 = 𝒈(𝑻𝒋) − 𝒈(𝑻𝒋M𝟐) ¡ Proof: 3 steps:
§ 0) Lemma: 𝑔(𝐵 ∪ 𝐶) − 𝑔(𝐵) ≤ ∑ [𝑔(𝐵 ∪ {𝑐
„} p „o…
) − 𝑔(𝐵)]
§ where: 𝐶 = {𝑐1, … , 𝑐𝑙} and 𝑔(⋅) is submodular
§ 1) 𝜺𝒋ˆ𝟐 ≥
𝟐 𝒍 [𝒈 𝑷𝑸𝑼 − 𝒈(𝑻𝒋)]
§ 2) 𝒈 𝑻𝒋ˆ𝟐 ≥ 𝟐 −
𝟐 𝒍 𝒈 𝑻𝒋 + 𝟐 𝒍 𝒈(𝑷𝑸𝑼)
§ 3) 𝒈 𝑻𝒍 ≥ 𝟐 −
𝟐 𝒇 𝒈(𝑷𝑸𝑼)
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 28
A B
¡ 𝑔(𝐵 ∪ 𝐶) − 𝑔(𝐵) ≤ ∑
[𝑔(𝐵 ∪ {𝑐
„} p „o…
) − 𝑔(𝐵)]
§ where: 𝐶 = {𝑐1, … , 𝑐𝑙} and 𝑔(⋅) is submodular
¡ Proof:
§ Let 𝑪𝒋 = {𝒄𝟐, … 𝒄𝒋}, so we have 𝑪𝟐, 𝑪𝟑, … , 𝑪𝒍(= 𝑪) § 𝑔 𝐵 ∪ B − 𝑔 𝐵 = ∑ 𝑔 𝐵 ∪ 𝐶3 − 𝑔 𝐵 ∪ 𝐶3M…
p 3o…
§ = ∑ 𝑔 𝐵 ∪ 𝐶3M… ∪ 𝑐3 − 𝑔 𝐵 ∪ 𝐶3M…
p 3o…
§ ≤ ∑ 𝑔 𝐵 ∪ {𝑐3} − 𝑔 𝐵
p 3o…
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29
𝑔 𝐵 ∪ 𝐶… − 𝑔 𝐵 ∪ 𝐶Œ + 𝑔 𝐵 ∪ 𝐶• − 𝑔 𝐵 ∪ 𝐶… + 𝑔 𝐵 ∪ 𝐶Ž − 𝑔 𝐵 ∪ 𝐶• … + 𝑔 𝐵 ∪ 𝐶p − 𝑔(𝐵 ∪ 𝐶pM…)
Work out the sum. Everything but 1st and last term cancel out: By submodularity since AÈX Ê A
¡ 𝑔 𝑃𝑄𝑈 ≤ 𝑔 𝑇3 ∪ 𝑃𝑄𝑈 ¡ = 𝑔 𝑇3 ∪ 𝑃𝑄𝑈 − 𝑔 𝑇3 + 𝑔 𝑇3 ¡ ≤ ∑
𝑔 𝑇3 ∪ {𝑢„} − 𝑔 𝑇3 + 𝑔(𝑇3)
p „o…
¡ ≤ ∑
𝜀3ˆ…
p „o…
+ 𝑔 𝑇3
¡ = 𝑔 𝑇3 + 𝑙 𝜀3ˆ… ¡ Thus: 𝑔 𝑃𝑄𝑈 ≤ 𝑔 𝑇3 + 𝑙 𝜀3ˆ… ¡ ⇒ 𝜺𝒋ˆ𝟐 ≥ 𝟐
𝒍 [𝒈 𝑷𝑸𝑼 − 𝒈(𝑻𝒋)]
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30
(by monotonicity) (by prev. slide)
OPT = { t1, … tk } tj is j-th element of the
- ptimal solution.
Rather than choosing tj let’s greedily choose the best element qi, which gives a gain of di+1. So, 𝒈 𝑻𝒋 ∪ 𝒖𝒌 ≤ 𝜺𝒋ˆ𝟐. This is the “hill-climbing” assumption.
Remember: 𝜺𝒋 = 𝒈(𝑻𝒋) − 𝒈(𝑻𝒋M𝟐)
¡ We just showed: 𝜀3ˆ… ≥
… p [𝑔 𝑃𝑄𝑈 − 𝑔(𝑇3)]
¡ What is 𝒈(𝑻𝒋ˆ𝟐)?
§ 𝑔 𝑇3ˆ… = 𝑔 𝑇3 + 𝜀3ˆ… § ≥ 𝑔 𝑇3 +
… p 𝑔 𝑃𝑄𝑈 − 𝑔 𝑇3
§ ≥ 1 −
… p 𝑔 𝑇3 + … p 𝑔(𝑃𝑄𝑈)
¡ What is 𝒈(𝑻𝒍)?
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31
¡ Claim:
Proof by induction:
¡ 𝒋 = 𝟏:
§ 𝑔 𝑇Œ = 𝑔({}) = 0 § 1 − 1 −
… p Œ
𝑔 𝑃𝑄𝑈 = 0
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32
) ( 1 1 1 ) ( OPT f k S f
i i
ú ú û ù ê ê ë é ÷ ø ö ç è æ -
- ³
¡ Given that this is true for Si:
Proof by induction:
¡ At 𝒋 + 𝟐:
§ 𝑔 𝑇3ˆ… ≥ 1 −
… p 𝑔 𝑇3 + … p 𝑔 𝑃𝑄𝑈
§ ≥ 1 −
… p
1 − 1 −
… p 3
𝑔 𝑃𝑄𝑈 +
… p 𝑔 𝑃𝑄𝑈
§ = 1 − 1 −
… p 3ˆ…
𝑔(𝑃𝑄𝑈)
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33
) ( 1 1 1 ) ( OPT f k S f
i i
ú ú û ù ê ê ë é ÷ ø ö ç è æ -
- ³
𝑔 𝑇3ˆ… ≥ 1 − 1 𝑙 𝑔 𝑇3 + 1 𝑙 𝑔(𝑃𝑄𝑈) Two slides ago we showed: the claim
¡ Thus:
𝒈 𝑻 = 𝒈 𝑻𝒍 ≥ 𝟐 − 𝟐 − 𝟐 𝒍
𝒍
𝒈 𝑷𝑸𝑼
¡ So:
𝒈 𝑻𝒍 ≥ 𝟐 − 𝟐 𝒇 𝒈(𝑷𝑸𝑼)
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34
≤ 𝟐 𝒇 qed.
Apply inequality: 1 + 𝑦 ≤ 𝑓”where 𝑦 = − …
p
We just proved:
¡ Hill climbing finds solution S which
f(S) ³ (1-1/e)*f(OPT) i.e., f(S) ³ 0.63*f(OPT)
¡ This is a data independent bound
§ This is a worst case bound § No matter what is the input data, we know that the Hill-Climbing will never do worse than 0.63*f(OPT)
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35
¡ How to evaluate influence maximization ƒ(S)?
§ Still an open question of how to compute it efficiently
¡ But: Very good estimates by simulation
§ Repeating the diffusion process often enough (polynomial in n; 1/ε) § Achieve (1± ε)-approximation to f(S) § Generalization of Nemhauser-Wolsey proof: Greedy algorithm is now a (1-1/e- ε)- approximation
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 36
¡ Find most influential set S of size k: largest
expected cascade size f(S) if set S is activated
¡ Want to solve:
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37
c a b d e f
Network, each edge activates with prob. puv Activate edges by coin flipping
c a b d e f c a b d e f c a b d e f
Multiple realizations i. Each realization is a “parallel universe” … influence set of node a … influence set of node d
Consider S={a,d} then: f1(S)=5, f2(S)=4, f3(S)=3 and f(S) = 1/3*(5+4+3)=4
𝐛𝐬𝐡 𝐧𝐛𝐲
n op
𝒈 𝑻 = 𝟐 |𝑱| 2 𝒈𝒋(𝑻)
- 𝒋∈𝑱
¡ A collaboration network: co-authorships in
papers of the arXiv high-energy physics theory:
§ 10,748 nodes, 53,000 edges § Example cascade process: Spread of new scientific terminology/method or new research area
¡ Independent Cascade Model:
§ Case 1: Uniform probability p on each edge § Case 2: Edge from v to w has probability 1/deg(w) of activating w.
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39
¡ Simulate the process 10,000 times for each
targeted set
§ Every time re-choosing edge outcomes randomly
¡ Compare with other 3 common heuristics
§ Degree centrality: Pick nodes with highest degree § Closeness centrality: Pick nodes in the “center” of the network § Random nodes: Pick a random set of nodes
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40
puv = 0.01 puv = 0.10
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41
Uniform edge firing probability puv
f(Sk) f(Sk) k k
puv=1/deg(v)
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42
Non-uniform edge firing probability puv
k f(Sk)
¡ Notice: Greedy approach is slow!
§ For a given network G, repeat 10,000s of times:
§ Flip coin for each edge and determine influence sets under coin-flip realization i § Each node u is associated with 10,000s influence sets Xu
i
§ Greedy’s complexity is 𝑷(𝒍 ⋅ 𝒐 ⋅ 𝑺 ⋅ 𝑵)
§ 𝑜 … number of nodes in G § 𝑙 … number of nodes to be selected/influenced § 𝑆 … number of simulation rounds § 𝑛 … number of edges in G
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43
¡ Many researchers have since proposed
heuristics that work well in practice and run faster than the greedy algorithm
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44
[Chen, Wang, Yang]
¡ More realistic viral marketing:
§ Different marketing actions increase likelihood of initial activation, for several nodes at once
¡ Study more general influence models:
§ Find trade-offs between generality and feasibility
¡ Deal with negative influences:
§ Model competing ideas
¡ Obtain more data (better models) about how
activations occur in real social networks
10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45