Maximizing the Spread of Influence through a Social Network
Han Wang
Department of Computer Science ETH Zürich
Maximizing the Spread of Influence through a Social Network Han - - PowerPoint PPT Presentation
Maximizing the Spread of Influence through a Social Network Han Wang Department of Computer Science ETH Zrich Problem Example 1: Spread of Rumor 2012 = end! D A C E B F Problem Example 2: Viral Marketing ezPad 1 beats iPad 3 D
Han Wang
Department of Computer Science ETH Zürich
2012 = end!
C F E B D A
ezPad 1 beats iPad 3
C F E B D A
G:
a social network (n nodes)
Model: spread process S:
initially active subset (k seeds)
𝝉 𝑻 :
#final active nodes (achievement)
Task: Choose 𝑇∗ Goal: 𝜏 𝑇∗ = max 𝜏 𝑇 NP-Hard
Realistic Goal: Approximate the maximum with a guarantee Choose S: 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇∗
G:
a social network (n nodes)
Model: spread process S:
initially active subset (k seeds)
𝝉 𝑻 :
#final active nodes (achievement)
Task: Choose 𝑇∗ Goal: 𝜏 𝑇∗ = max 𝜏 𝑇 NP-Hard
Realistic Goal: Approximate the maximum with a guarantee Choose S: 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇∗
Two Models Prove: Prove:
Each active node try to activate his neighbors
𝑞𝑣,𝑤 1 − 𝑞𝑣,𝑤 Only a single chance
C F E D
𝑞𝐷,𝐸 = 0.2 𝑞𝐷,𝐹 = 0.8 𝑞𝐷,𝐺 = 0.6
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4
𝑇 = 𝐵, 𝐷 , 𝜏 𝑇 = 5
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4
Each inactive node picks a random 𝜄𝑤 ∈ ,0,1-
Active condition:
𝑐𝑣,𝑤
𝑣: 𝑏𝑑𝑢𝑗𝑤𝑓 𝑜𝑓𝑗𝑖𝑐𝑝𝑠 𝑝𝑔 𝑤
≥ 𝜄𝑤
C E D
𝜾𝑬 = 𝟏. 𝟒 Iteration 2: 0.2 < 0.3 𝑐𝐷,𝐸 = 0.2 𝑐𝐹,𝐸 = 0.7 Iteration 4: E active Iteration 5: 0.2+0.7 > 0.3 D active
Iteration:
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4 𝜾 = 𝟏. 𝟔 𝜾 = 𝟏. 𝟕 𝜾 = 𝟏. 𝟒 𝜾 = 𝟏. 𝟘
1 2
𝑇 = 𝐵, 𝐷 , 𝜏 𝑇 = 4
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4
Given a spread model find 𝑇, s.t. 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇∗
???
f(S): Non-negative monotone submodular find 𝑇, s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇∗
Nemhauser
𝑉: a finite ground set 𝑄 𝑉 : power set of 𝑉 𝑔 ∙ : 𝑄 𝑉 → 𝑆∗ Submodularity: ∀ 𝑜𝑝𝑒𝑓 𝑤, ∀𝑇 ⊆ 𝑈
𝒈 𝐓 ∪ 𝒘 − 𝒈 𝑻 ≥ 𝒈 𝑼 ∪ 𝒘 − 𝒈 𝑼
𝒈 𝑻 :
number of vertexes reachable from vertexes in S
C D B v A C D B v A
Given a spread model find 𝑇, s.t. 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇∗
???
f(S): Non-negative monotone submodular find 𝑇, s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇∗
Nemhauser
Prove: 𝛕 𝑻 is Submodular
Model 𝛕 𝑇 is Submodular NP-hard Independent Cascade Linear Threshold
Cascade Model
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4
Recall: flip coin
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4
Why not flip all the coins in the begining?
Live edges live paths blocked edges
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4
X: coin flipping outcome
e.g. X1, X2
𝑆𝑌 𝑤
𝑆𝑌1 𝐵 = 𝐵, 𝐶 𝑆𝑌1 𝐷 = 𝐷, 𝐸, 𝐹
𝜏𝑌 𝑇 = |
𝑆𝑌 𝑤 |
𝑤∈𝑇
𝜏𝑌1 *𝐵, 𝐷+ =
𝐵, 𝐶, 𝐷, 𝐸, 𝐹 = 5
C F E B D A C F E B D A
Fix x, 𝜏𝑌 𝑇 is submodular Linear combination of submodular functions
is still submodular
𝜏 𝑇 = 𝑄𝑠𝑝𝑐 𝑌 ∙ 𝜏𝑌 𝑇
𝑌
Active = Has a live path 𝜏𝑌 𝑇 is submodular
𝜏 𝑇 is submodular
Simplified Cascade Model
Set Cover Problem: k subsets cover all? K=1: No K=2: No K=3: Yes K=4: …
Solve Set Cover
Q: 2 subsets cover all ?
Influence maximization
Q: 𝑇 = 2, 𝜏 𝑇 ≥ 2 + 5?
A B C D E S1 S3 A B C D E S1 S2 S3 S2
Influence Maximization Problem
is at least as difficult as
Set Cover Problem
Linear Threshold Model
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4 𝜾 = 𝟏. 𝟔 𝜾 = 𝟏. 𝟕 𝜾 = 𝟏. 𝟒 𝜾 = 𝟏. 𝟘
N1 N2 N3 N4 N5 N6 None N5
v
N4 N3 N2 N1 N6 0.2 0.15 0.07 0.23 0.1 0.14
𝜾 = 𝟏. 𝟓
A None C E None C None
𝜾 = 𝟏. 𝟔
C None
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4 𝜾 = 𝟏. 𝟕 𝜾 = 𝟏. 𝟒 𝜾 = 𝟏. 𝟘
C F E B D A
0.2 0.8 0.7 0.6 0.3 0.4 𝜾 = 𝟏. 𝟔 𝜾 = 𝟏. 𝟕 𝜾 = 𝟏. 𝟒 𝜾 = 𝟏. 𝟘
Live edges live paths
𝐺𝑝𝑠 𝑜𝑝𝑒𝑓 𝑤: 𝑄 𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜 𝑢 + 1 𝑗𝑜𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜𝑡 ≤ 𝑢) = 𝑄(𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜 𝑢 + 1) 𝑄(𝑗𝑜𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜𝑡 ≤ 𝑢)
N1 N2 N3 N4 N5 N6 None N5
v
N4 N3 N2 N1 N6 0.2 0.15 0.07 0.23 0.1 0.14
Active before iteration 5 becomes active in iteration 5
𝐵𝑢: Nodes becoming active in iteration t
𝑐𝑣,𝑤
𝑣∈𝐵𝑢
1 − 𝑐𝑣,𝑤
𝑣∈ 𝐵1∪𝐵2∪⋯∪𝐵𝑢−1
N5
v
N4 N3 N2 N1 N6 0.2 0.15 0.07 0.23 0.1 0.14 N2 N6 N4 N3 N1
N5 None
𝐵𝑢: Nodes becoming active in iteration t
𝑐𝑣,𝑤
𝑣∈𝐵𝑢
1 − 𝑐𝑣,𝑤
𝑣∈ 𝐵1∪𝐵2∪⋯∪𝐵𝑢−1
Active = Has a live path 𝜏𝑌 𝑇 is submodular
𝜏 𝑇 is submodular
Linear Threshold Model
Vertex Cover Problem
k vertexes (S)
each edge is incident to at least one vertex in S
Vertex Set Cover
Q: 3 vertexes cover all ?
Influence maximization
Q: 𝑇 = 3, 𝜏 𝑇 = 6?
C F E B D A C F E B D A
Q: 𝑇 = 3, 𝜏 𝑇 = 6?
C F E B D A C F E B D A
Q: 𝑇 = 2, 𝜏 𝑇 = 6?
Influence Maximization Problem
is at least as difficult as
Vertex Cover Problem
Influence Maximization Problem
Model 𝛕 𝑇 is Submodular NP-hard Independent Cascade Linear Threshold
Given a spread model find 𝑇, s.t. 𝜏 𝑇 ≥ (1 − 1 𝑓 − 𝝑) ∙ 𝜏 𝑇∗ f(S): Non-negative monotone submodular find 𝑇, s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇∗
Greedy Hill Climbing 𝑵𝑩𝒀𝒘 𝒈 𝐓 ∪ 𝒘 − 𝒈 𝑻 (Maximize Marginal Gain)
Prove: 𝛕 𝑻 is Submodular
Problem Description Two Models
Independent Cascade Model Linear Threshold Model
Submodular Functions
Proof of Approximation Guarantee
Proof of NP-Hardness