Maximizing the Spread of Influence through a Social Network Han - - PowerPoint PPT Presentation

maximizing the spread of influence
SMART_READER_LITE
LIVE PREVIEW

Maximizing the Spread of Influence through a Social Network Han - - PowerPoint PPT Presentation

Maximizing the Spread of Influence through a Social Network Han Wang Department of Computer Science ETH Zrich Problem Example 1: Spread of Rumor 2012 = end! D A C E B F Problem Example 2: Viral Marketing ezPad 1 beats iPad 3 D


slide-1
SLIDE 1

Maximizing the Spread of Influence through a Social Network

Han Wang

Department of Computer Science ETH Zürich

slide-2
SLIDE 2

Problem Example 1: Spread of Rumor

 2012 = end!

C F E B D A

slide-3
SLIDE 3

Problem Example 2: Viral Marketing

 ezPad 1 beats iPad 3

C F E B D A

slide-4
SLIDE 4

Problem Definition

 G:

a social network (n nodes)

 Model: spread process  S:

initially active subset (k seeds)

 𝝉 𝑻 :

#final active nodes (achievement)

 Task: Choose 𝑇∗  Goal: 𝜏 𝑇∗ = max 𝜏 𝑇 NP-Hard

Realistic Goal: Approximate the maximum with a guarantee Choose S: 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇∗

slide-5
SLIDE 5

Contents in This Talk

 G:

a social network (n nodes)

 Model: spread process  S:

initially active subset (k seeds)

 𝝉 𝑻 :

#final active nodes (achievement)

 Task: Choose 𝑇∗  Goal: 𝜏 𝑇∗ = max 𝜏 𝑇 NP-Hard

Realistic Goal: Approximate the maximum with a guarantee Choose S: 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇∗

Two Models Prove: Prove:

slide-6
SLIDE 6

Model 1: Independent Cascade Model

slide-7
SLIDE 7

Model 1: Cascade Model

 Each active node try to activate his neighbors

 𝑞𝑣,𝑤 1 − 𝑞𝑣,𝑤  Only a single chance

C F E D

𝑞𝐷,𝐸 = 0.2 𝑞𝐷,𝐹 = 0.8 𝑞𝐷,𝐺 = 0.6

slide-8
SLIDE 8

Model 1: Cascade Model

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4

slide-9
SLIDE 9

 𝑇 = 𝐵, 𝐷 , 𝜏 𝑇 = 5

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4

Model 1: Cascade Model

slide-10
SLIDE 10

Model 2: Linear Threshold Model

slide-11
SLIDE 11

Model 2: Threshold Model

 Each inactive node picks a random 𝜄𝑤 ∈ ,0,1-

 Active condition:

𝑐𝑣,𝑤

𝑣: 𝑏𝑑𝑢𝑗𝑤𝑓 𝑜𝑓𝑗𝑕𝑖𝑐𝑝𝑠 𝑝𝑔 𝑤

≥ 𝜄𝑤

C E D

𝜾𝑬 = 𝟏. 𝟒 Iteration 2: 0.2 < 0.3 𝑐𝐷,𝐸 = 0.2 𝑐𝐹,𝐸 = 0.7 Iteration 4: E  active Iteration 5: 0.2+0.7 > 0.3 D  active

slide-12
SLIDE 12

Iteration:

Model 2: Threshold Model

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4 𝜾 = 𝟏. 𝟔 𝜾 = 𝟏. 𝟕 𝜾 = 𝟏. 𝟒 𝜾 = 𝟏. 𝟘

1 2

slide-13
SLIDE 13

Model 2: Threshold Model

 𝑇 = 𝐵, 𝐷 , 𝜏 𝑇 = 4

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4

slide-14
SLIDE 14

How to Prove the Guarantee?

Given a spread model find 𝑇, s.t. 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇∗

???

f(S): Non-negative monotone submodular find 𝑇, s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇∗

Nemhauser

slide-15
SLIDE 15

Submodularity

 𝑉: a finite ground set  𝑄 𝑉 : power set of 𝑉  𝑔 ∙ : 𝑄 𝑉 → 𝑆∗  Submodularity: ∀ 𝑜𝑝𝑒𝑓 𝑤, ∀𝑇 ⊆ 𝑈

𝒈 𝐓 ∪ 𝒘 − 𝒈 𝑻 ≥ 𝒈 𝑼 ∪ 𝒘 − 𝒈 𝑼

slide-16
SLIDE 16

Example: Submodularity

 𝒈 𝑻 :

number of vertexes reachable from vertexes in S

C D B v A C D B v A

slide-17
SLIDE 17

How to Prove the Guarantee?

Given a spread model find 𝑇, s.t. 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇∗

???

f(S): Non-negative monotone submodular find 𝑇, s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇∗

Nemhauser

Prove: 𝛕 𝑻 is Submodular

slide-18
SLIDE 18

We Want to Prove…

Model 𝛕 𝑇 is Submodular NP-hard Independent Cascade Linear Threshold

slide-19
SLIDE 19

Prove: Submodularity

Cascade Model

slide-20
SLIDE 20

Submodularity (Cascade Model)

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4

 Recall: flip coin

slide-21
SLIDE 21

Submodularity (Cascade Model)

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4

 Why not flip all the coins in the begining?

slide-22
SLIDE 22

Submodularity (Cascade Model)

 Live edges  live paths  blocked edges

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4

slide-23
SLIDE 23

Simplify Cascade Model

Node v ends up active A live path: some seed  v

slide-24
SLIDE 24

 X: coin flipping outcome

 e.g. X1, X2

 𝑆𝑌 𝑤

 𝑆𝑌1 𝐵 = 𝐵, 𝐶  𝑆𝑌1 𝐷 = 𝐷, 𝐸, 𝐹

 𝜏𝑌 𝑇 = |

𝑆𝑌 𝑤 |

𝑤∈𝑇

 𝜏𝑌1 *𝐵, 𝐷+ =

𝐵, 𝐶, 𝐷, 𝐸, 𝐹 = 5

C F E B D A C F E B D A

Achievement(Simplified Model)

slide-25
SLIDE 25

 Fix x, 𝜏𝑌 𝑇 is submodular  Linear combination of submodular functions

is still submodular

𝜏 𝑇 = 𝑄𝑠𝑝𝑐 𝑌 ∙ 𝜏𝑌 𝑇

𝑌

Submodularity (Cascade Model)

slide-26
SLIDE 26

Summary of the proof

Active = Has a live path 𝜏𝑌 𝑇 is submodular

𝜏 𝑇 is submodular

slide-27
SLIDE 27

Prove: NP-hard

Simplified Cascade Model

slide-28
SLIDE 28

NP-Hard (Cascade Model)

 Set Cover Problem: k subsets cover all?  K=1: No  K=2: No  K=3: Yes  K=4: …

slide-29
SLIDE 29

NP-Hard (Cascade Model)

 Solve Set Cover

Q: 2 subsets cover all ?

 Influence maximization

Q: 𝑇 = 2, 𝜏 𝑇 ≥ 2 + 5?

A B C D E S1 S3 A B C D E S1 S2 S3 S2

slide-30
SLIDE 30

NP-Hard (Cascade Model)

Influence Maximization Problem

is at least as difficult as

Set Cover Problem

slide-31
SLIDE 31

Prove: Submodularity

Linear Threshold Model

slide-32
SLIDE 32

Recall: Threshold Model

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4 𝜾 = 𝟏. 𝟔 𝜾 = 𝟏. 𝟕 𝜾 = 𝟏. 𝟒 𝜾 = 𝟏. 𝟘

slide-33
SLIDE 33

Gamble: Roulette

slide-34
SLIDE 34

Gamble: Roulette

N1 N2 N3 N4 N5 N6 None N5

v

N4 N3 N2 N1 N6 0.2 0.15 0.07 0.23 0.1 0.14

𝜾 = 𝟏. 𝟓

slide-35
SLIDE 35

A None C E None C None

𝜾 = 𝟏. 𝟔

C None

Submodularity (Threshold Model)

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4 𝜾 = 𝟏. 𝟕 𝜾 = 𝟏. 𝟒 𝜾 = 𝟏. 𝟘

slide-36
SLIDE 36

Submodularity (Threshold Model)

C F E B D A

0.2 0.8 0.7 0.6 0.3 0.4 𝜾 = 𝟏. 𝟔 𝜾 = 𝟏. 𝟕 𝜾 = 𝟏. 𝟒 𝜾 = 𝟏. 𝟘

 Live edges  live paths

slide-37
SLIDE 37

Correctness of Simplification

𝐺𝑝𝑠 𝑜𝑝𝑒𝑓 𝑤: 𝑄 𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜 𝑢 + 1 𝑗𝑜𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜𝑡 ≤ 𝑢) = 𝑄(𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜 𝑢 + 1) 𝑄(𝑗𝑜𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜𝑡 ≤ 𝑢)

slide-38
SLIDE 38

Simplified Model

N1 N2 N3 N4 N5 N6 None N5

v

N4 N3 N2 N1 N6 0.2 0.15 0.07 0.23 0.1 0.14

Active before iteration 5 becomes active in iteration 5

slide-39
SLIDE 39

𝐵𝑢: Nodes becoming active in iteration t

Simplified Model

𝑐𝑣,𝑤

𝑣∈𝐵𝑢

1 − 𝑐𝑣,𝑤

𝑣∈ 𝐵1∪𝐵2∪⋯∪𝐵𝑢−1

slide-40
SLIDE 40

Original Model

N5

v

N4 N3 N2 N1 N6 0.2 0.15 0.07 0.23 0.1 0.14 N2 N6 N4 N3 N1

N5 None

slide-41
SLIDE 41

𝐵𝑢: Nodes becoming active in iteration t

Original Model

𝑐𝑣,𝑤

𝑣∈𝐵𝑢

1 − 𝑐𝑣,𝑤

𝑣∈ 𝐵1∪𝐵2∪⋯∪𝐵𝑢−1

slide-42
SLIDE 42

Simplify Threshold Model

Node v ends up active A live path: some seed  v

slide-43
SLIDE 43

Similarly, we have…

Active = Has a live path 𝜏𝑌 𝑇 is submodular

𝜏 𝑇 is submodular

slide-44
SLIDE 44

Prove: NP-hard

Linear Threshold Model

slide-45
SLIDE 45

NP-Hard (Threshold Model)

 Vertex Cover Problem

 k vertexes (S)

each edge is incident to at least one vertex in S

slide-46
SLIDE 46

NP-Hard (Threshold Model)

 Vertex Set Cover

Q: 3 vertexes cover all ?

 Influence maximization

Q: 𝑇 = 3, 𝜏 𝑇 = 6?

C F E B D A C F E B D A

slide-47
SLIDE 47

Influence Maximization

Q: 𝑇 = 3, 𝜏 𝑇 = 6?

C F E B D A C F E B D A

Q: 𝑇 = 2, 𝜏 𝑇 = 6?

slide-48
SLIDE 48

NP-Hard (Threshold Model)

Influence Maximization Problem

is at least as difficult as

Vertex Cover Problem

slide-49
SLIDE 49

End of Proofs

 Influence Maximization Problem

Model 𝛕 𝑇 is Submodular NP-hard Independent Cascade Linear Threshold

slide-50
SLIDE 50

Initial Problem

Given a spread model find 𝑇, s.t. 𝜏 𝑇 ≥ (1 − 1 𝑓 − 𝝑) ∙ 𝜏 𝑇∗ f(S): Non-negative monotone submodular find 𝑇, s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇∗

Greedy Hill Climbing 𝑵𝑩𝒀𝒘 𝒈 𝐓 ∪ 𝒘 − 𝒈 𝑻 (Maximize Marginal Gain)

Prove: 𝛕 𝑻 is Submodular

slide-51
SLIDE 51

Summary

 Problem Description  Two Models

 Independent Cascade Model  Linear Threshold Model

Submodular Functions

Proof of Approximation Guarantee

Proof of NP-Hardness

slide-52
SLIDE 52

Q&A