http://cs224w.stanford.edu We are more influenced by our friends - - PowerPoint PPT Presentation

http cs224w stanford edu we are more influenced by our
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu We are more influenced by our friends - - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of consumers consult friends and family before purchasing home electronics 50% do


slide-1
SLIDE 1

CS224W: Analysis of Networks Jure Leskovec, Stanford University

http://cs224w.stanford.edu

slide-2
SLIDE 2

¡ We are more influenced by our friends

than strangers

2

¨ 68% of consumers consult

friends and family before purchasing home electronics

¨50% do research online

before purchasing electronics

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

slide-3
SLIDE 3

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

Identify influential customers These customers endorse the product among their friends Convince them to adopt the product – Offer discount or free samples

slide-4
SLIDE 4

¡ Information epidemics:

§ Which are the influential users? § Which news sites create big cascades? § Where should we advertise?

4 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

vs. Which node shall we target?

slide-5
SLIDE 5

¡ Independent Cascade Model

§ Directed finite 𝑯 = (𝑾, 𝑭) § Set 𝑻 starts out with new behavior

§ Say nodes with this behavior are “active”

§ Each edge (𝒘, 𝒙) has a probability 𝒒𝒘𝒙 § If node 𝒘 is active, it gets one chance to make 𝒙 active, with probability 𝒒𝒘𝒙

§ Each edge fires at most once

¡ Does scheduling matter? No

§ If 𝒗, 𝒘 are both active at the same time, it doesn’t matter which tries to activate 𝒙 first

§ But the time moves in discrete steps

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

slide-6
SLIDE 6

¡ Initially some nodes S are active ¡ Each edge (𝒘, 𝒙) has probability (weight) 𝒒𝒘𝒙 ¡ When node v becomes active:

§ It activates each out-neighbor 𝒙 with prob. 𝒒𝒘𝒙

¡ Activations spread through the network

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.2

e g f c b a d h i f g e

slide-7
SLIDE 7

Problem: (k is a user-specified parameter)

¡ Most influential set of

size k: set S of k nodes producing largest expected cascade size f(S) if activated [Domingos-

Richardson ‘01]

¡ Optimization problem:

10/23/17 7 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

) ( max

k size

  • f

S

S f

0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.2 c b e a d g f h i Influence set Xd of d Influence set Xa of a

Why “expected cascade size”? Xa is a result of a random process. So in practice we would want to compute Xa for many random realizations and then maximize the “average” value f(S). For now let’s ignore this nuisance and simply assume that each node u influences a set of nodes Xu

Random realizations i

𝑔 𝑇 = 1 |𝐽| 2 𝑔

3(𝑇)

slide-8
SLIDE 8

¡ S: is initial active set ¡ f(S): The expected size of final active set

§ f(S) is the size of the union of Xu: 𝒈(𝑻) = ∪𝒗∈𝑻 𝒀𝒗

¡ Set S is more influential if f(S) is larger

𝒈( 𝒃, 𝒄 ) < 𝒈({𝒃, 𝒅}) < 𝒈({𝒃, 𝒆})

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

graph G

c … influence set Xu of node u a b d

slide-9
SLIDE 9
slide-10
SLIDE 10

¡ Problem: Most influential set of k nodes:

set S on k nodes producing largest expected cascade size f(S) if activated

¡ The optimization problem: ¡ How hard is this problem?

§ NP-COMPLETE!

§ Show that finding most influential set is at least as hard as a set cover problem

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

) ( max

k size

  • f

S

S f

slide-11
SLIDE 11

¡ Set cover problem

(a known NP-complete problem):

§ Given universe of elements 𝑽 = {𝒗𝟐, … , 𝒗𝒐} and sets 𝒀𝟐, … , 𝒀𝒏 ⊆ 𝑽 § Q: Are there k sets among X1,…, Xm such that their union is U?

¡ Goal:

Encode set cover as an instance of

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

) ( max

k size

  • f

S

S f

U X1

X2 X3

X4

slide-12
SLIDE 12

¡ Given a set cover instance with sets X1,…, Xm ¡ Build a bipartite “X-to-U” graph: ¡ Set Cover as Influence Maximization in

X-to-U graph: There exists a set S of size k with f(S)=k+n iff there exists a size k set cover

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

Construction:

  • Create edge

(Xi,u) " Xi " uÎXi

  • - directed edge

from sets to their elements

  • Put weight 1 on

each edge (the activation is deterministic)

u1 u2 u3 un e.g.: X1 = {u1, u2, u3}

1 1 1

X1 X2 X3 Xm

Note: Optimal solution is always a set of nodes Xi (we never influence nodes “u”) This problem is hard in general, but there could be special cases that are easier.

slide-13
SLIDE 13

¡ Extremely bad news:

§ Influence maximization is NP-complete

¡ Next, good news:

§ There exists an approximation algorithm!

§ For some inputs the algorithm won’t find globally

  • ptimal solution/set OPT

§ But we will also prove that the algorithm will never do too badly either. More precisely, the algorithm will find a set S that where f(S) > 0.63*g(OPT), where OPT is the globally optimal set.

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

slide-14
SLIDE 14

¡ Consider a Greedy Hill Climbing algorithm to

find S:

§ Input: Influence set 𝒀𝒗 of each node 𝒗: 𝒀𝒗 = {𝒘𝟐, 𝒘𝟑, … }

§ That is, if we activate 𝒗, nodes {𝒘𝟐, 𝒘𝟑, … } will eventually get active

§ Algorithm: At each iteration 𝒋 activate the node 𝒗 that gives largest marginal gain: 𝐧𝐛𝐲

𝒗 𝒈(𝑻𝒋M𝟐 ∪ {𝒗})

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

𝑇𝑗 … Initially active set 𝑔(𝑇3) … Size of the union of 𝑌P, 𝑣 ∈ 𝑇3

slide-15
SLIDE 15

Algorithm:

¡ Start with 𝑻𝟏 = { } ¡ For 𝒋 = 𝟐 … 𝒍

§ Activate node 𝒗 that max 𝒈(𝑻𝒋M𝟐 ∪ {𝒗}) § Let 𝑻𝒋 = 𝑻𝒋M𝟐 ∪ {𝒗}

¡ Example:

§ Eval. 𝑔 𝑏 , … , 𝑔({𝑓}), pick argmax of them § Eval. 𝑔 𝒆, 𝑏 , … , 𝑔({𝒆, 𝑓}), pick argmax § Eval. 𝑔(𝒆, 𝒄, 𝑏}), … , 𝑔({𝒆, 𝒄, 𝑓}), pick argmax

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

a b c a b c d d f(Si-1È{u}) e e

15

slide-16
SLIDE 16

¡ Claim: Hill climbing produces a solution S

where: f(S) ³(1-1/e)*f(OPT) (f(S)>0.63*f(OPT))

[Nemhauser, Fisher, Wolsey ’78, Kempe, Kleinberg, Tardos ‘03]

¡ Claim holds for functions f(·) with 2 properties:

§ f is monotone: (activating more nodes doesn’t hurt) if S Í T then f(S) £ f(T) and f({})=0 § f is submodular: (activating each additional node helps less) adding an element to a set gives less improvement than adding it to one of its subsets: "S Í T

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

Gain of adding a node to a small set Gain of adding a node to a large set

f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)

slide-17
SLIDE 17

¡ Diminishing returns:

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

f(·) Set size |T|, |S|

Gain of adding a node to a small set Gain of adding a node to a large set

f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)

f(S) f(S È{u}) f(T È{u})

"S Í T

f(T) Adding u to T helps less than adding it to S!

slide-18
SLIDE 18

Also see the hangout posted on the course website.

slide-19
SLIDE 19

¡ We must show our f(·) is submodular: ¡ "S Í T ¡ Basic fact 1:

§ If 𝒈𝟐(𝒚), … , 𝒈𝒍(𝒚) are submodular, and 𝒅𝟐, … , 𝒅𝒍 ≥ 𝟏 then 𝑮 𝒚 = ∑ 𝒅𝒋 ] 𝒈𝒋 𝒚

  • 𝒋

is also submodular

(Non-negative combination of submodular functions is a submodular function)

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

Gain of adding a node to a small set Gain of adding a node to a large set

f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)

slide-20
SLIDE 20

¡ "S Í T: ¡ Basic fact 2: A simple submodular function

§ Sets 𝒀𝟐, … , 𝒀𝒏 § 𝒈 𝑻 = ⋃ 𝒀𝒍

  • 𝒍∈𝑻

(size of the union of sets 𝒀𝒍, 𝒍 ∈ 𝑻)

§ Claim: 𝒈(𝑻) is submodular!

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

S T u

Gain of adding u to a small set Gain of adding u to a large set

f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)

S Í T

The more sets you already have the less new area a given set u will cover

slide-21
SLIDE 21

¡ Proof strategy:

§ We will argue that influence maximization is an instance of the Set cover problem:

§ Set cover problem: f(S) is the size of the union of nodes influenced by active set S

§ Note f(S) is “random” (a result of a random process) so we need to be a bit careful

§ Principle of deferred decision to the rescue!

¡ We will create many parallel universes and then

average over them

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

c b e g f h i

a d

Random realizations i

𝑔 𝑇 = 1 |𝐽| 2 𝑔

3(𝑇)

slide-22
SLIDE 22

¡ Principle of deferred decision:

§ Flip all the coins at the beginning and record which edges fire successfully § Now we have a deterministic graph! § Def: Edge is live if it fired successfully

§ That is, we remove edges that did not fire

¡ What is influence set 𝒀𝒗 of node 𝒗?

§ The set reachable by live-edge paths from 𝒗

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

c b e g f h i

a d

Influence sets for realization 𝒋: 𝑌_

3 = {a,f,c,g}

𝑌`

3 = {b,c},

𝑌a

3 = {c}

𝑌b

3 = {d,e,h}

Random realizations i

𝑔 𝑇 = 1 |𝐽| 2 𝑔

3(𝑇)

slide-23
SLIDE 23

¡ What is an influence set 𝒀𝒗?

§ The set reachable by live-edge paths from 𝒗

¡ What is now f(S)?

§ fi(S) = size of the set reachable by live-edge paths from nodes in S

¡ For the i-th realization of coin flips

§ 𝑔𝑗(𝑇 = 𝑏, 𝑐 ) = 𝑏, 𝑔, 𝑑, 𝑕 ∪ 𝑐, 𝑑 = 5 § 𝑔𝑗 𝑇 = 𝑏, 𝑒 = 𝑏, 𝑔, 𝑑, 𝑕} ∪ {𝑒, 𝑓, ℎ = 7

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23

c b e g f h i

a d

Influence sets for realization 𝒋: 𝑌_

3 = {a,f,c,g}

𝑌`

3 = {b,c},

𝑌a

3 = {c}

𝑌b

3 = {d,e,h}

Random realizations i

𝑔 𝑇 = 1 |𝐽| 2 𝑔

3(𝑇)

slide-24
SLIDE 24

¡ Fix outcome 𝒋 ∈ 𝑱 of coin flips ¡ 𝒀𝒘

𝒋 = set of nodes reachable from

𝒘 on live-edge paths

¡ 𝒈𝒋(𝑻) = size of cascades from 𝑻

given the coin flips 𝒋

¡ 𝒈𝒋 𝑻 = ⋃

𝒀𝒘

𝒋

  • 𝒘∈𝑻

Þ 𝒈𝒋(𝑻) is submodular!

§ 𝒀𝒘

𝒋 are sets, 𝒈𝒋(𝑻) is the size of their union

¡ Expected influence set size:

𝒈 𝑻 = 𝟐

|𝑱| ∑

𝒈𝒋(𝑻)

  • 𝒋∈𝑱

Þ 𝒈(𝑻) is submodular!

§ 𝒈(𝑻) is a linear combination of submodular functions

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

c a b d e f

Activate edges by coin flipping

c a b d e f c a b d e f c a b d e f 𝒀𝒃

𝟐

𝒀𝒃

𝟑

𝒀𝒃

𝟒

Random realizations i

𝑔 𝑇 = 1 |𝐽| 2 𝑔

3(𝑇)

slide-25
SLIDE 25

¡ Find most influential set S of size k: largest

expected cascade size f(S) if set S is activated

¡ Want to solve:

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25

c a b d e f

Network, each edge activates with prob. puv Activate edges by coin flipping

c a b d e f c a b d e f c a b d e f

Multiple realizations i. Each realization is a “parallel universe” … influence set of node a … influence set of node d

Consider S={a,d} then: f1(S)=5, f2(S)=4, f3(S)=3 and f(S) = 12

𝐛𝐬𝐡 𝐧𝐛𝐲

n op

𝒈 𝑻 = 𝟐 |𝑱| 2 𝒈𝒋(𝑻)

  • 𝒋∈𝑱
slide-26
SLIDE 26
slide-27
SLIDE 27

Claim: When f(S) is monotone and submodular then Hill climbing produces active set S where: 𝒈 𝑻 ≥ 𝟐 − 𝟐

𝒇 ⋅ 𝒈(𝑷𝑸𝑼)

§ In other words: 𝑔 𝑇 ≥ 0.63 ⋅ 𝑔(𝑃𝑄𝑈)

¡ The setting:

§ Keep adding nodes that give the largest gain § Start with 𝑻𝟏 = {}, produce sets 𝑻𝟐, 𝑻𝟑, … , 𝑻𝒍 § Add elements one by one § Let 𝑷𝑸𝑼 = {𝒖𝟐 … 𝒖𝒍} be the optimal set (OPT) of size 𝒍

¡ We need to show: 𝒈 𝑻 ≥ (𝟐 − 𝟐

𝒇) 𝒈(𝑷𝑸𝑼)

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27

slide-28
SLIDE 28

¡ Define: Marginal gain: 𝜺𝒋 = 𝒈(𝑻𝒋) − 𝒈(𝑻𝒋M𝟐) ¡ Proof: 3 steps:

§ 0) Lemma: 𝑔(𝐵 ∪ 𝐶) − 𝑔(𝐵) ≤ ∑ [𝑔(𝐵 ∪ {𝑐

„} p „o…

) − 𝑔(𝐵)]

§ where: 𝐶 = {𝑐1, … , 𝑐𝑙} and 𝑔(⋅) is submodular

§ 1) 𝜺𝒋ˆ𝟐 ≥

𝟐 𝒍 [𝒈 𝑷𝑸𝑼 − 𝒈(𝑻𝒋)]

§ 2) 𝒈 𝑻𝒋ˆ𝟐 ≥ 𝟐 −

𝟐 𝒍 𝒈 𝑻𝒋 + 𝟐 𝒍 𝒈(𝑷𝑸𝑼)

§ 3) 𝒈 𝑻𝒍 ≥ 𝟐 −

𝟐 𝒇 𝒈(𝑷𝑸𝑼)

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 28

A B

slide-29
SLIDE 29

¡ 𝑔(𝐵 ∪ 𝐶) − 𝑔(𝐵) ≤ ∑

[𝑔(𝐵 ∪ {𝑐

„} p „o…

) − 𝑔(𝐵)]

§ where: 𝐶 = {𝑐1, … , 𝑐𝑙} and 𝑔(⋅) is submodular

¡ Proof:

§ Let 𝑪𝒋 = {𝒄𝟐, … 𝒄𝒋}, so we have 𝑪𝟐, 𝑪𝟑, … , 𝑪𝒍(= 𝑪) § 𝑔 𝐵 ∪ B − 𝑔 𝐵 = ∑ 𝑔 𝐵 ∪ 𝐶3 − 𝑔 𝐵 ∪ 𝐶3M…

p 3o…

§ = ∑ 𝑔 𝐵 ∪ 𝐶3M… ∪ 𝑐3 − 𝑔 𝐵 ∪ 𝐶3M…

p 3o…

§ ≤ ∑ 𝑔 𝐵 ∪ {𝑐3} − 𝑔 𝐵

p 3o…

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29

𝑔 𝐵 ∪ 𝐶… − 𝑔 𝐵 ∪ 𝐶Œ + 𝑔 𝐵 ∪ 𝐶• − 𝑔 𝐵 ∪ 𝐶… + 𝑔 𝐵 ∪ 𝐶Ž − 𝑔 𝐵 ∪ 𝐶• … + 𝑔 𝐵 ∪ 𝐶p − 𝑔(𝐵 ∪ 𝐶pM…)

Work out the sum. Everything but 1st and last term cancel out: By submodularity since AÈX Ê A

slide-30
SLIDE 30

¡ 𝑔 𝑃𝑄𝑈 ≤ 𝑔 𝑇3 ∪ 𝑃𝑄𝑈 ¡ = 𝑔 𝑇3 ∪ 𝑃𝑄𝑈 − 𝑔 𝑇3 + 𝑔 𝑇3 ¡ ≤ ∑

𝑔 𝑇3 ∪ {𝑢„} − 𝑔 𝑇3 + 𝑔(𝑇3)

p „o…

¡ ≤ ∑

𝜀3ˆ…

p „o…

+ 𝑔 𝑇3

¡ = 𝑔 𝑇3 + 𝑙 𝜀3ˆ… ¡ Thus: 𝑔 𝑃𝑄𝑈 ≤ 𝑔 𝑇3 + 𝑙 𝜀3ˆ… ¡ ⇒ 𝜺𝒋ˆ𝟐 ≥ 𝟐

𝒍 [𝒈 𝑷𝑸𝑼 − 𝒈(𝑻𝒋)]

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

(by monotonicity) (by prev. slide)

OPT = { t1, … tk } tj is j-th element of the

  • ptimal solution.

Rather than choosing tj let’s greedily choose the best element qi, which gives a gain of di+1. So, 𝒈 𝑻𝒋 ∪ 𝒖𝒌 ≤ 𝜺𝒋ˆ𝟐. This is the “hill-climbing” assumption.

Remember: 𝜺𝒋 = 𝒈(𝑻𝒋) − 𝒈(𝑻𝒋M𝟐)

slide-31
SLIDE 31

¡ We just showed: 𝜀3ˆ… ≥

… p [𝑔 𝑃𝑄𝑈 − 𝑔(𝑇3)]

¡ What is 𝒈(𝑻𝒋ˆ𝟐)?

§ 𝑔 𝑇3ˆ… = 𝑔 𝑇3 + 𝜀3ˆ… § ≥ 𝑔 𝑇3 +

… p 𝑔 𝑃𝑄𝑈 − 𝑔 𝑇3

§ ≥ 1 −

… p 𝑔 𝑇3 + … p 𝑔(𝑃𝑄𝑈)

¡ What is 𝒈(𝑻𝒍)?

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

slide-32
SLIDE 32

¡ Claim:

Proof by induction:

¡ 𝒋 = 𝟏:

§ 𝑔 𝑇Œ = 𝑔({}) = 0 § 1 − 1 −

… p Œ

𝑔 𝑃𝑄𝑈 = 0

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32

) ( 1 1 1 ) ( OPT f k S f

i i

ú ú û ù ê ê ë é ÷ ø ö ç è æ -

  • ³
slide-33
SLIDE 33

¡ Given that this is true for Si:

Proof by induction:

¡ At 𝒋 + 𝟐:

§ 𝑔 𝑇3ˆ… ≥ 1 −

… p 𝑔 𝑇3 + … p 𝑔 𝑃𝑄𝑈

§ ≥ 1 −

… p

1 − 1 −

… p 3

𝑔 𝑃𝑄𝑈 +

… p 𝑔 𝑃𝑄𝑈

§ = 1 − 1 −

… p 3ˆ…

𝑔(𝑃𝑄𝑈)

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33

) ( 1 1 1 ) ( OPT f k S f

i i

ú ú û ù ê ê ë é ÷ ø ö ç è æ -

  • ³

𝑔 𝑇3ˆ… ≥ 1 − 1 𝑙 𝑔 𝑇3 + 1 𝑙 𝑔(𝑃𝑄𝑈) Two slides ago we showed: the claim

slide-34
SLIDE 34

¡ Thus:

𝒈 𝑻 = 𝒈 𝑻𝒍 ≥ 𝟐 − 𝟐 − 𝟐 𝒍

𝒍

𝒈 𝑷𝑸𝑼

¡ So:

𝒈 𝑻𝒍 ≥ 𝟐 − 𝟐 𝒇 𝒈(𝑷𝑸𝑼)

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34

≤ 𝟐 𝒇 qed.

Apply inequality: 1 + 𝑦 ≤ 𝑓”where 𝑦 = − …

p

slide-35
SLIDE 35

We just proved:

¡ Hill climbing finds solution S which

f(S) ³ (1-1/e)*f(OPT) i.e., f(S) ³ 0.63*f(OPT)

¡ This is a data independent bound

§ This is a worst case bound § No matter what is the input data, we know that the Hill-Climbing will never do worse than 0.63*f(OPT)

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35

slide-36
SLIDE 36

¡ How to evaluate influence maximization ƒ(S)?

§ Still an open question of how to compute it efficiently

¡ But: Very good estimates by simulation

§ Repeating the diffusion process often enough (polynomial in n; 1/ε) § Achieve (1± ε)-approximation to f(S) § Generalization of Nemhauser-Wolsey proof: Greedy algorithm is now a (1-1/e- ε)- approximation

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 36

slide-37
SLIDE 37

¡ Find most influential set S of size k: largest

expected cascade size f(S) if set S is activated

¡ Want to solve:

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37

c a b d e f

Network, each edge activates with prob. puv Activate edges by coin flipping

c a b d e f c a b d e f c a b d e f

Multiple realizations i. Each realization is a “parallel universe” … influence set of node a … influence set of node d

Consider S={a,d} then: f1(S)=5, f2(S)=4, f3(S)=3 and f(S) = 1/3*(5+4+3)=4

𝐛𝐬𝐡 𝐧𝐛𝐲

n op

𝒈 𝑻 = 𝟐 |𝑱| 2 𝒈𝒋(𝑻)

  • 𝒋∈𝑱
slide-38
SLIDE 38
slide-39
SLIDE 39

¡ A collaboration network: co-authorships in

papers of the arXiv high-energy physics theory:

§ 10,748 nodes, 53,000 edges § Example cascade process: Spread of new scientific terminology/method or new research area

¡ Independent Cascade Model:

§ Case 1: Uniform probability p on each edge § Case 2: Edge from v to w has probability 1/deg(w) of activating w.

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39

slide-40
SLIDE 40

¡ Simulate the process 10,000 times for each

targeted set

§ Every time re-choosing edge outcomes randomly

¡ Compare with other 3 common heuristics

§ Degree centrality: Pick nodes with highest degree § Closeness centrality: Pick nodes in the “center” of the network § Random nodes: Pick a random set of nodes

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40

slide-41
SLIDE 41

puv = 0.01 puv = 0.10

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41

Uniform edge firing probability puv

f(Sk) f(Sk) k k

slide-42
SLIDE 42

puv=1/deg(v)

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42

Non-uniform edge firing probability puv

k f(Sk)

slide-43
SLIDE 43

¡ Notice: Greedy approach is slow!

§ For a given network G, repeat 10,000s of times:

§ Flip coin for each edge and determine influence sets under coin-flip realization i § Each node u is associated with 10,000s influence sets Xu

i

§ Greedy’s complexity is 𝑷(𝒍 ⋅ 𝒐 ⋅ 𝑺 ⋅ 𝑵)

§ 𝑜 … number of nodes in G § 𝑙 … number of nodes to be selected/influenced § 𝑆 … number of simulation rounds § 𝑛 … number of edges in G

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43

slide-44
SLIDE 44

¡ Many researchers have since proposed

heuristics that work well in practice and run faster than the greedy algorithm

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44

[Chen, Wang, Yang]

slide-45
SLIDE 45

¡ More realistic viral marketing:

§ Different marketing actions increase likelihood of initial activation, for several nodes at once

¡ Study more general influence models:

§ Find trade-offs between generality and feasibility

¡ Deal with negative influences:

§ Model competing ideas

¡ Obtain more data (better models) about how

activations occur in real social networks

10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45