http cs224w stanford edu we are more influenced by our
play

http://cs224w.stanford.edu We are more influenced by our friends - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of consumers consult friends and family before purchasing home electronics 50% do


  1. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. ¡ We are more influenced by our friends than strangers ¨ 68% of consumers consult friends and family before purchasing home electronics ¨ 50% do research online before purchasing electronics 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

  3. Identify influential customers Convince them to These customers adopt the product – endorse the product Offer discount or among their friends free samples 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

  4. ¡ Information epidemics: § Which are the influential users? § Which news sites create big cascades? § Where should we advertise? Which node shall we target? vs. 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

  5. ¡ Independent Cascade Model § Directed finite 𝑯 = (𝑾, 𝑭) § Set 𝑻 starts out with new behavior § Say nodes with this behavior are “ active ” § Each edge (𝒘, 𝒙) has a probability 𝒒 𝒘𝒙 § If node 𝒘 is active, it gets one chance to make 𝒙 active, with probability 𝒒 𝒘𝒙 § Each edge fires at most once ¡ Does scheduling matter? No § If 𝒗, 𝒘 are both active at the same time, it doesn’t matter which tries to activate 𝒙 first § But the time moves in discrete steps 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

  6. ¡ Initially some nodes S are active ¡ Each edge (𝒘, 𝒙) has probability (weight) 𝒒 𝒘𝒙 0.4 a d 0.4 0.2 0.3 0.3 0.2 0.3 b f f e 0.2 e h 0.4 0.4 0.3 0.2 0.3 0.3 g g i 0.4 c ¡ When node v becomes active: § It activates each out-neighbor 𝒙 with prob. 𝒒 𝒘𝒙 ¡ Activations spread through the network 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

  7. � 0.4 a d Problem: ( k is a user-specified parameter) 0.4 0.2 0.3 0.3 0.2 ¡ Most influential set of 0.3 b f 0.2 e size k : set S of k nodes h 0.4 0.4 0.3 0.2 producing largest 0.3 0.3 g i 0.4 expected cascade size f(S) c if activated [Domingos- Influence Influence set X a of a set X d of d Richardson ‘01] f ( S ) max ¡ Optimization problem: S of size k 𝑔 𝑇 = 1 Why “expected cascade size”? X a is a result of a random process. So in |𝐽| 2 𝑔 3 (𝑇) practice we would want to compute X a for many random realizations and then maximize the “average” value f(S ). For now let’s ignore this nuisance and Random simply assume that each node u influences a set of nodes X u realizations i 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

  8. ¡ S : is initial active set ¡ f(S) : The expected size of final active set § f(S) is the size of the union of X u : 𝒈(𝑻) = ∪ 𝒗∈𝑻 𝒀 𝒗 a b d … influence set X u of node u c graph G ¡ Set S is more influential if f(S) is larger 𝒈( 𝒃, 𝒄 ) < 𝒈({𝒃, 𝒅}) < 𝒈({𝒃, 𝒆}) 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

  9. ¡ Problem: Most influential set of k nodes: set S on k nodes producing largest expected cascade size f(S) if activated ¡ The optimization problem: f ( S ) max S of size k ¡ How hard is this problem? § NP-COMPLETE! § Show that finding most influential set is at least as hard as a set cover problem 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

  10. ¡ Set cover problem (a known NP-complete problem) : § Given universe of elements 𝑽 = {𝒗 𝟐 , … , 𝒗 𝒐 } and sets 𝒀 𝟐 , … , 𝒀 𝒏 ⊆ 𝑽 X 3 X 1 U X 2 X 4 § Q: Are there k sets among X 1 ,…, X m such that their union is U ? ¡ Goal: f ( S ) Encode set cover as an instance of max S of size k 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

  11. ¡ Given a set cover instance with sets X 1 ,…, X m ¡ Build a bipartite “X-to-U” graph: Construction: • Create edge X 1 (X i ,u) " X i " u Î X i 1 e.g.: u 1 -- directed edge 1 X 1 = {u 1 , u 2 , u 3 } X 2 from sets to their u 2 1 X 3 elements u 3 • Put weight 1 on each edge (the activation is deterministic) u n X m ¡ Set Cover as Influence Maximization in X-to-U graph: There exists a set S of size k with f(S)=k+n iff there exists a size k set cover Note: Optimal solution is always a set of nodes X i (we never influence nodes “ u” ) This problem is hard in general, but there could be special cases that are easier. 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

  12. ¡ Extremely bad news: § Influence maximization is NP-complete ¡ Next, good news: § There exists an approximation algorithm! § For some inputs the algorithm won’t find globally optimal solution/set OPT § But we will also prove that the algorithm will never do too badly either. More precisely, the algorithm will find a set S that where f(S) > 0.63*g(OPT) , where OPT is the globally optimal set. 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

  13. ¡ Consider a Greedy Hill Climbing algorithm to find S : § Input: Influence set 𝒀 𝒗 of each node 𝒗: 𝒀 𝒗 = {𝒘 𝟐 , 𝒘 𝟑 , … } § That is, if we activate 𝒗 , nodes {𝒘 𝟐 , 𝒘 𝟑 , … } will eventually get active § Algorithm: At each iteration 𝒋 activate the node 𝒗 that gives largest marginal gain: 𝐧𝐛𝐲 𝒗 𝒈(𝑻 𝒋M𝟐 ∪ {𝒗}) 𝑇 𝑗 … Initially active set 𝑔(𝑇 3 ) … Size of the union of 𝑌 P , 𝑣 ∈ 𝑇 3 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

  14. Algorithm: d ¡ Start with 𝑻 𝟏 = { } b a ¡ For 𝒋 = 𝟐 … 𝒍 e c § Activate node 𝒗 that max 𝒈(𝑻 𝒋M𝟐 ∪ {𝒗}) f(S i-1 È {u}) § Let 𝑻 𝒋 = 𝑻 𝒋M𝟐 ∪ {𝒗} a ¡ Example: b § Eval. 𝑔 𝑏 , … , 𝑔({𝑓}) , pick argmax of them c § Eval. 𝑔 𝒆, 𝑏 , … , 𝑔({𝒆, 𝑓}) , pick argmax d § Eval. 𝑔(𝒆, 𝒄, 𝑏}), … , 𝑔({𝒆, 𝒄, 𝑓}) , pick argmax e 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

  15. ¡ Claim: Hill climbing produces a solution S where: f(S) ³ (1-1/e)*f(OPT) ( f(S)>0.63*f(OPT) ) [Nemhauser, Fisher, Wolsey ’78, Kempe, Kleinberg, Tardos ‘03] ¡ Claim holds for functions f(·) with 2 properties: § f is monotone: (activating more nodes doesn’t hurt) if S Í T then f (S) £ f (T) and f({})= 0 § f is submodular: (activating each additional node helps less) adding an element to a set gives less improvement than adding it to one of its subsets: " S Í T f(S È {u}) – f(S) ≥ f(T È {u}) – f(T) Gain of adding a node to a small set Gain of adding a node to a large set 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

  16. ¡ Diminishing returns: f(·) " S Í T f(T È {u}) f(T) f(S È {u}) f(S) Adding u to T helps less than adding it to S ! Set size |T|, |S| f(S È {u}) – f(S) ≥ f(T È {u}) – f(T) Gain of adding a node to a small set Gain of adding a node to a large set 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

  17. Also see the hangout posted on the course website.

  18. � ¡ We must show our f(·) is submodular: ¡ " S Í T f(S È {u}) – f(S) ≥ f(T È {u}) – f(T) Gain of adding a node to a small set Gain of adding a node to a large set ¡ Basic fact 1: § If 𝒈 𝟐 (𝒚), … , 𝒈 𝒍 (𝒚) are submodular , and 𝒅 𝟐 , … , 𝒅 𝒍 ≥ 𝟏 then 𝑮 𝒚 = ∑ 𝒅 𝒋 ] 𝒈 𝒋 𝒚 is also submodular 𝒋 (Non-negative combination of submodular functions is a submodular function) 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

  19. � f(S È {u}) – f(S) ≥ f(T È {u}) – f(T) ¡ " S Í T : Gain of adding u to a small set Gain of adding u to a large set ¡ Basic fact 2: A simple submodular function § Sets 𝒀 𝟐 , … , 𝒀 𝒏 § 𝒈 𝑻 = ⋃ 𝒀 𝒍 (size of the union of sets 𝒀 𝒍 , 𝒍 ∈ 𝑻 ) 𝒍∈𝑻 § Claim: 𝒈(𝑻) is submodular! T S The more sets you already u have the less new area a given set u will S Í T cover 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

  20. � 𝑔 𝑇 = 1 |𝐽| 2 𝑔 3 (𝑇) Random realizations i a ¡ Proof strategy: d § We will argue that b f influence maximization e h is an instance of the Set cover problem : g i c § Set cover problem: f(S) is the size of the union of nodes influenced by active set S § Note f(S) is “random” (a result of a random process) so we need to be a bit careful § Principle of deferred decision to the rescue! ¡ We will create many parallel universes and then average over them 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend