Maximizing the Spread of Maximizing the Spread of I nfluence - - PowerPoint PPT Presentation

maximizing the spread of maximizing the spread of i
SMART_READER_LITE
LIVE PREVIEW

Maximizing the Spread of Maximizing the Spread of I nfluence - - PowerPoint PPT Presentation

Maximizing the Spread of Maximizing the Spread of I nfluence through a Social I nfluence through a Social Network Network By David Kempe, Jon Kleinberg, By David Kempe, Jon Kleinberg, Eva Tardos Eva Tardos Report by Joe Abrams Report by


slide-1
SLIDE 1

Maximizing the Spread of Maximizing the Spread of I nfluence through a Social I nfluence through a Social Network Network

By David Kempe, Jon Kleinberg, By David Kempe, Jon Kleinberg, Eva Tardos Eva Tardos Report by Joe Abrams Report by Joe Abrams

slide-2
SLIDE 2

Social Networks Social Networks

slide-3
SLIDE 3

Infectious disease networks Infectious disease networks

slide-4
SLIDE 4

Viral Marketing Viral Marketing

slide-5
SLIDE 5

Viral Marketing Viral Marketing

  • Example:

Example: Hotmail Hotmail

  • Included service

Included service’ ’s URL in every email sent s URL in every email sent by users by users

  • Grew from zero to 12 million users in 18

Grew from zero to 12 million users in 18 months with small advertising budget months with small advertising budget

slide-6
SLIDE 6

Domingos and Richardson Domingos and Richardson (2001, 2002) (2001, 2002)

  • Introduction to maximization of influence

Introduction to maximization of influence

  • ver social networks
  • ver social networks
  • Intrinsic Value vs. Network Value

Intrinsic Value vs. Network Value

  • Expected Lift in Profit (ELP)

Expected Lift in Profit (ELP)

  • Epinions,

Epinions, “ “web of trust web of trust” ”, 75,000 users and , 75,000 users and 500,000 edges 500,000 edges

slide-7
SLIDE 7

Domingos and Richardson Domingos and Richardson (2001, 2002) (2001, 2002)

  • Viral marketing (using greedy hill

Viral marketing (using greedy hill-

  • climbing

climbing strategy) worked very well compared with strategy) worked very well compared with direct marketing direct marketing

  • Robust (69% of total lift knowing only 5%

Robust (69% of total lift knowing only 5%

  • f edges)
  • f edges)
slide-8
SLIDE 8

Diffusion Model: Linear Diffusion Model: Linear Threshold Model Threshold Model

  • Each node (consumer) influenced by set

Each node (consumer) influenced by set

  • f neighbors; has threshold
  • f neighbors; has threshold Θ

Θ from from uniform distribution [0,1] uniform distribution [0,1]

  • When combined influence reaches

When combined influence reaches threshold, node becomes threshold, node becomes “ “active active” ”

  • Active node now can influence its

Active node now can influence its neighbors neighbors

  • Weighted edges

Weighted edges

slide-9
SLIDE 9

Diffusion Model: Linear Diffusion Model: Linear Threshold Model Threshold Model

slide-10
SLIDE 10

Diffusion Model: Independent Diffusion Model: Independent Cascade Model Cascade Model

  • Each active node has a probability

Each active node has a probability p p of

  • f

activating a neighbor activating a neighbor

  • At time

At time t t+1, all newly activated nodes try +1, all newly activated nodes try to activate their neighbors to activate their neighbors

  • Only one attempt for per node on target

Only one attempt for per node on target

  • Akin to turn

Akin to turn-

  • based strategy game?

based strategy game?

slide-11
SLIDE 11

Influence Maximization Influence Maximization

  • Using greedy hill

Using greedy hill-

  • climbing strategy, can

climbing strategy, can approximate optimum to within a factor of approximate optimum to within a factor of (1 (1 – – 1/e 1/e – – ε ε), or ~63% ), or ~63%

  • Proven using theories of submodular

Proven using theories of submodular functions (diminishing returns) functions (diminishing returns)

  • Applies to both diffusion models

Applies to both diffusion models

slide-12
SLIDE 12

Testing on network data Testing on network data

  • Co

Co-

  • authorship network

authorship network

  • High

High-

  • energy physics theory section of

energy physics theory section of www.arxiv.org www.arxiv.org

  • 10,748 nodes (authors) and ~53,000

10,748 nodes (authors) and ~53,000 edges edges

  • Multiple co

Multiple co-

  • authored papers listed as

authored papers listed as parallel edges (greater weight) parallel edges (greater weight)

slide-13
SLIDE 13

Testing on network data Testing on network data

  • Linear Threshold: influence weighed by #

Linear Threshold: influence weighed by #

  • f parallel lines, inversely weighed by
  • f parallel lines, inversely weighed by

degree of target node: w = c degree of target node: w = cu,v

u,v /d

/dv

v

  • Independent Cascade:

Independent Cascade: p p set at 1% and set at 1% and 10%; total probability for 10%; total probability for u v u v is is 1 1 – – (1 (1 – – p p)^c )^cu,v

u,v

  • Weighted Cascade:

Weighted Cascade: p p = 1/ d = 1/ dv

v

slide-14
SLIDE 14

Algorithms Algorithms

  • Greedy hill

Greedy hill-

  • climbing

climbing

  • High degree: nodes with greatest number

High degree: nodes with greatest number

  • f edges
  • f edges
  • Distance centrality: lowest average

Distance centrality: lowest average distance with other nodes distance with other nodes

  • Random

Random

slide-15
SLIDE 15

Algorithms Algorithms

slide-16
SLIDE 16

Results: Linear Threshold Model Results: Linear Threshold Model

Greedy: ~40% better than central, ~18% better than high degree

slide-17
SLIDE 17

Results: Weighted Cascade Results: Weighted Cascade Model Model

slide-18
SLIDE 18

Results: Independent Cascade, Results: Independent Cascade, p p = 1% = 1%

slide-19
SLIDE 19

Results: Independent Cascade, Results: Independent Cascade, p p = 10% = 10%

slide-20
SLIDE 20

Advantages of Random Selection Advantages of Random Selection

slide-21
SLIDE 21

Generalized models Generalized models

  • Generalized Linear Threshold: for node

Generalized Linear Threshold: for node v v, , influence of neighbors not necessarily sum influence of neighbors not necessarily sum

  • f individual influences
  • f individual influences
  • Generalized Independent Cascade: for

Generalized Independent Cascade: for node node v v, probability , probability p p depends on set of depends on set of v v’ ’s s neighbors that have previously tried to neighbors that have previously tried to activate activate v v

  • Models computationally equivalent,

Models computationally equivalent, impossible to guarantee approximation impossible to guarantee approximation

slide-22
SLIDE 22

Non Non-

  • Progressive Threshold

Progressive Threshold Model Model

  • Active nodes can become inactive

Active nodes can become inactive

  • Similar concept: at each time

Similar concept: at each time t t, whether , whether

  • r not
  • r not v

v becomes/stays active depends on becomes/stays active depends on if influence meets threshold if influence meets threshold

  • Can

Can “ “intervene intervene” ” at different times; need at different times; need not perform all interventions at not perform all interventions at t t = 0 = 0

  • Answer to progressive model with graph G

Answer to progressive model with graph G equivalent to non equivalent to non-

  • progressive model with

progressive model with layered graph G layered graph Gτ

τ

slide-23
SLIDE 23

General Marketing Strategies General Marketing Strategies

  • Can divide up total budget

Can divide up total budget κ κ into equal into equal increments of size increments of size δ δ

  • For greedy hill

For greedy hill-

  • climbing strategy, can

climbing strategy, can guarantee performance within factor of guarantee performance within factor of 1 1 – – e^[ e^[-

  • (

(κ κ * *γ γ)/( )/(κ κ + + δ δ * *n n)] )]

  • As

As δ δ decreases relative to decreases relative to κ κ, result , result approaches 1 approaches 1 – – e e-

  • 1

1 = 63%

= 63%

slide-24
SLIDE 24

Strengths of paper Strengths of paper

  • Showed results in two complementary

Showed results in two complementary fashions: theoretical models and test fashions: theoretical models and test results using real dataset results using real dataset

  • Demonstrated that greedy hill

Demonstrated that greedy hill-

  • climbing

climbing strategy could guarantee results within strategy could guarantee results within 63% of optimum 63% of optimum

  • Used specific and generalized versions of

Used specific and generalized versions of two different diffusion models two different diffusion models

slide-25
SLIDE 25

Weaknesses of paper Weaknesses of paper

  • Doesn

Doesn’ ’t fully explain methodology of t fully explain methodology of greedy hill greedy hill-

  • climbing strategy

climbing strategy

  • Lots of work not shown

Lots of work not shown – – simply refers to simply refers to work done in other papers work done in other papers

  • Threshold value uniformly distributed?

Threshold value uniformly distributed?

  • Influence inversely weighted by degree of

Influence inversely weighted by degree of target? target?

slide-26
SLIDE 26

Questions? Questions?