http://cs224w.stanford.edu Probabilistic models of network contagion - - PowerPoint PPT Presentation
http://cs224w.stanford.edu Probabilistic models of network contagion - - PowerPoint PPT Presentation
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models of network contagion How
Probabilistic models of network contagion Probabilistic models of network contagion How contagions diffuse in real‐life:
g
- Viral marketing
- Blogs
Blogs
- Group membership
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
How do viruses/rumors propagate?
How do viruses/rumors propagate?
Will a flu‐like virus linger, or will it become extinct? (Virus) birth rate β: (Virus) birth rate β:
- probability than an infected neighbor attacks
(Virus) death rate δ:
- probability that an infected node heals Healthy
N2
- Prob. δ
N N1
2
- Prob. β
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
Infected N3
10/13/2009
General scheme for epidemic models: General scheme for epidemic models:
S…susceptible E…exposed I…infected d
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
R…recovered Z…immune
4
Assuming perfect
g p mixing, i.e., a network is a complete graph
- des
The model dynamics:
mber of no Nu time
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Susceptible Infected Recovered
5
Susceptible Infective Susceptible (SIS) model Susceptible‐Infective‐Susceptible (SIS) model Cured nodes immediately become susceptible Virus “strength”: s = β / δ Virus strength : s = β / δ
Infected by neighbor with prob. β
Susceptible Infective
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
Cured internally with prob. δ
10/13/2009
f
Assuming perfect
mixing (complete graph):
nodes
graph):
I SI dS
Number of n
I SI dI dt
N S sceptible Infected
I SI dt
time
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Susceptible Infected
7
Representing SIS epidemic an SIR model Representing SIS epidemic an SIR model
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
Epidemic threshold of a graph is a Epidemic threshold of a graph is a
value of t, such that:
- If strength s = β / δ < t epidemic can not
- If strength s = β / δ < t epidemic can not
happen (it eventually dies out)
Given a graph compute its epidemic threshold
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9 10/13/2009
What should t depend on? What should t depend on?
- Avg. degree? And/or highest degree?
- A d/
i f d ?
- And/or variance of degree?
- And/or third moment of degree?
A d/ di ?
- And/or diameter?
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 10
[Wang et al. 2003]
We have no epidemic if: We have no epidemic if:
(Virus) Death Epidemic threshold
β/δ < τ = 1/ λ1 A
( ) rate
β
1,A
(Virus) Birth rate largest eigenvalue
► λ A alone captures the property of the graph!
- f adj. matrix A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
► λ1,A alone captures the property of the graph!
10/13/2009 11
[Wang et al. 2003]
500
Oregon β 0 001
10,900 nodes and 31,180 edges
400
d Nodes
β = 0.001
β/δ > τ (above threshold)
3 , g
200 300
f Infected
100 200
umber of
β/δ = τ (at the threshold)
250 500 750 1000
N
β/δ < τ
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
Time
δ: 0.05 0.06 0.07
β (below threshold)
Does it matter how many people are Does it matter how many people are
initially infected?
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 13
Prob of adoption depends on the number of Prob. of adoption depends on the number of
friends who have adopted [Bass ‘69, Granovetter ’78]
- What is the shape?
What is the shape?
- Distinction has consequences for models and algorithms
- n
- n
- f adoptio
- f adoptio
k = number of friends adopting
- Prob. o
k = number of friends adopting
- Prob. o
k = number of friends adopting k = number of friends adopting
Diminishing returns? Critical mass?
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14
[Leskovec et al., TWEB ’07]
Senders and followers of recommendations
receive discounts on products
10% credit 10% off
- Data – Incentivized Viral Marketing program
- 16 million recommendations
illi l
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
- 4 million people
- 500,000 products
10/13/2009 15
[Backstrom et al., KDD ’06]
Use social networks where people belong to Use social networks where people belong to
explicitly defined groups
Each group defines a behavior that diffuses Each group defines a behavior that diffuses Data – LiveJournal:
- On‐line blogging community with
friendship links and user‐defined groups p g p
- Over a million users update content
each month
- Over 250,000 groups to join
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 16
[Leskovec et al., TWEB ’07]
DVD recommendations asing
0.09 0.1
DVD recommendations (8.2 million observations)
- f purcha
0 05 0.06 0.07 0.08
bability o
0 02 0.03 0.04 0.05
Prob
0.01 0.02 10 20 30 40
17
10 20 30 40
# recommendations received
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
[Backstrom et al., KDD ’06]
LiveJournal community membership LiveJournal community membership
- ining
- rob. of jo
k ( b f f i d i th it ) Pr
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
k (number of friends in the community)
10/13/2009 18
For viral marketing: For viral marketing:
- We see that node v receiving the i‐th
recommendation and then purchased the product p p
For communities:
- At time t we see the behavior of node v’s friends
Questions:
- When did v become aware of recommendations or
f i d ’ b h i ? friends’ behavior?
- When did it translate into a decision by v to act?
- How long after this decision did v act?
- How long after this decision did v act?
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 19
Large anonymous online retailer
Large anonymous online retailer (June 2001 to May 2003) 15 646 121 d ti
15,646,121 recommendations 3,943,084 distinct customers 548 523 products recommended
548,523 products recommended
Products belonging to 4 product groups:
- books
- DVDs
- music
- VHS
20 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
purchase following a recommendation customer recommending a product customer not buying a recommended product
Majority of recommendations do not cause purchases nor cause purchases nor propagation
Notice many star‐like patterns
21 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
patterns
Many disconnected components
t < t <
< t
t1 < t2 < … < tn
t3 legend
bought but didn’t i di t
t1
receive a discount bought and received a discount
t2
received a recommendation but didn’t buy
t5
22
t4
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
What role does the product category play? What role does the product category play?
products customers recommenda- tions edges buy + get discount buy + no discount tions discount discount Book 103,161 2,863,977 5,741,611 2,097,809 65,344 17,769 DVD 19,829 805,285 8,180,393 962,341 17,232 58,189 Music 393,598 794,148 1,443,847 585,738 7,837 2,739 Video 26,131 239,583 280,270 160,683 909 467 F ll 542 719 3 943 084 15 646 121 3 153 676 91 322 79 164 Full 542,719 3,943,084 15,646,121 3,153,676 91,322 79,164
people recommendations
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
high low
10/13/2009 23
There are relatively few DVD titles, but DVDs account for ~ 50% of recommendations.
Recommendations per person
- DVD: 10
- books and music: 2
- VHS: 1
- VHS: 1
Recommendations per purchase
- books: 69
- DVDs: 108
- music: 136
- VHS: 203
Overall there are 3.69 recommendations per node on 3.85 different products.
Music recommendations reached about the same number of people as DVDs but used only 1/5 as many recommendations
Book recommendations reached by far the most people – 2.8 million.
All networks have a very small number of unique edges For books videos and
All networks have a very small number of unique edges. For books, videos and music the number of unique edges is smaller than the number of nodes – the networks are highly disconnected
24 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Does sending more recommendations Does sending more recommendations
influence more purchases?
BOOKS DVDs
0.5 s 6 7 s 0.3 0.4
- f Purchase
3 4 5
- f Purchase
0.1 0.2 Number o 1 2 3 Number o
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 25
10 20 30 40 50 60 Outgoing Recommendations 20 40 60 80 100 120 140 Outgoing Recommendations
What is the effectiveness of subsequent What is the effectiveness of subsequent
recommendations?
10
- 3
10 12x 10
3
ng 0.06 0.07 ng
BOOKS DVDs
8 bility of buyin 0 04 0.05 bility of buyin 6 Probab 0.03 0.04 Probab
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 26
5 10 15 20 25 30 35 40 4 Exchanged recommendations 5 10 15 20 25 30 35 40 0.02 Exchanged recommendations
consider successful recommendations in terms of
- av # senders of recommendations per book category
- av. # senders of recommendations per book category
- av. # of recommendations accepted
books overall have a 3% success rate
- (2% with discount, 1% without)
lower than average success rate (significant at p=0 01 level)
lower than average success rate (significant at p=0.01 level)
- fiction
- romance (1.78), horror (1.81)
- teen (1.94), children’s books (2.06)
- i
(2 30) i fi (2 34) t d th ill (2 40)
- comics (2.30), sci‐fi (2.34), mystery and thrillers (2.40)
- nonfiction
- sports (2.26)
- home & garden (2.26)
- travel (2 39)
- travel (2.39)
higher than average success rate (statistically significant)
- professional & technical
- medicine (5.68)
- professional & technical (4 54)
- professional & technical (4.54)
- engineering (4.10), science (3.90), computers & internet (3.61)
- law (3.66), business & investing (3.62)
27 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
47 000 customers responsible for the 2 5 out 47,000 customers responsible for the 2.5 out
- f 16 million recommendations in the system
d f
29% success rate per recommender of an
anime DVD
Giant component covers 19% of the nodes
O ll d ti f DVD
Overall, recommendations for DVDs are more
likely to result in a purchase (7%), but the anime community stands out anime community stands out
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 28
Variable transformation Coefficient const
- 0.940 ***
# recommendations ln(r) 0 426 *** # recommendations ln(r) 0.426 # senders ln(ns)
- 0.782 ***
# recipients ln(n )
- 1 307 ***
# recipients ln(nr) 1.307 product price ln(p) 0.128 *** # reviews ln(v)
- 0 011 ***
# reviews ln(v)
- 0.011
- avg. rating
ln(t)
- 0.027 *
R2 0 74
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29
R2 0.74
significance at the 0.01 (***), 0.05 (**) and 0.1 (*) levels
12x 10
4
10
6
10 nent 4x 10
6
6 8 compon 2 n # nodes 4 6
- f giant
10 20 m (month) 1.7*106m 2 size by month quadratic fit m (month)
30
1 2 3 4 x 10
6
number of nodes quadratic fit
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
94% of users make first recommendation without having
g received one previously
Size of giant connected component increases from 1% to
2 % f h k (100 20 ) ll! 2.5% of the network (100,420 users) – small!
Some sub‐communities are better connected
24% f 18 000 f V
- 24% out of 18,000 users for westerns on DVD
- 26% of 25,000 for classics on DVD
- 19% of 47,000 for anime (Japanese animated film) on DVD
Others are just as disconnected
- 3% of 180,000 home and gardening
- 2‐7% for children’s and fitness DVDs
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 31
Products suited for Viral Marketing:
small and tightly knit community
- few reviews, senders, and recipients
- but sending more recommendations helps
pricey products pricey products rating doesn’t play as much of a role
Observations for future diffusion models:
purchase decision more complex than threshold or simple infection influence saturates as the number of contacts expands links user effectiveness if they are overused
Conditions for successful recommendations:
professional and organizational contexts discounts on expensive items small tightly knit communities small, tightly knit communities
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 32
How big are cascades? How big are cascades? What are the building
blocks of cascades? blocks of cascades?
973 938Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Medical guide book DVD
10/13/2009 33
Given a (social) network Given a (social) network A process by spreading over the network
creates a graph (a tree) creates a graph (a tree)
Cascade (propagation graph) Social network
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Let’s count cascades
10/13/2009 34
high
General observations:
- DVDs have the richest
d ( t
cascades different
g low
cascades (most recommendations, most densely linked)
cascades different Book 122,657 959 DVD 289 055 87 614
- Books have small
cascades
- M
i i 3 ti l
DVD 289,055 87,614 Music 13,330 158
- Music is 3 times larger
than video but does not have much variety in
Video 1,928 109
number of vocabulary
35
cascades
all “words” y size
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
is the most common cascade subgraph
is the most common cascade subgraph
It accounts for ~75% cascades in books, CD
and VHS, only 12% of DVD cascades , y
is 6 (1.2 for DVD) times more frequent than
For DVDs is more frequent than Chains ( ) are more frequent than
i f t th lli i
is more frequent than a collision ( ) (but collision has less edges)
Late split (
) is more frequent than
Late split ( ) is more frequent than
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 36
Stars (“no propagation”) Stars ( no propagation ) Bipartite cores (“common friends”) Nodes having same friends Nodes having same friends
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 37
Delete late recommendations Count how many people are in a single cascade Exclude nodes that did not buy
steep drop‐off books 10
6
= 1.8e6 x-4.98 10
4
very few large cascades 10
2
38
10 10
1
10
2
10
10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
DVD cascades can grow large DVD cascades can grow large Possibly as a result of websites where people
sign up to exchange recommendations sign up to exchange recommendations
shallow drop off – fat tail
~ x-1.56 10
4
Count
a number of large cascades
10
2
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
10 10
1
10
2
10
3
10
x = Cascade size (number of nodes)
10/13/2009 39
[Leskovec et al., SDM ’07]
Posts Blogs Time Information cascade
D t Bl
Time
- rdered
hyperlinks
Data – Blogs:
- We crawled 45,000 blogs for 1 year
- 10 million posts and 350,000 cascades
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 40
Cascade shapes (ranked by frequency)
The probability of The probability of
- bserving a cascade
- n n nodes follows a
- unt
f Zipf distribution: p(n) ~ n-2
Co x = Cascade size (number of nodes)
10/13/2009 41 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Most of cascades are trees
r of edges diameter Number Effective d Cascade size (number of nodes) Cascade size
Number of
Count Count
Number of cascades per node also follows power‐law
Number of joined cascades C Cascades per node
p distribution.
10/13/2009 42 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Cascade sizes follow a heavy‐tailed distribution Cascade sizes follow a heavy‐tailed distribution
- Viral marketing:
- Books: steep drop‐off: power‐law exponent ‐5
Books: steep drop off: power law exponent 5
- DVDs: larger cascades: exponent ‐1.5
- Blogs:
- Power‐law exponent ‐2
What’s a good model?
- What role does the underlying social network play?
- Can make a step towards more realistic cascade
generation (propagation) model? generation (propagation) model?
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/13/2009 43
1) Randomly pick blog to infect add to cascade 2) Infect each in‐linked neighbor with probability infect, add to cascade.
B1 B2 1 1
neighbor with probability
B1 B1
B1 B2 1 1 B4 B3 2 1 3 1 B4 B3 2 1 3 1
4
3) Add infected neighbors to cascade. 4) Set node infected in (i) to uninfected.
B1 B2 1 1 2 1 B1 B2 1 1 2 1
B1 B1
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
B4 B3 1 3 B4 B3 1 3
B4 B4
10/13/2009 44
Generative model
Count Count
produces realistic cascades
Cascade size Cascade node in‐degree
β=0.025
- unt
nt Co Cou
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Most frequent cascades
Size of star cascade Size of chain cascade
10/13/2009 45
Blogs – information epidemics Blogs – information epidemics
- Which are the influential/infectious blogs?
Viral marketing
- Who are the trendsetters?
Who are the trendsetters?
- Influential people?
Disease spreading
- Where to place monitoring stations to detect
p g epidemics?
46 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu