Cascades and Contagion Prof. Srijan Kumar - - PowerPoint PPT Presentation

cascades and contagion
SMART_READER_LITE
LIVE PREVIEW

Cascades and Contagion Prof. Srijan Kumar - - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Cascades and Contagion Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Todays Lecture Introduction


slide-1
SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Cascades and Contagion

  • Prof. Srijan Kumar

http://cc.gatech.edu/~srijan

slide-2
SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Today’s Lecture

  • Introduction
  • Decision based models of diffusion

– Single Adoption – Multiple Adoption

  • Probabilistic models of diffusion

– SEIR model – Independent cascade model

These slides are borrowed from Prof. Jure Leskovec’s CS224W class.

slide-3
SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Epidemics vs Cascade Spreading

  • In decision-based models nodes make

decisions based on pay-off benefits of adopting one strategy or the other.

  • In epidemic spreading:

– Lack of decision making – Process of contagion is complex and unobservable

  • In some cases it involves (or can be modeled as)

randomness

slide-4
SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Simple model: Branching Process

  • First wave: A person carrying a disease

enters the population and transmits to all she meets with probability 𝑟. She meets 𝑒 people, a portion of which will be infected.

  • Second wave: Each of the 𝑒 people goes

and meets 𝑒 different people. So we have a second wave of 𝑒 ∗ 𝑒 = 𝑒% people, a portion

  • f which will be infected.
  • Subsequent waves: same process
slide-5
SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Example with k=3

slide-6
SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

Spreading Models of Viruses

Virus Propagation: 2 Parameters:

  • (Virus) Birth rate β:

– probability that an infected neighbor attacks

  • (Virus) Death rate δ:

– Probability that an infected node heals

Infected Healthy N N1 N3 N2

  • Prob. β
  • Prob. δ
slide-7
SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

7

More Generally: S+E+I+R Models

  • General scheme for epidemic models:

– Each node can go through phases:

  • Transition probs. are governed by the model parameters

S…susceptible E…exposed I…infected R…recovered Z…immune

slide-8
SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

SIR Model

  • SIR model: Node goes through phases

– Models chickenpox or plague:

  • Once you heal, you can never get infected again
  • Assuming perfect mixing: The network is a

complete graph

  • The model dynamics are:

Susceptible Infected Recovered time Number of nodes

dI dt = βSI −δI dS dt = −βSI dR dt =δI

I(t) S(t) R(t) 𝛾 𝜀

slide-9
SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

SIS Model

  • Susceptible-Infective-Susceptible (SIS)

model

  • Cured nodes immediately become susceptible
  • Virus “strength”: 𝒕 = 𝜸 / 𝜺
  • Node state transition diagram:

Susceptible Infective

Infected by neighbor with prob. β Cured with

  • prob. δ
slide-10
SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

SIS Model

  • Models flu:

– Susceptible node

becomes infected

– The node then

heals and become susceptible again

  • Assuming perfect

mixing (a complete graph):

Susceptible Infected

I SI dt dI d b

  • =

I SI dt dS d b +

  • =

time Number of nodes

10

I(t) S(t)

slide-11
SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

11

Question: Epidemic threshold 𝝊

  • SIS Model:

Epidemic threshold of an arbitrary graph G is τ, such that:

– If virus “strength” s = β / δ < τ the epidemic can not happen (it eventually dies out)

  • Given a graph what is its epidemic

threshold?

slide-12
SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Epidemic Threshold in SIS Model

  • Fact: We have no epidemic if:

β/δ < τ = 1/ λ1,A

► λ1,A alone captures the property of the

graph! (Virus) Birth rate (Virus) Death rate Epidemic threshold largest eigenvalue

  • f adj. matrix A of G

[Wang et al. 2003]

slide-13
SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Experiments on an Small Graph

13

100 200 300 400 500 250 500 750 1000

Time Number of Infected Nodes

δ: 0.05 0.06 0.07 Oregon β = 0.001

s=β/δ > τ (above threshold) s=β/δ = τ (at the threshold) s=β/δ < τ (below threshold)

10,900 nodes and 31,180 edges

[Wang et al. 2003]

Autonomous Systems Graph

slide-14
SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Experiments

  • Does it matter how many people are

initially infected?

slide-15
SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

Modeling Ebola with SEIR

[Gomes et al., 2014]

[Gomes et al., Assessing the International Spreading Risk Associated with the 2014 West African Ebola Outbreak, PLOS Current Outbreaks, ‘14]

slide-16
SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Example: Ebola

S: susceptible individuals, E: exposed individuals, I: infectious cases in the community, H: hospitalized cases, F: dead but not yet buried, R: individuals no longer transmitting the disease

[Gomes et al., Assessing the International Spreading Risk Associated with the 2014 West African Ebola Outbreak, PLOS Current Outbreaks, ‘14]

slide-17
SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

Application: Rumor spread modeling using SEIZ model

References: 1. Epidemiological Modeling of News and Rumors on Twitter. Jin et al. SNAKDD 2013 2. False Information on Web and Social Media: A survey. Kumar et al., arXiv :1804.08559

slide-18
SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

SEIZ model: Extension of SIS model

slide-19
SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

Recap: SIS model

slide-20
SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Details of the SEIZ model

Notation:

– S = Susceptible – I = Infected – E = Exposed – Z = Skeptics

slide-21
SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Dataset

Tweets collected from eight stories: Four rumors and four real

REAL EVENTS RUMORS

slide-22
SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Method: Fitting SEIZ model to data

  • SEIZ model is fit to each cascade to minimize the

difference |𝐽(𝑢) – 𝑢𝑥𝑓𝑓𝑢𝑡(𝑢)|: – 𝑢𝑥𝑓𝑓𝑢𝑡(𝑢) = number of rumor tweets – 𝐽(𝑢) = the estimated number of rumor tweets by the

model

  • Use grid-search and find the parameters with

minimum error

slide-23
SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

Fitting to “Boston Marathon Bombing”

SEIZ model better models the real data, especially at initial points

slide-24
SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

Fitting to "Pope resignation” data

SEIZ model better models the real data, especially at initial points

slide-25
SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

Rumor detection with SEIZ model

Notation: S = Susceptible I = Infected E = Exposed Z = Skeptics

New metric:

All parameters learned by model fitting to real data (from previous slides)

slide-26
SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

Rumor detection by RSI

Parameters obtained by fitting SEIZ model efficiently identifies rumors vs. news

Rumors

slide-27
SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

Today’s Lecture

  • Introduction
  • Decision based models of diffusion

– Single Adoption – Multiple Adoption

  • Probabilistic models of diffusion

– SEIZ model – Independent cascade model

slide-28
SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

Linear Threshold Model

, active neighbor of v w v w v

b q ³

å

  • A decision-based model
  • A node v has random threshold 𝜄𝑤 ~ U[0,1]
  • A node v is influenced by each neighbor w

according to a weight 𝑐𝑤,𝑥 such that

  • A node v becomes active when >=

(weighted) 𝜾𝒘 fraction of its neighbors are active

, neighbor of

1

v w w v

b £

å

slide-29
SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

Linear Threshold Model

Inactive Node Active Node Threshold Active neighbors

v w

0.5 0.3 0.2 0.5 0.1 0.4 0.3 0.2 0.6 0.2

Stop!

U X

slide-30
SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

Probabilistic Contagion

  • Independent Cascade Model

– Directed finite 𝑯 = (𝑾, 𝑭) – Set 𝑻 starts out with new behavior

  • Say nodes with this behavior are “active”

– Each edge (𝒘, 𝒙) has a probability 𝒒𝒘𝒙 – If node 𝒘 is active, it gets one chance to make 𝒙 active, with probability 𝒒𝒘𝒙

  • Each edge fires at most once
  • Does scheduling matter? No
  • If 𝒗, 𝒘 are both active at the same time, it doesn’t

matter which tries to activate 𝒙 first

– But the time moves in discrete steps

slide-31
SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

Independent Cascade Model

  • Initially some nodes S are active
  • Each edge (u,v) has probability (weight) puv
  • When node u becomes active/infected:

– It activates each out-neighbor v with prob. puv

  • Activations spread through the network!

0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.2

e g f c b a d h i f g e

slide-32
SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

Independent Cascade Modal

  • Independent cascade model

is simple but requires many parameters!

– Estimating them from data is very hard [Goyal et al. 2010]

  • Solution: Make all edges have the same

weight (which brings us back to the SIR model)

– Simple, but too simple

  • Can we do something better?

0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.2

e g f c b a d h i f g e

slide-33
SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

Exposures and Adoptions

  • From exposures to adoptions

– Exposure: Node’s neighbor exposes the node to the contagion – Adoption: The node acts on the contagion

[KDD ‘12]

slide-34
SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

34

Exposure Curves

  • Exposure curve:

– Probability of adopting new behavior depends on the total number of friends who have already adopted

  • What’s the dependence?

k = number of friends adopting

  • Prob. of adoption

k = number of friends adopting

  • Prob. of adoption

“Probabilistic” spreading: Viruses, Information Critical mass: Decision making … adopters

slide-35
SLIDE 35

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

35

Exposure Curves

  • From exposures to adoptions

– Exposure: Node’s neighbor exposes the node to information – Adoption: The node acts on the information

  • Examples of different adoption curves:

Prob(Infection) # exposures Probability of infection ever increases Nodes build resistance [KDD ‘12]

slide-36
SLIDE 36

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

36

Diffusion in Viral Marketing

  • Senders and followers of

recommendations receive discounts on products

  • Data: Incentivized Viral Marketing

program

– 16 million recommendations – 4 million people, 500k products

10% credit 10% off

slide-37
SLIDE 37

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Exposure Curve: Validation

37

Probability of purchasing

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 10 20 30 40

DVD recommendations (8.2 million observations) # recommendations received

[Leskovec et al., TWEB ’07]

slide-38
SLIDE 38

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

38

Exposure Curve: LiveJournal

  • Group memberships spread over the

network:

– Red circles represent existing group members – Yellow squares may join

  • Question:

– How does prob. of joining a group depend on the number of friends already in the group?

[Backstrom et al. KDD ‘06]

slide-39
SLIDE 39

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

39

  • LiveJournal group membership

k (number of friends in the group)

  • Prob. of joining

[Backstrom et al., KDD ’06]

Exposure Curve: LiveJournal

slide-40
SLIDE 40

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

40

Today’s Lecture

  • Introduction
  • Decision based models of diffusion

– Single Adoption – Multiple Adoption

  • Probabilistic models of diffusion

– SEIZ model – Independent cascade model