Generative Models for Rapid Propagation of Information Propagation - - PowerPoint PPT Presentation

generative models for rapid propagation of information
SMART_READER_LITE
LIVE PREVIEW

Generative Models for Rapid Propagation of Information Propagation - - PowerPoint PPT Presentation

Generative Models for Rapid Propagation of Information Propagation of Information Kirill Dyagilev (Technion & IBM) Shie Mannor (Technion) Elad Yom-Tov (IBM) Social Networks The accessibility of large-scale social data lead to an


slide-1
SLIDE 1

Generative Models for Rapid Propagation of Information Propagation of Information

Kirill Dyagilev (Technion & IBM) Shie Mannor (Technion) Elad Yom-Tov (IBM)

slide-2
SLIDE 2

Social Networks

The accessibility of large-scale social data lead to an explosion of research in the field of complex networks. Social data can be used for the following purposes:

Marketing Campaign management (Hill et.al.) Fraud detection (Hill et.al.) “Churn” prediction (Nanavati et.al., Richter et.al.)

slide-3
SLIDE 3

Influential Subscribers

One of the central questions - identification of

influential subscribers in the network.

These subscribers can be used as seeds in marketing

campaigns, sources of news items etc. campaigns, sources of news items etc.

Goldenberg et.al. showed a significant role of well-

connected individuals in disseminating information and in adoption of innovations.

However, he considered a static graph of social relations,

rather than dynamics of social interaction.

slide-4
SLIDE 4

Our contribution

We investigate the dynamics of information

propagation, i.e., the actual sequences of information- passing events.

We introduce a notion of significance of nodes based

  • n their dynamic behavior.
slide-5
SLIDE 5

Rapid Propagation of Information (“Gossip”)

We focus on rapid propagation of information (RPI). We look for a sequences of interactions in which once

the information is received, it is

either transferred to somebody else during a relatively short either transferred to somebody else during a relatively short

period of time(say T); or

It will not be transferred to anyone.

slide-6
SLIDE 6

Additional Scenario of Gossip Propagation

slide-7
SLIDE 7

Outline

Algorithm for identification of event of rapid

propagation of information

Observations in Real-World data Evidence for Information Propagation Evidence for Information Propagation Generative Models of Information Propagation Future Work

slide-8
SLIDE 8

Rapid Propagation of Information

Goal: Identify an RPI - sequences of calls involved in

rapid propagation of information.

Calls C1 and C2 are T-connected if they share a

common subscriber and the time interval between them common subscriber and the time interval between them < T min.

This observation scales up easily to several calls.

A B C C1 C2 A B C C1 C2 E C4 F C5 D C3

slide-9
SLIDE 9

Identification of RPI in Call Data

Build a line graph in which nodes correspond to calls

and directed edges connect calls from the same RPI.

A B C C1 C2 C1 C2

Partition this graph to trees using the DFS algorithm. Define large-enough DFS trees (> 4 calls, > 4

subscribers) as RPIs.

slide-10
SLIDE 10

Interpretation of GPCs – Information Cascades

We then translate the set of calls in each RPI to an

information cascade.

Namely, we produce a tree that describes paths in

which the information propagates from the source subscriber to all the others. subscriber to all the others.

A B C D E F G

slide-11
SLIDE 11

Outline

Algorithm for identification of event of rapid

propagation of information

Observations in Real-World data Evidence for Information Propagation Evidence for Information Propagation Generative Models of Information Propagation Future Work

slide-12
SLIDE 12

Real-world data

We applied our algorithm to call data records (CDRs)

  • f two large cellular operators from different parts of

the world:

Operator 1: Operator 1:

50 million calls over 24 days, total 5.4 million of distinct subscribers, out which

approximately 2 million belonged to the analyzed

  • perator.

Operator 2:

Twice as many calls in the same period of 24 days. Similar number of subscribers.

slide-13
SLIDE 13

Real-world data (cont.)

Description of each call contains:

Obfuscated identity of subscribers involved.

Obfuscated identity of subscribers involved. Beginning time of the call and its duration.

slide-14
SLIDE 14

Structural Properties of RPIs

Size distribution of RPIs (T=20min): Size distribution is almost identical for both data sets.

slide-15
SLIDE 15

Structural Properties of RPIs

Average number of RPIs by weekdays (T=20min):

slide-16
SLIDE 16

Properties of Information Cascades

We used clustering to isolate typical topologies of information cascade.

  • 1. Pure star.
  • 3. Pure star + single

additional node.

  • 2. Initialization call + pure

star. These topologies cover over 60% of all RPIs. They all have one dominant node – dissemination- leader.

slide-17
SLIDE 17

Properties of Information Cascades (cont.)

  • 4. Strings.
  • 5. Star + Strings.

Star 34% Star + Other 19%

  • 6. The rest of the trees.

Init + Star 14% Star + Node 18% Strings 4% Star + Strings 11%

slide-18
SLIDE 18

Dissemination-Leaders Vs. Hubs

We compared the set of hubs (subscribers with top 5% of number of friends) and the set of dissemination-leaders. These sets overlap, but differ in a significant way:

  • 41% of hubs are also dissemination-leaders.
  • 64% of dissemination-leaders are hubs.
slide-19
SLIDE 19

Outline

Algorithm for identification of event of rapid

propagation of information

Observations in Real-World data Evidence for Information Propagation Evidence for Information Propagation Generative Models of Information Propagation Future Work

slide-20
SLIDE 20

Do RPIs really propagate information?

Downside: without knowing the content of calls, it is

impossible to verify that RPIs disseminate information.

Upside:

RPI cover several intuitive scenarios of information

propagation.

Basic properties of RPIs make sense. We can provide certain circumstantial evidence for the

hypothesis.

slide-21
SLIDE 21

Geographic Evidence for Information Propagation

The following experiment shows that some RPIs

propagate geospatial information.

We can estimate the location of a subscriber using the

number of the antenna (cell) his phone uses during the number of the antenna (cell) his phone uses during the current call.

Consider cells visited in a single day by a pair of socially

connected subscribers: A and B.

A A&B B A B A&B B

slide-22
SLIDE 22

Geographic Evidence for Information Propagation

Consider 85,000 pairs of socially-connected

subscribers

Count the number of “shared” cells Count the number of “shared” cells

On a day in which they appeared in the same RPI. On a day their communication did not appear in a RPI.

The number of “shared” cells increases on the day

these subscribers participate in the same RPI.

slide-23
SLIDE 23

Outline

Algorithm for identification of event of rapid

propagation of information

Observations in Real-World data Evidence for Information Propagation Evidence for Information Propagation Generative Models of Information

Propagation

Future Work

slide-24
SLIDE 24

Propagation Models

Day Generating Model:

Describes the emergence of sequences of calls that produce

RPIs with the given size distribution.

Information Cascade Model:

Generates Information Cascades of different topologies. Fits the given fraction of RPIs of each topology and given

size distribution.

slide-25
SLIDE 25

Day Generating Model - Assumptions

This model relies on the following assumptions:

Two kinds of subscribers: regular and dissemination-leaders. Fraction of dissemination-leaders is relatively small =>

dissemination-leaders call only regular subscribers.

The model generates calls made by a dissemination-

leader during a single day.

Resulting topology is simplistic, but covers over 50% of

RPIs in data.

slide-26
SLIDE 26

Day Generating Model – Some Details

Number of calls is Discrete

Gaussian eXponential (DGX)

Beginning time of the first

call is uniform over the day.

Time interval between

consecutive calls depends

  • n the total number of calls

and is DGX. and is DGX.

Callees are chosen

uniformly from the set of regular subscribers.

slide-27
SLIDE 27

The fit of the Day Generation Model to data

This model explains well the size distribution of RPIs (R-squared = 0.88).

The model admits combinatorial analysis => size

distribution can be predicted theoretically.

slide-28
SLIDE 28

Information Cascade Model

We use branching process to model the information

cascade, namely, the corresponding tree is built in a layer-by-layer fashion.

Degree distributions are modeled by Discrete Degree distributions are modeled by Discrete

Gaussian eXponential (DGX) and depend on the following properties:

depth of the current node degree of the root

slide-29
SLIDE 29

The fit of the Information Cascade Model to data (cont.)

The information cascade model predicts the fraction

RPIs belonging to each topology.

Both using theoretical results and simulation

Star + Strings

This model explains well the size distributions of RPIs of

different distributions (R-squared > 0.95).

0.1 0.2 0.3 0.4

Star Init + Star Star + Node Strings Model Data

slide-30
SLIDE 30

Outline

Algorithm for identification of event of rapid

propagation of information

Observations in Real-World data Evidence for Information Propagation Evidence for Information Propagation Generative Models of Information Propagation Future Work

slide-31
SLIDE 31

Future Work

More circumstantial evidence for information

propagation.

Model unification: generation of sequences of calls

that disseminate information and the topology of the that disseminate information and the topology of the information cascades.

Inter-day behavior of dissemination-leaders. Apply our approach to other media, e.g., twitter.