Outlines Topic-aware Social Influence Propagation Models by N - - PowerPoint PPT Presentation

outlines
SMART_READER_LITE
LIVE PREVIEW

Outlines Topic-aware Social Influence Propagation Models by N - - PowerPoint PPT Presentation

Outlines Topic-aware Social Influence Propagation Models by N Barbieri and et al. , ICDM 2012 Scalable topic-specific influence analysis on microblogs by Bin Bi and et al., WSDM 2014 Introduction The problem of influence maximization


slide-1
SLIDE 1

Outlines

”Topic-aware Social Influence Propagation Models” by N Barbieri and et

  • al. , ICDM 2012

”Scalable topic-specific influence analysis on microblogs” by Bin Bi and et al., WSDM 2014

slide-2
SLIDE 2
slide-3
SLIDE 3

Introduction

The problem of influence maximization has received a good deal of attention by the data mining research community in the last decade, but quite surprisingly, the characteristics of the item being the subject of the viral marketing campaign has been left out of the picture. Observations: Users have different interests Items have different characteristics Similar items are likely to interest the same users.

slide-4
SLIDE 4

Outlines of the paper

Extending the IC and LT models to be topic-aware. Using an EM approach to estimate the parameter of TIC They Introduce a new influence propagation model AIR (Authoritativeness − Interest − Relevance). Devising a generalized expectation maximization (GEM) approach to learn the parameter of AIR

slide-5
SLIDE 5

Background

Under both the IC and LT propagation models, influence maximization is NP-hard. The function σm(S) is monotone (i.e.,σm(S) ≤ σm(T) whenever S ⊆ T) and submodular (i.e., σm(S ∪ w) − σ(S) ≥ σm(T ∪ w) − σ(T) whenever S ⊆ T). There is a 1 − 1

e + φ-approximate algorithm for the influence

maximization problem

slide-6
SLIDE 6

Topic-aware Independent Cascade Model (TIC)

In the topic-aware version of the IC model the user-to-user influence probabilities depend on the topic.

  • pz

u,v

representing the strength of the influence exerted by user v on user u on topic z. (u, v) ∈ E and z ∈ [1, K]

  • for each item i that propagates in the network, we have a distribution
  • ver the topics,

γz

i = P(Z = z|i),

K

z=1 γz i = 1

slide-7
SLIDE 7

Topic-aware Independent Cascade Model (TIC)

In this model a propagation happens like in the IC model: when a node v first becomes active on item i, has one chance of influencing each inactive neighbor u, independently of the history thus far. The tentative succeeds with a probability that is the weighted average of the link probability w.r.t. the topic distribution of the item i: pi

u,v = K

  • z=1

γz

i pz u,v

slide-8
SLIDE 8

Topic-aware Linear Threshold Model (TLT)

  • pz

u,v

representing the weight of the edge u, v on topic z. Sum of incoming weights in each node and for each topic is no more than 1.

  • θu is the threshold of each node and is chosen uniformly at random

from [0, 1].

  • Influence weight: W t

i (u) = K z=1

  • v∈Fi(u,t) γz

i pz u,v.

Fi(u, t) denotes the set of users that have a link to u and that at time t have already adopted the item i. If W t

i (u) ≥ θu then u will activate on item i at time t + 1.

slide-9
SLIDE 9

Observation: in both cases only the model parameters are topic-aware, while the overall mechanism of propagation does not change. Proposition: The expected spread σm(S) remains monotone and submodular for m = TIC or m= TLT. Corollary: The greedy algorithm provides an (1 − 1

e − φ)-approximation for the influence maximization problem

also under the TIC and TLT propagation models.

slide-10
SLIDE 10
slide-11
SLIDE 11

AIR (Authoritativeness-Interest-Relevance)

TIC and TLT we have K(|E| + |I|) parameters. AIR (Authoritativeness-Interest-Relevance) propagation model assumes that social influence depends on a user authority in the context of a given topic and the interest of the user social neighborhood for that topic.

slide-12
SLIDE 12

The AIR model has the following parameters: Authoritativeness of a user in a topic: pz

v ∈ R (+) authoritativeness, (-) distrust

Interest of a user for a topic: ϑz

u = P(Z = z|u), and K z=1 ϑz u = 1

Relevance of an item for a topic: − → ϕ z ∈ R|I|, with ϕz

i ∈ R

slide-13
SLIDE 13

The working principle of AIR is a general threshold model. At the beginning of the process each user u chooses a threshold θu uniformly at random from [0, 1]. P(i|u, t) =

  • z

P(z|u)P(i|u, z, t) ≥ θu where P(z|u) = ϑz

u

fv(i, u, t) and f (i, u, t) are scaling factor. fv(i, u, t) = 0 if v ∈ Fi(u, t)

slide-14
SLIDE 14
slide-15
SLIDE 15

Influence Maximization in AIR

Although AIR is a general threshold model, the fact that user authoritativeness can be negative makes σAIR not submodular and not even monotone. Greedy: at each iteration greedily add to the set of seeds S the node x that brings the largest marginal gain, i.e. σAIR(S ∪ {x}) − σAIR(S) is maximal. Estimate σAIR(S) for given S by Monte Carlo simulation. Top-k authorities: given the new item i and its distribution over topics γz

i , select the top-k users v w.r.t. K z=1 γz i pz v

slide-16
SLIDE 16

Dataset

use two real-world and publicly available datasets, both containing a social graph G = (V , E) and a log of past propagations D = {(User,Item,Time)}. DIGG social graph contains 11,142 users and 99,846 directed arcs, while FLIXSTER contains 6,353 users and 84,606 directed arcs.

slide-17
SLIDE 17

EXPERIMENTAL EVALUATION

slide-18
SLIDE 18

EXPERIMENTAL EVALUATION

slide-19
SLIDE 19

EXPERIMENTAL EVALUATION

slide-20
SLIDE 20
slide-21
SLIDE 21

Introduction

Although a few prior works do support topic-specific influence analysis, they either separate the analysis of content from that of network structure, or assume that content is the only cause of links, which is clearly an inappropriate assumption for microblog networks.

slide-22
SLIDE 22

contributions

They propose a new Bayesian Bernoulli-Multinomial mixture model, FLDA, to jointly model both content and links in the same generative process. They discuss and implement a distributed Gibbs-sampling technique for training FLDA over large clusters.

slide-23
SLIDE 23

Followship-LDA (FLDA)

slide-24
SLIDE 24

Topic-Specific Influence: the influence of user e on topic x is measured by σe|xwhich is the probability of e being followed for topic x in the FLDA model. Content-Independent Popularity: the content-independent popularity

  • f user e is measured by πe which is the probability of e being

followed for any content-independent reason in the FLDA model.

slide-25
SLIDE 25

Gibbs Sampling for FLDA Implement Distributed FLDA using Spark: Spark is a large-scale distributed processing framework specifically targeted at machine-learning iterative workloads.

slide-26
SLIDE 26

QUERYING TOPICAL INFLUENCERS

They propose a general search framework for topic-specific key influencers, called SKIT. SKIT allows a user to freely express his/her interests by typing a set of

  • keywords. Then, SKIT returns an ordered list of key influencers by their

influence scores that satisfy the users intent. INFL(t; u) = σe=u|x=t W (t; q) = θz=t|m=q

slide-27
SLIDE 27

Dataset

slide-28
SLIDE 28
slide-29
SLIDE 29