SLIDE 1 Outlines
”Topic-aware Social Influence Propagation Models” by N Barbieri and et
”Scalable topic-specific influence analysis on microblogs” by Bin Bi and et al., WSDM 2014
SLIDE 2
SLIDE 3
Introduction
The problem of influence maximization has received a good deal of attention by the data mining research community in the last decade, but quite surprisingly, the characteristics of the item being the subject of the viral marketing campaign has been left out of the picture. Observations: Users have different interests Items have different characteristics Similar items are likely to interest the same users.
SLIDE 4
Outlines of the paper
Extending the IC and LT models to be topic-aware. Using an EM approach to estimate the parameter of TIC They Introduce a new influence propagation model AIR (Authoritativeness − Interest − Relevance). Devising a generalized expectation maximization (GEM) approach to learn the parameter of AIR
SLIDE 5 Background
Under both the IC and LT propagation models, influence maximization is NP-hard. The function σm(S) is monotone (i.e.,σm(S) ≤ σm(T) whenever S ⊆ T) and submodular (i.e., σm(S ∪ w) − σ(S) ≥ σm(T ∪ w) − σ(T) whenever S ⊆ T). There is a 1 − 1
e + φ-approximate algorithm for the influence
maximization problem
SLIDE 6 Topic-aware Independent Cascade Model (TIC)
In the topic-aware version of the IC model the user-to-user influence probabilities depend on the topic.
u,v
representing the strength of the influence exerted by user v on user u on topic z. (u, v) ∈ E and z ∈ [1, K]
- for each item i that propagates in the network, we have a distribution
- ver the topics,
γz
i = P(Z = z|i),
K
z=1 γz i = 1
SLIDE 7 Topic-aware Independent Cascade Model (TIC)
In this model a propagation happens like in the IC model: when a node v first becomes active on item i, has one chance of influencing each inactive neighbor u, independently of the history thus far. The tentative succeeds with a probability that is the weighted average of the link probability w.r.t. the topic distribution of the item i: pi
u,v = K
γz
i pz u,v
SLIDE 8 Topic-aware Linear Threshold Model (TLT)
u,v
representing the weight of the edge u, v on topic z. Sum of incoming weights in each node and for each topic is no more than 1.
- θu is the threshold of each node and is chosen uniformly at random
from [0, 1].
i (u) = K z=1
i pz u,v.
Fi(u, t) denotes the set of users that have a link to u and that at time t have already adopted the item i. If W t
i (u) ≥ θu then u will activate on item i at time t + 1.
SLIDE 9 Observation: in both cases only the model parameters are topic-aware, while the overall mechanism of propagation does not change. Proposition: The expected spread σm(S) remains monotone and submodular for m = TIC or m= TLT. Corollary: The greedy algorithm provides an (1 − 1
e − φ)-approximation for the influence maximization problem
also under the TIC and TLT propagation models.
SLIDE 10
SLIDE 11
AIR (Authoritativeness-Interest-Relevance)
TIC and TLT we have K(|E| + |I|) parameters. AIR (Authoritativeness-Interest-Relevance) propagation model assumes that social influence depends on a user authority in the context of a given topic and the interest of the user social neighborhood for that topic.
SLIDE 12 The AIR model has the following parameters: Authoritativeness of a user in a topic: pz
v ∈ R (+) authoritativeness, (-) distrust
Interest of a user for a topic: ϑz
u = P(Z = z|u), and K z=1 ϑz u = 1
Relevance of an item for a topic: − → ϕ z ∈ R|I|, with ϕz
i ∈ R
SLIDE 13 The working principle of AIR is a general threshold model. At the beginning of the process each user u chooses a threshold θu uniformly at random from [0, 1]. P(i|u, t) =
P(z|u)P(i|u, z, t) ≥ θu where P(z|u) = ϑz
u
fv(i, u, t) and f (i, u, t) are scaling factor. fv(i, u, t) = 0 if v ∈ Fi(u, t)
SLIDE 14
SLIDE 15 Influence Maximization in AIR
Although AIR is a general threshold model, the fact that user authoritativeness can be negative makes σAIR not submodular and not even monotone. Greedy: at each iteration greedily add to the set of seeds S the node x that brings the largest marginal gain, i.e. σAIR(S ∪ {x}) − σAIR(S) is maximal. Estimate σAIR(S) for given S by Monte Carlo simulation. Top-k authorities: given the new item i and its distribution over topics γz
i , select the top-k users v w.r.t. K z=1 γz i pz v
SLIDE 16
Dataset
use two real-world and publicly available datasets, both containing a social graph G = (V , E) and a log of past propagations D = {(User,Item,Time)}. DIGG social graph contains 11,142 users and 99,846 directed arcs, while FLIXSTER contains 6,353 users and 84,606 directed arcs.
SLIDE 17
EXPERIMENTAL EVALUATION
SLIDE 18
EXPERIMENTAL EVALUATION
SLIDE 19
EXPERIMENTAL EVALUATION
SLIDE 20
SLIDE 21
Introduction
Although a few prior works do support topic-specific influence analysis, they either separate the analysis of content from that of network structure, or assume that content is the only cause of links, which is clearly an inappropriate assumption for microblog networks.
SLIDE 22
contributions
They propose a new Bayesian Bernoulli-Multinomial mixture model, FLDA, to jointly model both content and links in the same generative process. They discuss and implement a distributed Gibbs-sampling technique for training FLDA over large clusters.
SLIDE 23
Followship-LDA (FLDA)
SLIDE 24 Topic-Specific Influence: the influence of user e on topic x is measured by σe|xwhich is the probability of e being followed for topic x in the FLDA model. Content-Independent Popularity: the content-independent popularity
- f user e is measured by πe which is the probability of e being
followed for any content-independent reason in the FLDA model.
SLIDE 25
Gibbs Sampling for FLDA Implement Distributed FLDA using Spark: Spark is a large-scale distributed processing framework specifically targeted at machine-learning iterative workloads.
SLIDE 26 QUERYING TOPICAL INFLUENCERS
They propose a general search framework for topic-specific key influencers, called SKIT. SKIT allows a user to freely express his/her interests by typing a set of
- keywords. Then, SKIT returns an ordered list of key influencers by their
influence scores that satisfy the users intent. INFL(t; u) = σe=u|x=t W (t; q) = θz=t|m=q
SLIDE 27
Dataset
SLIDE 28
SLIDE 29