Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks
Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence
Le Song College of Computing Georgia Institute of Technology
1
Dyn ynamic mic Pr Processes esses ove ver In Informat matio - - PowerPoint PPT Presentation
Dyn ynamic mic Pr Processes esses ove ver In Informat matio ion n Netwo works rks Rep epre rese sentat ntation, ion, Modeli deling ng, , Le Learning ning and d Infer erence ence Le Song College of Computing Georgia
1
2
WORLD WIDE WEB ELECTRICAL
NETWORKS
SOCIAL NETWORKS TRANSPORTATION NETWORKS INFORMATION NETWORKS PROTEIN INTERACTIONS
3
4
David vid 1:00 pm Cool picture Sophie hie 1:01 pm Indeed David vid 1:18 pm Funny joke Sophie hie 1:19 pm Yes David vid 1:30 pm Dinner together? Sophie hie 1:31 pm OK David vid 1:00 pm Cool picture Sophie hie 1:14 pm Indeed David vid 1:15 pm Funny joke Sophie hie 1:29 pm Yes David vid 1:30 pm Dinner together? Sophie hie 1:50 pm OK
Timing is critically important for event data Timing is critically important for event data time
Discrete-time models artificially introduce epochs:
How long is each epoch? How to aggregate events within epoch? What if no event within an epoch? Time is treated as index or conditioning variable, not easy to deal with time-related queries
5
time
𝑈
𝑢1 𝑢2 𝑢3
David vid
1:00 pm Cool picture 1:18 pm Funny joke 1:30 pm Dinner together?
Epoch 1 Epoch 2 Epoch 3
6
Information spread Epidemiology Cyber-security Healthcare analytics
Smart city Wildlife conservation
7
Christine ine Sophie hie Davi vid Jacob
Bob
D S means S follows D 1:00pm
Rodriguez et al. ICML 2011 Du et al. NIPS 2012
8
Christine ine Sophie hie Davi vid Jacob
Bob
D S means S follows D 1:00pm 1:18pm 1:43pm 1:53pm
Tina
9
Christine ine Sophie hie David vid Jacob
Bob 1pm, D: Cool paper 1:10pm, @D: Indeed 1:15pm, @S @D: Classic 1:18pm, @S @D: Very useful 2:03pm, @D: Agree 2pm, D: Nice idea 1:35pm @B @S @D: Indeed brilliant 1:45pm Olivi via D S means S follows D
Farajtabar et al. NIPS 2015
10
Chris istine tine Sophi phie David id
Jacob
𝑝1 𝑝2 𝑝3 𝑝4
Du et al. NIPS 2015
11
Representation: introduce intensity
1. Intensity function 2. Basic building blocks 3. Superposition
Modeling: incorporate domain specifics
1. Idea adoption 2. Network coevolution 3. Collaborative dynamics
Inference: temporal queries
1. Time-sensitive recommendation 2. Scalable Influence estimation
Learning : efficient algorithm
1. Sparse hidden diffusion networks 2. Low rank collaborative dynamics 3. Generic algorithm
12
13
time
𝑈
𝑢1 𝑢2 𝑢3
History 𝓘𝑢
𝑢 ?
David vid
1:00 pm Cool picture 1:18 pm Funny joke 1:30 pm Dinner together?
… 𝑂 𝑢 ∈ 0 ∪ 𝑎+
14
time
𝑈
𝑇∗(𝑢)
𝑇∗ 𝑢 = 1 − 𝑔∗ 𝜐 𝑒𝜐
𝑢
density 𝑔∗ 𝑢 ≔ 𝑔(𝑢|𝓘𝑢)
𝑢 + 𝑒𝜐
𝑔∗ 𝑢 𝑒𝜐
𝑢1 𝑢2 𝑢3
History 𝓘𝑢
𝑢
David vid
15
time
𝑈
𝑢1 𝑢2 𝑢3
Likelihood: 𝑔∗ 𝑢1 𝑔∗ 𝑢2 𝑔∗ 𝑢3 𝑔∗ 𝑢 𝑇∗(𝑈)
𝑔∗(𝑢1) 𝑔∗(𝑢2) 𝑔∗(𝑢3) 𝑔∗(𝑢) 𝑇∗(𝑈) David vid
𝑢
16
time
𝑈
𝑢1 𝑢2 𝑢3
𝑔∗(𝑢1) 𝑔∗(𝑢2) 𝑔∗(𝑢3) 𝑔∗(𝑢) 𝑇∗(𝑈) David vid
𝑢
Likelihood: 𝑔∗ 𝑢1 𝑔∗ 𝑢2 𝑔∗ 𝑢3 𝑔∗ 𝑢 𝑇∗(𝑈)
exp 𝑥, 𝜔∗ 𝑢1 𝑎 exp 𝑥, 𝜔∗ 𝑢2 𝑎 exp 𝑥, 𝜔∗ 𝑢3 𝑎 exp 𝑥, 𝜔∗ 𝑢 𝑎 1 − exp 𝑥, 𝜔∗ 𝜐 𝑎 𝑒𝜐
𝑈
Not concave in w! Not concave in w!
17
time
𝑈
𝑇∗(𝑢)
𝑢 + 𝑒𝜐
𝑔∗ 𝑢 𝑒𝜐
𝑢1 𝑢2 𝑢3
History 𝓘𝑢
𝑢
David vid Intensity: Pr. between [𝑢, 𝑢 + d𝜐] but not before 𝑢 ℎ∗ 𝑢 𝑒𝜐 = 𝑔∗ 𝑢 𝑒𝜐 𝑇∗(𝑢) > 0 𝑔∗ 𝑢 = ℎ∗ 𝑢 𝑇∗ 𝑢 𝑇∗ 𝑢 = exp − ℎ∗ 𝜐 𝑒𝜐
𝑢 𝑢3
18
𝐺(𝑢)
𝑢
𝑔 𝑢 𝑇 𝑢 𝑔 𝑢 𝑒𝜐
𝑔∗(𝑢) 𝑔∗(𝑢) 𝐺∗(𝑢) 𝐺∗(𝑢) 𝑇∗(𝑢) 𝑇∗(𝑢) ℎ∗(𝑢) ℎ∗(𝑢) ℎ∗(𝑢) exp − ℎ∗(𝜐)𝑒𝜐
𝑢 𝑢𝑗−1
exp − ℎ∗(𝜐)𝑒𝜐
𝑢 𝑢𝑗−1
1 − 𝑇∗(𝑢) 𝑔∗(𝜐)𝑒𝜐
𝑢
𝑔∗(𝑢) 𝑇∗(𝑢) 𝑒𝐺∗(𝑢) 𝑒𝑢
𝑢=Δ𝑗𝑘
1 − 𝐺∗(𝑢)
Central quantity to parameterize Central quantity to parameterize
19
time
𝑈
𝑢1 𝑢2 𝑢3
David vid
𝑢
Likelihood: ℎ∗ 𝑢1 ℎ∗ 𝑢2 ℎ∗ 𝑢3 ℎ∗ 𝑢 exp − ℎ∗ 𝜐 𝑒𝜐
𝑈 𝑥, 𝜚∗ 𝑢1 𝑥, 𝜚∗ 𝑢2 𝑥, 𝜚∗ 𝑢3 𝑥, 𝜚∗ 𝑢 exp − 𝑥, 𝜚∗(𝜐) 𝑒𝜐
𝑈
Concave in w! Concave in w! Log-likelihood log 𝑥, 𝜚∗ 𝑢𝑗
𝑛 𝑗=1
− 𝑥, Ψ∗(𝑈)
20
Uniformly random occurrence. Time interval follows exponential distribution
21
time
𝑈
𝑢1 𝑢2 𝑢3
David vid
𝑢
ℎ∗ 𝑢 = 𝜈
Intensity independent of history
22
time
𝑈
𝑢1 𝑢2 𝑢3
David vid
𝑢
ℎ∗ 𝑢 = (𝑢)
Let ℎ∗ 𝑢 be positive combination of basis functions
23
𝑢𝑗 − 𝑢𝑘 𝑢𝑗 − 𝑢𝑘 ℎ∗ 𝑢 ℎ∗ 𝑢 = 𝛽𝑚𝑙 𝜐𝑚, 𝑢
𝑛 𝑚=1
𝛽1
𝛽4
𝛽5
𝛽6
𝛽7
Gaussian RBF kernel 𝑙 𝜐, t : exp − 𝜐 − t 2 2𝜏2
𝜐1 𝜐2 𝜐3 𝜐4 𝜐5 𝜐6 𝜐7 𝑈
𝑙(𝜐, 𝑢)
Limited number of occurrence
24
time
𝑈
David vid
𝑢
ℎ∗ 𝑢 = 1 − 𝑂 𝑢 ∗ 𝑢
Clustered occurrence
25
time
𝑈
𝑢1 𝑢2 𝑢3
David vid
𝑢
ℎ∗ 𝑢 = 𝜈 + 𝛽 exp −|𝑢 − 𝑢𝑗|
𝑢𝑗∈𝓘𝑢
= 𝜈 + 𝛽 exp − 𝑢 ⋆ 𝑒𝑂(𝑢)
Triggering kernel
Thinning procedure (similar to rejection sampling)
26
time
𝑈
𝑢1 𝑢2 𝑢3
David vid ℎ0 = ℎ∗(𝑢3)
ℎ∗ 𝑢 = 𝜈 + 𝛽 exp −|𝑢 − 𝑢𝑗|
𝑢𝑗∈𝓘𝑢
Sample 𝑢 from homogeneous Poisson process with intensity ℎ0 𝑢 ∼ − 1 ℎ0 ln 𝑉[0,1] Keep the sample with probability ℎ∗(𝑢)/ℎ0
𝑢 ?
27
28
time
𝑈
David vid
𝜐
time
𝑈
𝑢1 𝜐1
time
𝑈
𝑢2 𝜐2
time
𝑈
𝑢3 𝜐3
𝑢 = min 𝜐, 𝜐1, 𝜐2, 𝜐3 Sample each intensity + take minimum = Additive intensity
𝜈 𝛽 exp(− 𝑢 − 𝑢1 ) 𝛽 exp(− 𝑢 − 𝑢2 ) 𝛽 exp(− 𝑢 − 𝑢3 )
𝑢
ℎ∗ 𝑢 = 𝜈 + 𝛽 exp −|𝑢 − 𝑢𝑗|
𝑢𝑗∈𝓘𝑢
Clustered occurrence affected by neighbors
29
time
𝑈
𝑢1
𝐸
𝑢2
𝐸
𝑢3
𝐸
David vid
𝑢
Sophie hie time
𝑢1
𝑇
𝑢2
𝑇
𝑢3
𝑇
ℎ𝐸∗ 𝑢 = 𝜈 + 𝛽𝐸 exp −|𝑢 − 𝑢𝑗
𝐸| 𝑢𝑗
𝐸∈𝓘𝑢 𝐸
+ 𝛽𝐸𝑇 exp −|𝑢 − 𝑢𝑗
𝑇| 𝑢𝑗
𝑇∈𝓘𝑢 𝑇
Limited number of occurrence affected by neighbors
30
time
𝑈
David vid
𝑢
ℎ𝐸∗ 𝑢 = 1 − 𝑂𝐸 𝑢 ∗ 𝑢
Sophie hie time
+ 𝛽𝐸𝑇 exp −|𝑢 − 𝑢𝑗
𝑇| 𝑢𝑗
𝑇∈𝓘𝑢 𝑇
𝑢1
𝑇
𝑢2
𝑇
𝑢3
𝑇
31
32
Christine ine Sophie hie David vid Jacob
Bob
D S means S follows D 1:00pm
Rodriguez et al. ICML 2011 Du et al. NIPS 2012
33
Christine ine Sophie hie David vid Jacob
Bob
D S means S follows D 1:00pm 1:10pm 1:15pm 1:18pm 1:25pm
34
Christine ine Sophie hie David vid Jacob
Bob
𝐵𝐾𝐷 𝐵𝐾𝐶 D S means S follows D
D is source 𝑂𝐸 𝑢 = 1
𝑢
D C 𝑂𝐷 𝑢 B 𝑂𝐶 𝑢 J 𝑂𝐾 𝑢
1:00pm 1:10pm 1:25pm 1:15pm 1:18pm
ℎ𝐾∗ 𝑢 = 𝐵𝐾𝐶 𝑢 1 − 𝑂𝐾 𝑢 𝑂𝐶 𝑢 + 𝐵𝐾𝐷 𝑢 1 − 𝑂𝐾 𝑢 𝑂𝐷 𝑢 Terminating process adopt product only once Followee adopted Not yet adopted
35
Christine ine Sophie hie David vid Jacob
Bob
1:00pm 1:08pm 1:48pm 1:25pm 1:17pm
36
Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob
1:00pm 1:03pm 1:48pm 1:25pm 1:17pm 2:00pm 2:08pm 2:42pm 2:53pm 2:37pm
37
Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob
2:00pm 2:08pm 2:42pm 2:53pm 2:47pm 1:00pm 1:03pm 1:48pm 1:25pm 1:17pm 7:00pm 7:06pm 7:17pm 7:32pm 7:50pm
38
Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob
7:00pm 7:06pm 7:17pm 7:32pm 7:50pm 2:00pm 2:08pm 2:42pm 2:53pm 2:47pm 1:00pm 1:03pm 1:48pm 1:25pm 1:17pm
Cascade: a sequence of (node, time) pairs for a particular piece of news Cascades can start from different sources
39
𝑢 User 1 𝑢 User 2 𝑢 User 3 𝑢 User n:
Cascade 1 Cascade 2 Cascade 3
(𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑜) 𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑜 𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑜
40
Tina
41
Christine ine Sophie hie David vid Jacob
Bob 1pm, D: Cool paper 1:10pm, @D: Indeed 1:15pm, @S @D: Classic 1:18pm, @S @D: Very useful 2:03pm, @D: Agree 2pm, D: Nice idea 1:35pm @B @S @D: Indeed brilliant 1:45pm Olivi via D S means S follows D
Farajtabar et al. NIPS 2015
42
1pm, D: Cool paper (D, D, 1:00) 1:35pm @B @S @D: Indeed brilliant (J, D, 1:35) 4:10pm, @B: Beautiful (J, B, 4:10) 4pm, B: It snows (B, B, 4:00)
𝑢
(J, J) (J, D) (J, B) … Tweet/retweet event sequence 5pm, J: Going out (J, J, 5:00)
𝑢
(J, J) (J, D) (J, S) … Link creation event sequence 5:25pm (J, S, 5:25) (J, D, 1:45) 1:45pm Christine ine Sophie hie David vid Jacob
Bob
43
D’s own initiative ℎ𝐸∗ 𝑢 = 𝜃 Christine ine Sophie hie David vid Jacob
Bob
𝐵𝐾𝐷 𝑢 𝐵𝐾𝐶 𝑢
(D, D)
𝑂𝐶𝐸 𝑢
(B, D)
𝑂𝐷𝐸 𝑢
(C, D) (J, D)
𝑂𝐾𝐸 𝑢
ℎ𝐾𝐸∗ 𝑢 = 𝛾𝐸𝐵𝐾𝐶 𝑢 exp − 𝑢 ⋆ 𝑒𝑂𝐶𝐸 𝑢 + 𝛾𝐸𝐵𝐾𝐷 𝑢 exp − 𝑢 ⋆ 𝑒𝑂𝐷𝐸 𝑢 Mutually-exciting process High if followees retweet frequently
44
𝛿𝐾𝐸∗ 𝑢 = 1 − 𝐵𝐾𝐸 𝑢 ⋅ 𝜈𝐾 + 𝛽𝐸 exp − 𝑢 ⋆ 𝑒𝑂𝐾𝐸 𝑢 Check whether the link already there Retweet 𝐸 Self-exciting process Terminating process no link and retweet often 𝐾’s random exploration Christine ine Sophie hie David vid Jacob
Bob
1:45pm 𝐵𝐾𝐸(𝑢)
(J, D) (J, D)
𝑂𝐾𝐸 𝑢
45
Diffusion network 𝑩 𝑢 ∈ {0,1} Diffusion network 𝑩 𝑢 ∈ {0,1} Information diffusion process 𝑶 𝑢 ∈ 0 ∪ 𝑎+ Information diffusion process 𝑶 𝑢 ∈ 0 ∪ 𝑎+ Drive Link creation process Link creation process Support Alter Mutually-exciting process Terminating process
46
47
𝛽𝐸 = 0 Erdos-Renyi random networks 𝛽𝐸 large Scale-free networks
Generate networks with small shrinking diameter
48
Diameter shrinks Small connected components merge
Generate short and fat cascades as 𝛽 increases
49
𝛾 = 0.2
50
51
Chris istine tine Sophi phie David id
Jacob
𝑝1 𝑝2 𝑝3 𝑝4
ℎ𝐸𝑝1∗ 𝑢 = 𝜈𝐸𝑝1 + 𝛽𝐸𝑝1 exp −|𝑢 − 𝑢𝑗
𝐸𝑝1| 𝑢𝑗
𝐸𝑝1∈𝓘𝑢 𝐸𝑝1
𝜈𝐸𝑝1 … 𝜈𝐸𝑝4 ⋮ ⋱ ⋮ 𝜈𝐾𝑝1 … 𝜈𝐾𝑝4
Low rank
𝛽𝐸𝑝1 … 𝛽𝐸𝑝4 ⋮ ⋱ ⋮ 𝛽𝐾𝑝1 … 𝛽𝐾𝑝4 Self-exciting process Tend to go to the same store again and again
52
53
54
Christine ine Sophie hie David vid Jacob
Bob
𝐵𝐾𝐷 𝐵𝐾𝐶 D S means S follows D
D is source 𝑂𝐸 𝑢 = 1
𝑢
D C 𝑂𝐷 𝑢 B 𝑂𝐶 𝑢 J 𝑂𝐾 𝑢
1:00pm 1:10pm 1:25pm 1:15pm 1:18pm
ℎ𝐾∗ 𝑢 = 𝐵𝐾𝐶 1 − 𝑂𝐾 𝑢 𝑂𝐶 𝑢 + 𝐵𝐾𝐷 1 − 𝑂𝐾 𝑢 𝑂𝐷 𝑢 Terminating process adopt product only once Followee adopted Not yet adopted
𝑥 = 𝐵𝐾𝐸 𝐵𝐾𝑇 𝐵𝐾𝐶 𝐵𝐾𝐷 Parametrization
55
time
𝑈
𝑢1 𝑢2 𝑢3 𝑢
Likelihood: ℎ∗ 𝑢1 ℎ∗ 𝑢2 ℎ∗ 𝑢3 ℎ∗ 𝑢 exp − ℎ∗ 𝜐 𝑒𝜐
𝑈 𝑥, 𝜚∗ 𝑢1 𝑥, 𝜚∗ 𝑢2 𝑥, 𝜚∗ 𝑢3 𝑥, 𝜚∗ 𝑢 exp − 𝑥, 𝜚∗(𝜐) 𝑒𝜐
𝑈
𝑀 𝑥 + 𝜇 𝑥 1 = log 𝑥, 𝜚∗ 𝑢𝑗
𝑛 𝑗=1
− 𝑥, Ψ∗ 𝑈 − 𝜇 𝑥 1
Jacob
ℓ1-reguarlized likelihood estimation problem. Solve one such problem for each node. Set learning rate 𝛾 𝑙 = 0 Initialize 𝑥 While 𝑙 ≤ 𝐿, do
𝑥𝑙+1 = 𝑥𝑙 − 𝛾 ⋅ 𝛼
𝑥𝑀 𝑥𝑙 − 𝜇 ⋅ 𝛾 +
𝑙 = 𝑙 + 1
End while
56
𝑣 𝑣 𝑗 𝑗 𝑤 𝑤 𝑘 𝑘
Recovery conditions:
Eigenvalue of the Hessian, 𝑅 = 𝛼
𝑥 2𝑀, is bounded [𝐷𝑛𝑗𝑜, 𝐷𝑛𝑏𝑦]
Gradient is upper bounded, 𝛼
𝑥𝑀 ∞ ≤ 𝐷1
Hazard is lower bounded, min 𝑥
𝑘 ≥ 𝐷2
Incoherence condition: 𝑅𝑇𝑑𝑇 𝑅𝑇𝑇 −1 ∞ ≤ 1 − 𝜁
network structure parameter value
source node distribution
Given 𝑜 > 𝐷3 ⋅ 𝑒3 log 𝑞 cascades, set regularization parameter 𝜇 ≥ 𝐷4 ⋅
2−𝜁 𝜁 log 𝑞 𝑜 , the network structure can be recovered with
probability at least 1 − 2 exp(−𝐷′′𝜇2𝑜)
57
58
59
Blogs Mainstream media
Nan et al. NIPS 2012
60
61
62
Chris istine tine Sophi phie David id
Jacob
𝑝1 𝑝2 𝑝3 𝑝4
ℎ𝐸𝑝1∗ 𝑢 = 𝜈𝐸𝑝1 + 𝛽𝐸𝑝1 exp −|𝑢 − 𝑢𝑗
𝐸𝑝1| 𝑢𝑗
𝐸𝑝1∈𝓘𝑢 𝐸𝑝1
𝜈𝐸𝑝1 … 𝜈𝐸𝑝4 ⋮ ⋱ ⋮ 𝜈𝐾𝑝1 … 𝜈𝐾𝑝4
∗
Regularization
𝛽𝐸𝑝1 … 𝛽𝐸𝑝4 ⋮ ⋱ ⋮ 𝛽𝐾𝑝1 … 𝛽𝐾𝑝4
∗
Self-exciting process Tend to go to the same store again and again
63
64
time
𝑈
𝑢1 𝑢2 𝑢3 𝑢
Likelihood: ℎ∗ 𝑢1 ℎ∗ 𝑢2 ℎ∗ 𝑢3 ℎ∗ 𝑢 exp − ℎ∗ 𝜐 𝑒𝜐
𝑈 𝑥, 𝜚∗ 𝑢1 𝑥, 𝜚∗ 𝑢2 𝑥, 𝜚∗ 𝑢3 𝑥, 𝜚∗ 𝑢 exp − 𝑥, 𝜚∗(𝜐) 𝑒𝜐
𝑈
Concave in w! Concave in w! Log-likelihood 𝑀(𝑥) = log 𝑥, 𝜚∗ 𝑢𝑗
𝑛 𝑗=1
− 𝑥, Ψ∗(𝑈)
Jacob
65
time
𝑈
𝑢1 𝑢2 𝑢3
David vid
𝑢𝑗
Negative log-likelihood
min
w∈ℝ+
𝑜 𝑥, Ψ∗(𝑈) − log 𝑥, 𝜚∗(𝑢𝑗)
𝑛 𝑗=1
+ 𝜇 𝑥 1
… 𝑢𝑛
Existing first order methods 𝑃
1 𝜗2 iterations
Existing first order methods 𝑃
1 𝜗2 iterations
log 𝑦
Δ𝑦 Δ𝑧
Non-Lipschitz
66
time
𝑈
𝑢1 𝑢2 𝑢3
David vid
𝑢𝑗
Negative log-likelihood
min
w∈ℝ+
𝑜 𝑥, Ψ∗(𝑈) − log 𝑥, 𝜚∗(𝑢𝑗)
𝑛 𝑗=1
+ 𝜇 𝑥 1
… 𝑢𝑛
Fenchel dual
max
𝑤𝑗>0 𝑤𝑗 𝑥, 𝜚∗(𝑢𝑗) − log 𝑤𝑗 − 1
min
w∈ℝ+
𝑜 max
𝑤𝑗>0 𝑗=1
𝑛
𝑥, Ψ∗(𝑈) − 𝑤𝑗 𝑥, 𝜚∗ 𝑢𝑗 +
𝑛 𝑗=1
log 𝑤𝑗
𝑛 𝑗=1
+ 𝜇 𝑥 1
He et al. Arxiv 2016
67
min
w∈ℝ+
𝑜 max
𝑤𝑗>0 𝑗=1
𝑛
𝑥, Ψ∗(𝑈) − 𝑤𝑗 𝑥, 𝜚∗ 𝑢𝑗 +
𝑛 𝑗=1
log 𝑤𝑗
𝑛 𝑗=1
+ 𝜇 𝑥 1
Bilinear form 𝑀 𝑥, 𝑤𝑗
𝑥 𝑘 = 𝑥
𝑘 − 𝜇𝛿 +
𝑤 𝑗 = 𝑤𝑗 + 𝑤𝑗
2 + 4𝛿 1/2
2 𝑥 𝑘 = 𝑥 𝑘 − 𝜇𝛿 + 𝑤 𝑗 = 𝑤 𝑗 + 𝑤 𝑗
2 + 4𝛿 1/2
2 𝑥 𝑘 = 𝑥
𝑘 𝑢 − 𝛿 𝛼 𝑥𝑘𝑀 𝑥𝑢, 𝑤𝑗 𝑢
𝑤 𝑗 = 𝑤𝑗
𝑢 + 𝛿 𝛼 𝑤𝑗𝑀 𝑥𝑢, 𝑤𝑗 𝑢
𝑥 𝑘 = 𝑥
𝑘 𝑢 − 𝛿 𝛼 𝑥𝑘𝑀 𝑥𝑢, 𝑤𝑗 𝑢
𝑤 𝑗 = 𝑤𝑗
𝑢 + 𝛿 𝛼 𝑤𝑗𝑀 𝑥𝑢, 𝑤𝑗 𝑢
𝑥
𝑘
𝑤𝑗 Given current 𝑥𝑢, {𝑤𝑗
𝑢}
Given current 𝑥𝑢, {𝑤𝑗
𝑢}
(𝑥
𝑘 𝑢, 𝑤𝑗 𝑢)
(𝑥 𝑘, 𝑤 𝑗)
68
min
w∈ℝ+
𝑜 max
𝑤𝑗>0 𝑗=1
𝑛
𝑥, Ψ∗(𝑈) − 𝑤𝑗 𝑥, 𝜚∗ 𝑢𝑗 +
𝑛 𝑗=1
log 𝑤𝑗
𝑛 𝑗=1
+ 𝜇 𝑥 1
Bilinear form 𝑀 𝑥, 𝑤𝑗
𝑥 𝑘 = 𝑥
𝑘 𝑢 − 𝛿 𝛼 𝑥𝑘𝑀 𝑥, 𝑤𝑗
𝑤 𝑗 = 𝑤𝑗
𝑢 + 𝛿 𝛼 𝑤𝑗𝑀 𝑥, 𝑤𝑗
𝑥 𝑘 = 𝑥
𝑘 𝑢 − 𝛿 𝛼 𝑥𝑘𝑀 𝑥
, 𝑤 𝑗 𝑤 𝑗 = 𝑤𝑗
𝑢 + 𝛿 𝛼 𝑤𝑗𝑀 𝑥
, 𝑤 𝑗 𝑥
𝑘
𝑤𝑗 𝑥𝑢+1 = 𝑥
𝑘 − 𝜇𝛿 +
𝑤𝑗
𝑢+1 = 𝑤𝑗 + 𝑤𝑗 2 + 4𝛿 1/2
2 𝑥𝑢+1 = 𝑥 𝑘 − 𝜇𝛿 + 𝑤𝑗
𝑢+1 = 𝑤 𝑗 + 𝑤 𝑗 2 + 4𝛿 1/2
2 Given current 𝑥𝑢, {𝑤𝑗
𝑢}
Given current 𝑥𝑢, {𝑤𝑗
𝑢}
𝑥 𝑘 = 𝑥
𝑘 − 𝜇𝛿 +
𝑤 𝑗 = 𝑤𝑗 + 𝑤𝑗
2 + 4𝛿 1/2
2 𝑥 𝑘 = 𝑥 𝑘 − 𝜇𝛿 + 𝑤 𝑗 = 𝑤 𝑗 + 𝑤 𝑗
2 + 4𝛿 1/2
2 𝑥 𝑘 = 𝑥
𝑘 𝑢 − 𝛿 𝛼 𝑥𝑘𝑀 𝑥𝑢, 𝑤𝑗 𝑢
𝑤 𝑗 = 𝑤𝑗
𝑢 + 𝛿 𝛼 𝑤𝑗𝑀 𝑥𝑢, 𝑤𝑗 𝑢
𝑥 𝑘 = 𝑥
𝑘 𝑢 − 𝛿 𝛼 𝑥𝑘𝑀 𝑥𝑢, 𝑤𝑗 𝑢
𝑤 𝑗 = 𝑤𝑗
𝑢 + 𝛿 𝛼 𝑤𝑗𝑀 𝑥𝑢, 𝑤𝑗 𝑢
𝑃
1 𝜗 iterations
𝑃
1 𝜗 iterations (𝑥
𝑘 𝑢, 𝑤𝑗 𝑢)
(𝑥
𝑘 𝑢+1, 𝑤𝑗 𝑢+1)
(𝑥 𝑘, 𝑤 𝑗)
69
Accelerated gradient Unaccelerated
70
71
Christine ine Sophie hie David vid Jacob
𝑝1 𝑝2 𝑝3 𝑝4
Return time prediction
When will David buy the item? max 𝜐𝑔𝐸𝑝∗ 𝜐 𝑒𝜐
∞ 𝑢
Return time prediction
When will David buy the item? max 𝜐𝑔𝐸𝑝∗ 𝜐 𝑒𝜐
∞ 𝑢
Next item prediction
What next item David will buy? max
𝑝
ℎ𝐸𝑝∗(𝑢)
Next item prediction
What next item David will buy? max
𝑝
ℎ𝐸𝑝∗(𝑢)
Nan et al. NIPS 2015
Online records of music listening. The time unit is hour 1000 users, 3000 albums 20,000 observed pairs, more than 1 million events
72
Album prediction Returning time prediction
MIMIC II dataset: a collection of de-identified clinical visit records The time unit is week 650 patients and 204 disease codes
73
Diagnosis code prediction Returning time prediction
74
75
Christine ine Sophie hie David vid Jacob
Bob
D S means S follows D
Influence estimation
Can a piece of news spread, in 1 month, to a million user? 𝜏 𝑡, 𝑢 : = 𝔽 𝑂𝑗(𝑢)
𝑗∈𝑊
Influence estimation
Can a piece of news spread, in 1 month, to a million user? 𝜏 𝑡, 𝑢 : = 𝔽 𝑂𝑗(𝑢)
𝑗∈𝑊
Influence maximization
Who is the most influential user? max
𝑡∈𝑊 𝜏 𝑡, 𝑢
Influence maximization
Who is the most influential user? max
𝑡∈𝑊 𝜏 𝑡, 𝑢
Source localization
Where is the origin of information? max
𝑡∈𝑊,𝑢∈[0,𝑈] Likelihood partial cascade
Source localization
Where is the origin of information? max
𝑡∈𝑊,𝑢∈[0,𝑈] Likelihood partial cascade
1:18 pm … 1:30 pm … 2:00 pm … Rodriguez et al. ICML 2012 Nan et al. NIPS 2013 Farajtabar et al. AISTATS 2015
76
Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob
77
𝑂𝑗 𝑢 = 4
𝑗∈𝑊
𝑂𝑗 𝑢 = 2
𝑗∈𝑊
𝑂𝑗 𝑢 = 3
𝑗∈𝑊
Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob
𝜏 𝐸, 𝑢 ≈ 4 + 2 + 3 3 = 3
78
Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob
79
Christine ine Sophie hie David vid Jacob
Bob Christine ine Sophie hie David vid Jacob
Bob
𝑂𝑗 𝑢 = 2
𝑗∈𝑊
Christine ine Sophie hie David vid Jacob
Bob
𝜏 𝐶, 𝑢 ≈ 4 + 2 + 2 3 = 2.67
𝑂𝑗 𝑢 = 4
𝑗∈𝑊
𝑂𝑗 𝑢 = 2
𝑗∈𝑊
80 Chris istin tine Soph phie ie David vid Jacob cob Bob Bob Chris istin tine Soph phie ie David vid Jacob cob Bob Bob Chris istin tine Sophie phie David vid Jacob cob Bob Bob
𝑃 𝑃 𝑊 2 + 𝐹 |𝑊| 𝑛 𝑃 𝑛 𝑊 2 + 𝑛 𝐹 |𝑊|
max
𝑡∈𝑊 𝜏 𝑡, 𝑢
max
𝑡∈𝑊 𝜏 𝑡, 𝑢
𝑃 𝑞 𝑊 𝑊 + |𝐹|
Each graph
81 Chris istin tine Soph phie ie David vid Jacob cob Bob Bob Chris istin tine Soph phie ie David vid Jacob cob Bob Bob Chris istin tine Sophie phie David vid Jacob cob Bob Bob
𝑃 𝑃 𝑊 2 + 𝐹 |𝑊| 𝑛 𝑃 𝑛 𝑊 2 + 𝑛 𝐹 |𝑊|
max
𝑡∈𝑊 𝜏 𝑡, 𝑢
max
𝑡∈𝑊 𝜏 𝑡, 𝑢
𝑃 𝑞 𝑊 𝑊 + |𝐹|
Each graph Each node
82 Chris istin tine Soph phie ie David vid Jacob cob Bob Bob Chris istin tine Soph phie ie David vid Jacob cob Bob Bob Chris istin tine Sophie phie David vid Jacob cob Bob Bob
𝑃 𝑃 𝑊 2 + 𝐹 |𝑊| 𝑛 𝑃 𝑛 𝑊 2 + 𝑛 𝐹 |𝑊|
max
𝑡∈𝑊 𝜏 𝑡, 𝑢
max
𝑡∈𝑊 𝜏 𝑡, 𝑢
𝑃 𝑞 𝑊 𝑊 + |𝐹|
Each graph Each node Single source shortest path
Quadratic in |𝑊| not scalable! Quadratic in |𝑊| not scalable!
83
1.38 0.33 1.26 0.29 2.75 Sophi phie David id Bob Jacob
Chris istine tine
Christine ine Sophie hie David vid Jacob
Bob
𝑆𝐸 = 0.29 𝑆𝑇 = 0.29 𝑆𝐶 = 0.29 𝑆𝐾 = 1.26 𝑆𝐷 = 0.33
𝑠 ∼ exp(−𝑠) Linear in # of nodes and edges Linear in # of nodes and edges
84
0.32 3.70 0.37 1.97 0.23
Christine ine Sophie hie David vid Jacob
Bob
Sophi phie David id Bob Jacob
Chris istine tine 𝑆𝐸 = 0.29, 0.23 𝑆𝑇 = 0.29, 0.23 𝑆𝐶 = 0.29, 0.23 𝑆𝐾 = 1.26, 0.37 𝑆𝐷 = 0.33, 3.70
𝜏 𝑡, 𝑢 ≈ 𝑛 − 1 𝑆𝑡(𝑗)
𝑛 𝑗=1
𝜏 𝑡, 𝑢 ≈ 𝑛 − 1 𝑆𝑡(𝑗)
𝑛 𝑗=1
𝑠 ∼ exp(−𝑠) Given 𝑛 iid samples, 𝑠 ∼ 𝑓−𝑠, their minimum 𝑠
∗ is distributed as
∗ ∼ 𝑛𝑓−𝑛𝑠
Given 𝑛 iid samples, 𝑠 ∼ 𝑓−𝑠, their minimum 𝑠
∗ is distributed as
∗ ∼ 𝑛𝑓−𝑛𝑠
85 Chris istin tine Soph phie ie David vid Jacob cob Bob Bob Chris istin tine Soph phie ie David vid Jacob cob Bob Bob Chris istin tine Sophie phie David vid Jacob cob Bob Bob
𝜏 𝑡, 𝑢 ≈ 1 𝑞 𝑛 − 1 𝑆
𝑘 𝑡(𝑗) 𝑛 𝑗=1 𝑞 𝑘=1
𝜏 𝑡, 𝑢 ≈ 1 𝑞 𝑛 − 1 𝑆
𝑘 𝑡(𝑗) 𝑛 𝑗=1 𝑞 𝑘=1
𝑃 𝑞 𝑛 𝑊 + 𝑊 + 𝐹
Each graph Each node Breadth first search Each random label set
86
Site Typ ype e of site digg.com popular news site lxer.com linux and open source news exopolitics.blogs.com political blog mac.softpedia.com mac news and rumors gettheflick.blogspot.com pictures blog urbanplanet.org urban enthusiasts givemeaning.blogspot.com political blog talkgreen.ca environmental protection blog curriki.org educational site pcworld.com technology news
87
88
89
time
0 𝑢1 𝑢2 𝑢𝑗 𝑢𝑛 … … 𝑈
Text Image Audio Other simultaneously measured time-series
Nan et al. AISTATS 203 Nan et al. KDD 2015
90
bird migration influenza spread Crime
Smart city
91
Time
Nan et al. KDD 2015
92
Recurrent Chinese Restaurant Process Dirichlet Hawkes Process 𝜄𝑜|𝜄1:𝑜−1 ∼ ℎ𝑙(𝑢𝑜) ℎ𝑙′(𝑢𝑜)
𝑙′
+ 𝛽 𝜀 𝜄𝑙 + 𝛽 ℎ𝑙′(𝑢𝑜)
𝑙′
+ 𝛽 𝐻0 𝜄
𝑙
93
Temporal Dynamics Triggering Kernel
Each parametric form encodes our prior knowledge
Poisson Process Hawkes Process Self-Correcting Process Autoregressive Conditional Duration Process
Limitations
Model may be misspecified Hard to encode complex features or markers Hard to encode dependence structure
94
Recurrent neural network + Marked temporal point processes
95
hidden vector of RNN learns a nonlinear dependency
time and marker ers general conditional density
multinomial distribution for the markers
96
ACD Hawkes Self-Correcting Time Prediction Intensity Function Prediction Error
97
NYC Taxi Trading Stackoverflow MIMIC-II Time Prediction Marker Prediction
98
to and
Representation
1. Intensity function 2. Basic building blocks 3. Superposition
Modeling
1. Idea adoption 2. Network coevolution 3. Collaborative dynamics
Inference
1. Time-sensitive recommendation 2. Scalable Influence estimation
Learning
1. Sparse hidden diffusion networks 2. Low rank collaborative dynamics 3. Generic algorithm
Learning sparse interdependency structure of continuous-time information diffusions Scalable continuous-time influence estimation and maximization Learning multivariate Hawkes processes with different structural constraints, like: sparse, low-rank, customized triggering kernels Learning low-rank Hawkes processes for time-sensitive recommendations Efficient simulation of standard multivariate Hawkes processes Learning multivariate self-correcting processes Simulation of customized general temporal point processes Basic residual analysis and model checking of customized temporal point processes Visualization of triggering kernels, intensity functions, and simulated events
100
https://github.com/dunan/MultiVariatePointProcess
101
∆t ∆t
102
Sequence 1
Sequence 2
Sequence 3
103
104
105
https://github.com/dunan/MultiVariatePointProcess/blob/master/example/learning_network_structu re_exp_kernel.cc
106
107
𝜚𝑘𝑗 𝑢 − 𝑢𝑘 = 𝛽𝑚𝑙 𝜐𝑚, 𝑢 − 𝑢𝑘
𝑛 𝑚=1
𝜐1 𝜐2 𝜐3 𝜐4 𝜐5 𝜐6 𝜐7 𝑈
𝑙(𝜐, 𝑢 − 𝑢𝑘)
108
https://github.com/dunan/MultiVariatePointProcess/blob/master/example/learning_network_struct ure_general_kernel.cc
109
110
https://github.com/dunan/MultiVariatePointProcess/blob/master/example/influence_maximizatio n.cc
111
For the demo, we assume pairwise Weibull distribution For each edge, we have: scale parameter shape parameter
112
113
114
115
116
From to predict
https://github.com/dunan/MultiVariatePointProcess/blob/master/example/learning_lo wrank_hawkes.cc
117
user-id u, item-id i, time1, time2, time3, ……
time
118
119
120
121
122
123
Learning standard Hawkes processes Support customized triggering kernels for Hawkes Learning standard self-correcting processes Support customized point processes Basic residual analysis Efficient simualtions …… Check out the project website
http://www.cc.gatech.edu/%7Endu8/ptpack/html/index.html
124