Learning Temporal Point Processes via Reinforcement Learning Shuang - - PowerPoint PPT Presentation

learning temporal point processes via reinforcement
SMART_READER_LITE
LIVE PREVIEW

Learning Temporal Point Processes via Reinforcement Learning Shuang - - PowerPoint PPT Presentation

Learning Temporal Point Processes via Reinforcement Learning Shuang Li 1 , Shuai Xiao 2 , Shixiang Zhu 1 , Nan Du 3 , Yao Xie 1 , Le Song 1,2 1 Georgia Institute of Technology 2 Ant Financial 3 Google Brain Motivation 1:30 pm 1:18 pm 1:00 pm


slide-1
SLIDE 1

Learning Temporal Point Processes via Reinforcement Learning

Shuang Li1, Shuai Xiao2, Shixiang Zhu1, Nan Du3, Yao Xie1, Le Song1,2

1Georgia Institute of Technology 2Ant Financial 3Google Brain

slide-2
SLIDE 2

Motivation

u Event data : tweets/retweets, crime events, earthquakes,

patient visits to hospital, finance transactions, ….

u Learn temporal pattern of event data.

u Event time is random u Complex dependency structure

time

!

"# "$ "%

History &'

" ?

David

1:00 pm Cool picture 1:18 pm Funny joke 1:30 pm Dinner together?

( " ∈ 0 ∪ ,-

slide-3
SLIDE 3

Point Process Model

u

Intensity function where 𝑂[𝑢, 𝑢 + 𝑒𝑢) is the number of events falling in the set [𝑢, 𝑢 + 𝑒𝑢). 𝜇 𝑢|ℋ+ 𝑒𝑢 = 𝔽 𝑂[𝑢, 𝑢 + 𝑒𝑢)]|ℋ+

Point Process 𝝁𝜾 𝒖|𝓘𝒖 Temporal Pattern

Poisson

constant

Inhomogeneous Poisson

𝜇4(𝑢)

Hawkes

𝜈 + 𝛽 8 exp −|𝑢 − 𝑢=|

  • +?∈𝓘A

time

!

"# "$ "%

Da David

"&

time

!

"# "$ "%

Da David

"& "' "# "$ "% "& "'

time

!

"# "$ "%

Da David

slide-4
SLIDE 4

Traditional Maximum-Likelihood Framework

u Model conditional intensity

𝜇4 𝑢|ℋ+ as a parametric/non-parametric form.

u Learn model by maximizing likelihood

𝑄 𝑢C, 𝑢D, … , 𝑢F = exp − G 𝜇4 𝑢 ℋ+

  • H,I

𝑒𝑢 J 𝜇4(𝑢=|ℋ+)

  • =

Model- misspecification!

"# "$ "% "& "'

time

!

"# "$ "%

Da David

slide-5
SLIDE 5

New Reinforcement Learning Framework

u

Learn policy 𝜌4 𝑏 𝑡+ = 𝑞 𝑢= 𝑢=OC, … , 𝑢C) where 𝑏 ∈ RQ is the next event time, to maximize cumulative reward .

u

Learn reward 𝑠(𝑏) to guide policy to imitate observed event data (expert).

imitate

time

!

density "∗ $ ≔ "($|())

$+ $, $-

History ()

$

id !" ℎ$ ℎ" !% ℎ% !& ℎ& '((!|+,): LSTM .(!") .(!%) .(!&)

David

slide-6
SLIDE 6

Optimal Reward

u

Inverse Reinforcement Learning 𝑠∗ = max

V∈ℱ

( 𝔽XYZX[\ 8 𝑠 𝑢=

  • =

− max

]^ 𝔽]^ 8 𝑠 𝑏=

  • =

)

u

Choose 𝑠 ∈ ℱ be unit ball in Reproducing Kernel Hilbert Space (RKHS). We obtain analytical optimal reward

u

Given L expert trajectories, M generated trajectories by policy 𝑠̂∗ 𝑏 ∝ 1 𝑀 8 8 𝑙 𝑢=

(d), 𝑏 − =eC f deC

1 𝑁 8 8 𝑙 𝑏=

(h), 𝑏 =eC i heC

where 𝑙 𝑢, 𝑢′ is a universal RKHS kernel.

mean embedding of expert intensity function mean embedding of policy intensity function

slide-7
SLIDE 7

Modeling Framework

Policy Gradient

expert

"# (&|()) &+ &, &- &.

/∗(&+)/∗(&,) /∗(&-) /∗(&.) update optimal reward

1+ 1, 1-

slide-8
SLIDE 8

Numerical Results

u

Our method: RLPP

u

Baselines:

u

State-of-the-art methods: RMTPP(Du et al. 2016 KDD ) WGANTPP (Xiao et al. 2017 NIPS)

u

Parametric baselines: Inhomogeneous Poisson (IP), Hawkes (SE), Self-correcting (SC)

u

Comparison of learned empirical intensity

u

Comparison of runtime

2 4 6

4 8 12

Time Index Intensity

Methods Real RLPP WGAN IP SE RMTPP SC

0.0 2.5 5.0 7.5

4 8 12

Time Index Intensity

Methods Real RLPP WGAN IP SE RMTPP SC

1 2 3 4

4 8 12

Time Index Intensity

Methods Real RLPP WGAN IP SE RMTPP SC

Method RLPP WGANTPP RMTPP SE SC IP Time 80 m 1560m 60m 2m 2m 2m Ratio 40x 780x 30x 1x 1x 1x

slide-9
SLIDE 9

Poster

u Tue Dec 4th 05:00 -- 07:00 PM u @ Room 210 & 230 AB #124