learning temporal point processes via reinforcement
play

Learning Temporal Point Processes via Reinforcement Learning Shuang - PowerPoint PPT Presentation

Learning Temporal Point Processes via Reinforcement Learning Shuang Li 1 , Shuai Xiao 2 , Shixiang Zhu 1 , Nan Du 3 , Yao Xie 1 , Le Song 1,2 1 Georgia Institute of Technology 2 Ant Financial 3 Google Brain Motivation 1:30 pm 1:18 pm 1:00 pm


  1. Learning Temporal Point Processes via Reinforcement Learning Shuang Li 1 , Shuai Xiao 2 , Shixiang Zhu 1 , Nan Du 3 , Yao Xie 1 , Le Song 1,2 1 Georgia Institute of Technology 2 Ant Financial 3 Google Brain

  2. Motivation 1:30 pm 1:18 pm 1:00 pm Dinner together? Funny joke Cool picture … ( " ∈ 0 ∪ , - ! time David " # " $ " % " ? History & ' u Event data : tweets/retweets, crime events, earthquakes, patient visits to hospital, finance transactions, …. u Learn temporal pattern of event data. u Event time is random u Complex dependency structure

  3. � Point Process Model Intensity function u 𝜇 𝑢|ℋ + 𝑒𝑢 = 𝔽 𝑂[𝑢, 𝑢 + 𝑒𝑢)]|ℋ + where 𝑂[𝑢, 𝑢 + 𝑒𝑢) is the number of events falling in the set [𝑢, 𝑢 + 𝑒𝑢) . Point Process Temporal Pattern 𝝁 𝜾 𝒖|𝓘 𝒖 Poisson constant ! time David Da " & " # " $ " % Inhomogeneous 𝜇 4 (𝑢) Poisson ! time David Da " # " $ " % " & " ' " # " $ " % " & " ' Hawkes 𝜈 + 𝛽 8 exp −|𝑢 − 𝑢 = | ! time + ? ∈𝓘 A Da David " # " $ " %

  4. � � Traditional Maximum-Likelihood Framework u Model conditional intensity 𝜇 4 𝑢|ℋ + Model- as a parametric/non-parametric form. misspecification! u Learn model by maximizing likelihood 𝑄 𝑢 C , 𝑢 D , … , 𝑢 F = exp − G 𝜇 4 𝑢 ℋ + 𝑒𝑢 J 𝜇 4 (𝑢 = |ℋ + ) H,I = " $ " # " % " & " ' ! time Da David " # " % " $

  5. New Reinforcement Learning Framework Learn policy u 𝜌 4 𝑏 𝑡 + = 𝑞 𝑢 = 𝑢 =OC, … , 𝑢 C ) where 𝑏 ∈ R Q is the next event time, to maximize cumulative reward . Learn reward u 𝑠(𝑏) to guide policy to imitate observed event data (expert). ' ( (!|+ , ) : LSTM ℎ $ ℎ " ℎ % ℎ & ! " ! % ! & imitate .(! " ) .(! % ) .(! & ) density " ∗ $ ≔ "($|( ) ) ! time David id $ + $ , $ - $ History ( )

  6. � � Optimal Reward Inverse Reinforcement Learning u 𝑠 ∗ = max ( 𝔽 XYZX[\ 8 𝑠 𝑢 = − max ] ^ 𝔽 ] ^ 8 𝑠 𝑏 = ) V∈ℱ = = Choose 𝑠 ∈ ℱ be unit ball in Reproducing Kernel Hilbert Space u (RKHS). We obtain analytical optimal reward Given L expert trajectories, M generated trajectories by policy u 𝑠̂ ∗ 𝑏 ∝ 1 f 1 i (d) , 𝑏 − (h) , 𝑏 𝑀 8 8 𝑙 𝑢 = 𝑁 8 8 𝑙 𝑏 = deC =eC heC =eC mean embedding of mean embedding of expert intensity function policy intensity function where 𝑙 𝑢, 𝑢′ is a universal RKHS kernel.

  7. Modeling Framework update optimal reward / ∗ (& + )/ ∗ (& , ) / ∗ (& - ) / ∗ (& . ) " # (&|( ) ) & + & , & - & . 0 Policy Gradient 1 + 1 , 1 - expert

  8. Numerical Results Our method: RLPP u Baselines: u State-of-the-art methods: RMTPP(Du et al. 2016 KDD ) WGANTPP (Xiao et al. 2017 NIPS) u Parametric baselines: Inhomogeneous Poisson (IP), Hawkes (SE), Self-correcting (SC) u Comparison of learned empirical intensity u 4 Methods Real RLPP WGAN IP SE RMTPP SC Methods Real RLPP WGAN IP SE RMTPP SC Methods Real RLPP WGAN IP SE RMTPP SC 6 7.5 3 4 Intensity Intensity Intensity 5.0 2 2 2.5 1 0 0 0.0 4 8 12 4 8 12 4 8 12 Time Index Time Index Time Index Comparison of runtime u Method RLPP WGANTPP RMTPP SE SC IP Time 80 m 1560m 60m 2m 2m 2m Ratio 40x 780x 30x 1x 1x 1x

  9. Poster u Tue Dec 4th 05:00 -- 07:00 PM u @ Room 210 & 230 AB # 124

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend