Local Independence Tests for Point Processes Learning causality in - PowerPoint PPT Presentation

Local Independence Tests for Point Processes Learning causality in event models Nikolaj Thams, University of Copenhagen November 21 st , 2019 Time to Event Data and Machine Learning Workshop Joint work with Niels Richard Hansen

Hawkes Processes Causality Local independence test Experimental results Conclusion

Learning causality in event models? b h c a 0 T Time

Hawkes Processes

Point process T 1 s d s . If the compensator A k t N t Point processes T 2 T 3 where T k T k i of random measures random A point process with marks V = { 1 , . . . , d } is a collection N k = ∑ i , i is the i ’th event of type k . This defjnes processes t �→ N k t := N k ( 0 , t ] . ∫ t 0 λ k s d s for some λ k , λ k is the intensity of N k . t of N k t equals ∫ t Observe that E N k t = 0 E λ k Famous examples: Poisson process ( λ t constant) and Hawkes process (next slide).

Hawkes processes This motivates using graphs for summarizing dependencies: 1 Hawkes process 2 The process with intensity: ∫ t − ∑ ∑ ∑ λ k t = β k 0 + g vk ( t − s ) N ( d s ) = β k 0 + g vk ( t − s ) −∞ v ∈ V v ∈ V s < t is called the (linear) Hawkes process, with kernels g for some integrable functions g . 1 e − β vk 2 ( x ) . E.g. g vk ( x ) = β vk N1 N2 0.6 0.5 Process Intensity N1 0.4 N2 0.3 0 5 10 15 20 0 5 10 15 20 Time

Causality

• The global Markov property if A • Faithfulness A The global Markov property and faithfullness is the motivation for developing Causal inference A graph ̏ satisfjes, in conjunction with a separation criterion B C A C P B P B C . A B C satisfjes: c . Static system Essential assumption: Also describes the system under interventions X i c X 3 X 2 X 1 X 3 X 2 X 1 parents in a graph. Structural Causal Models (SCMs) consist of functional assignments, summarized by conditional independence tests in causality. See (Peters et al. 2017) for details. X i = f i ( X pa i , ϵ i ) , i ∈ V

• Faithfulness A The global Markov property and faithfullness is the motivation for developing Causal inference satisfjes: B C A C P B P B C . A B C • The global Markov property if A A graph ̏ satisfjes, in conjunction with a separation criterion Static system X 3 X 2 X 1 X 3 X 2 X 1 parents in a graph. Structural Causal Models (SCMs) consist of functional assignments, summarized by conditional independence tests in causality. See (Peters et al. 2017) for details. X i = f i ( X pa i , ϵ i ) , i ∈ V := c Essential assumption: Also describes the system under interventions X i := c .

Causal inference X 1 The global Markov property and faithfullness is the motivation for developing Static system X 3 X 2 conditional independence tests in causality. See (Peters et al. 2017) for details. X 3 X 2 X 1 parents in a graph. Structural Causal Models (SCMs) consist of functional assignments, summarized by X i = f i ( X pa i , ϵ i ) , i ∈ V := c Essential assumption: Also describes the system under interventions X i := c . A graph ̏ satisfjes, in conjunction with a separation criterion ⊥ satisfjes: • The global Markov property if A ⊥ B | C = = ⇒ A P B | C . | P B | C = ⇒ A ⊥ B | C = • Faithfulness A |

Causal inference: Dynamical system X 3 X 2 X 1 t 3 X 3 t 3 X 2 t 3 X 1 Causal ideas have been generalized the dynamical setting, e.g. (Didelez 2008; t 2 t 2 X 2 t 2 X 1 t 1 X 3 t 1 X 2 t 1 X 1 Mogensen, Malinsky, et al. 2018; Mogensen and Hansen 2018) X 3 . . . . . . ⇝ . . .

Volterra series t Under faithfulness assumptions, there exist algorithms for learning the causal graph depends only on events of C . t version Local independence for some C . In practice, this requires an empirical test for independence! t Let N be a marked point process. For subsets A , B , C ⊆ V , we say that B is locally independent of A given C if for every b ∈ B : λ b , A ∪ C t | F A ∪ C = E [ λ b ] ∈ F C and we write A ̸→ B | C . Heuristically, the intensity of b , when observing A ∪ C , (Meek 2014; Mogensen and Hansen 2018), by removing the edge a → b if a ̸→ b | C

Local independence test

C gener- Local independence test We want to test: c h k ally, to retain level? may not be a Hawkes process. So how to estimate Problem: If there are latent variables, the marginalized model Then t 0 j k t . We propose to fjt: t t H 0 : j ̸→ k | C Equivalently to test if λ k , C is a version of λ k , C ∪{ j } ∫ t k , C ∪{ j } g jk ( t − s ) N j ( d s ) + λ k , C λ = β 0 + H 0 : g jk = 0 will have the right level, if we estimate the true λ k , C .

Local independence test k c h k ally, to retain level? Problem: If there are latent variables, the marginalized model Then t 0 We want to test: j . We propose to fjt: t t t H 0 : j ̸→ k | C Equivalently to test if λ k , C is a version of λ k , C ∪{ j } ∫ t k , C ∪{ j } g jk ( t − s ) N j ( d s ) + λ k , C λ = β 0 + H 0 : g jk = 0 will have the right level, if we estimate the true λ k , C . may not be a Hawkes process. So how to estimate λ C gener-

Voltera approximations Voltera series for continuous systems. Theorem N , such that letting: P t N To develop a non-parametric fjt for λ C , we prove the following theorem, resembling Suppose that N is a stationary point process. There exist a sequence of functions h α ∫ t ∫ t ∑ ∑ h α N ( t − s 1 , · · · t − s n ) N α 1 ( d s 1 ) · · · N α n ( d s n ) λ N t = h 0 N + · · · −∞ −∞ n = 1 | α | = n − → λ C for N → ∞ . and λ N

Approximating intensity C x C t k 0 t C x C t In vector notation: A2: Approximate kernels using tensor splines λ C approximations A1: Approximate by 2 nd order iterated integrals. h α ( x 1 , . . . , x n ) ≈ ∑ d j n = 1 β α j 1 = 1 · · · ∑ d j 1 ,..., j n b j 1 ( x 1 ) · · · b j n ( x n ) ∫ t − t ( β ) = β 0 + ∑ λ C ( β v ) T Φ 1 ( t − s ) N v ( d s ) −∞ v ∈ C ∫ t − ∑ ( β v 1 v 2 ) T Φ 2 ( t − s 1 , t − s 2 ) N ( v 1 , v 2 ) ( d s 2 ) + −∞ v 1 , v 2 ∈ C v 2 ≥ v 1 =: β T Similarly for g jk , such that ∫ t k , C g jk ( t − s ) N j ( d s ) + λ k , C λ = β 0 + = β T x t + β T t =: β T x t

Maximum Likelihood Estimation x k t x t x T 0 T The likelihood is concave for linear intensities! s.t. We penalize with a roughness penalty: t d t approx 0 log 0 ∫ T ∫ T log L T ( β ) = ( ) N k ( d t ) − β T β T x t log L T ( β ) − κ 0 β T Ω β max β X β ≥ 0 The distribution of maximum likelihood estimate is approximately normal: ( ) ˆ ( I + 2 κ 0 ˆ T Ω) β 0 , ˆ T ˆ K T ˆ J − 1 J − 1 J − 1 β ∼ N with ˆ β T x t d t and ˆ J T = ˆ ∫ T K T = K T − 2 κ 0 Ω ˆ

Local Independence Test (1) Given the distribution of β = ( β, β C ) , we can test the hypothesis H 0 : j ̸→ k | C . How do we test Φ T β ≡ 0 ? • First idea: β approximately normal, so test directly β = 0. • Better idea (see Wood 2012), evaluate basis Φ in a grid G = { x 1 , . . . , x M } . Fitted function values over grid is thus Φ( G ) T β . If β ∼ N ( µ j , Σ j ) then Wald test statistic for null hypothesis Φ( G ) T µ j = 0 is: ] − 1 Φ( G ) T β T α = ( β ) T Φ( G ) [ Φ( G ) T Σ j Φ( G ) This is χ 2 ( M ) -distributed, and we can test for signifjcance of components!

• If test is accepted, conclude local independence. Summary of test We summarize our proposed test. To test j ̸→ k | C : • Approximate λ C by Voltera expansion at degree 2 and with spline-kernels. k , C ( β ) within model class by penalized MLE. • Fit λ • Test ϕ T β ≡ 0 using grid evaluation and Wald approximation.

Experimental results

Experiment 1: Testing various structures L 3 : P 3 : P 2 : P 1 : b h c a b h a b a We obtain acceptance rates: L 2 : L 1 : b h c a b h c a b c a In each of the following 7 structures, we test a ̸→ b | b , C : L 1 L 2 L 3 P 1 P 2 P 3 100% H 0 acceptance rate 80% Test outcome 60% Accepted 40% Rejected 20% 0% 1 2 1 2 1 2 1 2 1 2 1 2

Causal discovery We evaluate the performance in the CA-algorithm, which estimates the causal graph. c b a d c b a d d c b a a ̸→ b | { b , c , d } · · · → 0 T Time

Local Independence Tests for Point Processes Learning causality in - PowerPoint PPT Presentation

Local Independence Tests for Point Processes Learning causality in event models Nikolaj Thams, University of Copenhagen November 21 st , 2019 Time to Event Data and Machine Learning Workshop Joint work with Niels Richard Hansen Hawkes Processes

Quasi-Exact Tests for the dichotomous Rasch Model conditional independence (local

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

Comparing User-Provided Tests to Developer-Provided Tests Ren Just, Chris Parnin, Ian Drosos,

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Equivalence and Independence in Controlled Graph-Rewriting Processes ICGT@STAF 2018, Toulouse ar

Chapter 5.6: Tests for Independence Previously, we used parametric tests, e.g. is there any

Order Independence Krzysztof R. Apt CWI and University of Amsterdam Order Independence p.

Higher independence Vera Fischer University of Vienna February 4th, 2020 Vera Fischer

Categorical data Modelling and Independence R.W. Oldford Eikosograms - Dependence/independence

In vitro tests and experimental animal In vitro tests and experimental animal In vitro tests and

Generalized Measurement Invariance Tests for Proposed Proposed Tests Tests Factor Analysis

Hypothesis Tests using Excel T.TEST function V1e 11/12/2013 Two group hypothesis tests using

Hypothesis Tests using Z.TEST function in Excel 2008 V1c 11/16/2012 Hypothesis Tests [Excel

Gravity tests by atom interferometry: Gravity tests by atom interferometry: Gravity tests by atom

Implementing Processes Implementing Processes Review: Threads vs vs. Processes . Processes

The Perfect Storm: what is happening to the World ? 12 th November 2013 Imperial College

Relations among partitions. III: Some structures with three or four partitions R. A. Bailey

http://ctfs.si.edu/datasets/bci/ http://www.uwm.edu/People/schnitze/BCIcanopy2.jpg Prestons

Modeling Anisotropic Surface Reflectance with Example-Based Microfacet Synthesis Sean Bell and

Smooth Models of Mortality with Period Shocks Iain Currie & James Kirkby Heriot Watt

[ ] ( R G ) ( R B ) + if (B > G),

Ray Tracing Assignment Goal is to reproduce the following Whitted, 1980 1 Ray Tracing

Round-off Error Analysis of Explicit One-Step Numerical Integration Methods 24th IEEE Symposium on

Local Independence Tests for Point Processes Learning causality in - PowerPoint PPT Presentation

Local Independence Tests for Point Processes Learning causality in event models Nikolaj Thams, University of Copenhagen November 21 st , 2019 Time to Event Data and Machine Learning Workshop Joint work with Niels Richard Hansen Hawkes Processes

Quasi-Exact Tests for the dichotomous Rasch Model conditional independence (local

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

Comparing User-Provided Tests to Developer-Provided Tests Ren Just, Chris Parnin, Ian Drosos,

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Equivalence and Independence in Controlled Graph-Rewriting Processes ICGT@STAF 2018, Toulouse ar

Chapter 5.6: Tests for Independence Previously, we used parametric tests, e.g. is there any

Order Independence Krzysztof R. Apt CWI and University of Amsterdam Order Independence p.

Higher independence Vera Fischer University of Vienna February 4th, 2020 Vera Fischer

Categorical data Modelling and Independence R.W. Oldford Eikosograms - Dependence/independence

In vitro tests and experimental animal In vitro tests and experimental animal In vitro tests and

Generalized Measurement Invariance Tests for Proposed Proposed Tests Tests Factor Analysis

Hypothesis Tests using Excel T.TEST function V1e 11/12/2013 Two group hypothesis tests using

Hypothesis Tests using Z.TEST function in Excel 2008 V1c 11/16/2012 Hypothesis Tests [Excel

Gravity tests by atom interferometry: Gravity tests by atom interferometry: Gravity tests by atom

Implementing Processes Implementing Processes Review: Threads vs vs. Processes . Processes

The Perfect Storm: what is happening to the World ? 12 th November 2013 Imperial College

Relations among partitions. III: Some structures with three or four partitions R. A. Bailey

http://ctfs.si.edu/datasets/bci/ http://www.uwm.edu/People/schnitze/BCIcanopy2.jpg Prestons

Modeling Anisotropic Surface Reflectance with Example-Based Microfacet Synthesis Sean Bell and

Smooth Models of Mortality with Period Shocks Iain Currie &amp; James Kirkby Heriot Watt

[ ] ( R G ) ( R B ) + if (B &gt; G),

Ray Tracing Assignment Goal is to reproduce the following Whitted, 1980 1 Ray Tracing

Round-off Error Analysis of Explicit One-Step Numerical Integration Methods 24th IEEE Symposium on

Smooth Models of Mortality with Period Shocks Iain Currie & James Kirkby Heriot Watt

[ ] ( R G ) ( R B ) + if (B > G),