restarted bayesian online change point detector achieves
play

Restarted Bayesian Online Change-point Detector achieves Optimal - PowerPoint PPT Presentation

Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay Reda ALAMI Joint work with Odalric Maillard and Raphael F eraud. reda.alami@total.com Presented at ICML 2020 Overview A pruning version of the Bayesian


  1. Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay Reda ALAMI Joint work with Odalric Maillard and Raphael F´ eraud. reda.alami@total.com Presented at ICML 2020

  2. Overview ◮ A pruning version of the Bayesian Online Change-point Detector. 2/14

  3. Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: 2/14

  4. Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. 2/14

  5. Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . 2/14

  6. Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . ◮ The detection delay is asymptotically optimal 2/14

  7. Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . ◮ The detection delay is asymptotically optimal (reaching the existing lower bound [Lai and Xing, 2010]) . 2/14

  8. Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . ◮ The detection delay is asymptotically optimal (reaching the existing lower bound [Lai and Xing, 2010]) . ◮ Empirical comparisons with the original BOCPD [Fearnhead and Liu, 2007] 2/14

  9. Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . ◮ The detection delay is asymptotically optimal (reaching the existing lower bound [Lai and Xing, 2010]) . ◮ Empirical comparisons with the original BOCPD [Fearnhead and Liu, 2007] and the Improved Generalized Likelihood Ratio test [Maillard, 2019]. 2/14

  10. Setting & Notations 3/14

  11. Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . 3/14

  12. Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . ◮ Piece-wise stationary process: ∀ c ∈ [1 , C ] , ∀ t ∈ T c = [ τ c , τ c +1 ) µ t = θ c 3/14

  13. Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . ◮ Piece-wise stationary process: ∀ c ∈ [1 , C ] , ∀ t ∈ T c = [ τ c , τ c +1 ) µ t = θ c ◮ Sequence of observations: x s : t = ( x s , ...x t ) . 3/14

  14. Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . ◮ Piece-wise stationary process: ∀ c ∈ [1 , C ] , ∀ t ∈ T c = [ τ c , τ c +1 ) µ t = θ c ◮ Sequence of observations: x s : t = ( x s , ...x t ) . ◮ Length: n s : t = t − s + 1 . 3/14

  15. Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . ◮ Piece-wise stationary process: ∀ c ∈ [1 , C ] , ∀ t ∈ T c = [ τ c , τ c +1 ) µ t = θ c ◮ Sequence of observations: x s : t = ( x s , ...x t ) . ◮ Length: n s : t = t − s + 1 . 3/14

  16. Bayesian Online Change-point Detector Runlength inference Runlength inference 4/14

  17. Bayesian Online Change-point Detector Runlength inference Runlength inference Runlength r t : number of time steps since the last change-point. � ∀ r t ∈ [0 , t − 1] p ( r t | x 1: t ) ∝ p ( r t | r t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) � �� � � �� � � �� � r t − 1 ∈ [0 ,t − 2] Runlength distribution at t hazard UPM 4/14

  18. Bayesian Online Change-point Detector Runlength inference Runlength inference Runlength r t : number of time steps since the last change-point. � ∀ r t ∈ [0 , t − 1] p ( r t | x 1: t ) ∝ p ( r t | r t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) � �� � � �� � � �� � r t − 1 ∈ [0 ,t − 2] Runlength distribution at t hazard UPM Constant hazard rate assumption ( h ∈ (0 , 1) ) (geometric inter-arrival time of change-point): 4/14

  19. Bayesian Online Change-point Detector Runlength inference Runlength inference Runlength r t : number of time steps since the last change-point. � ∀ r t ∈ [0 , t − 1] p ( r t | x 1: t ) ∝ p ( r t | r t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) � �� � � �� � � �� � r t − 1 ∈ [0 ,t − 2] Runlength distribution at t hazard UPM Constant hazard rate assumption ( h ∈ (0 , 1) ) (geometric inter-arrival time of change-point): � p ( r t = r t − 1 + 1 | x 1: t ) ∝ (1 − h ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) ∝ h � p ( r t = 0 | x 1: t ) r t − 1 p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) 4/14

  20. Bayesian Online Change-point Detector Runlength inference Runlength inference Runlength r t : number of time steps since the last change-point. � ∀ r t ∈ [0 , t − 1] p ( r t | x 1: t ) ∝ p ( r t | r t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) � �� � � �� � � �� � r t − 1 ∈ [0 ,t − 2] Runlength distribution at t hazard UPM Constant hazard rate assumption ( h ∈ (0 , 1) ) (geometric inter-arrival time of change-point): � p ( r t = r t − 1 + 1 | x 1: t ) ∝ (1 − h ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) ∝ h � p ( r t = 0 | x 1: t ) r t − 1 p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) is computed via the Laplace predictor as MLE: � � t i = s x i +1 if x t +1 = 1 n s : t +2 Lp ( x t +1 | x s : t ) := � t i = s (1 − x i )+1 if x t +1 = 0 n s : t +2 4/14

  21. Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) 5/14

  22. Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) � (1 − h ) exp ( − l s,t ) v s,t − 1 ∀ s < t, v s,t = h � t − 1 i =1 exp ( − l i,t ) v i,t − 1 s = t . 5/14

  23. Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) � (1 − h ) exp ( − l s,t ) v s,t − 1 ∀ s < t, v s,t = h � t − 1 i =1 exp ( − l i,t ) v i,t − 1 s = t . Instantaneous loss: l s,t := − log Lp ( x t | x s ′ : t − 1 ) . 5/14

  24. Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) � � � � (1 − h ) n s : t h I { s � =1 } exp − � L s : t V s ∀ s < t, (1 − h ) exp ( − l s,t ) v s,t − 1 ∀ s < t, v s,t = v s,t = h � t − 1 i =1 exp ( − l i,t ) v i,t − 1 s = t . hV t s = t. Instantaneous loss: l s,t := − log Lp ( x t | x s ′ : t − 1 ) . 5/14

  25. Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) � � � � (1 − h ) n s : t h I { s � =1 } exp − � L s : t V s ∀ s < t, (1 − h ) exp ( − l s,t ) v s,t − 1 ∀ s < t, v s,t = v s,t = h � t − 1 i =1 exp ( − l i,t ) v i,t − 1 s = t . hV t s = t. Instantaneous loss: l s,t := − log Lp ( x t | x s ′ : t − 1 ) . L s : t := � t s ′ = s l s,t : cumulative loss and V t = � t � s =1 v s,t 5/14

  26. Main difficulty to provide the theoretical guarantees Lemma (Computing the initial weight V t ) 6/14

  27. Main difficulty to provide the theoretical guarantees Lemma (Computing the initial weight V t ) � � k − 1 t − 1 � h V t = (1 − h ) t − 2 ˜ V k : t , 1 − h k =1 . 6/14

  28. Main difficulty to provide the theoretical guarantees Lemma (Computing the initial weight V t ) � � k − 1 t − 1 � h V t = (1 − h ) t − 2 ˜ V k : t , where: 1 − h k =1 t − ( k − 1) � � � � � � t − k t − 2 k − 2 � � � � ˜ − � − � − � V k : t = exp × exp × exp ... L 1: i 1 L i j +1: i j +1 L i k − 1 +1: t − 1 , i 1 =1 i 2 = i 1 +1 i k − 1 = i k − 2 +1 j =1 . 6/14

  29. Main difficulty to provide the theoretical guarantees Lemma (Computing the initial weight V t ) � � k − 1 t − 1 � h V t = (1 − h ) t − 2 ˜ V k : t , where: 1 − h k =1 t − ( k − 1) � � � � � � t − k t − 2 k − 2 � � � � ˜ − � − � − � V k : t = exp × exp × exp ... L 1: i 1 L i j +1: i j +1 L i k − 1 +1: t − 1 , i 1 =1 i 2 = i 1 +1 i k − 1 = i k − 2 +1 j =1 � t − 2 � t − ( k − 1) t − k t − 2 � � � 1 = ... . with: k − 1 i 1 =1 i 2 = i 1 +1 i k − 1 = i k − 2 +1 6/14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend