On Some Mixture Models for INAR(1) Processes Helton Graziadei, - PowerPoint PPT Presentation

On Some Mixture Models for INAR(1) Processes Helton Graziadei, Paulo Marques and Hedibert Lopes IME-USP & Insper November 1st, 2019

Outline 1 Introduction 2 The AdINAR(1) Model 3 Learning the latent pattern of heterogeneity in time series of counts 4 Future work

Introduction

Time series of counts arise in a wide range of applications such as econometrics, public policy and environmental studies. Traditional time series models consider continuously valued processes. In count scenarios, continuous time series models are not suitable for analyzing discrete data. We WILL NOT pursue the well known class of generalized dynamic linear models. We assume here a special autoregressive structure for discrete variables [Alzaid and Al-Osh, 1987, McKenzie, 1985]. We consider some mixture models on the innovation process as a means to improve forecasting accuracy.

INAR(1) process Consider a Markov process { Y t } t ∈ N represented by the following functional form [McKenzie, 1985, Alzaid and Al-Osh, 1987]: Y t = α ◦ Y t − 1 + Z t , �� Count at time t Innovation at time t Survivors from t − 1 where Y t − 1 � M t = α ◦ Y t − 1 = B i ( t ) , i =1 is refereed to here as maturation at time t , and { B i ( t ) } is a collection of independent Bernoulli( α ) random variables. The original formulation assumes that Z t follows a parametric model, usually a Poisson or a Geometric distribution.

Our contributions 1 Model Z t via a Poisson-Geometric mixture to account for over-dispersion in time series of counts. 2 Develop a semi-parametric model based on the Dirichlet Process in order to learn the patters of heterogeneity in time series of counts. 3 Investigate the Pitman-Yor process to robustify inference for the number of clusters.

The AdINAR(1) Model

The AdINAR(1) model is defined such that Z t is a mixture of a Geometric and a Poisson distributions z t | θ, λ, w ∼ w Geometric ( θ ) + (1 − w ) Poisson ( λ ) t = 2 , . . . , T , w ∈ [0 , 1] . As w becomes large, the innovation is contaminated by the Geometric distribution in the mixture, increasing variability of the process.

w = 0.1 w = 0.1 30 30 20 20 Count Count 10 10 0 0 0 0 25 25 50 50 75 75 100 100 Time Time w = 0.9 w = 0.9 30 30 20 20 Count Count 10 10 0 0 0 0 25 25 50 50 75 75 100 100 Time Time Figure: Typical simulated series for w = 0 . 1 and w = 0 . 9.

The joint distribution of ( Y 1 , . . . , Y T ), given α and λ , can be written as T � p ( y 1 , . . . , y T | α, θ, λ, w ) = p ( y t | y t − 1 , α, θ, λ, w ) . t =2

The joint distribution of ( Y 1 , . . . , Y T ), given α and λ , can be written as T � p ( y 1 , . . . , y T | α, θ, λ, w ) = p ( y t | y t − 1 , α, θ, λ, w ) . t =2 The likelihood function of y = ( y 2 , . . . , y T ) is directly derived: Hence, the AdINAR(1) model likelihood function is given by min { y t − 1 , y t } T � y t − 1 � � � α m t (1 − α ) y t − 1 − m t × L y ( α, θ, λ, w ) = m t t =2 m t =0 � � w × θ (1 − θ ) y t − m t + (1 − w ) × e − λ λ y t − m t . ( y t − m t )!

Reparameterizaton Let us introduce some new items. Let M = ( M 2 , . . . , M T ) be the set of maturations. Let the model be augmented by the latent varables u = ( u 2 , . . . , u T ) such that u t = 1 , if z t | θ ∼ Geometric ( θ ) or u t = 0 , if z t | λ ∼ Poisson ( λ ) , for t = 2 , . . . , T .

Conditionally conjugate priors Beta ( a ( α ) 0 , b ( α ) Thinning: α ∼ 0 ) Beta ( a ( w ) , b ( w ) Weight: w ∼ ) 0 0 Beta ( a ( θ ) 0 , b ( θ ) Geometric: θ ∼ 0 ) Gamma ( a ( λ ) 0 , b ( λ ) Poisson: λ ∼ 0 )

Simpler conditional distributions Postulate that: p ( y t | m t , u t = 1) = θ (1 − θ ) y t − m t I { m t , m t +1 ,... } ( y t ) , p ( y t | m t , u t = 0) = e − λ λ y t − m t ( y t − m t )! I { m t , m t +1 ,... } ( y t ) , � y t − 1 � α m t (1 − α ) y t − 1 − m t . p ( m t | α, y t − 1 ) = m t for t = 2 , . . . , T . It is possible to show that using these conditional distributions, we recover the original likelihood.

Full conditionals The full conditional distributions are simply derived: � � T T � � a ( α ) m t , b ( α ) ( α | . . . ) ∼ Beta + + ( y t − 1 − m t ) 0 0 t =2 t =2 � � T T � � a ( w ) u t , b ( w ) ( w | . . . ) ∼ Beta + + ( T − 1) − u t 0 0 t =2 t =2   T � �  a ( θ ) u t , b ( θ ) ( θ | . . . ) ∼ Beta + + ( y t − m t )  0 0 t =2 { t : u t =1 }   T � �  a ( λ ) ( y t − m t ) , b ( λ ) ( λ | . . . ) ∼ Gamma + + ( T − 1) − u t  0 0 t =2 { t : u t =0 }

Full conditionals Additionally, Pr { U t = 1 | . . . } ∝ w θ (1 − θ ) y t − m t ; Pr { U t = 0 | . . . } ∝ (1 − w ) e − λ λ y t − m T ( y t − m t )! , and Pr { M t = m t | . . . }  � � m t 1 α  if u t = 1   ( y t − 1 − m t )! m t ! (1 − θ )(1 − α ) � � m t ∝ 1 α  if u t = 0   ( y t − m t )! ( y t − 1 − m t )! m t ! λ (1 − α ) for t = 2 , . . . , T , m t = 0 , 1 , . . . , min { y t , y t − 1 } .

Direct acyclic graph w U t − 1 U t θ λ Y t − 1 Y t M t − 1 M t t = 2 , . . . , T α

This mixture distribution allows the model to account for overdispersion in a time series of counts and accommodate inflation of zeros. In what follows, we extend the 2-component mixture of distributions by a generalized, DP-based version of the INAR(1) model.

Learning the latent pattern of heterogeneity in time series of counts

The Dirichlet Process Given a measurable space ( X , B ) and a probability space (Ω , F , Pr), a random probability measure G is a mapping G : B × Ω → [0 , 1]. Definition (Ferguson, 1973): Let α be a finite non-null measure on ( X , B ). We say G is a Dirichlet process if, for every measurable partition { B 1 , . . . , B k } of X , the random vector ( P ( B 1 ) , . . . , P ( B k )) follows a Dirichlet distribution with parameter vector ( α ( B 1 ) , . . . , α ( B k )). Let τ = α ( X ) be the concentration parameter and, for every B ∈ B , G 0 ( B ) = α ( B ) /α ( X ) the base measure which leads to a suitable parametrization in terms of a probability measure. Under this formulation, we denote G ∼ DP ( τ G 0 ).

The Dirichlet Process 1 E( G ( B )) = G 0 ( B ) . 2 Var( G ( B )) = G 0 ( B )(1 − G 0 ( B )) . τ +1 3 Assume that, given a Dirichlet process G with parameter α , X 1 , . . . , X n are conditionally independent and identically distributed such that P ( X i ∈ B | G ) = G ( B ) i = 1 , . . . , n , then G | X 1 , . . . , X n ∼ DP ( β ), where β ( C ) = α ( C ) + � n i =1 I C ( X i ). 4 As shown by [Blackwell and MacQueen, 1973] the predictive distribution of X n +1 , n ≥ 1, given X 1 , . . . , X n may be obtained integrating out G , which entails that n τ 1 � X n +1 | X 1 , . . . , X n ∼ τ + nG 0 + δ X i , τ + n i =1 where δ x denotes a point mass on x .

Dirichlet and Pitman-Yor Processes The discrete parcel in the predictive distribution implies the clustering property of the Dirichlet process, which induces a probability distribution on the number of distinct values in ( X 1 , . . . , X n ), which we denote by k . [Pitman and Yor, 1997] generalized the Dirichlet process introducing a discount parameter σ , The predictive distribution for the Pitman-Yor process is given by: � � n X n +1 | X 1 , . . . , X n ∼ τ + k σ 1 1 − σ � τ + n G 0 + δ X i , τ + n n i i =1 where n i is the number of elements in ( X 1 , . . . , X n ) equal to X i , σ ∈ [0 , 1].

The Pitman-Yor process with high σ induces less informative prior distributions for K [Pitman and Yor, 1997, De Blasi et al., 2013]. σ = 0 σ = 0.25 0.12 0.20 0.10 0.15 0.08 p(k) p(k) 0.06 0.10 0.04 0.05 0.02 0.00 0.00 1 2 3 4 5 6 7 8 9 10 12 1 3 5 7 9 11 13 15 17 19 21 23 k k σ = 0.5 σ = 0.75 0.06 0.025 0.05 0.020 0.04 0.015 p(k) p(k) 0.03 0.010 0.02 0.005 0.01 0.00 0.000 1 5 9 13 18 23 28 33 39 44 1 6 12 19 26 33 40 47 54 61 68 75

In the INAR(1) structure, we now assume the innovation process is time-varying, i.e., E ( Z t ) = λ t . From a realization of the process y 1 , . . . , y T , we want to learn the distribution of each λ t and represent our uncertainties about the future steps Y T +1 , . . . , Y T + h in order to forecast them. We create clusters of innovation rates as a means to learn the latent patterns of heterogeneity in the count time series.

DAG τ G λ 2 λ 3 λ T − 1 λ T . . . Y 1 Y 2 Y 3 Y T − 1 Y T . . . M 2 M 3 M T − 1 M T . . . α

Let y = ( y 1 , . . . , y T ) and m = ( m 2 , . . . , m T ). To obtain the posterior p ( α, λ, m ) we integrate out the random distribution P . From the parametric part in the graph, we have that: � p ( y , m , α, λ ) = p ( y , m , α, λ | G ) d µ G ( G ) � T � � = p ( y t | m t , λ t ) p ( m t | y t − 1 , α ) × t =2 T � � π ( α ) × p ( λ t | G ) d µ G ( G ) . t =2 The random vector ( λ 2 , . . . , λ T ) has an exchangeable distribution.

On Some Mixture Models for INAR(1) Processes Helton Graziadei, - PowerPoint PPT Presentation

On Some Mixture Models for INAR(1) Processes Helton Graziadei, Paulo Marques and Hedibert Lopes IME-USP & Insper November 1st, 2019 Outline 1 Introduction 2 The AdINAR(1) Model 3 Learning the latent pattern of heterogeneity in time series

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

I EEE PES Sem inar I EEE PES Sem inar Sm art Grid I nfrastructure from a Gray Box Perspective

Sem inar on Scientific Soft Skills Sem inar organization Jaroslav Kivnek, MFF UK

Some algorithms to fit some reliability mixture models under censoring Laurent Bordes Didier

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Simulation-Based Tracing and Profiling for System Software Development Anselm Busse , Reinhardt

RSP models Daniele DellAglio dellaglio@ifi.uzh.ch http://dellaglio.org @dandellaglio Share,

SDProber: Software Defined Prober Sivaramakrishnan Ramanathan 1 , Yaron Kanza 2 and Balachander

Porting the Tenet Real-Time Protocol Suite to a HIPPI Network Bruce A. Mah and Domenico Ferrari

Near-Optimal Adaptive Control of a Large Grid Application Det Buaklee Greg Tracy Mary

System Performance Analysis Methodologies Brendan Gregg Senior Performance Architect Apollo

A Quick Math Review Logarithms and Exponents - properties of logarithms: log b (xy) = log b x

Modeling wildfire and air quality under c limate change Don McKenzie Pacific WIldland Fire

On Some Mixture Models for INAR(1) Processes Helton Graziadei, - PowerPoint PPT Presentation

On Some Mixture Models for INAR(1) Processes Helton Graziadei, Paulo Marques and Hedibert Lopes IME-USP & Insper November 1st, 2019 Outline 1 Introduction 2 The AdINAR(1) Model 3 Learning the latent pattern of heterogeneity in time series

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

I EEE PES Sem inar I EEE PES Sem inar Sm art Grid I nfrastructure from a Gray Box Perspective

Sem inar on Scientific Soft Skills Sem inar organization Jaroslav Kivnek, MFF UK

Some algorithms to fit some reliability mixture models under censoring Laurent Bordes Didier

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Simulation-Based Tracing and Profiling for System Software Development Anselm Busse , Reinhardt

RSP models Daniele DellAglio dellaglio@ifi.uzh.ch http://dellaglio.org @dandellaglio Share,

SDProber: Software Defined Prober Sivaramakrishnan Ramanathan 1 , Yaron Kanza 2 and Balachander

Porting the Tenet Real-Time Protocol Suite to a HIPPI Network Bruce A. Mah and Domenico Ferrari

Near-Optimal Adaptive Control of a Large Grid Application Det Buaklee Greg Tracy Mary

System Performance Analysis Methodologies Brendan Gregg Senior Performance Architect Apollo

A Quick Math Review Logarithms and Exponents - properties of logarithms: log b (xy) = log b x

Modeling wildfire and air quality under c limate change Don McKenzie Pacific WIldland Fire

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.