On Some Mixture Models for INAR(1) Processes Helton Graziadei, - - PowerPoint PPT Presentation

on some mixture models for inar 1 processes
SMART_READER_LITE
LIVE PREVIEW

On Some Mixture Models for INAR(1) Processes Helton Graziadei, - - PowerPoint PPT Presentation

On Some Mixture Models for INAR(1) Processes Helton Graziadei, Paulo Marques and Hedibert Lopes IME-USP & Insper November 1st, 2019 Outline 1 Introduction 2 The AdINAR(1) Model 3 Learning the latent pattern of heterogeneity in time series


slide-1
SLIDE 1

On Some Mixture Models for INAR(1) Processes

Helton Graziadei, Paulo Marques and Hedibert Lopes

IME-USP & Insper

November 1st, 2019

slide-2
SLIDE 2

Outline

1 Introduction 2 The AdINAR(1) Model 3 Learning the latent pattern of heterogeneity in time series

  • f counts

4 Future work

slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Time series of counts arise in a wide range of applications such as econometrics, public policy and environmental studies. Traditional time series models consider continuously valued

  • processes. In count scenarios, continuous time series models

are not suitable for analyzing discrete data. We WILL NOT pursue the well known class of generalized dynamic linear models. We assume here a special autoregressive structure for discrete variables [Alzaid and Al-Osh, 1987, McKenzie, 1985]. We consider some mixture models on the innovation process as a means to improve forecasting accuracy.

slide-5
SLIDE 5

INAR(1) process

Consider a Markov process {Yt}t∈N represented by the following functional form [McKenzie, 1985, Alzaid and Al-Osh, 1987]: Yt

  • Count at time t

= α ◦ Yt−1

  • Survivors from t − 1

+ Zt

  • Innovation at time t

, where Mt = α ◦ Yt−1 =

Yt−1

  • i=1

Bi(t), is refereed to here as maturation at time t, and {Bi(t)} is a collection of independent Bernoulli(α) random variables. The original formulation assumes that Zt follows a parametric model, usually a Poisson or a Geometric distribution.

slide-6
SLIDE 6

Our contributions

1 Model Zt via a Poisson-Geometric mixture to account for

  • ver-dispersion in time series of counts.

2 Develop a semi-parametric model based on the Dirichlet

Process in order to learn the patters of heterogeneity in time series of counts.

3 Investigate the Pitman-Yor process to robustify inference for

the number of clusters.

slide-7
SLIDE 7

The AdINAR(1) Model

slide-8
SLIDE 8

The AdINAR(1) model is defined such that Zt is a mixture of a Geometric and a Poisson distributions zt | θ, λ, w ∼ w Geometric(θ) + (1 − w) Poisson(λ) t = 2, . . . , T, w ∈ [0, 1]. As w becomes large, the innovation is contaminated by the Geometric distribution in the mixture, increasing variability of the process.

slide-9
SLIDE 9

10 20 30 25 50 75 100

Time Count

w = 0.1

10 20 30 25 50 75 100

Time Count

w = 0.9

10 20 30 25 50 75 100

Time Count

w = 0.1

10 20 30 25 50 75 100

Time Count

w = 0.9

Figure: Typical simulated series for w = 0.1 and w = 0.9.

slide-10
SLIDE 10

The joint distribution of (Y1, . . . , YT), given α and λ, can be written as p(y1, . . . , yT | α, θ, λ, w) =

T

  • t=2

p(yt | yt−1, α, θ, λ, w).

slide-11
SLIDE 11

The joint distribution of (Y1, . . . , YT), given α and λ, can be written as p(y1, . . . , yT | α, θ, λ, w) =

T

  • t=2

p(yt | yt−1, α, θ, λ, w). The likelihood function of y = (y2, . . . , yT) is directly derived: Hence, the AdINAR(1) model likelihood function is given by Ly(α, θ, λ, w) =

T

  • t=2

min{yt−1, yt}

  • mt=0

yt−1 mt

  • αmt(1 − α)yt−1−mt×
  • w × θ(1 − θ)yt−mt + (1 − w) × e−λλyt−mt

(yt − mt)!

  • .
slide-12
SLIDE 12

Reparameterizaton

Let us introduce some new items. Let M = (M2, . . . , MT) be the set of maturations. Let the model be augmented by the latent varables u = (u2, . . . , uT) such that ut = 1, if zt | θ ∼ Geometric(θ)

  • r

ut = 0, if zt | λ ∼ Poisson(λ), for t = 2, . . . , T.

slide-13
SLIDE 13

Conditionally conjugate priors

Thinning: α ∼ Beta(a(α)

0 , b(α) 0 )

Weight: w ∼ Beta(a(w) , b(w) ) Geometric: θ ∼ Beta(a(θ)

0 , b(θ) 0 )

Poisson: λ ∼ Gamma(a(λ)

0 , b(λ) 0 )

slide-14
SLIDE 14

Simpler conditional distributions

Postulate that: p(yt | mt, ut = 1) = θ(1 − θ)yt−mtI{mt,mt+1,...}(yt), p(yt | mt, ut = 0) = e−λλyt−mt (yt − mt)! I{mt,mt+1,...}(yt), p(mt | α, yt−1) = yt−1 mt

  • αmt(1 − α)yt−1−mt.

for t = 2, . . . , T. It is possible to show that using these conditional distributions, we recover the original likelihood.

slide-15
SLIDE 15

Full conditionals

The full conditional distributions are simply derived: (α | . . .) ∼ Beta

  • a(α)

+

T

  • t=2

mt, b(α) +

T

  • t=2

(yt−1 − mt)

  • (w | . . .) ∼ Beta
  • a(w)

+

T

  • t=2

ut, b(w) + (T − 1) −

T

  • t=2

ut

  • (θ | . . .) ∼ Beta

 a(θ) +

T

  • t=2

ut, b(θ) +

  • {t:ut=1}

(yt − mt)   (λ | . . .) ∼ Gamma  a(λ) +

  • {t:ut=0}

(yt − mt), b(λ) + (T − 1) −

T

  • t=2

ut  

slide-16
SLIDE 16

Full conditionals

Additionally, Pr {Ut = 1 | . . .} ∝ w θ(1 − θ)yt−mt; Pr {Ut = 0 | . . .} ∝ (1 − w) e−λλyt−mT (yt − mt)! , and Pr {Mt = mt | . . .} ∝        1 (yt−1 − mt)! mt!

  • α

(1 − θ)(1 − α) mt if ut = 1 1 (yt − mt)! (yt−1 − mt)! mt!

  • α

λ (1 − α) mt if ut = 0 for t = 2, . . . , T, mt = 0, 1, . . . , min{yt, yt−1}.

slide-17
SLIDE 17

Direct acyclic graph

w Ut Ut−1 Yt−1 Yt

t = 2, . . . , T

Mt−1 Mt θ λ α

slide-18
SLIDE 18

This mixture distribution allows the model to account for

  • verdispersion in a time series of counts and accommodate

inflation of zeros. In what follows, we extend the 2-component mixture of distributions by a generalized, DP-based version of the INAR(1) model.

slide-19
SLIDE 19

Learning the latent pattern of heterogeneity in time series of counts

slide-20
SLIDE 20

The Dirichlet Process

Given a measurable space (X , B) and a probability space (Ω, F, Pr), a random probability measure G is a mapping G : B × Ω → [0, 1]. Definition (Ferguson, 1973): Let α be a finite non-null measure

  • n (X , B). We say G is a Dirichlet process if, for every

measurable partition {B1, . . . , Bk} of X , the random vector (P(B1), . . . , P(Bk)) follows a Dirichlet distribution with parameter vector (α(B1), . . . , α(Bk)). Let τ = α(X ) be the concentration parameter and, for every B ∈ B, G0(B) = α(B)/α(X ) the base measure which leads to a suitable parametrization in terms of a probability measure. Under this formulation, we denote G ∼ DP(τ G0).

slide-21
SLIDE 21

The Dirichlet Process

1 E(G(B)) = G0(B). 2 Var(G(B)) = G0(B)(1−G0(B)) τ+1

.

3 Assume that, given a Dirichlet process G with parameter α,

X1, . . . , Xn are conditionally independent and identically distributed such that P(Xi ∈ B | G) = G(B) i = 1, . . . , n, then G | X1, . . . , Xn ∼ DP(β), where β(C) = α(C) + n

i=1 IC(Xi). 4 As shown by [Blackwell and MacQueen, 1973] the predictive

distribution of Xn+1, n ≥ 1, given X1, . . . , Xn may be obtained integrating out G, which entails that Xn+1 | X1, . . . , Xn ∼ τ τ + nG0 + 1 τ + n

n

  • i=1

δXi, where δx denotes a point mass on x.

slide-22
SLIDE 22

Dirichlet and Pitman-Yor Processes

The discrete parcel in the predictive distribution implies the clustering property of the Dirichlet process, which induces a probability distribution on the number of distinct values in (X1, . . . , Xn), which we denote by k. [Pitman and Yor, 1997] generalized the Dirichlet process introducing a discount parameter σ, The predictive distribution for the Pitman-Yor process is given by: Xn+1 | X1, . . . , Xn ∼ τ + kσ τ + n G0 + 1 τ + n

n

  • i=1
  • 1 − σ

ni

  • δXi,

where ni is the number of elements in (X1, . . . , Xn) equal to Xi, σ ∈ [0, 1].

slide-23
SLIDE 23

The Pitman-Yor process with high σ induces less informative prior distributions for K [Pitman and Yor, 1997, De Blasi et al., 2013].

0.00 0.05 0.10 0.15 0.20

σ = 0

k p(k) 1 2 3 4 5 6 7 8 9 10 12 0.00 0.02 0.04 0.06 0.08 0.10 0.12

σ = 0.25

k p(k) 1 3 5 7 9 11 13 15 17 19 21 23 0.00 0.01 0.02 0.03 0.04 0.05 0.06

σ = 0.5

p(k) 1 5 9 13 18 23 28 33 39 44 0.000 0.005 0.010 0.015 0.020 0.025

σ = 0.75

p(k) 1 6 12 19 26 33 40 47 54 61 68 75

slide-24
SLIDE 24

In the INAR(1) structure, we now assume the innovation process is time-varying, i.e., E(Zt) = λt. From a realization of the process y1, . . . , yT, we want to learn the distribution of each λt and represent our uncertainties about the future steps YT+1, . . . , YT+h in order to forecast them. We create clusters of innovation rates as a means to learn the latent patterns of heterogeneity in the count time series.

slide-25
SLIDE 25

DAG

τ G λ2 λ3 . . . λT −1 λT Y1 Y2 Y3 YT −1 YT . . . M2 M3 MT −1 α . . . MT

slide-26
SLIDE 26

Let y = (y1, . . . , yT) and m = (m2, . . . , mT). To obtain the posterior p(α, λ, m) we integrate out the random distribution P. From the parametric part in the graph, we have that: p(y, m, α, λ) =

  • p(y, m, α, λ | G) dµG(G)

= T

  • t=2

p(yt | mt, λt) p(mt | yt−1, α)

  • ×

π(α) ×

  • T
  • t=2

p(λt | G) dµG(G). The random vector (λ2, . . . , λT) has an exchangeable distribution.

slide-27
SLIDE 27

Therefore, the P´

  • lya-Blackwell-MacQueen urn process yiels the full

conditional distribution of λt as the mixture λt | all others ∼ w0 × Gamma(yt − mt + a(G0) , b(G0) + 1) +

  • r=t

λyt−mt

r

e−λr δ{λr}, in which w0 = τ·(b(G0)

)a(G0) Γ(yt−mt+a(G0) ) Γ(a(G0) )(b(G0) +1)yt −mt +a(G0)

and δ{λr} denotes a point mass at λr. Recall that the full conditional of λt is a combination of the joint prior p(λ2, . . . , λT) with p(yt | mt, λt). The weights in the expression above are not normalized.

slide-28
SLIDE 28

Choice of prior parameters

[Dorazio, 2009] choose the parameters a(τ) and b(τ)

  • f the τ prior

by minimizing the Kullback-Leibler divergence between the prior distribution of the number of clusters K and a uniform discrete distribution on a suitable range. The marginal probability function of K can be computed as π(k) = ∞ Pr{K = k | τ} π(τ) dτ = b(τ)

0 S(T − 1, k)

Γ(a(τ)

0 )

I(a(τ)

0 , b(τ) 0 ; k),

for k = 1, . . . , T − 1, in which I(a(τ)

0 , b(τ) 0 ; k) =

∞ τ k+a(τ)

−1 e−b(τ) τ Γ(τ)

Γ(τ + T − 1) dτ.

slide-29
SLIDE 29

Choice of prior parameters

Let q be the probability function of a uniform discrete distribution

  • n {1, . . . , T − 1}, that is

q(k) = 1 T − 1 I{1,...,T−1}(k), we find, by numerical integration and optimization, the values of a(τ) and b(τ) that minimize the Kullback-Leibler divergence KL[π q] =

T−1

  • k=1

q(k) log q(k) π(k)

  • .
slide-30
SLIDE 30

Choice of prior parameters

Similarly, we choose the hyperparameters a(G0) and b(G0)

  • f the

base probability density g0 minimizing the Kullback-Leibler divergence between g0 and a uniform distribution on a suitable range [0, λmax], where λmax is chosen by taking into consideration the available information on the studied phenomena.

slide-31
SLIDE 31

Choice of prior parameters

Let h be a uniform density on [0, λmax], that is h(λ) =

  • 1

λmax

  • I[0,λmax](λ),

we find, by numerical optimization, the values of a(G0) and b(G0) that minimize the Kullback-Leibler divergence KL[g0 h] = λmax

  • 1

λmax

  • log

1/λmax g0(λ)

= − log λmax − a(G0) log b(G0) + log Γ(a(G0) ) − (a(G0) − 1)(log λmax − 1) + b(G0) λmax 2 . Choosing the parameters for the α prior is more straightforward, with a(α) = b(α) = 1 being a natural choice.

slide-32
SLIDE 32

Pitman-Yor case

The full conditional of each λt is slightly modified: λt | all others ∼w∗

0 × Ga(yt − mt + a(G0)

(b(G0) + 1) +

  • i=t
  • 1 − σ

ni

  • λyt−mt

i

e−λiδ{λi}, w∗

0 = (τ+k\t σ)·(b(G0) )a(G0) Γ(yt−mt+a(G0) ) Γ(a(G0) )(b(G0) +1)yt −mt +a(G0)

.

slide-33
SLIDE 33

To improve efficiency, we remix the vector of distinct rates λ∗ after every step of the sampler [Escobar and West, 1998]. Let (λ∗

1, . . . , λ∗ k) be the k unique values among (λ2, . . . , λT).

Let ct = k

j=1 j · I{λ∗

j }(λt) be the cluster indicator of λt, and

define the number of occupants of cluster j by nj = T

t=2 I{j}(ct):

λ∗

j | all others ∼ Gamma

  a(G0) +

T

  • t=2

ct=j

(yt − mt), b(G0) + nj    . for j = 1, . . . , k.

slide-34
SLIDE 34

Also, the full conditionals for α and mt are: α | all others ∼ Beta

  • a(α)

+

T

  • t=2

mt, b(α) +

T

  • t=2

(yt−1 − mt)

  • .

p(mt | all others) ∝ 1 mt!(yt − mt)!(yt−1 − mt)!

  • α

λt(1 − α) mt I{0,1,...,min {yt−1,yt}}(mt). This Gibbs sampler yields, marginally, a sample {α(n), λ(n)}N

n=1 from the posterior distribution.

slide-35
SLIDE 35

The DP-INAR(1) Model

We extend [Freeland, 1998] original INAR(1) model. Proposition The probability function of Yt+h given Yt = yt and θ = (α, λt+1, . . . , λt+h), can be writen as the convolution of a Bin(yt, αh) distribution and a Poisson(µh) distribution. p(yt+h | yt, θ) =

min{yt,yt+h}

  • m=0

yt m

  • (αh)m(1 − αh)yt−m×
  • µyt+h−m

h

e−µh (yt+h − m)!

  • ,

in which µh = h

i=1 αh−iλt+i.

slide-36
SLIDE 36

  • lya-Blackwell-MacQueen urn

Using [Blackwell and MacQueen, 1973] urn process recursively, for n = 1 . . . , N, we draw a sample {λ(n)

T+1, . . . , λ(n) T+h}N n=1 from

h

i=1 p(λT+i | λ2, . . . , λT+i−1) sequentially as follows:

λ(n)

T+1 ∼

τ τ + T G0 + 1 τ + T

T

  • t=2

δ{λ(n)

t

};

λ(n)

T+2 ∼

τ τ + T + 1 G0 + 1 τ + T + 1

T+1

  • t=2

δ{λ(n)

t

};

. . . λ(n)

T+h ∼

τ τ + T + h − 1 G0 + 1 τ + T + h − 1

T+h−1

  • t=2

δ{λ(n)

t

}.

slide-37
SLIDE 37

DP-INAR(1) posterior predictive

Combining these elements, we approximate the integral representation of the h-steps-ahead posterior predictive probability function by the Monte Carlo average p(yT+h | y1, . . . , yT) ≈ 1 N

N

  • n=1

p(yT+h | yT, α(n), λ(n)

T+1, . . . , λ(n) T+h),

for yT+h ≥ 0. As a pointwise forecast, we compute the generalized median of the h-steps-ahead posterior predictive distribution, defined by ˆ yT+h = arg min

yT+h≥0

  • 0.5 −

yT+h

  • r=0

p(r | y1, . . . , yT)

  • .
slide-38
SLIDE 38

Monthly counts of burglary events

We analyze monthly time series of burglary events in Pittsburgh, USA, from January 1990 to December 2001. In this dataset, each time series has a length of 144 months and corresponds to a certain patrol area. The Figure below shows the series for patrol area 58.

10 20 30 40 50 100 150

Month

http://www.forecastingprinciples.com/index.php/crimedata

slide-39
SLIDE 39

Posterior distributions - AdINAR(1)

For the AdINAR(1) model hyperparameters, we make the choices aα = 1, bα = 1, aλ = 1, bλ = 0.01, aθ = 1, bθ = 1, aw = 1, and bw = 1, which correspond to reasonably flat priors. Figure 2 displays the marginal posterior distributions of the AdINAR(1) model parameters. The posterior distribution of the thinning parameter α is fairly concentrated, with posterior mean 0.31, showing that the autoregressive component is not negligible for this patrol area. The posterior mean of λ is 6.78, while the posterior mean of θ is 0.12. Also, the posterior distribution of w, with posterior mean 0.38, shows that the geometric component of the mixture has less weight for this patrol area.

slide-40
SLIDE 40

Posterior distributions - AdINAR(1)

0.000 0.025 0.050 0.075 0.100 0.1 0.2 0.3 0.4 0.5

α

0.000 0.025 0.050 0.075 0.100 0.05 0.10 0.15 0.20

θ

0.00 0.02 0.04 0.06 5 6 7 8

λ

0.000 0.025 0.050 0.075 0.2 0.4 0.6

w

0.000 0.025 0.050 0.075 0.100 0.1 0.2 0.3 0.4 0.5

α

0.000 0.025 0.050 0.075 0.100 0.05 0.10 0.15 0.20

θ

0.00 0.02 0.04 0.06 5 6 7 8

λ

0.000 0.025 0.050 0.075 0.2 0.4 0.6

w

slide-41
SLIDE 41

Posterior distributions - AdINAR(1)

slide-42
SLIDE 42

Posterior distributions - DP-INAR(1)

For the DP-INAR(1) model, we specify the hyperparameters as

  • follows. To determine aτ and bτ, the optimization procedure

described, with kmin = 1 and kmax = 143, yields aτ = 0.519 and bτ = 0.003. Note that these values of kmin and kmax correspond, within our scheme, to the most spread choice for the prior distribution of the number of clusters K. We control the support of G0 by choosing the value of λmax to be the maximum observed count. The level curves of KL[g0 h] are given below. The minimum is attained at a(G0) = 1.778 and b(G0) = 0.096. For the thinning parameter α, we adopt a uniform prior, choosing a(α) = b(α) = 1.

slide-43
SLIDE 43

a0 (G0) 1 2 3 4 5 6 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 b 0 (G0)

  • Figure: Level curves of the Kullback-Leibler divergence associated with

the optimization of the base measure hyperparameters for patrol area 58.

slide-44
SLIDE 44

Posterior distributions - DP-INAR(1)

The marginal posterior distributions of parameters α, λ3, λ18, and λ96 are displayed below. The posterior distribution of the thinning parameter α is reasonably concentrated, with posterior mean 0.19, showing that the autoregressive component is not negligible. The posterior distributions of λ3, λ18 and λ96 are fairly concentrated as well, with posterior means equal to 6.50, 13.61 and 32.01, respectively, showing that different regimes of innovation rates were captured in the learning process.

slide-45
SLIDE 45

Posterior distributions - DP-INAR(1)

0.00 0.02 0.04 0.06 0.08 0.0 0.1 0.2 0.3 0.4

α

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

λ3

0.00 0.05 0.10 0.15 0.20 10 20 30 40 50

λ18

0.00 0.02 0.04 0.06 0.08 10 20 30 40 50

λ96

0.00 0.02 0.04 0.06 0.08 0.0 0.1 0.2 0.3 0.4

α

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

λ3

0.00 0.05 0.10 0.15 0.20 10 20 30 40 50

λ18

0.00 0.02 0.04 0.06 0.08 10 20 30 40 50

λ96

Figure: Marginal posterior distributions of parameters α, λ3, λ18, and λ96, for patrol area 58.

slide-46
SLIDE 46

Posterior distributions - DP-INAR(1)

The posterior means of the innovation rates follow the same pattern of heterogeneity of the series.

10 20 30 50 100 150

Month

Figure: Posterior means of the innovation rates (in grey) and the

  • bserved counts for patrol area 58 (in red).
slide-47
SLIDE 47

The Markov chains in Figure 7 indicate that proper mixing is achieved by the Gibbs sampler.

Figure: Markov chains associated with the marginal posterior distributions of parameters α, λ3, λ18, and λ96, for patrol area 58. The gray rectangles indicate the burn-in periods.

slide-48
SLIDE 48

Cross-validation procedure for time series

We use a form of cross-validation to evaluate the forecasting performance of the model. For an observed time series y1, . . . , yT, pick some T ∗ < T:

1 Treat the counts yT ∗, . . . , yT as a test sample 2 For t ≥ T ∗,train the model using the values of y1, . . . , yt−1

making an h-steps-ahead prediction ˆ yt+h.

3 Compute the median deviation |ˆ

yt+h − yt+h|

4 Take the average over all predictions.

slide-49
SLIDE 49

Illustration of the 2-step ahead forecasting

time

slide-50
SLIDE 50

Forecasting performance

In terms of forecasting performance: The AdINAR(1) and the DP-INAR(1) models outperform the INAR(1) model in 75% of the patrol areas The AdINAR(1) model and the DP-INAR(1) model produce substantial relative gains in the mean absolute deviations, with the exception of five areas in which the INAR(1) performs better, but with smaller relative gains.

slide-51
SLIDE 51

In terms of learning K, we fixed τ such that the prior expected number of clusters is equal to {3, 8, 13}. The Dirichlet Process is highly influenced by the prior specification of τ. Robustness is achieved in the Pitman-Yor case specifying high values for σ. DP Pitman-Yor σ = 0 σ 0.25 0.50 0.75 E(K) = 3 4.67 5.76 6.41 7.44 E(K) = 8 9.37 8.37 7.89 7.31 E(K) = 13 13.93 10.76 9.25 7.15

slide-52
SLIDE 52

Future work

slide-53
SLIDE 53

Future work

Extend the DP-INAR(1) and PY-INAR(1) to higher order Markov process. Generalize these ideas for multivariate time series by using Hierarchical Dirichlet Processes. Incorporate covariates in the model.

slide-54
SLIDE 54

References I

Alzaid, A. and Al-Osh, M. (1987). First-order integer-valued autoregressive (inar(1)) processes. Journal of Time Series Analysis. Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via polya urn schemes. Annals of Statistics, 1(2). De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Pr¨ unster, I., and Ruggiero, M. (2013). Are gibbs-type priors the most natural generalization of the dirichlet process? IEEE transactions on pattern analysis and machine intelligence, 37(2):212–229.

slide-55
SLIDE 55

References II

Dorazio, R. M. (2009). On selecting a prior for the precision parameter of dirichlet process mixture models. Journal of Statistical Planning and Inference. Escobar, M. and West, M. (1998). Computing nonparametric hierarchical models. In Dey, D., M¨ uller, P., and Sinha, D., editors, Practical nonparametric and semiparametric Bayesian statistics, chapter 1, pages 1–22. Springer-Verlag. Freeland, R. K. (1998). Statistical analysis of discrete time series with application to the analysis of workers’ compensation data. PhD thesis, University of British Columbia, Vancouver.

slide-56
SLIDE 56

References III

McKenzie, E. (1985). Some simple models for discrete variate time series. JAWRA Journal of the American Water Resources Association, 21(4):645–650. Pitman, J. and Yor, M. (1997). The two-parameter poisson-dirichlet distribution derived from a stable subordinator. Annals of Probability, 25:855–900.