Diagnostics tools for space-time point processes Giada Adelfio and - - PowerPoint PPT Presentation

diagnostics tools for space time point processes
SMART_READER_LITE
LIVE PREVIEW

Diagnostics tools for space-time point processes Giada Adelfio and - - PowerPoint PPT Presentation

GRASPA CONFERENCE 2008 Intermediate Meeting of GRASPA-PRIN 2006-2008 on Environmental Statistics 27 and 28 March 2008 Diagnostics tools for space-time point processes Giada Adelfio and Marcello Chiodi Dipartimento di Scienze Statistiche e


slide-1
SLIDE 1
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

GRASPA CONFERENCE 2008

Intermediate Meeting of GRASPA-PRIN 2006-2008 on Environmental Statistics 27 and 28 March 2008

Diagnostics tools for space-time point processes

Giada Adelfio and Marcello Chiodi ∗

∗Dipartimento di Scienze Statistiche e Matematiche “Silvio Vianelli” (DSSM), Universit`

a di Palermo

slide-2
SLIDE 2
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Introduction

  • A new diagnostic approach is introduced; it is based on a transformed version of some second-order statistics

for general point processes.

  • Assume the existence of the conditional intensity function for each point in space and time.
slide-3
SLIDE 3
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Introduction

  • A new diagnostic approach is introduced; it is based on a transformed version of some second-order statistics

for general point processes.

  • Assume the existence of the conditional intensity function for each point in space and time.
  • For a more realistic interpretation of real phenomena: Self-similarity, long-range dependence and fractal

behavior are introduced, analyzing second-order moments of the underlying process that can give infor- mation about interdependence among events.

slide-4
SLIDE 4
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Conditional intensity function

The conditional intensity function of a space-time point process can be defined as: λ(t, x|Ht) = lim

∆t,∆x→0

E[N([t, t + ∆t) × [x, x + ∆x)|Ht)] |∆t∆x| where:

  • Ht is the space-time occurrence history of the process up to time t;
  • ∆t, ∆x are time and space infinitesimal increments;
  • E[N([t, t + ∆t) × [x, x + ∆x)|Ht)] is the history-dependent expected value of the number of events that
  • ccur in the volume {[t, t + ∆t) × [x, x + ∆x)}.
  • If the conditional intensity is independent of the history but dependent only on the current time and the

spatial locations this supplies a non-stationary Poisson process.

  • A constant conditional intensity provides a stationary Poisson process.
slide-5
SLIDE 5
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Second-order statistics

  • R/S Statistic (Mandelbrot, 1965) for long-range dependence in time: log-log plot of R/S versus intervals
  • f time has a constant slope (0 < H < 1, Hurst constant) (Clegg (2006)).
  • Spectrum: a point process is said to be long-range dependent if its spectral density obeys:

fN(ω) ∼ C|ω|θ as ω → 0, for some C > 0 and some real θ ∈ (−1, 0), with θ = 1 − 2H. A process with a high spectral density at small frequencies has the greater part of the variance concentrated at low frequency cycles.

  • Correlation Integral: when fractal objects are self-scaling in a statistical sense (parts of the whole fit the

whole in distribution) ⇒ D2 (correlation dimension): D2 = lim

δ→0

log C2(δ) log(δ) with C2(δ) the correlation integral: numbers of pairs of points with Euclidean distance less than δ. Given X1, X2, . . . , Xn on R2 its estimator is (Grassberger and Procaccia, 1983): ˆ C2(δ) = 2 n(n − 1)

n−1

  • i=1

n

  • j=i+1

I(|Xi − Xj| ≤ δ) (1)

slide-6
SLIDE 6
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Second-order residuals

For point processes, many methods designed to test whether the second-order properties of an observed point pattern are consistent with the stationary Poisson process have been proposed (Diggle (1983), Ripley (1976), Baddeley and Silverman (1984)). A better diagnostic tool: The intensities used in the thinning method to retain points (Schoenberg (2003)) are here used as weights in order to offset the inhomogeneity of the process. Residual process is defined just weighting second-order statistics by the inverse of the conditional intensity function (autocorrelation, K-function, spectrum, fractal dimension and R/S statistic). Main result: second-order statistics of Nw(·) behave as the ones of a homogeneous Poisson process.

slide-7
SLIDE 7
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Second-order residuals

For point processes, many methods designed to test whether the second-order properties of an observed point pattern are consistent with the stationary Poisson process have been proposed (Diggle (1983), Ripley (1976), Baddeley and Silverman (1984)). A better diagnostic tool: The intensities used in the thinning method to retain points (Schoenberg (2003)) are here used as weights in order to offset the inhomogeneity of the process. Residual process is defined just weighting second-order statistics by the inverse of the conditional intensity function (autocorrelation, K-function, spectrum, fractal dimension and R/S statistic). Main result: second-order statistics of Nw(·) behave as the ones of a homogeneous Poisson process.

  • The approach includes all the observed points rather than the only ones retained after the application of

a random thinning.

  • This method can be applied to processes of any dimension, provided that those statistics exist.
  • It gives to second-order statistics a primary role in the diagnostic aim, to interpret dependence and at-

tractive/inhibitive features of observed point patterns.

slide-8
SLIDE 8
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The weighted process

Let N a point process defined on S ∈ Rd, d ≥ 1, identified by λ(s|F) with respect to the filtration F on S, assumed bounded away from zero. Nw is a real valued random measure: for each S then Nw(S) =

  • S

1 λ∗(s)dN where

1 λ∗(s) = λmin λ(s) and such that there exists λmin ≤ min{λ(s); (s) ∈ S}.

Assumption: N is a simple space-time point process with Hi the past history up to ith event.

slide-9
SLIDE 9
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The weighted process

Let N a point process defined on S ∈ Rd, d ≥ 1, identified by λ(s|F) with respect to the filtration F on S, assumed bounded away from zero. Nw is a real valued random measure: for each S then Nw(S) =

  • S

1 λ∗(s)dN where

1 λ∗(s) = λmin λ(s) and such that there exists λmin ≤ min{λ(s); (s) ∈ S}.

Assumption: N is a simple space-time point process with Hi the past history up to ith event. By theoretical results (Adelfio and Schoenberg (2007)) the covariance is cov[Nw(si, si + δ), Nw(sj, sj + δ)] = 0 (2) Therefore the spectral density of the weighted process is: fNw(ω) = λmin 2π ⇒ that is the power spectrum of a Poisson process with constant rate λmin Moreover Nw is a martingale (see Adelfio and Schoenberg (2007)).

slide-10
SLIDE 10
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Weighted R/S statistic

Weighted version: Rw(t; d) = max

0≤u≤d[Nw(u + t) − Nw(t)] − u

d[Nw(t + d) − Nw(t)]− min

0≤u≤d[Nw(u + t) − Nw(t)] − u

d[Nw(t + d) − Nw(t)] and S2

w(t; d) = Nw(t + d) − Nw(t)

d − Nw(t + d) − Nw(t) d 2 then R/Sw = Rw(t; d) Sw(t; d) (3) ⇒ generalization of the Donsker’s functional theorem or following the martingale approach: as for homogenous Poisson processes (Mandelbrot, 1975) and for short-term dependence (3) converges in distribution to the range

  • f a Brownian bridge.
slide-11
SLIDE 11
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Weighted correlation integral

Weighted correlation integral for a point process defined on A ∈ R2 with measure |A| is: ˆ CW(δ) = 1 (λmin|A|)2

  • i

ωi

  • j=i

ωjI(|si − sj| ≤ δ) (4) with λ(s) the conditional intensity function defined with respect the filtration F on A, ωk = λmin

λ(sk), ∀k.

Weighted correlation integral for a time process N with realizations t1, t2, . . . , tn on [0, T] ∈ R: ˆ CW(δ) = 1 (λminT)2

n

  • i

ωi

n

  • j=i

ωjI(|ti − tj| ≤ δ) (5) with ωk = λmin

λ(tk), ∀k and λ(t) the conditional intensity function with respect the filtration Ht on [0, T].

⇒ as for a homogeneous Poisson process it is asymptotically normally distributed (Adelfio and Schoenberg, 2007).

slide-12
SLIDE 12
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Nonparametric estimation and residual analysis for seismic processes

  • General point processes for aggregated data:

– self-exciting point processes, to model events that are clustered together, – self-correcting processes when regularities are observed, e.g. the strain-release model (Daley and Vere- Jones, 2003) – A widely used model is ETAS model:

λ(x, y, t, m|Ht) = J(m)  µ(x, y) +

  • tj<t

g(t − tj)f(x − xj, y − yj|mj)   (6) ∗ xj, yj, tj, mj are the space-time-magnitude coordinates of the observed events up to time t, ∗ J(m) is the magnitude distribution; ∗ µ(x, y) is the spontaneous activity ∗ the triggered one is given by the product of the time and space (conditioned to magnitude) probability distributions. ∗ the background seismicity µ(x, y) is assumed stationary in time, while time triggering activity is represented by a non stationary Poisson process according to the modified Omori’s formula (Utsu, 1961): g(t − τ) = K (t − τ + c)p , with t > τ (7) with K a normalizing constant, c and p characteristic parameters of the seismic activity of the given region.

slide-13
SLIDE 13
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • The parametric models estimation suffers by many drawbacks:

– the definition of a reliable mathematical model from the geophysical theory – the sensitivity of statistical estimates to the composition of the sample, that is the space-time region under study. Many of the disadvantages of the parametric modelling can be avoided by making use of nonparametric techniques, such as kernel density methods (Silverman, 1986).

slide-14
SLIDE 14
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • We use second-order residuals also to improve the estimation procedure of a generalized version of ETAS

model.

  • This model is estimated by using nonparametric methods for earthquakes occurring in the east coast of

Japan from 1926 to 1995, with magnitude equal or larger than 4.5, to yield a more flexible estimate in presence of several data for which an obvious discrimination between principal and secondary events is not available.

  • This estimate is obtained through interpolation and smoothing procedures based on kernel estimators

(Silverman, 1986).

  • We provide the estimation of the intensity function of a inhomogeneous Poisson process identified by a

space-time Gaussian kernel density, assuming independence between spatial and temporal components (as in (6)): λ(t, x, y) = f(x, y)g(t) with f and g spatial and temporal densities, respectively, estimated by kernel estimators.

slide-15
SLIDE 15
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 1: Epicenters of earthquakes occurred in the east coast of the Tohoku District - Japan (fractal dimension 1.7), in the region defined by 36◦ ∼ 42◦ N and 141◦ ∼ 145◦ E for all depth and for the time span 1926-1995, with magnitude equal or larger than 4.5. (Red-stars: m > 7.5; Blue-stars: m = 7.5)

slide-16
SLIDE 16
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Models definition

  • Model a: nonparametric ETAS model with constant bandwidth value estimated by following the Silver-

man’s rule, that for the one-dimensional case Silverman (1986) is: hopt = 1.06An−1/5 (8) with A = min{standard deviation, range-interquartile/1.34} , which optimizes the estimator asymptotic behavior in terms of mean integrated square error and provides valid results on a wide range of distributions.

  • Model b: nonparametric ETAS model with variable bandwidth values; according to this procedure, cal-

culate the bandwidth hj = (hj

x, hj y, hj t) for each event as the radius of the smallest circle centered at the

location of the jth event (xj, yj, tj) that includes at least np other events

slide-17
SLIDE 17
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 2: Weighted periodogram for constant bandwidth model (a), with 95%-bounds of a homogeneous Poisson process (dotted lines).

  • First a model with constant bandwidth values in space and time (model a) is estimated and its weighted

spectral density (fig. 2) is analyzed.

  • Because of the strong clustering nature of data, model a tends to smooth them out more than expected.
  • To account for this residual correlation, a nonparametric model with variable bandwidth values (model b;

see Zhuang et al. (2002)) is estimated.

slide-18
SLIDE 18
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 3: Weighted periodogram for varying bandwidth model (b), with 95%-bounds of a homogeneous Poisson process (dotted lines).

⇒ The introduction of variable bandwidth values (model b) seems to improve the fitting in time, accounting for the residual correlation (due to the presence of aftershocks) not taken into account by model a; ⇒ For model b, the estimated weighted spectral density behaves as the one of a temporal homogeneous Poisson process.

slide-19
SLIDE 19
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

⇒ For model b, the weighted correlation dimension estimated for model b, is not significatively different from 2, that is the expected value of the correlation dimension for a homogeneous Poisson process in space. Such behavior is not observed for the model with constant bandwidth values. ⇒ If events are strongly correlated, the introduction of a variable smoothing, accounting for the high production

  • f offsprings, seems to be preferable.
slide-20
SLIDE 20
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

⇒ For model b, the weighted correlation dimension estimated for model b, is not significatively different from 2, that is the expected value of the correlation dimension for a homogeneous Poisson process in space. Such behavior is not observed for the model with constant bandwidth values. ⇒ If events are strongly correlated, the introduction of a variable smoothing, accounting for the high production

  • f offsprings, seems to be preferable.

⇒ Analyze graphical departures from the parametric fitting of ETAS model.

Figure 4: Weighted periodogram for ETAS model, with 95%-bounds of a homogeneous Poisson process (dotted lines).

slide-21
SLIDE 21
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 5: Log-scale of space-time variable kernel density, ETAS intensity and their ratio, for points following the mainshock of 1968

slide-22
SLIDE 22
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

⇒ the variable kernel estimate provides a more realistic description of the seismic activity of the studied area.

  • Some differences with the parametric fitting are still observed.
  • Problems of the parametric fitting and how to improve its ability in the description of data:

– the use of constant p-values of (7) in space, that induces a too quick decay of the intensity function in correspondence of events that are strongly clustered in time; – or the use of a constant K-value of (7). Indeed, because of the strong clustering nature of data, the use of an higher value of K that account for the high production of offsprings in those cases could be preferable. ⇒ Very complex model!!

slide-23
SLIDE 23
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Gaps identification

A seismic gap can be defined as a segment of an active geologic fault that has not produced seismic events for an unusually long time; gaps are often considered susceptible to future strong earthquakes occurrence and therefore their identification may be useful for predictive purposes.

  • By a graphical approach, we compare the ETAS model, estimated by ML with the kernel estimate of the

seismic activity of the South Tyrrhenian Sea from 1981 to 2005 (fig. 7 on the left, for latitude-time space), by considering their ratio (fig. 7 on the right).

  • Regions with ratio values smaller than one are identified by a brighter grey, while regions with a ratio

larger than 1 are identified by a darker grey: darker areas indicate that the observed seismicity is smaller than those calculated by the estimated space-time ETAS model.

slide-24
SLIDE 24
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 6: Epicenters of earthquakes occurred in the South Tyrrhenian Sea from 1981 to 2005, in the region defined by 37.9◦ ∼ 39.31◦ N and 11.52◦ ∼ 16◦ E for all depth and magnitude (Red-stars: m ≥ 5.5)

slide-25
SLIDE 25
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 7: Nonparametric intensity estimate of the seismic activity of the South Tyrrhenian Sea from 1981 to 2005 (on the left) and ratio between the parametric (ETAS) and nonparametric ones (on the right)

slide-26
SLIDE 26
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

⇒Darker grey area around the source region before large earthquakes that induced a big sequence of events may be observed. ⇒ It is just a starting analysis to deal with this kind of issue: it needs further data to be available and the application of more rigorous methodology. ⇒ On the other hand, we think that it may provide a useful starting-point for studies aimed to seismic hazard evaluation.

slide-27
SLIDE 27
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 8: Weighted periodogram for ETAS-model and kernel-model, with 95%-bounds of a homogeneous Poisson process (dotted lines).

slide-28
SLIDE 28
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Conclusions

  • The proposed method for residual analysis in point processes is an useful tool for the comprehension of

point patterns properties, when attraction or dependence features are present.

  • The method, in comparison with others, has the advantage to analyze features of data without requiring

a random depletion of data or a rescaling along a fixed domain.

  • Moreover, the method incorporates the use of the second-order statistics of point processes, allowing an

immediate interpretation of attractive or repulsive characteristics of observed point patterns.

  • It must be emphasized that these techniques should be applicable for spatial-temporal point processes,

processes in higher dimensions, and processes in more general metric spaces.

  • Another important direction for future work would be a comparison between the various residual second-
  • rder statistics proposed here as well as comparison with other methods of residual analysis such as the

first-order residuals of Baddeley et al. (2000) or the second-order statistics of the residual processes such as those in Schoenberg (2003).

slide-29
SLIDE 29
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

References

Adelfio, G. and Schoenberg, F. P. (2007). Point process diagnostics based on weighted second-order statistics and their asymptotic properties. Annals of the Institute of Statistical Mathematics, pages –. Baddeley, A., Møller, J., and Waagepetersen, R. (2000). Non and semi-parametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica, 54(3), 329–350. Baddeley, A., Turner, R., Møller, J., and Hazelton, M. (2005). Residual analysis for spatial point processes. Journal of the Royal Statistical Society, Series B, 67(5), 617–666. Baddeley, A. J. and Silverman, B. W. (1984). A cautionary example on the use of second-order methods for analyzing point patterns. Biometrics, 40, 1089–1093. Clegg, R. (2006). A practical guide to measuring the hurst parameter. International Journal of Simulation: Systems, Science and Technology, 7(2), 3–14. Daley, D. J. and Vere-Jones, D. (2003). An introduction to the theory of point processes. New York: Springer- Verlag, second edition. Diggle, P. J. (1983). Statistical analysis of spatial point patterns. Academic Press Inc., London.

slide-30
SLIDE 30
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Georgii, H. (1976). Canonical and grand canonical gibbs states for continuum systems. Communications of Mathematical Physics, 48, 31–51. Grassberger, P. and Procaccia, I. (1983). Measuring the strangeness of a strange attractor. Physica D, 9, 189–208. Mandelbrot, B. B. (1965). Self-similar error clusters in communication systems and the concept of conditional

  • stationarity. IEEE Transactions on Communication Technology, COM-13, 71–90.

Mandelbrot, B. B. (1975). Limit theorems on the self-normalized range for weakly and strongly dependent

  • processes. Zeitschrift ftr. Wahrscheinlichkeitstheorie. und Verwandte Gebiete, 31, 271–285.

Meyer, P. (1971). D´ emonstration simplif´ ee d’un th´ eor` eme de knight. Lecture Notes in Mathematics, 191, 191–195. Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83(401), 9–27. Ogata, Y. (1998). Space-time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics, 50(2), 379–402.

slide-31
SLIDE 31
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Ripley, B. D. (1976). The second-order analysis of stationary point processes. Journal of Applied Probability, 13(2), 255–266. Schoenberg, F. P. (2003). Multi-dimensional residual analysis of point process models for earthquake occur-

  • rences. Journal American Statistical Association, 98(464), 789–795.

Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London. Zhuang, J. (2006). Second-order residual anaysis of spatio-timporal point processes and applications in model

  • evaluation. Journal of the Royal Statistical Society, Series B, 68(4), 635–653.

Zhuang, J., Ogata, Y., and Vere-Jones, D. (2002). Stochastic declustering of space-time earthquake occurrences. Journal of the American Statistical Association, 97(458), 369–379.