Dealing with unknown discontinuities in data and models Kerry Gallagher John Stephenson Chris Holmes

Discontinuities occur in both data and processes in the Earth and Environmental Sciences Spatial : faults, topography, lithology, phase, composition,… Temporal : climate, seismicity, tectonics,…

What is the appropriate question ? What was the signficance of the opening of the Aswan Dam ? 1400 1200 8 ) 3 x10 Nile discharge (m 1000 800 600 400 1860 1880 1900 1920 1940 1960 1980 Year (data from Cobb 1978)

When was the change ? ƒ(t) = μ 1 I( t ≤ t c ) + μ 2 I( t > t c ) 1400 1200 8 ) 3 x10 Nile discharge (m 1000 800 600 400 1860 1880 1900 1920 1940 1960 1980 Year (after Denison et al. 2002)

Data interpolation and prediction with discontinuities Standard methods may be too smooth ƒ(s) ƒ(s) Kriging model of synthetic step function Kriging model of synthetic model 1.2 0.8 Realisation of true data 1 0.6 True Function Kriging Fit (Gaussian) 0.8 0.4 0.6 0.2 Y Y 0.4 0 0.2 -0.2 Realisation of true data True Function 0 -0.4 Kriging Fit (Gaussian) -0.2 -0.6 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 X X s s

Need a method that can deal with an unknown number of discontinuities in unknown locations Partition Modelling 1D 2D Partition model of synthetic data 0.8 0.6 0.4 ƒ(x) 0.2 Y 0 -0.2 -0.4 True Function Kriging Fit (Gaussian) -0.6 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x X

Formulating a Partition Model How many discontinuities, where are they ? ƒ(X) Space partitioned into discrete regions Partitions defined by Voronoi tessellation Regression function, ƒ, specified within region Parameters: (c 1-N ,ƒ 1-N , N, σ 2 ) = θ X c 1 c 2 c 3 c 4 c 5 c 6 Voronoi Centres

Generating Partition Models Prediction ( ) ∫ = θ θ θ ( | ) ( | , ) | p y D p y D p D d Θ Posterior distribution Monte Carlo integration y = value to be predicted D = observed data θ = model parameters N 1 ∑ ≈ θ (θ i D) ( | ) ( | , ) p y D p y i D | p N = 1 Bayes’ Theorem i θ ∝ θ θ ( | ) ( | ) ( ) p D p D p Posterior Likelihood Prior Use Markov chain Monte Carlo (MCMC) to sample the posterior distribution, p( θ |D)

Sampling with (transdimensional) MCMC Initialise θ • Propose new θ ’ Iterate • Calculate likelihood with new θ ’ • Accept new θ ’ or retain current θ Acceptance Prior Likelihood Model Proposal criterion α ( θ , θ ’ ) = min 1, p( θ ’)p(D| θ ’ ) p( θ | θ ’) p( θ )p(D| θ) p( θ ’ | θ ) R |J| Jump Jacobian proposal Distribution of accepted models θ ~ p( θ |D)

Sampling Partition Models natural parsimony Likelihood Better data fit

1D partition models for data interpolation Atmospheric dust input to peat bogs Looking for common signature in multiple systems 8,850 yr 38,500 yr 45,500 yr Max. Like. Mean±95%C.I.

Partition Models – 2D example function

Partition Sampling – 2D single realisation Multiple realisations … ensemble average (smooth, but maintain discontinuities)

Partition Model Digital Elevation Model (DEM) example Raw ERS Sample Image Contour Plot of Partition Model 60 60 50 50 40 40 Pixels Pixels 30 30 20 20 10 10 10 20 30 40 50 60 10 20 30 40 50 60 Pixels Pixels Pixel Value

Partition Models Application to spatially variable physical processes and parameters Example from thermochronology

Thermochronology : data are sensitived to temperature history experience by host rock e.g. apatite fission track analysis p(D| θ) = ƒ(T(t), φ ) Likelihood is a non-linear function of unknown parameters at each location within each partition

Model partition distribution and thermal histories

The problem is to find (a) how to partition the samples in 2D (i) number of partitions (ii) location of the partitions (b) the distribution of thermal histories in each partition

Inferred partition distribution and thermal histories P 1 0.25 0.8 0.2 0.6 0.15 X 2 0.4 0.1 0.2 0.05 0 0 0.2 0.4 0.6 0.8 1 X 1 true inferred (Stephenson, Gallagher and Holmes 2006)

Summary • Partition models allow for unknown number of discontinuities with unknown geometry in variable dimensions • Bayesian approach deals with the problem in terms of probabilities…intuitive for model choice • Obtain probability distributions (partitions, model parameters, posterior predictions) • Bayesian approach is naturally parsimonious • Potential for self-adaptive/self regularising model parameterisation

Sampling Partition Models distribution on number of partitions 1 0.8 0.6 p(k) 0.4 0.2 0 1 2 3 4 5 6 No. of Partitions (k)

Traditionally, each sample is modelled independently.. ignores spatial relationships….

Traditionally, each sample is modelled independently.. ignores spatial relationships…. ..ideally want to group samples with common thermal history

Traditionally, each sample is modelled independently.. …but the spatial relationships may be unknown…

Recommend

More recommend