Emission tomography and Bayesian inverse problems Peter Green - - PowerPoint PPT Presentation

emission tomography and bayesian inverse problems
SMART_READER_LITE
LIVE PREVIEW

Emission tomography and Bayesian inverse problems Peter Green - - PowerPoint PPT Presentation

Emission tomography and Bayesian inverse problems Peter Green University of Bristol and UTS, Sydney Bernoulli Lecture 12 July 2012, Istanbul Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 1 / 70 Outline The


slide-1
SLIDE 1

Emission tomography and Bayesian inverse problems

Peter Green

University of Bristol and UTS, Sydney

Bernoulli Lecture 12 July 2012, Istanbul

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 1 / 70

slide-2
SLIDE 2

Outline

1

The Bernoullis

2

Inverse problems from a Bayesian perspective

3

Single-photon emission computed tomography

4

Concentration and approximation of posterior in non-regular inverse problems

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 2 / 70

slide-3
SLIDE 3

The Bernoullis Family

The Bernoulli family

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 3 / 70

slide-4
SLIDE 4

The Bernoullis Family

The Bernoulli family and the Bernoulli Society

The Bernoulli Society was formed in 1975 out of the International Association for Statistics in Physical Sciences, an autonomous section

  • f the ISI, the International Statistical Institute.

.. not out of an ISI section on Mathematical Statistics!

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 4 / 70

slide-5
SLIDE 5

The Bernoullis Family

The Bernoulli family

Jacob: Ars Conjectandi – Law

  • f large numbers.

e Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 5 / 70

slide-6
SLIDE 6

The Bernoullis Family

The Bernoulli family

Johann: Calculus & li i & application to physical problems Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 5 / 70

slide-7
SLIDE 7

The Bernoullis Family

The Bernoulli family

Daniel: fluid mechanics, St , Petersburg paradox Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 5 / 70

slide-8
SLIDE 8

The Bernoullis Family

The Bernoulli family

Hesse: The Glass Bead Game Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 5 / 70

slide-9
SLIDE 9

The Bernoullis Inverse problems

Principia (1687)

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 6 / 70

slide-10
SLIDE 10

The Bernoullis Inverse problems

Central force motion

In edition 1 of the Principia (1687), Newton solved the direct problem of central forces: he proved that if a body follows a curve that is a conic section, it must be subject to an inverse-square law

  • f force towards a focus of the curve.

(This is proved by an essentially geometric argument, starting from a proof that Kepler’s area law holds if and only if the force is central.)

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 7 / 70

slide-11
SLIDE 11

The Bernoullis Inverse problems

The inverse problem of central forces

Newton also asserts the converse, the solution to the inverse problem

  • f central forces: a body subject to

an inverse-square law of force towards a fixed point traverses a conic section with focus at that point. But his 1687 explanation was incomplete!

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 8 / 70

slide-12
SLIDE 12

The Bernoullis Inverse problems

The inverse problem of central forces

In 1709, in time for the second edition, he had filled in the details – with an argument based on uniqueness

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 9 / 70

slide-13
SLIDE 13

The Bernoullis Inverse problems

The inverse problem of central forces

Johann Bernoulli took Leibniz’s side in the notorious controversy with Newton over priority in the calculus – and more importantly on methods of proof, the role of models, and the relationship between calculus and dynamics. By the time of Newton’s second edition, Johann had published (1710) two different and complete proofs. Newton’s corollary was “a supposition not a demonstration”.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 10 / 70

slide-14
SLIDE 14

The Bernoullis Inverse problems

The inverse problem of central forces

Johann Bernoulli took Leibniz’s side in the notorious controversy with Newton over priority in the calculus – and more importantly on methods of proof, the role of models, and the relationship between calculus and dynamics. By the time of Newton’s second edition, Johann had published (1710) two different and complete proofs. Newton’s corollary was “a supposition not a demonstration”.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 10 / 70

slide-15
SLIDE 15

Inverse problems from a Bayesian perspective Basic formulation

Linear inverse problem with exact data

In modern language, the linear inverse problem is to solve y = Ax for x given data y; typically the problem is ill-posed: the columns of A are not linearly independent. Given y, there is an infinite number of solutions; to specify a particular solution, it is common to take a minimum norm approach: x† = argminAx=y||x − x0||B, Conventionally x0 = 0 and B = I, then x† = A†y, where A† is the Moore-Penrose inverse of A.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 11 / 70

slide-16
SLIDE 16

Inverse problems from a Bayesian perspective Basic formulation

Linear inverse problem with exact data

In modern language, the linear inverse problem is to solve y = Ax for x given data y; typically the problem is ill-posed: the columns of A are not linearly independent. Given y, there is an infinite number of solutions; to specify a particular solution, it is common to take a minimum norm approach: x† = argminAx=y||x − x0||B, Conventionally x0 = 0 and B = I, then x† = A†y, where A† is the Moore-Penrose inverse of A.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 11 / 70

slide-17
SLIDE 17

Inverse problems from a Bayesian perspective Basic formulation

Linear inverse problem with noise

More generally, to specify a particular solution: x⋆ = argminAx=ypen(x), In practice, observe y with error; a statistical approach assumes a distribution p(y|x): ˆ x = argmin{− log p(y|x) + νpen(x)}. Can be interpreted as a MAP (maximum a posteriori) estimate of posterior distribution with prior density ∝ exp{−νpen(x)}: p(x|y) = p(y|x)p(x) p(y) . A full Bayesian view allows also assessment of uncertainty.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 12 / 70

slide-18
SLIDE 18

Inverse problems from a Bayesian perspective Basic formulation

Linear inverse problem with noise

More generally, to specify a particular solution: x⋆ = argminAx=ypen(x), In practice, observe y with error; a statistical approach assumes a distribution p(y|x): ˆ x = argmin{− log p(y|x) + νpen(x)}. Can be interpreted as a MAP (maximum a posteriori) estimate of posterior distribution with prior density ∝ exp{−νpen(x)}: p(x|y) = p(y|x)p(x) p(y) . A full Bayesian view allows also assessment of uncertainty.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 12 / 70

slide-19
SLIDE 19

Inverse problems from a Bayesian perspective Basic formulation

Why take a Bayesian view?

Imposing smoothness or other ‘regular’ behaviour of the solution to an inverse problem, is based on a prior assumption on the unknown x, known or assumed prior to the data being collected – using this accepts that the required solution combines data with this prior information. Formulating a statistical inverse problem as one of inference in a Bayesian model has great appeal, notably for what this brings in terms

  • f

coherence, the interpretability of regularisation penalties, the integration of all uncertainties, and because it supports principled elaboration of the model (e.g. allowing measurement error, indirect observation).

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 13 / 70

slide-20
SLIDE 20

Inverse problems from a Bayesian perspective Basic formulation

Why take a Bayesian view?

Imposing smoothness or other ‘regular’ behaviour of the solution to an inverse problem, is based on a prior assumption on the unknown x, known or assumed prior to the data being collected – using this accepts that the required solution combines data with this prior information. Formulating a statistical inverse problem as one of inference in a Bayesian model has great appeal, notably for what this brings in terms

  • f

coherence, the interpretability of regularisation penalties, the integration of all uncertainties, and because it supports principled elaboration of the model (e.g. allowing measurement error, indirect observation).

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 13 / 70

slide-21
SLIDE 21

Inverse problems from a Bayesian perspective Basic formulation

Why take a Bayesian view?

The Bayesian formulation comes close to the way that most scientists intuitively regard the inferential task, and in principle allows the free use of subject knowledge in probabilistic model building. Mathematical analysis of inverse problems is usually focussed on how well the true solution can be recovered, in the presence of noise – in a Bayesian analysis we correspondingly focus on the behaviour of the posterior distribution.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 14 / 70

slide-22
SLIDE 22

Inverse problems from a Bayesian perspective Basic formulation

Why take a Bayesian view?

The Bayesian formulation comes close to the way that most scientists intuitively regard the inferential task, and in principle allows the free use of subject knowledge in probabilistic model building. Mathematical analysis of inverse problems is usually focussed on how well the true solution can be recovered, in the presence of noise – in a Bayesian analysis we correspondingly focus on the behaviour of the posterior distribution.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 14 / 70

slide-23
SLIDE 23

Inverse problems from a Bayesian perspective Applications

Cell microscopy & segmentation

Al-Awadhi, Jennison and Hurn

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 15 / 70

slide-24
SLIDE 24

Inverse problems from a Bayesian perspective Applications

Electrical impedance tomography

E i t l t Experimental set-up, MAP reconstruction, posterior mean and posterior mean and variance, under circular inclusions prior

Colin Fox and Geoff Nicholls

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 16 / 70

slide-25
SLIDE 25

Inverse problems from a Bayesian perspective Applications

Deconvolution of Chandra X-ray images

Overlay of the Hubble optical image first with the raw Chandra data and second with the posterior mean reconstruction, highlighting black holes. van Dyk

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 17 / 70

slide-26
SLIDE 26

Inverse problems from a Bayesian perspective Applications

Remote sensing

Josiane Zerubia, Xavier Descombes, C. Lacoste, M. Ortner, R. Stoica

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 18 / 70

slide-27
SLIDE 27

Inverse problems from a Bayesian perspective Applications

Ocean circulation from tracer data

data Ocean circulation from tracer data

Posterior mean for advection field field superimposed

  • n reconstructed
  • xygen and
  • xygen and

salinity concentrations (left) compared (left) compared to interpolated data McKeague, Nicholls, Speer and Herbei, J. Marine Research, 2005

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 19 / 70

slide-28
SLIDE 28

Inverse problems from a Bayesian perspective Theory

Some recent themes in inverse problems theory

ℓ1 penalisation for sparse high dimensional linear inverse problems, Candes & Tao (2007), Bickel, Ritov & Tsybakov (2009) Cavalier & Tsybakov (2002) – sharp adaptation Hofinger and Pikkarainen (2007, 2009) – convergence of posterior distribution in linear inverse problems, via Ky Fan metric Lasanen (2007, 2012) – Bayesian formalism for infinite dimensional inverse problems (more generally, on Banach spaces) Knapik, van der Vaart, van Zanten (2011) – Bayesian inverse problems with Gaussian priors; Recovery of initial conditions for the heat equation (Today at 3pm!) Pokern, Stuart, van Zanten (2012) – drift estimation for sde’s Agapiou, Larsson, Stuart (2012) – Bayesian nonparametric linear inverse problems in a separable Hilbert space

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 20 / 70

slide-29
SLIDE 29

Single-photon emission computed tomography Principles

SPECT

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 21 / 70

slide-30
SLIDE 30

Single-photon emission computed tomography Principles

SPECT

SPECT is a medical imaging technique for creating a 3-dimensional visualisation of the pattern of concentration of a tracer of interest within the human body. As such, it images ‘function’ not ‘form’. It is complementary to other techniques such as CT and MRI/fMRI; useful for specific studies, and comparatively very cheap. The patient is injected with/ingests/inhales a tracer whose pattern of uptake within body tissue has a known relationship to physiological

  • function. The tracer has been radioactively labelled. Some of the

photons subsequently emitted are detected in a gamma camera for subsequent processing and interpretation.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 22 / 70

slide-31
SLIDE 31

Single-photon emission computed tomography Principles

SPECT

SPECT is a medical imaging technique for creating a 3-dimensional visualisation of the pattern of concentration of a tracer of interest within the human body. As such, it images ‘function’ not ‘form’. It is complementary to other techniques such as CT and MRI/fMRI; useful for specific studies, and comparatively very cheap. The patient is injected with/ingests/inhales a tracer whose pattern of uptake within body tissue has a known relationship to physiological

  • function. The tracer has been radioactively labelled. Some of the

photons subsequently emitted are detected in a gamma camera for subsequent processing and interpretation.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 22 / 70

slide-32
SLIDE 32

Single-photon emission computed tomography Principles

SPECT

50 Journal of the American Statistical Association, March 1997

x,y,z

position/pulse-height module photomultiplier tubes scintillation crystal

collimato

r ,JI'1111111111EIII. patient's head

3

4 56

Figure 1. Gamma Camera and Photon Interactions. (1) A photon rejected by the collimator; (2) a direct count; (3) a scattered undetected count; (4) an absorbed photon; (5) a scattered detected count; (6) an undetected count.

{ats} depend on various factors: the geometry of the detec- tion system, the strength of the isotope and exposure time, and the extent of attenuation and scattering between source and detector. Weir and Green (1994) discussed and com- pared two methods of constructing these weights; the first method uses simple geometric and physical arguments, and the second is an extension of the empirical model of Geman, Manbeck, and McClure (1991). The first method is found to be preferable and is used here. The weight modeling aims to be sufficiently accurate for reasonable reconstruction of x, without the need for auxiliary transmission experiments. The values involve only known physical constants and eas- ily measured dimensions, and they do not depend on the data obtained during the scanning of the patient. It is as- sumed that the rate of elimination of the radiopharmaceu- tical from the organ is negligible within the duration of the data acquisition and that scattering can be neglected. Each ats term is assumed to be the product of three factors:

  • 1. the proportion of radioactivity that has not decayed by

the time at which photons are collected in detector t

  • 2. the proportion of emissions that survive attenuation
  • 3. the solid angle of view of the center of pixel s into

detector t, which is treated as a cylindrical tube of known length and radius. Attenuation is assumed to be at a constant rate per unit distance within an idealised elliptical boundary represent- ing the patient's body. The dimensions of the ellipse can be easily measured from the patient, or estimated from a trial reconstruction from the data, presuming that the body

  • utline can be readily distinguished. In my experience, five

EM iterations are sufficient to determine the approximate center and length of axes of the ellipse. The joint probability function of the data given the iso-

tope concentration is given by p(ylx) = I exp(- Zs atsxs)(Z8

atsxs)Yt (1

and Shepp and Vardi (1982) proposed using the EM al- gorithm for obtaining its maximum likelihood estima- tor (MLE) solution. However, MLE solutions have the unattractive feature

  • f being very noisy in appearance

be- cause the reconstruction problem is ill-posed. To overcome this, Vardi, Shepp, and Kaufman (1985) suggested using a slightly smoothed version

  • f the MLE
  • r running

a limited number

  • f EM

iterations from a uniform start. With the lat- ter approach, the progression from smooth to noisy can be stopped at a point that yields a satisfactory reconstruction. Bayesian reconstruction from SPECT data was proposed by Geman and McClure (1985), following the general paradigm

  • f Bayesian

image analysis introduced by Be- sag (1986) and Geman and Geman (1984). This involves constructing a probability distribution

  • n the space
  • f true

patterns

  • f isotope

concentration. Geman and McClure pro- posed a pairwise difference Gibbs prior with a potential function that incorporates a smoothing and scale parameter. Reconstruction can then be accomplished by considering the posterior distribution that follows from Bayes's theo- rem on the two model components. Various reconstruction methods have been used. Geman and McClure [1985, 1987, 1991 (with Manbeck)] used iterated conditional modes (ICM) and Markov chain Monte Carlo (MCMC) methods, and Green (1990) proposed the one-step-late (OSL) algo- rithm as an adaptation

  • f the EM algorithm,

which is im- practical for posterior maximization and MCMC methods (1996). Regardless

  • f the method

used for reconstruction, a Bayesian method could be fatally misleading with a bad choice

  • f prior.

A simple approach to choosing appropriate prior parameter values is by experimentation using train- ing data

  • sets. In earlier

work (Weir 1993), I used simulated training sets from a suitably designed phantom and found the transition from simulated to real data effortless. A case can be made for using fixed parameters in clinical prac- Camera bins Gamma camera T

Caer is Roae

Thog

Ange_

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 23 / 70

slide-33
SLIDE 33

Single-photon emission computed tomography Principles

Cartoon of SPECT data acquisition

Sinogram: raw data from a single slice, SPECT scan of pelvis

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 24 / 70

slide-34
SLIDE 34

Single-photon emission computed tomography Principles

Cartoon of SPECT data acquisition

Sinogram: raw data from a single slice, SPECT scan of pelvis

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 24 / 70

slide-35
SLIDE 35

Single-photon emission computed tomography Principles

Cartoon of SPECT data acquisition

Sinogram: raw data from a single slice, SPECT scan of pelvis

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 24 / 70

slide-36
SLIDE 36

Single-photon emission computed tomography Principles

Cartoon of SPECT data acquisition

Sinogram: raw data from a single slice, SPECT scan of pelvis

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 24 / 70

slide-37
SLIDE 37

Single-photon emission computed tomography Modelling

Statistical modelling of SPECT

The Poisson linear model y ∼ Poisson(Ax) (componentwise, independently) is close to reality (there are some dead-time effects and other artifacts in recording). x represents the spatial distribution of the isotope in question, typically discretised on a square/cubic grid, x = {xj}, y the array of detected photons, also discretised y = {yi} by the recording process, the array A = (aij) - discrete Radon transform - quantifies the emission, transmission, attenuation, decay and recording process; aij is the mean number of photons recorded at i per unit concentration at pixel/voxel j.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 25 / 70

slide-38
SLIDE 38

Single-photon emission computed tomography Modelling

Bayesian analysis of SPECT

The reconstruction problem is to infer the pattern of concentration x for the array of detected photon counts y; a statistical inverse problem, linear, but with (approximately) Poisson variation. Earlier work: Geman & McClure (Proc ASA, 1985); PJG (JRSSB, 1990; IEEE-TMI, 1990); PJG & Weir (J. Appl. Stat., 1994); Weir (JASA, 1997) discussed: modelling of SPECT – particularly building the matrix A from physical, geometric and data-analytic considerations Bayesian methods of reconstruction, using both EM- and sampling-based computational methods assessment on simulated data and real gamma camera images of both physical phantoms and actual patients.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 26 / 70

slide-39
SLIDE 39

Single-photon emission computed tomography Modelling

Bayesian analysis of SPECT

The reconstruction problem is to infer the pattern of concentration x for the array of detected photon counts y; a statistical inverse problem, linear, but with (approximately) Poisson variation. Earlier work: Geman & McClure (Proc ASA, 1985); PJG (JRSSB, 1990; IEEE-TMI, 1990); PJG & Weir (J. Appl. Stat., 1994); Weir (JASA, 1997) discussed: modelling of SPECT – particularly building the matrix A from physical, geometric and data-analytic considerations Bayesian methods of reconstruction, using both EM- and sampling-based computational methods assessment on simulated data and real gamma camera images of both physical phantoms and actual patients.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 26 / 70

slide-40
SLIDE 40

Single-photon emission computed tomography Modelling

Pelvis data

Sinogram: raw data from a single slice, SPECT scan of pelvis

10 20 30 40 50 60 10 20 30 40 50 angle detector Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 27 / 70

slide-41
SLIDE 41

Single-photon emission computed tomography Modelling

Ill-posed character of SPECT

matrix A: Single slice, SPECT scan

  • f pelvis

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 28 / 70

slide-42
SLIDE 42

Single-photon emission computed tomography Modelling

Eigenvalues

Single slice, SPECT scan of pelvis. Small scale problem: 24 × 24 body space, 26 × 32 detector space, Gaussian approximation

  • 100

200 300 400 500 −3 −2 −1 1 log10(eigenvalues)

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 29 / 70

slide-43
SLIDE 43

Single-photon emission computed tomography Reconstruction

Bayesian SPECT reconstruction

In practical work, we have typically used non-Gaussian pairwise-interaction Markov random field priors: p(x) ∝ exp

  • −βδ(1 + δ)
  • s∼s′

log cosh((xs − xs′)/δ)

  • This has attractive properties

log-convex penalises less reconstructions x with physically-realistic abrupt boundaries (e.g. between tissue types), which are smoothed over with Gaussian priors bridges Gaussian (δ → ∞) and Laplace (δ → 0) Mrf’s and β and δ can also be inferred.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 30 / 70

slide-44
SLIDE 44

Single-photon emission computed tomography Reconstruction

Bayesian SPECT reconstruction

Some demonstration reconstructions... approx mle

approx mle Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 31 / 70

slide-45
SLIDE 45

Single-photon emission computed tomography Reconstruction

Bayesian SPECT reconstruction

Some demonstration reconstructions... approx MAP using log cosh prior

  • sl iteration 30

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 32 / 70

slide-46
SLIDE 46

Single-photon emission computed tomography Reconstruction

Bayesian SPECT reconstruction

Some demonstration reconstructions... approx MAP using log cosh prior, finer body-space resolution

  • sl iteration 30

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 33 / 70

slide-47
SLIDE 47

Single-photon emission computed tomography Current practice

Current practice

Standard clinical gamma-camera systems still supplied with (non-probabilistic) filtered back-projection algorithms Smoothed/penalised EM methods also now supplied as standard, increasingly used given attenuation information (e.g. from CT) In PET, EM (in Hudson’s ‘ordered subsets’ variant) is standard, due to improved noise characteristics in low count regions Interest in 4D problems that incorporate motion or tracer kinetics Modelling approaches have particular advantage over FBP with low counts and/or irregular acquisition protocols In all these areas, algorithms rely on OSL to implement penalised/Bayesian methods. See Qi & Leahy, 2006; Hutton, 2011.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 34 / 70

slide-48
SLIDE 48

Single-photon emission computed tomography Current practice

Current practice

Standard clinical gamma-camera systems still supplied with (non-probabilistic) filtered back-projection algorithms Smoothed/penalised EM methods also now supplied as standard, increasingly used given attenuation information (e.g. from CT) In PET, EM (in Hudson’s ‘ordered subsets’ variant) is standard, due to improved noise characteristics in low count regions Interest in 4D problems that incorporate motion or tracer kinetics Modelling approaches have particular advantage over FBP with low counts and/or irregular acquisition protocols In all these areas, algorithms rely on OSL to implement penalised/Bayesian methods. See Qi & Leahy, 2006; Hutton, 2011.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 34 / 70

slide-49
SLIDE 49

Single-photon emission computed tomography Current practice

Current practice

Standard clinical gamma-camera systems still supplied with (non-probabilistic) filtered back-projection algorithms Smoothed/penalised EM methods also now supplied as standard, increasingly used given attenuation information (e.g. from CT) In PET, EM (in Hudson’s ‘ordered subsets’ variant) is standard, due to improved noise characteristics in low count regions Interest in 4D problems that incorporate motion or tracer kinetics Modelling approaches have particular advantage over FBP with low counts and/or irregular acquisition protocols In all these areas, algorithms rely on OSL to implement penalised/Bayesian methods. See Qi & Leahy, 2006; Hutton, 2011.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 34 / 70

slide-50
SLIDE 50

Single-photon emission computed tomography Current practice

SPECT-CT

Buck et al, 2008.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 35 / 70

slide-51
SLIDE 51

Single-photon emission computed tomography Current practice

SPECT-MRI phantom study

Hutton, 2011.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 36 / 70

slide-52
SLIDE 52

Concentration and approximation of posterior Framework

Theoretical analysis of SPECT

This section is joint work with Natalia Bochkina (Edinburgh) There is plenty of empirical evidence of the value of taking a Bayesian approach to inverse problems in emission tomography – what mathematical statements can we make about the results? Our framework includes: studying frequentist properties of a Bayesian method the likelihood part of model assumed to be true (cf nonparametric analysis of van der Vaart etc.) treating the prior as the construct of the analyst, not a statement about nature a treatment that addresses non-gaussianity, non-regularity and constraints

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 37 / 70

slide-53
SLIDE 53

Concentration and approximation of posterior Framework

Theoretical analysis of SPECT

This section is joint work with Natalia Bochkina (Edinburgh) There is plenty of empirical evidence of the value of taking a Bayesian approach to inverse problems in emission tomography – what mathematical statements can we make about the results? Our framework includes: studying frequentist properties of a Bayesian method the likelihood part of model assumed to be true (cf nonparametric analysis of van der Vaart etc.) treating the prior as the construct of the analyst, not a statement about nature a treatment that addresses non-gaussianity, non-regularity and constraints

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 37 / 70

slide-54
SLIDE 54

Concentration and approximation of posterior Framework

Theoretical analysis of SPECT

This section is joint work with Natalia Bochkina (Edinburgh) There is plenty of empirical evidence of the value of taking a Bayesian approach to inverse problems in emission tomography – what mathematical statements can we make about the results? Our framework includes: studying frequentist properties of a Bayesian method the likelihood part of model assumed to be true (cf nonparametric analysis of van der Vaart etc.) treating the prior as the construct of the analyst, not a statement about nature a treatment that addresses non-gaussianity, non-regularity and constraints

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 37 / 70

slide-55
SLIDE 55

Concentration and approximation of posterior Framework

Theoretical analysis of SPECT

Using SPECT reconstruction as an example for an asymptotic study, there are three directions in which it is very natural in practical terms to go to a limit: exposure time becomes large resolution of data becomes finer resolution of reconstruction becomes finer We will concentrate on the first of these – thus we hold the dimensions

  • f data and reconstruction fixed, but allow relative noise levels to

decrease towards 0. If the exposure time is extended by a factor T , the model becomes T y ∼ Poisson(T Ax), T → ∞.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 38 / 70

slide-56
SLIDE 56

Concentration and approximation of posterior Framework

Theoretical analysis of SPECT

Using SPECT reconstruction as an example for an asymptotic study, there are three directions in which it is very natural in practical terms to go to a limit: exposure time becomes large resolution of data becomes finer resolution of reconstruction becomes finer We will concentrate on the first of these – thus we hold the dimensions

  • f data and reconstruction fixed, but allow relative noise levels to

decrease towards 0. If the exposure time is extended by a factor T , the model becomes T y ∼ Poisson(T Ax), T → ∞.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 38 / 70

slide-57
SLIDE 57

Concentration and approximation of posterior Framework

Theoretical analysis of SPECT

Using SPECT reconstruction as an example for an asymptotic study, there are three directions in which it is very natural in practical terms to go to a limit: exposure time becomes large resolution of data becomes finer resolution of reconstruction becomes finer We will concentrate on the first of these – thus we hold the dimensions

  • f data and reconstruction fixed, but allow relative noise levels to

decrease towards 0. If the exposure time is extended by a factor T , the model becomes T y ∼ Poisson(T Ax), T → ∞.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 38 / 70

slide-58
SLIDE 58

Concentration and approximation of posterior Framework

Small-variance SPECT

Our model for SPECT is an example of an ill-posed generalised linear inverse problem – p(y|x) depends on x only through the linear predictor Ax where A has (numerically) linearly dependent columns. We study inference for x given observed y, in the limit as a noise parameter σ2 (here, 1/T ) goes to 0. We assume an ‘identity link function’, so that y becomes concentrated on Ax as σ2 → 0. Because of the ill-posed/ill-conditioned character of the problem, we cannot expect consistency in inference about x based on the likelihood

  • alone. Even as σ2 → 0, so that y converges to ‘exact data’

yexact = Axtrue, we will not be able to distinguish between {x : Ax = Axtrue}.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 39 / 70

slide-59
SLIDE 59

Concentration and approximation of posterior Framework

Small-variance SPECT

Our model for SPECT is an example of an ill-posed generalised linear inverse problem – p(y|x) depends on x only through the linear predictor Ax where A has (numerically) linearly dependent columns. We study inference for x given observed y, in the limit as a noise parameter σ2 (here, 1/T ) goes to 0. We assume an ‘identity link function’, so that y becomes concentrated on Ax as σ2 → 0. Because of the ill-posed/ill-conditioned character of the problem, we cannot expect consistency in inference about x based on the likelihood

  • alone. Even as σ2 → 0, so that y converges to ‘exact data’

yexact = Axtrue, we will not be able to distinguish between {x : Ax = Axtrue}.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 39 / 70

slide-60
SLIDE 60

Concentration and approximation of posterior Framework

Small-variance SPECT

Our model for SPECT is an example of an ill-posed generalised linear inverse problem – p(y|x) depends on x only through the linear predictor Ax where A has (numerically) linearly dependent columns. We study inference for x given observed y, in the limit as a noise parameter σ2 (here, 1/T ) goes to 0. We assume an ‘identity link function’, so that y becomes concentrated on Ax as σ2 → 0. Because of the ill-posed/ill-conditioned character of the problem, we cannot expect consistency in inference about x based on the likelihood

  • alone. Even as σ2 → 0, so that y converges to ‘exact data’

yexact = Axtrue, we will not be able to distinguish between {x : Ax = Axtrue}.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 39 / 70

slide-61
SLIDE 61

Concentration and approximation of posterior Framework

Model formulation: GLIP

Rather than restricting to the SPECT model, our theoretical analysis is set in a framework of generalised linear inverse problems: specifically, p(y|x) = cy(σ2) exp(− fy(Ax)/σ2) for a given n × p matrix A and appropriate functions fy and cy. There is a continuous bijective link function G, and we write G(yexact) = Axtrue. We suppose the data are generated from this distribution, with x = xtrue, and attempt to recover xtrue as the dispersion parameter σ2 → 0. We assume that for all µ such that G(µ) ∈ AX, if y has the above distribution with Ax = G(µ), then y

p

→ µ as σ → 0

  • fµ(η) has a unique minimum over η ∈ AX at η = G(µ)

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 40 / 70

slide-62
SLIDE 62

Concentration and approximation of posterior Framework

Model formulation: GLIP

Rather than restricting to the SPECT model, our theoretical analysis is set in a framework of generalised linear inverse problems: specifically, p(y|x) = cy(σ2) exp(− fy(Ax)/σ2) for a given n × p matrix A and appropriate functions fy and cy. There is a continuous bijective link function G, and we write G(yexact) = Axtrue. We suppose the data are generated from this distribution, with x = xtrue, and attempt to recover xtrue as the dispersion parameter σ2 → 0. We assume that for all µ such that G(µ) ∈ AX, if y has the above distribution with Ax = G(µ), then y

p

→ µ as σ → 0

  • fµ(η) has a unique minimum over η ∈ AX at η = G(µ)

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 40 / 70

slide-63
SLIDE 63

Concentration and approximation of posterior Framework

Prior modelling

One of the roles of the prior in the Bayesian approach is to resolve this ambiguity (as well as generally improve reconstruction through ‘regularisation’, even without σ2 → 0). Our results apply for a wider class of suitably smooth priors, but for this talk we simply assume p(x) ∝ exp(−1/(2γ2)||x − x0||2

B) = exp(−1/(2γ2)(x − x0)TB(x − x0))

subject to x ∈ X = {x ∈ Rp : xj ≥ 0 ∀ j}. Thus the posterior is proportional to p(y|x) × exp(−1/(2γ2)||x − x0||2

B) subject to x ∈ X.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 41 / 70

slide-64
SLIDE 64

Concentration and approximation of posterior Framework

Prior modelling

One of the roles of the prior in the Bayesian approach is to resolve this ambiguity (as well as generally improve reconstruction through ‘regularisation’, even without σ2 → 0). Our results apply for a wider class of suitably smooth priors, but for this talk we simply assume p(x) ∝ exp(−1/(2γ2)||x − x0||2

B) = exp(−1/(2γ2)(x − x0)TB(x − x0))

subject to x ∈ X = {x ∈ Rp : xj ≥ 0 ∀ j}. Thus the posterior is proportional to p(y|x) × exp(−1/(2γ2)||x − x0||2

B) subject to x ∈ X.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 41 / 70

slide-65
SLIDE 65

Concentration and approximation of posterior Framework

Prior modelling

One of the roles of the prior in the Bayesian approach is to resolve this ambiguity (as well as generally improve reconstruction through ‘regularisation’, even without σ2 → 0). Our results apply for a wider class of suitably smooth priors, but for this talk we simply assume p(x) ∝ exp(−1/(2γ2)||x − x0||2

B) = exp(−1/(2γ2)(x − x0)TB(x − x0))

subject to x ∈ X = {x ∈ Rp : xj ≥ 0 ∀ j}. Thus the posterior is proportional to p(y|x) × exp(−1/(2γ2)||x − x0||2

B) subject to x ∈ X.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 41 / 70

slide-66
SLIDE 66

Concentration and approximation of posterior Framework

Small-variance SPECT

The posterior is proportional to p(y|x) × exp(−1/(2γ2)||x − x0||2

B) subject to x ∈ X.

We study this in the σ → 0 limit; to obtain convergence to a point, we will need γ → 0 as well (though, as we will see, at a slower rate than σ). In particular, we will show that as σ → 0 and γ → 0 in such a way that σ/γ → 0, the posterior converges to the point x⋆ = argminx∈X:Ax=yexact||x − x0||2

B

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 42 / 70

slide-67
SLIDE 67

Concentration and approximation of posterior Framework

Small-variance SPECT

The posterior is proportional to p(y|x) × exp(−1/(2γ2)||x − x0||2

B) subject to x ∈ X.

We study this in the σ → 0 limit; to obtain convergence to a point, we will need γ → 0 as well (though, as we will see, at a slower rate than σ). In particular, we will show that as σ → 0 and γ → 0 in such a way that σ/γ → 0, the posterior converges to the point x⋆ = argminx∈X:Ax=yexact||x − x0||2

B

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 42 / 70

slide-68
SLIDE 68

Concentration and approximation of posterior Framework

Small-variance SPECT

The posterior is proportional to p(y|x) × exp(−1/(2γ2)||x − x0||2

B) subject to x ∈ X.

We study this in the σ → 0 limit; to obtain convergence to a point, we will need γ → 0 as well (though, as we will see, at a slower rate than σ). In particular, we will show that as σ → 0 and γ → 0 in such a way that σ/γ → 0, the posterior converges to the point x⋆ = argminx∈X:Ax=yexact||x − x0||2

B

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 42 / 70

slide-69
SLIDE 69

Concentration and approximation of posterior Geometry

Basic geometry

Suppose that A is a real n × p matrix, and B a real symmetric non-negative definite p × p matrix, both possibly of deficient rank. Assume that the p × (n + p) block matrix [AT. . .B] has full rank p. Then x⋆ = argminx∈X:Ax=yexact||x − x0||2

B

is unique, and if σ → 0, γ → 0 and σ/γ → 0, then approaching the limit the posterior is a truncated Gaussian, with variance scaling differently in different directions. If q is the rank of A, then asymptotically the variance of the posterior distribution (before truncation) has q eigenvalues scaling like σ2 and the remaining (p − q) like the (larger) γ2.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 43 / 70

slide-70
SLIDE 70

Concentration and approximation of posterior Geometry

Basic geometry

Suppose that A is a real n × p matrix, and B a real symmetric non-negative definite p × p matrix, both possibly of deficient rank. Assume that the p × (n + p) block matrix [AT. . .B] has full rank p. Then x⋆ = argminx∈X:Ax=yexact||x − x0||2

B

is unique, and if σ → 0, γ → 0 and σ/γ → 0, then approaching the limit the posterior is a truncated Gaussian, with variance scaling differently in different directions. If q is the rank of A, then asymptotically the variance of the posterior distribution (before truncation) has q eigenvalues scaling like σ2 and the remaining (p − q) like the (larger) γ2.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 43 / 70

slide-71
SLIDE 71

Concentration and approximation of posterior Geometry

Geometry

Visualisation of posterior when n = 1, p = 2, q = 1.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 44 / 70

slide-72
SLIDE 72

Concentration and approximation of posterior Geometry

Geometry

Visualisation of posterior when n = 1, p = 2, q = 1.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 44 / 70

slide-73
SLIDE 73

Concentration and approximation of posterior Geometry

Geometry

Visualisation of posterior when n = 1, p = 3, q = 1. In this case x⋆ lies internal to X.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 45 / 70

slide-74
SLIDE 74

Concentration and approximation of posterior Geometry

Geometry

Visualisation of posterior when n = 1, p = 3, q = 1. In this case x⋆ lies internal to X.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 45 / 70

slide-75
SLIDE 75

Concentration and approximation of posterior Geometry

Geometry

Visualisation of posterior when n = 1, p = 3, q = 1. In this case x⋆ lies internal to X. σ and γ getting smaller.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 46 / 70

slide-76
SLIDE 76

Concentration and approximation of posterior Geometry

Geometry

Visualisation of posterior when n = 1, p = 3, q = 1. In this case x⋆ lies

  • n boundary of X.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 47 / 70

slide-77
SLIDE 77

Concentration and approximation of posterior Geometry

Geometry

Visualisation of posterior when n = 1, p = 3, q = 1. In this case x⋆ lies

  • n boundary of X.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 47 / 70

slide-78
SLIDE 78

Concentration and approximation of posterior Geometry

Geometry

Visualisation of posterior when n = 1, p = 3, q = 1. In this case x⋆ lies

  • n boundary of X.

σ and γ getting smaller.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 48 / 70

slide-79
SLIDE 79

Concentration and approximation of posterior Geometry

Karush–Kuhn–Tucker

We are interested in the solution to the constrained minimisation problem x⋆ = argminx∈X:Ax=yexact||x − x0||2

B

By the Karush–Kuhn–Tucker theory, for this particular problem, it is necessary and sufficient to find (x⋆, ζ, λ) ∈ Rp × Rp × Rn such that B(x⋆ − x0) − ζ + ATλ = 0 x⋆ ≥ 0 Ax⋆ = yexact ζ ≥ 0 for all j, ζj = 0 or x⋆

j = 0

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 49 / 70

slide-80
SLIDE 80

Concentration and approximation of posterior Geometry

Karush–Kuhn–Tucker

Geometrical interpretation of KKT conditions when n = 1, p = 2, q = 1, in case B = I.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 50 / 70

slide-81
SLIDE 81

Concentration and approximation of posterior Geometry

Karush–Kuhn–Tucker

Geometrical interpretation of KKT conditions when n = 1, p = 2, q = 1, in case B = I.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 50 / 70

slide-82
SLIDE 82

Concentration and approximation of posterior Concentration of posterior

Metrics and convergence

To study convergence of the posterior simultaneously over all y = y(ω), we need a metric that metrises convergence in probability. The Ky Fan metric between two random variables ξ1 and ξ2 in a metric space (Y, dY) is ρK(ξ1, ξ2) = inf{ε > 0 : p{dY(ξ1(ω), ξ2(ω)) > ε} < ε}. Weak convergence of the posterior µpost = p(x ∈ ·|y(ω)) as a random variable to the point mass δx⋆ is equivalent to its convergence in the Ky Fan metric, where (Y, dY) is a space of distributions with the Prohorov metric.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 51 / 70

slide-83
SLIDE 83

Concentration and approximation of posterior Concentration of posterior

Concentration

We are able to give explicit upper bounds on ρK(µpost, δx⋆), depending

  • n the spectral properties of the asymptotic information matrix and

prior precision at x⋆, on ρK(y, yexact), and on the likelihood and prior dispersion parameters σ2 and γ2. For example, in the interior point case, under conditions, ρK(µpost, δx⋆) ≤ max{2ρK(y, yexact), c1ρK(y, yexact) + c2ν +

τ λmin(Hν) log

  • CP
  • τ

λmin(Hν) κP1/2 (1 + ∆⋆,K(δ))}

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 52 / 70

slide-84
SLIDE 84

Concentration and approximation of posterior Concentration of posterior

Concentration

We are able to give explicit upper bounds on ρK(µpost, δx⋆), depending

  • n the spectral properties of the asymptotic information matrix and

prior precision at x⋆, on ρK(y, yexact), and on the likelihood and prior dispersion parameters σ2 and γ2. For example, in the interior point case, under conditions, ρK(µpost, δx⋆) ≤ max{2ρK(y, yexact), c1ρK(y, yexact) + c2ν +

τ λmin(Hν) log

  • CP
  • τ

λmin(Hν) κP1/2 (1 + ∆⋆,K(δ))}

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 52 / 70

slide-85
SLIDE 85

Concentration and approximation of posterior Concentration of posterior

Concentration

Ky Fan metrics can be readily evaluated for specific distributions. If Yt ∼ σ2Poisson(µt/σ2), then ρK(Y, µ) ∼

  • −σ2M log(σ2M)

as σ2 → 0, where M =

t µt.

If Y ∼ Np(µ, Σ) then there exist constants Cp, θp such that for any Σ with ||Σ|| < θp, ρK(Y, µ) ≤

  • −||Σ|| log{Cp||Σ||max{1,p−2}}

If Y ∼ Exp(λ/σ2) then ρK(Y, 0) ∼ −(σ2/λ) log(σ2/λ) (and is always ≤) as σ → 0.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 53 / 70

slide-86
SLIDE 86

Concentration and approximation of posterior Concentration of posterior

Concentration in the Bayesian SPECT model

If x⋆ is an interior point of X, then for small enough σ2 and γ2, ρK(µpost, δx⋆) ≤

  • C1
  • −σ2 log σ + C2

σ2 γ2 + C3σ1−αγα

  • − log(σ1−αγα)
  • (1 + o(1))

where α = 0 is AT∇2˜ fyexact(Ax⋆)A is of full rank, and α = 1 otherwise. If α = 0 (well-posed case), the fastest rate is σ

  • − log σ, with

γ ≥ σ1/2[− log σ]−1/4 If α = 1 (ill-posed case), the fastest rate is σ2/3 − log σ, with γ = σ2/3[− log σ]−1/6 If x⋆ is on the boundary of X, there are additional terms in ρK(µpost, δx⋆).

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 54 / 70

slide-87
SLIDE 87

Concentration and approximation of posterior Concentration of posterior

Concentration in the Bayesian SPECT model

If x⋆ is an interior point of X, then for small enough σ2 and γ2, ρK(µpost, δx⋆) ≤

  • C1
  • −σ2 log σ + C2

σ2 γ2 + C3σ1−αγα

  • − log(σ1−αγα)
  • (1 + o(1))

where α = 0 is AT∇2˜ fyexact(Ax⋆)A is of full rank, and α = 1 otherwise. If α = 0 (well-posed case), the fastest rate is σ

  • − log σ, with

γ ≥ σ1/2[− log σ]−1/4 If α = 1 (ill-posed case), the fastest rate is σ2/3 − log σ, with γ = σ2/3[− log σ]−1/6 If x⋆ is on the boundary of X, there are additional terms in ρK(µpost, δx⋆).

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 54 / 70

slide-88
SLIDE 88

Concentration and approximation of posterior Concentration of posterior

Concentration in the Bayesian SPECT model

If x⋆ is an interior point of X, then for small enough σ2 and γ2, ρK(µpost, δx⋆) ≤

  • C1
  • −σ2 log σ + C2

σ2 γ2 + C3σ1−αγα

  • − log(σ1−αγα)
  • (1 + o(1))

where α = 0 is AT∇2˜ fyexact(Ax⋆)A is of full rank, and α = 1 otherwise. If α = 0 (well-posed case), the fastest rate is σ

  • − log σ, with

γ ≥ σ1/2[− log σ]−1/4 If α = 1 (ill-posed case), the fastest rate is σ2/3 − log σ, with γ = σ2/3[− log σ]−1/6 If x⋆ is on the boundary of X, there are additional terms in ρK(µpost, δx⋆).

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 54 / 70

slide-89
SLIDE 89

Concentration and approximation of posterior Concentration of posterior

Geometry

Visualisation of posterior when n = 1, p = 3, q = 1. In this case x⋆ lies internal to X.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 55 / 70

slide-90
SLIDE 90

Concentration and approximation of posterior Concentration of posterior

(Lack of) consistency

In summary, the posterior concentrates at the point x⋆ = argminx≥0:Ax=yexact||x − x0||2

B

– but this is usually not equal to the truth xtrue. We do have Ax⋆ = Axtrue but (I − PAT )x⋆ is determined solely by the prior and the constraints.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 56 / 70

slide-91
SLIDE 91

Concentration and approximation of posterior Approximation of posterior

Bernstein–von Mises

Can we quantify probabilistically the variation of p(x|y) about the concentration point x⋆? More concretely, can x − x⋆ be (transformed and) scaled so that it has a non-trivial limit posterior law? Such questions are traditionally the subject of the Bernstein–von Mises (BvM) theorem, in various versions. Here we need a version of the BvM theorem that deals, as well as singularity of the prior, with the non-regular aspects of our set-up, namely positivity constraints ill-posed-ness of A (so x not identifiable) likelihood non-regularity, e.g. from the possibility of Poisson counts of 0

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 57 / 70

slide-92
SLIDE 92

Concentration and approximation of posterior Approximation of posterior

Bernstein–von Mises

Can we quantify probabilistically the variation of p(x|y) about the concentration point x⋆? More concretely, can x − x⋆ be (transformed and) scaled so that it has a non-trivial limit posterior law? Such questions are traditionally the subject of the Bernstein–von Mises (BvM) theorem, in various versions. Here we need a version of the BvM theorem that deals, as well as singularity of the prior, with the non-regular aspects of our set-up, namely positivity constraints ill-posed-ness of A (so x not identifiable) likelihood non-regularity, e.g. from the possibility of Poisson counts of 0

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 57 / 70

slide-93
SLIDE 93

Concentration and approximation of posterior Approximation of posterior

BvM: transformation of coordinates

Recall that the point of concentration of the posterior is x⋆ = argminx≥0:Ax=yexact||x − x0||2

B

Let J = {j : x⋆

j = 0}. When we scale x − x⋆ to get a non-degenerate

limit as σ, γ → 0, the support of the scaled posterior is XJ = {x : xJ ≥ 0}. We can decompose XJ = W0 ⊕ W1 ⊕ W+

2 ⊕ W+ 3

to focus on the different characters of the limit in different directions: different scalings and different shapes

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 58 / 70

slide-94
SLIDE 94

Concentration and approximation of posterior Approximation of posterior

BvM: transformation of coordinates

We can decompose XJ = W0 ⊕ W1 ⊕ W+

2 ⊕ W+ 3

The four subsets can be characterised as follows: identi- projection dim fiable?

  • n Wk

scaling subspace W0 p0 yes interior σ subspace W1 p1 no interior γ polyhedral cone W+

2

p2 yes boundary σ2 polyhedral cone W+

3

p3 no boundary γ2 Any of {pk} can be 0, depending on the details of the problem, but p0 + p1 + p2 + p3 = p.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 59 / 70

slide-95
SLIDE 95

Concentration and approximation of posterior Approximation of posterior

BvM: transformation of coordinates

W0 is the whole space when the problem is regular W1 appears in an ill-posed linear Gaussian problem W+

3 arises from the positivity constraints on x⋆ being active

W+

2 arises with an irregular likelihood (zero Poisson counts)

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 60 / 70

slide-96
SLIDE 96

Concentration and approximation of posterior Approximation of posterior

BvM theorem for non-regular GLIP

Visualisation of a case with n = 1, p = 3,

x3 x2 x1

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 61 / 70

slide-97
SLIDE 97

Concentration and approximation of posterior Approximation of posterior

BvM theorem for non-regular GLIP

Visualisation of a case with n = 1, p = 3, in which x⋆

1, x⋆ 2 > 0, x⋆ 3 = 0.

x3 x2 X* x1

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 61 / 70

slide-98
SLIDE 98

Concentration and approximation of posterior Approximation of posterior

BvM theorem for non-regular GLIP

Visualisation of a case with n = 1, p = 3, in which x⋆

1, x⋆ 2 > 0, x⋆ 3 = 0.

x3 x2 x1

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 61 / 70

slide-99
SLIDE 99

Concentration and approximation of posterior Approximation of posterior

BvM theorem for non-regular GLIP

Visualisation of a case with n = 1, p = 3, in which x⋆

1, x⋆ 2 > 0, x⋆ 3 = 0.

p0 =p1 =p3 =1, p2 =0.

x3 x2 W3

+

W1 W0 x1

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 61 / 70

slide-100
SLIDE 100

Concentration and approximation of posterior Approximation of posterior

BvM theorem for non-regular GLIP

Suppose we fill the p × pk matrix Vk with column vectors forming a basis for Wk (W+

k ) for each k,

we set V = [V0 . . .V1 . . .V2 . . .V3], we assume σ → 0, γ → 0, σ = o(γ) and σ = O(γ2), we write S =

  • σV0

γV1 σ2V2 γ2V3 −1, then S(x − x⋆) | y → µ⋆ = N(a0, Ω−1

00 ) × N(0, B−1 11 ) × Exp(a2) × Exp(a3)

in total variation norm.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 62 / 70

slide-101
SLIDE 101

Concentration and approximation of posterior Approximation of posterior

BvM theorem for non-regular GLIP

Suppose we fill the p × pk matrix Vk with column vectors forming a basis for Wk (W+

k ) for each k,

we set V = [V0 . . .V1 . . .V2 . . .V3], we assume σ → 0, γ → 0, σ = o(γ) and σ = O(γ2), we write S =

  • σV0

γV1 σ2V2 γ2V3 −1, then S(x − x⋆) | y → µ⋆ = N(a0, Ω−1

00 ) × N(0, B−1 11 ) × Exp(a2) × Exp(a3)

in total variation norm.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 62 / 70

slide-102
SLIDE 102

Concentration and approximation of posterior Approximation of posterior

BvM theorem for non-regular GLIP

The parameters in this limiting distribution are given by: Ω00 = V T

0 ∇2fyexact(x⋆)V0,

a2 = V T

2 ∇fyexact(x⋆),

B11 = V T

1 BV1,

a3 = V T

3 ζ,

a0(ω) = Ω−1

00 lim σ→0[1/σV T 0 ∇fy(ω)(x⋆) + σ/γ2V T 0 B(x⋆ − x0)].

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 63 / 70

slide-103
SLIDE 103

Concentration and approximation of posterior Approximation of posterior

Assumptions

We have to exclude degeneracy in the convex optimisation: for all j, either ζj = 0 and x⋆

j = 0 but not both;

for all i, either ∇i˜ fyexact(yexact) = 0 or yi = 0 but not both. and also assume that Ω00 and B11 are of full rank; a0(ω) < ∞ for all ω; γ → 0, σ/γ → 0 as σ → 0, and c = limσ→0 σ/γ2 < ∞.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 64 / 70

slide-104
SLIDE 104

Concentration and approximation of posterior Approximation of posterior

Assumptions

We need an assumption about the concentration of the posterior: For some {δk} → 0 but with (δ0, δ1, δ2, δ3) ≫ (σ, γ, σ2, γ2), resp., P{x ∈ B(x⋆, δ)|y} → 0 as σ → 0. where B(x⋆, δ) = x⋆ + S−1 B2,0(δ0) × B2,1(δ1) × B∞,2(δ2) × B∞,3(δ3)

  • and Br,k(δ) = {x ∈ Rpk : ||x||r < δ}.

We need continuity at x⋆: y

P

→ yexact, ∇kfy(x) P → ∇kfyexact(x), k = 0, 1, 2, 3, as σ → 0. and smoothness: fy,fyexact both have bounded 3rd derivatives on B(x⋆, δ).

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 65 / 70

slide-105
SLIDE 105

Concentration and approximation of posterior Approximation of posterior

Assumptions

We need an assumption about the concentration of the posterior: For some {δk} → 0 but with (δ0, δ1, δ2, δ3) ≫ (σ, γ, σ2, γ2), resp., P{x ∈ B(x⋆, δ)|y} → 0 as σ → 0. where B(x⋆, δ) = x⋆ + S−1 B2,0(δ0) × B2,1(δ1) × B∞,2(δ2) × B∞,3(δ3)

  • and Br,k(δ) = {x ∈ Rpk : ||x||r < δ}.

We need continuity at x⋆: y

P

→ yexact, ∇kfy(x) P → ∇kfyexact(x), k = 0, 1, 2, 3, as σ → 0. and smoothness: fy,fyexact both have bounded 3rd derivatives on B(x⋆, δ).

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 65 / 70

slide-106
SLIDE 106

Concentration and approximation of posterior Approximation of posterior

Idea of proof

  • 0. Split the total variation norm in 3 parts:

||PS(x−x⋆)|y − µ⋆||TV ≤ ||PS(x−x⋆)|y1B − µ⋆1B||TV +||PS(x−x⋆)|y − PS(x−x⋆)|y1B||TV + ||µ⋆ − µ⋆1B||TV, where 1B represents renormalised truncation onto the rescaled local neighbourhood B(x⋆, δ) of x⋆.

  • 1. Use quadratic approximation of the log posterior distribution on a

neighbourhood of x⋆ to bound ||PS(x−x⋆)|y1B − µ⋆1B||TV. On W0 and W1, the leading term is quadratic, and on W+

2 and W+ 3

the leading term is linear.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 66 / 70

slide-107
SLIDE 107

Concentration and approximation of posterior Approximation of posterior

Idea of proof

  • 2. Bound the effect of the local neighbourhood truncation

||PS(x−x⋆)|y1B − PS(x−x⋆)|y||TV using the concentration assumption.

  • 3. Note that µ⋆ outside of the local neighbourhood vanishes since B

converges to the whole space.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 67 / 70

slide-108
SLIDE 108

Concentration and approximation of posterior Approximation of posterior

Practical implications of the BvM approximation

Of course, on a realistic practical scale we cannot hope to construct and manipulate p × p matrices such as V. However, the theorem can be used to calculate approximate posterior probabilities using posterior means and covariances estimated by (e.g.) MCMC.

54 Journal of the American Statistical Association, March 1997 Table 2. Brain Phantom Major MCMC Run: Estimates of Posterior Means, Integrated Auto-Correlation Times, and Standard Errors (SE's)

Parameter Mean

T(f)

M ;(f)/N SE High 188.29 12.392 38 .01377 1.6785 Medium 93.524 8.1245 25 .00903 .8502 Low 41.905 12.653 38 .01406 1.4670 3.6189 x 10-3 6.0526 19 .00673 2.2004 x 10-4 3 2.3384 x 10-2 11.003 34 .01223 9.6926 x 10-5

  • 4f. The

fully Bayesian posterior mean is not as aesthetically pleasing as the posterior mode from the OSL using the fixed "best" prior parameters. The former is slightly noisier and has RMSE value of 6.3670, compared to 5.7003 from the fixed parameter posterior

  • mode. The RMSE value of the

fully Bayesian reconstruction is substantially less than the least possible

  • btainable

by ML/EM (8.2179). The MCMC run took 940.3 seconds

  • n a Sun SPARC

10/41 worksta- tion, compared to 30.0 seconds for OSL. Therefore, the comparative computational cheapness

  • f OSL

when "good" fixed values

  • f the prior

parameters are available discour- ages the use of MCMC to obtain reconstructions. However, in reality the variety

  • f organs

imaged and range

  • f photon

counts recorded may mean that these are not available. Us- ing other parameter values, the OSL algorithm invariably produced grossly oversmoothed

  • r noisy reconstructions.

Thus the automation and quality

  • f the fully Bayesian

re- construction is admirable. Furthermore, the fully Bayesian method also allows

  • ne

to go beyond mere reconstruction, because functionals

  • f

the body space can be easily obtained as a by-product

  • f

the MCMC. For instance, fully Bayesian posterior pixelwise modes could be obtained. These may be useful for examin- ing a specific region

  • f interest.

The OSL fixed-parameter solution would not be capable

  • f this,

as it is a mode for the whole body space. Fully Bayesian posterior medians also

a a

  • c

.

CL~~~~~~~~

0.

5 10 15 20 25 30 5 10 15 20 25 30 pixel pixel

(a) (b)

  • ?
  • ?

0-

a

LO

a LO

Cu

C

..z.

*.....

0.0

  • )

5 10 15 20 25 30 5 10 15 20 25 30 pixel pixel

(c) (d)

Figure

  • 5. Brain

Phantom: (a) Row 25 Truth and Posterior Mean

(

phantom;

  • --, Mean);

(b) Row 25 Truth and Credibility Bounds (---, 90% bounds;

(---, 95%

bounds);

(c) Row 16 Truth

and Posterior Mean

( , Phantom;

  • --, Mean);

(d) Row 16 Truth and Credibility Bounds

(---,

90% bounds; ---, 95% bounds).

can be easily obtained; here the reconstruction composed

  • f pixelwise medians

was found to be very similar in ap- pearance to the posterior mean. Bayesian interval estimators are also easily obtained. Here I report pixelwise quantiles; however, Besag et al. (1995)

  • utlined

the construction

  • f si-

multaneous bounds for the whole vector

  • x. Figure

5 graphs two transects

  • f the brain

phantom that cross distinct re- gions

  • f interest.

The posterior mean and 90% and 95% pix- elwise credibility bounds are superimposed

  • n these plots.

It can be seen that the posterior mean faithfully reproduces the general truth shape, the major departure being overes- timation

  • f the low metabolic

activity region. However, the credibility bounds fully bracket the truth. In general, there may be situations where the credibility bounds do not give full information about the truth. For instance, the ability to successfully reconstruct small area hot spots and edges will be affected by low count/noisy data sets, the sampling bin widths, the reconstruction discretization used, and the

accuracy of the {ats} weight modeling.

We can also use output from the MCMC to get posterior density estimates

  • f the prior

parameters. The density es- timates do not have pathological worth but do give insight into appropriate prior parameter values that could, for in- stance, be used to gain fixed-parameter OSL posterior mode reconstructions.

3.2 Real Data

The data relate to a horizontal slice through a brain and comprise 119,405 photon counts recorded by a system

  • f

64 detectors at 64 angles. The filtered back-projection re- construction provided by the Bristol Royal Infirmary is on a 64 x 64 grid

  • f .48 cm squares.

Figure 6a displays the cen- tral 48 x 48 pixels. The presence

  • f spurious

radial artifacts is a well-documented deficiency

  • f filtered

back-projection. Figures 6b and 6c show ML/EM reconstructions after 50 and 100 iterations. These reconstructions define the out- line of the brain better, but they are noisier. Figure 6d dis- plays the OSL solution using the prior parameter values chosen for the simulated data and reveals a clear isotope pattern that comprises several well-defined

  • regions. Fig-

ures 6e and 6f present the final realization from the fully Bayesian MCMC run and the resulting estimate

  • f the

pos- terior mean. The fully Bayesian posterior mean is similar to the OSL fixed-parameter solution but is not as smooth and suggests greater contrast between several regions. Fig- ure 7 displays the posterior mean and credibility bands

  • f

a cross-section; the widths

  • f the credibility

bounds are not constant indicating the varying confidence in parts

  • f the

solution.

4. DISCUSSION

The work presented here has demonstrated the ability to

  • btain

representative Bayesian reconstructions

  • f SPECT

data without the need for prior parameter selections. The process is thus fully automated and can be applied with no supervision. The reconstructions are of comparable qual- ity to OSL ones when "good" fixed prior parameter val- ues are used and better when inappropriate values are used.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 68 / 70

slide-109
SLIDE 109

Concentration and approximation of posterior Approximation of posterior

Practical implications of the BvM approximation

Of course, on a realistic practical scale we cannot hope to construct and manipulate p × p matrices such as V. However, the theorem can be used to calculate approximate posterior probabilities using posterior means and covariances estimated by (e.g.) MCMC. Inferential questions of real interest, such as quantitative inference about amounts of radio-labelled tracer within specified regions of interest, or test for significance of apparent hot- or cold-spots can be answered using approximate posterior distributions for linear combinations of x. We can use the MAP estimate ˆ x = argmaxxp(x|y) to identify x⋆ and hence J; we anticipate p0 ≫ p1, p2, p3 in practical situations.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 69 / 70

slide-110
SLIDE 110

Concentration and approximation of posterior Approximation of posterior

Practical implications of the BvM approximation

Of course, on a realistic practical scale we cannot hope to construct and manipulate p × p matrices such as V. However, the theorem can be used to calculate approximate posterior probabilities using posterior means and covariances estimated by (e.g.) MCMC. Inferential questions of real interest, such as quantitative inference about amounts of radio-labelled tracer within specified regions of interest, or test for significance of apparent hot- or cold-spots can be answered using approximate posterior distributions for linear combinations of x. We can use the MAP estimate ˆ x = argmaxxp(x|y) to identify x⋆ and hence J; we anticipate p0 ≫ p1, p2, p3 in practical situations.

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 69 / 70

slide-111
SLIDE 111

Acknowledgements/contact

I am particularly indebted to Natalia Bochkina, and also grateful to my earlier collaborators Iain Weir and Malcolm Hudson, and to Brian Hutton for his update on modern SPECT reconstruction. To follow up: contact P .J.Green@bristol.ac.uk

Peter Green (Bristol/UTS) Bayesian inverse problems Istanbul, July 2012 70 / 70