# Trans-dimensional What if the number of things you dont know is one - PDF document

## SAMSI Stochastic Computation Research Triangle, September 2002 Trans-dimensional Markov chain Monte Carlo Trans-dimensional What if the number of things you dont know is one of the things you dont know? Markov chain Monte Carlo

1. SAMSI Stochastic Computation Research Triangle, September 2002 Trans-dimensional Markov chain Monte Carlo Trans-dimensional What if ‘the number of things you don’t know is one of the things you don’t know’? Markov chain Monte Carlo Ubiquitous in statistical modelling, both by Peter Green (University of Bristol, � in traditional modelling situations such as P.J.Green@bristol.ac.uk variable selection in regression, and http://www.stats.bris.ac.uk/ � peter ). � in more novel methodologies such as object recognition, signal processing, and Bayesian (Thanks to all my collaborators on nonparametrics. trans-dimensional MCMC problems: Carmen Fern´ andez, Paolo Giudici, Miles Harkness, Formulate generically as joint inference about a David Hastie, Matthew Hodgson, Antonietta Mira, model indicator k and a parameter vector � k , where Agostino Nobile, Marco Pievatolo, the model indicator determines the dimension n k of Sylvia Richardson, Luisa Scaccia and the parameter, but this dimension varies from Claudia Tarantola.) model to model. � University of Bristol, 2002 c 1 2 Hierarchical model Suppose given � a prior p � k � over models k in a countable set K , and � for each k Usually in a frequentist setting, inference about – a prior distribution p � � j k � , and these two kinds of unknown is based on different k logical principles. – a likelihood p � Y j k � � � for the data Y . k There may be debate on what to do with it, but the For definiteness and simplicity, suppose that p � � j k � Bayesian needs only the joint posterior p � k � � j Y � . k k n is a density with respect to k -dimensional How should we compute it? Lebesgue measure, and that there are no other parameters, so that where there are parameters common to all models these are subsumed into each n � � R k . k Additional parameters, perhaps in additional layers of a hierarchy, are easily dealt with. Note that all probability distributions are proper. 3 4

2. The joint posterior Note the generality of this basic formulation: it embraces both p � k � p � � j k � p � Y j k � � � k k R p � k � � j Y � � P k � � � � � � p � k � p � � j k � p � Y j k � � � d � � genuine model-choice situations, where the k � �K k � k � k � variable k indexes the collection of discrete can always be factorised as models under consideration, but also p � k � � j Y � � p � k j Y � p � � j k � Y � k k � settings where there is really a single model, but one with a variable dimension parameter, for – the product of posterior model probabilities and example a functional representation such as a model-specific parameter posteriors. series whose number of terms is not fixed (in – very often the basis for reporting the inference, k is unlikely to be of direct which case, and in some of the methods mentioned below is inferential interest). also the basis for computation. 5 6 Compatibility across models Some would argue that responsible adoption of this Bayesian hierarchical model presupposes that, e.g., p � � j k � should be compatible in that inference about k functions of parameters that are meaningful in several models should be approximately invariant Across- and within-model simulation to k . Such compatibility could in principle be exploited Two main approaches: in the construction of MCMC methods (how?). � across : one MCMC simulation with states of the But it is philosophically tenable that no such form � k � � � k compatibility is present, and we shall not assume it. � within : separate simulations of � k . k for each Non-Bayesian uses Trans-dimensional MCMC has many applications other than to Bayesian statistics. Much of what follows will apply equally to them all; however, for simplicity, I shall use the Bayesian motivation and terminology throughout. 7 8

3. Across-model simulation Metropolis-Hastings on a general state space Reversible jump MCMC We wish to construct a Markov chain on a state The state space for an across-model simulation is space X with invariant distribution � . S n f � k � � � g � � f k g � R � . k k k �K As usual in MCMC we will consider only reversible Mathematically, this is not a particularly awkward P satisfies the chains, so the transition kernel object. But at least a little non-standard! detailed balance condition Z Z We use Metropolis-Hastings to build a suitable � � � � � d x � P � x� d x � � � � d x � P � x � d x � reversible chain. On the face of it, this requires measure-theoretic � (both integrals over � x� x � � A � B ), notation, which may be unwelcome! The point of for all Borel sets A� B � X . the ‘reversible jump’ framework is to render the Compare this with measure theory invisible, by means of a construction using only ordinary densities. Even � � � � � x � P � x� x � � � � x � P � x � x � the fact that we are jumping dimensions becomes essentially invisible! 9 10 � Now � � d x � q � x� d x � is dominated by a symmetric measure � on X � X ; let its density (Radon-Nikodym derivative) with respect to this � In Metropolis-Hastings, we make a transition by be f . Then DB requires � from the first drawing a candidate new state x Z � proposal measure q � x� d x � and then accepting it � � � � � x� x � f � x� x � � � d x� d x � � � x�x � � A � B with probability � � x� x � , to be derived below. � Z If we reject, we stay in the current state, so that � � � � � � x � x � f � x � x � � � d x � d x � � P � x� d x � has an atom at x . This contributes the � x�x � � A � B � R same quantity P � x� f x g � � � d x � to each side of � , this is clearly satisfied for Using the symmetry of A � B the DB equation; subtracting this leaves all Borel A� B if Z � � � f � x � x � � � � � d x � q � x� d x � � � x� x � � � � x� x � � min � � � f � x� x � � � x�x � � � A � B Z This might be written more informally in the � � � � � � d x � q � x � d x � � � x � x � � apparently familiar form � x�x � � � A � B � � � � � � d x � q � x � d x � � � � x� x � � min � � � � � d x � q � x� d x � � 11 12

4. A constructive representation in terms of random numbers A to B The equilibrium probability of jumping from Now let’s get rid of this abstraction! is then an integral with respect to � x� u � : Z Consider how the transition will be implemented; � � � x � g � u � � � x� x � d x du� we find the dominating measure and � x�x � h � x�u �� � A � B � Radon-Nikodym derivatives can be generated implicitly. The equilibrium probability of jumping from B to A d , and that � � Assume X � R � has a density (also is an integral with respect to � x � u � : denoted � ) with respect to d –dimensional Lebesgue Z � � � � � measure. � � x � g � u � � � x � x � d x du� � x � h � � x � �u � � �x � � � A � B At the current state x , we generate, say, r random numbers u from a known joint density g , and then � � If the transformation from � x� u � to � x � u � is a form the proposed new state as a deterministic diffeomorphism (the transformation and its inverse function of the current state and the random are differentiable), then we can apply the standard � numbers: x � h � x� u � , say. change-of-variable formula, to write this as an � to integral with respect to � x� u � . The reverse transition from x x would be made � giving � u � g with the aid of random numbers � � � x � h � x � u � . 13 14 What’s the point? Detailed balance says the two integrals are equal: it holds if Perhaps a little indirect! � � � � � � � � x � u � – but a flexible framework for constructing quite � � � � � � � � � x � g � u � � � x� x � � � � x � g � u � � � x � x � � � � � � x� u � complex moves using only elementary calculus. The possibility that r � d covers the typical case where the last factor is the Jacobian of the x � X , only a lower-dimensional subset that given � � diffeomorphism from � x� u � to � x � u � . of X is reachable in one step. � is Thus, a valid choice for (The Gibbs sampler is the best-known example of � � � � � � � � � � � � � x � g � u � � � x � u � � � this, since in that case only some of the components � � � x� x � � min � � � � � � � x � g � u � � � x� u � of the state vector are changed at a time, although the formulation here is more general as it allows the involving only ordinary joint densities. subset not to be parallel to the coordinate axes.) 15 16

Recommend

More recommend