Fundamental Issues in Bayesian Functional Data Analysis Dennis D. - PowerPoint PPT Presentation

Fundamental Issues in Bayesian Functional Data Analysis Dennis D. Cox Rice University 1

Introduction • Question: What are functional data? • Answer: Data that are functions of a continuous variable. • ... say we observe Y i ( t ), t ∈ [ a, b ] where • Y 1 , Y 2 , . . . Y n are i.i.d. N ( µ, V ): µ ( t ) = E [ Y ( t )] , V ( t, s ) = Cov[ Y ( t ) , Y ( s )] . • Question: Do we ever really observe functional data? • Here’s some examples of functional data: 2

0.05 0.08 Intensity Intensity Intensity Intensity 0.06 0.10 0.03 0.02 0.02 0.01 0.02 400 550 400 550 400 550 400 550 Emission Wavelength (nm. Emission Wavelength (nm. Emission Wavelength (nm. Emission Wavelength (nm. 0.12 0.5 0.4 0.08 Intensity Intensity Intensity Intensity 0.3 0.06 0.2 0.02 0.1 0.00 0.0 400 550 400 550 400 550 400 550 Emission Wavelength (nm. Emission Wavelength (nm. Emission Wavelength (nm. Emission Wavelength (nm. 0.15 0.3 Intensity 0.15 Intensity Intensity Intensity 0.08 0.05 0.05 0.1 0.02 400 550 400 550 400 550 400 550 Emission Wavelength (nm. Emission Wavelength (nm. Emission Wavelength (nm. Emission Wavelength (nm. 3

Introduction (cont.) • Question: But you don’t really observe continuous functions, do you? • Answer: Look closely at the data ... 4

Intensity 0.199 0.200 0.201 0.202 0.203 0.204 455 460 Emission Wavelength (nm.) 465 5 470

Introduction (cont.) • OK, so it is really a bunch of dots connected by line segments. • That is, we really have the data Y i ( t ) for t on a grid: t ∈ { 395 , 396 , . . . , 660 } . • But people doing functional data analysis like to pretend they are observing whole functions. • Is it just a way of sounding erudite? “Functional Data Analysis, not for the heathen and unclean.” • Some books on the subject: Functional Data Analysis and Applied Functional Data Analysis by Ramsay and Silverman; Nonparametric Functional Data Analysis: Theory and Practice by Ferraty and Vieu. 6

Functional Data (cont.): • Working with functional data requires some idealization • E.g. the data are actually multivariate; they are stored as either of (G) ( Y i ( t 1 ) , . . . , Y i ( t m )), vectors of values on a grid (C) ( η i 1 , . . . , η im ) where Y i ( t ) = � m j =1 η ij B j ( t ) is a basis function expansion (e.g., B-splines). • Note that the order of approximation m is rather arbitrary. • Treating functional data as simply multivariate doesn’t make use of the additional “structure” implied by being a smooth function. 7

Functional Data (cont.): • Methods for Functional Data Analysis (FDA) should satisfy the Grid Refinement Invariance Principle (GRIP) : • As the order of approximation becomes more exact (i.e., m → ∞ ), the method should approach the appropriate limiting analogue for true functional (infinite dimensional) observations. • Thus the statistical procedure will not be strongly dependent on the finite dimensional approximation. • Two general ways to mind the GRIP: (i) Direct: Devise a method for true functional data, then find a finite dimensional approximation (“projection”). (ii) Indirect: Devise a method for the finite dimensional data, then see if it has a limit as m → ∞ . 8

See Lee & Cox, “Pointwise Testing with Functional Data Using the Westfall-Young Randomization Method,” Biometrika (2008) for a frequentist nonparametric approach to some testing problems with functional data. 9

Bayesian Functional Data Analysis: • Why Bayesian? • After all, Bayesian methods have a high “information reqirement,” i.e. a likelihood and a prior. • In principle, statistical inference problems are not conceptually as difficult for Bayesians. • Of course, there is the problem of computing the posterior, even approximately (will MCMC be the downfall of statistics?). • And, priors have consequences. • So there are lots of opportunities for investigation into these consequences. 10

• A Bayesian problem: develop priors for Bayesian functional data analysis. • Again assume the data are realizations of a Gaussian process, say we observe Y i ( t ), t ∈ [ a, b ] where • Y 1 , Y 2 , . . . Y n are i.i.d. N ( µ, V ): µ ( t ) = E [ Y ( t )] , V ( t, s ) = Cov[ Y ( t ) , Y ( s )] . Y ( m ) • Denote the discretized data by � = � Y i = i ( Y i ( t 1 ) , . . . , Y i ( t m )) and the corresponding mean vectors and µ and � V where � covariance matrix � V ij = V ( t i , t j ). • Prior distribution for µ : µ | V, k ∼ N (0 , kV ). • But V ∼ ????? • What priors can we construct for covariance functions? 11

Requisite properties of covariance functions: • Symmetry: V ( s, t ) = V ( t, s ). • Positive definiteness: for any choice of k and distinct s 1 , . . . , s k in the domain, the matrix given by � V ij = V ( s i , s j ) is positive definite. • It is difficult to achieve this latter requirement. 12

Requirements on Covariance Priors: • Our first requirement in constructing a prior for covariance functions is that we mind the GRIP • One may wish to use the conjugate inverse Wishart prior: V − 1 ∼ Wishart ( d m , W m ) for some m × m matrix W m . � • ... where, e.g., W m is obtained by discretizing a standard covariance function. • Under what conditions (if any) on m and d m will this converge to a probability measure on the space of covariance operators? • This would be an indirect approach to satisfying the GRIP. More on this later. 13

Requirements on Covariance Priors (cont.): • An easier way to satisfy the GRIP requirement is to construct a prior on the space of covariance functions and then project it down to the finite dimensional approximation. • For example, using grid values, � V ij = V ( t i , t j ). • i.e., the direct approach. • We (joint work with Hong Xiao Zhu of MDACC) did come up with something that works, sort of. 14

A proposed approach that does work (sort of): • Suppose Z 1 , Z 2 , . . . are i.i.d. realizations of a Gaussian random process (mean 0, covariance function B ( s, t )). • Consider � V ( s, t ) = w i Z i ( s ) Z i ( t ) i where w 1 , w 2 , . . . are nonnegative constants satisfying � w i < ∞ . i • One can show that this gives a random covariance function, and that its distribution “fills out” the space of covariance functions. • Can we compute with it? 15

A proposed approach that sort of works (cont.): • Thus, if we can compute with this proposed prior, we will have satisfied the three requirements: a valid prior on covariance functions that “fills out the space” of covariance functions, and is useful in practice. • Assuming we use values on a grid for the finite dimensional representation, let � Z i = ( Z ( t 1 ) , . . . , Z ( t m )). Then � w i � Z i � � Z T V = i i • How to compute with this? One idea is to write out the characteristic function and use Fourier inversion. That works well for weighted sum of χ 2 distributions (fortran code available from Statlib) 16

A proposed approach that sort of works (cont.): • Another approach: use the � Z i directly. We will further approximate � V by truncating the series: j V ( m,j ) = � w i � Z i � � Z T i i =1 • We devised a Metropolis-Hastings algorithm to sample the Z i . • Can use rank-1 QR updating to do fairly efficient computing (update each � Z i one at a time). 17

A proposed approach that sort of works (cont.): • There are a couple of minor modifications: 1. We include an additional scale parameter k in V ( s, t ) = k � i w i Z i ( s ) Z i ( t ) where k has an independent inverse Γ prior. 2. We integrate out µ and k . and use the marginal unnormalized posterior f ( Z | � Y 1 , . . . � Y n ) in a Metropolis-Hastings MCMC algorithm. • The algorithm has been implemented in Matlab. 18

Some results with simulated data: • Generated data from Brownian motion (easy to do!) • n = 50 and various values of m and j 19

Brownian Motion N=10, m=1000 2.5 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 0 200 400 600 800 1000 20

First, the True Covariance function for Brownian Motion. 21

True Covariance Function 1 0.8 0.6 Sigma 0.4 0.2 0 1 1 0.8 0.5 0.6 0.4 0.2 0 0 Tm Tm 22

The covariance function used to generate the Z i is the Ornstein-Uhlenbeck correlation: B ( s, t ) = exp[ − α | s − t | ] with α = 1. This process goes by a number of other names (the Gauss-Markov process, Continuous-Time Autoregression of order 1, etc.) 23

0.95 0.9 0.85 0.8 0.75 0.7 0.65 1 0.8 1 0.6 0.8 0.4 0.6 0.4 0.2 0.2 0 0 24

The Bayesian posterior mean estimate with m = 10, j = 20. 25

Bayes Estimated Covariance Function 0.8 0.6 Sigma e st 0.4 0.2 0 1 1 0.8 0.5 0.6 0.4 0.2 0 0 Tm Tm 26

The sample covariance estimate with m = 10. 27

Sample Estimated of Covariance Function 0.8 0.6 Sigma s ample 0.4 0.2 0 1 1 0.8 0.5 0.6 0.4 0.2 0 0 Tm Tm 28

Now the Bayes posterior mean estimate with m = 30, j = 60. 29

Bayes Estimated Covariance Function 0.5 0.4 Sigma e st 0.3 0.2 0.1 0 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 Tm Tm 30

The sample covariance estimate with m = 30. 31

Sample Estimated of Covariance Function 0.7 0.6 0.5 Sigma s ample 0.4 0.3 0.2 0.1 0 1 1 0.5 0.5 0 0 Tm Tm 32

Fundamental Issues in Bayesian Functional Data Analysis Dennis D. - PowerPoint PPT Presentation

Fundamental Issues in Bayesian Functional Data Analysis Dennis D. Cox Rice University 1 Introduction Question: What are functional data? Answer: Data that are functions of a continuous variable. ... say we observe Y i ( t ), t [

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Introduction to Bayesian Inference Frank Wood April 6, 2010 Introduction Overview of Topics

Functional Programming in 40 minutes @russolsen Functional Programming in 40 minutes

+ f(x) = Python Functional Programming Python Functional Programming Functional Programming by

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

The Role of Atmospheric Circulation in the Seasonal Melt of Snow and Sea Ice in the Pacific

EACA 2014 XIV Encuentro de lgebra Computacional y Aplicaciones Barcelona June 1820 2014

The Cross-section of Managerial Ability and Risk Preferences Ralph S.J. Koijen Chicago GSB

Manchester UK Professor Jorge Ribeiro Patrick Ribeiro 1 SAS/ETS Econometrics Time Series

Snapshots from the History of Toric Geometry David A. Cox Geometry 19701988 Toric Geometry

NC Perspective on NRC Transformation Recommendations W. Lee Cox, III Chief, Radiation

MOL2NET The Anti-Inflammatory Effects of the Zingiber officinale Roscoe Lucas Miguel Lima do Amaral

Disclosures Updates in No Conflicts of Interest Acute Coronary Syndromes Krishan Soni, MD, MBA,