Fundamental Issues in Bayesian Functional Data Analysis Dennis D. Cox Rice University
1
Fundamental Issues in Bayesian Functional Data Analysis Dennis D. - - PowerPoint PPT Presentation
Fundamental Issues in Bayesian Functional Data Analysis Dennis D. Cox Rice University 1 Introduction Question: What are functional data? Answer: Data that are functions of a continuous variable. ... say we observe Y i ( t ), t [
Fundamental Issues in Bayesian Functional Data Analysis Dennis D. Cox Rice University
1
Introduction
µ(t) = E[Y (t)], V (t, s) = Cov[Y (t), Y (s)].
2
400 550 0.02 0.06 Emission Wavelength (nm. Intensity 400 550 0.02 0.10 Emission Wavelength (nm. Intensity 400 550 0.02 0.08 Emission Wavelength (nm. Intensity 400 550 0.01 0.03 0.05 Emission Wavelength (nm. Intensity 400 550 0.00 0.06 0.12 Emission Wavelength (nm. Intensity 400 550 0.1 0.3 0.5 Emission Wavelength (nm. Intensity 400 550 0.0 0.2 0.4 Emission Wavelength (nm. Intensity 400 550 0.02 0.08 Emission Wavelength (nm. Intensity 400 550 0.05 0.15 Emission Wavelength (nm. Intensity 400 550 0.05 0.15 Emission Wavelength (nm. Intensity 400 550 0.1 0.3 Emission Wavelength (nm. Intensity 400 550 0.02 0.08 Emission Wavelength (nm. Intensity
3
Introduction (cont.)
do you?
4
455 460 465 470 0.199 0.200 0.201 0.202 0.203 0.204 Emission Wavelength (nm.) Intensity
5
Introduction (cont.)
t ∈ {395, 396, . . . , 660}.
are observing whole functions.
Analysis, not for the heathen and unclean.”
Applied Functional Data Analysis by Ramsay and Silverman; Nonparametric Functional Data Analysis: Theory and Practice by Ferraty and Vieu.
6
Functional Data (cont.):
either of (G) (Yi(t1), . . . , Yi(tm)), vectors of values on a grid (C) (ηi1, . . . , ηim) where Yi(t) = m
j=1 ηijBj(t) is a basis
function expansion (e.g., B-splines).
use of the additional “structure” implied by being a smooth function.
7
Functional Data (cont.):
the Grid Refinement Invariance Principle (GRIP):
m → ∞), the method should approach the appropriate limiting analogue for true functional (infinite dimensional) observations.
(i) Direct: Devise a method for true functional data, then find a finite dimensional approximation (“projection”). (ii) Indirect: Devise a method for the finite dimensional data, then see if it has a limit as m → ∞.
8
See Lee & Cox, “Pointwise Testing with Functional Data Using the Westfall-Young Randomization Method,” Biometrika (2008) for a frequentist nonparametric approach to some testing problems with functional data.
9
Bayesian Functional Data Analysis:
reqirement,” i.e. a likelihood and a prior.
as difficult for Bayesians.
even approximately (will MCMC be the downfall of statistics?).
consequences.
10
data analysis.
say we observe Yi(t), t ∈ [a, b] where
µ(t) = E[Y (t)], V (t, s) = Cov[Y (t), Y (s)].
Y (m)
i
= Yi = (Yi(t1), . . . , Yi(tm)) and the corresponding mean vectors and covariance matrix µ and V where Vij = V (ti, tj).
11
Requisite properties of covariance functions:
in the domain, the matrix given by Vij = V (si, sj) is positive definite.
12
Requirements on Covariance Priors:
functions is that we mind the GRIP
covariance function.
to a probability measure on the space of covariance operators?
13
Requirements on Covariance Priors (cont.):
a prior on the space of covariance functions and then project it down to the finite dimensional approximation.
Vij = V (ti, tj).
with something that works, sort of.
14
A proposed approach that does work (sort of):
process (mean 0, covariance function B(s, t)).
V (s, t) =
wiZi(s)Zi(t) where w1, w2, . . . are nonnegative constants satisfying
wi < ∞.
and that its distribution “fills out” the space of covariance functions.
15
A proposed approach that sort of works (cont.):
satisfied the three requirements: a valid prior on covariance functions that “fills out the space” of covariance functions, and is useful in practice.
representation, let Zi = (Z(t1), . . . , Z(tm)). Then
wi Zi ZT
i
characteristic function and use Fourier inversion. That works well for weighted sum of χ2 distributions (fortran code available from Statlib)
16
A proposed approach that sort of works (cont.):
Zi directly. We will further approximate V by truncating the series:
j
wi Zi ZT
i
(update each Zi one at a time).
17
A proposed approach that sort of works (cont.):
V (s, t) = k
i wiZi(s)Zi(t) where k has an independent
inverse Γ prior.
unnormalized posterior f(Z| Y1, . . . Yn) in a Metropolis-Hastings MCMC algorithm.
18
Some results with simulated data:
19
200 400 600 800 1000 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 Brownian Motion N=10, m=1000
20
First, the True Covariance function for Brownian Motion.
21
0.2 0.4 0.6 0.8 1 0.5 1 0.2 0.4 0.6 0.8 1 Tm True Covariance Function Tm Sigma
22
The covariance function used to generate the Zi is the Ornstein-Uhlenbeck correlation: B(s, t) = exp[−α|s − t|] with α = 1. This process goes by a number of other names (the Gauss-Markov process, Continuous-Time Autoregression of order 1, etc.)
23
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.65 0.7 0.75 0.8 0.85 0.9 0.95
24
The Bayesian posterior mean estimate with m = 10, j = 20.
25
0.2 0.4 0.6 0.8 1 0.5 1 0.2 0.4 0.6 0.8 Tm Bayes Estimated Covariance Function Tm Sigmaest
26
The sample covariance estimate with m = 10.
27
0.2 0.4 0.6 0.8 1 0.5 1 0.2 0.4 0.6 0.8 Tm Sample Estimated of Covariance Function Tm Sigmasample
28
Now the Bayes posterior mean estimate with m = 30, j = 60.
29
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 Tm Bayes Estimated Covariance Function Tm Sigmaest
30
The sample covariance estimate with m = 30.
31
0.5 1 0.5 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Tm Sample Estimated of Covariance Function Tm Sigmasample
32
Some results with simulated data:
m j MSE Bayes MSE Sample 10 20 0.017 0.026 30 60 0.065 0.054
33
Problems with the proposed approach that sort of works (cont.):
Zj, 1 ≤ j ≤ J, where J ≫ m.
mix well - seems to converge to different values depending on the start.
“mode” is a complicated manifold in a very high dimensional space.
34
Another approach (work in progress):
like the inverse Wishart in finite dimensions.
is not bounded.
values in L2[0, 1]. Then v(s, t) = Cov(Y (t), Y (s)) = min{s, t}.
V f(s) = 1 v(s, t)f(t)dt.
g(s) = 1 v(s, t)f(t)dt
35
Inverse Wishart (cont.):
g(s) = 1 min(s, t)f(t)dt = s tf(t)dt + s 1
s
f(t)dt.
g′(s) = sf(s) − sf(s) + 1
s
f(t)dt = 1
s
f(t)dt
Differentiating again g′′(s) = −f(s).
36
Inverse Wishart (cont.):
absolutely continuous and satisfies the two boundary
spectral representation: V =
λiφi ⊗ φi.
i λix, φiφi
V −1x =
λ−1
i x, φiφi
i λ−2 i x, φi2 < ∞, which is a pretty
strict condition on x since
i λi < ∞. 37
Inverse Wishart (cont.):
make it work, is there some way to do so?
will be a good finite dimensional approximant, let’s try another approach.
InverseWishart(dm, Bm) converges (in some sense).
the ch.f. is unknown.
38
Inverse Wishart (cont.):
interpolation operators:
= (f(t1), . . . , f(tm)) I fm = linear interpolant of fm.
fm is an
and I goes the other way.
variables: Bm is an m × m matrix with (i, j) entry equal to B(ti, tj).
39
Inverse Wishart (cont.):
Vm ∼ InverseWishart(dm, sm Bm) and fm is obtained by “sampling” a continuous function f.
E[I Vm fm] dm − m → Bf/(a − 1), where Bf(s) =
40
Inverse Wishart (cont.):
Bm) and fm and gm are
E[I Vm fm gT
m
Vm] (dm − m)2 → Bf ⊗ Bg/(a − 1)2.
converge if we have dm/m converging (e.g., take dm = 2m).
41
The Bayesian posterior mean estimate under inverse-Wishart prior with m = 50, dm = 100 obtained by Monte-Carlo.
42
0.2 0.4 0.6 0.8 1 0.5 1 0.2 0.4 0.6 0.8 Posterior mean of the covariance using Inverse Wishart prior
43
Further research:
approximate the prior using mixtures of inverse Wisharts.
convergence in the space of S-operators but using a basis function expansion rather than grid evaluations.
44
45