Some Areas of Recent Research Michael Stein University of Chicago - - PowerPoint PPT Presentation

some areas of recent research
SMART_READER_LITE
LIVE PREVIEW

Some Areas of Recent Research Michael Stein University of Chicago - - PowerPoint PPT Presentation

Some Areas of Recent Research Michael Stein University of Chicago Department Retreat, October 2012 Michael Stein Some Areas of Recent Research Funders & Collaborators NSF (STATMOS), US Department of Energy Faculty: Mihai Anitescu,


slide-1
SLIDE 1

Some Areas of Recent Research

Michael Stein

University of Chicago

Department Retreat, October 2012

Michael Stein Some Areas of Recent Research

slide-2
SLIDE 2

Funders & Collaborators

◮ NSF (STATMOS), US Department of Energy ◮ Faculty: Mihai Anitescu, Liz Moyer ◮ Postdocs: Jie Chen, Bill Leeds, Ying Sun ◮ Grad students: Stefano Castruccio, Michael Horrell, Andy Poppick ◮ Undergrads: Peter Hansen, Grant Wilder

Michael Stein Some Areas of Recent Research

slide-3
SLIDE 3

Preconditioning and fitting Gaussian process models

Gaussian process Z determined by its mean and covariance functions:

◮ EZ(x) = ✖(x) ◮ ❝♦✈❢Z(x)❀ Z(y)❣ = K(x❀ y)

Assume mean is 0 and covariance structure known up to parameter θ. Let Kθ be covariance matrix for observations Z(x1)❀ ✿ ✿ ✿ ❀ Z(xn) given θ. Then Then the loglik is (ignoring an additive constant) ❵(θ) = 1 2 log ❥K(θ)❥ 1 2Z✵K(θ)1Z✿ Problem: How to compute ❵(θ)? Particularly log ❥K(θ)❥?

Michael Stein Some Areas of Recent Research

slide-4
SLIDE 4

Important aside: Even when loglik can be computed exactly, maximizing it (or sampling from a posterior) may not be easy. Consider 400 evenly spaced observations on R and Z is fractional Brownian motion with variogram 1 2E❢Z(x) Z(y)❣2 = Γ “ ☛ 2 ” ❵❥x y❥☛ with ❵ = 10 and ☛ = 1✿5.

◮ Neither parameter is estimated well although there is strong evidence

parameters lie along a curve in (❵❀ ☛) space.

◮ Problem is worse if leave out Γ

` ☛

2

´ .

◮ I am unaware of any transformation independent of observation locations

that would give concave loglikelihood. This kind of function makes some people in the optimization community unhappy.

◮ Things only get worse with more complex models.

Michael Stein Some Areas of Recent Research

slide-5
SLIDE 5

10 20 30 40 50 1 1.2 1.4 1.6 1.8 2 −60 −40 −20 20 40 60

α ℓ log−likelihood

Michael Stein Some Areas of Recent Research

slide-6
SLIDE 6

Computing exact MLE

Exact computations of likelihood function for n irregularly sited observations generally requires O(n3) computation and O(n2) memory to compute Cholesky decomposition of covariance matrix.

◮ Computation is becoming cheap much faster than memory. ◮ Increasing emphasis on “matrix-free” methods in which never have to

store an n ✂ n matrix, even if requires more computation.

Michael Stein Some Areas of Recent Research

slide-7
SLIDE 7

Iterative solution of linear equations

Computing quadratic form in likelihood best done by solving systems like Kx = y, not by finding K 1. Iterative methods: for K positive definite, equivalent to minimizing

1 2x✵Kx x✵y, which can solve by, for example, conjugate gradient.

Main computation requires multiplying vectors by K. This is fast for

◮ sparse K ◮ some structured (e.g., Toeplitz) matrices

But even for dense unstructured matrices, iterative solution is matrix-free and may require many fewer flops than Cholesky decomposition: O(n2 ✂ # iterations) v✿ O(n3) Number of iterations for accurate solution related to condition number (ratio of largest to smallest eigenvalue) ✔(K) of K.

Michael Stein Some Areas of Recent Research

slide-8
SLIDE 8

When nearby observations strongly correlated, ✔(K) can be very large, so need to precondition:

◮ Find a matrix P such that P✵K(θ)P is well-conditioned for θ in vicinity of

MLE and multiplying a vector by P is fast.

◮ Let Y = P✵Z. Then the loglik (with Z as data, but written in terms of Y)

equals ❵(θ) = 1 2 log ❥P✵K(θ)P❥ + log ❥P❥ 1 2Y✵❢P✵K(θ)P❣1Y✿

◮ Okay for P to depend on θ as long as use this formula. ◮ Can ignore log ❥P❥ if it doesn’t depend on θ (even if P does). ◮ What to do about log ❥P✵K(θ)P❥?

Michael Stein Some Areas of Recent Research

slide-9
SLIDE 9

Solve score equations instead? (Ignore preconditioning for convenience.) Writing Ki(θ) for

❅ ❅✒i K(θ), score equations are (assume mean is 0)

Z✵K(θ)1Ki(θ)K(θ)1Z = tr n K(θ)1Ki(θ)

  • for i = 1❀ ✿ ✿ ✿ ❀ p.

First term requires only one solve. Instead of log determinant, need, for each component of θ, tr n K(θ)1Ki(θ)

which requires n solves for exact calculation. Approximate by the unbiased estimate (Hutchinson, 1990) 1 N

N

X

j=1

U✵

jK(θ)1Ki(θ)Uj❀

where Uj = (Uj1❀ ✿ ✿ ✿ ❀ Ujn)✵ is random vector with Ujk’s iid and Pr(Ujk = 1) = Pr(Ujk = 1) = 1

2. ◮ Yields unbiased estimating equations.

Michael Stein Some Areas of Recent Research

slide-10
SLIDE 10

Can bound statistical inefficiency of procedure in terms of ✔(K). Thus, if can find a decent preconditioner for K, moderate N works well.

◮ Don’t need N comparable to n!

Preconditioning helps in two ways:

◮ Reduces number of iterations needed in iterative solver. ◮ Reduces need for large N.

Scope for further improvement by choosing Uj’s not independent.

◮ Design of experiments!

Stein, Chen and Anitescu (under revision).

Michael Stein Some Areas of Recent Research

slide-11
SLIDE 11

Some other interests

◮ When low rank approximations to covariance matrices don’t work.

◮ Won’t discuss this here, but work likely to annoy some who have been

advocating this approach for massive spatial datasets.

◮ Modeling and computation for massive (as opposed to large) space-time

datasets.

◮ Without assuming covariance (or inverse covariance) matrices are low rank

  • r sparse.

◮ Climate model emulation.

Michael Stein Some Areas of Recent Research

slide-12
SLIDE 12

One-pass methods

Look at data block by block and summarize the information about K(θ) from that block so that don’t have to go back to raw data again (Anitescu, Horrell). Simple example:

◮ Divide data into B blocks. ◮ Within each block, approximate the loglik (or score) function.

◮ Mle of θ and observed information matrix an adequate approximation? ◮ If not, store more complete representation of loglik function. Adding loglik

across blocks reduces storage with little loss of information?

◮ Save a few observations (or other summaries) from each block. ◮ Add within block approximate logliks to loglik of sparse observations.

For truly massive (petascale, exascale) data, will need more than two “layers.”

Michael Stein Some Areas of Recent Research

slide-13
SLIDE 13

Michael Stein Some Areas of Recent Research

slide-14
SLIDE 14

Michael Stein Some Areas of Recent Research

slide-15
SLIDE 15

Michael Stein Some Areas of Recent Research

slide-16
SLIDE 16

+

Michael Stein Some Areas of Recent Research

slide-17
SLIDE 17

+

Michael Stein Some Areas of Recent Research

slide-18
SLIDE 18

+ +

Michael Stein Some Areas of Recent Research

slide-19
SLIDE 19

+ + + + + + + + + + + + + + +

Michael Stein Some Areas of Recent Research

slide-20
SLIDE 20

Climate model emulation

Reproducing some of the output of a GCM under some forcing scenario without actually running it (Castruccio, Leeds, Moyer, Wilder).

◮ Or, better yet, producing accurate simulations of actual climate under

some forcing scenario. GCM runs we have:

◮ NCAR Community Climate System Model version 3 (CCSM3), T31

resolution (approx 3✿75✍ ✂ 3✿75✍ grid cells)

◮ Input is CO2, output is temperature T(t) and precipitation P(t), t is year ◮ 18 forcing scenarios, 53 realizations, ❃ 10❀000 model years

Michael Stein Some Areas of Recent Research

slide-21
SLIDE 21

Statistical emulation of mean

Separate time series model for each of 47 regions: T(t) = ☞0 + ☞1 1 2 ❢log[CO2](t) + log[CO2](t 1)❣ +☞2

X

i=2

wi2 log[CO2](t i) + ✧(t) where

◮ ✧(t) is an autoregressive model of order 1 ◮ wi = (1 ✚)✚i.

Fit with small number of scenarios and a few realizations per scenario. Compare to standard computer model emulation approach in which view (CO2(1)❀ ✿ ✿ ✿ ❀ CO2(n)) as input and (T(1)❀ ✿ ✿ ✿ ❀ T(n)) as output.

Michael Stein Some Areas of Recent Research

slide-22
SLIDE 22

Total column ozone

OMI (Ozone Monitoring Instrument, successor to TOMS) is aboard the satellite EOS Aura:

◮ Polar-orbiting. ◮ Sun-synchronous, so satellite always at local noon. ◮ Each orbit about 100 minutes, or 14.1 orbits a day. ◮ From raw data (photon counts in multiple frequency bands), levels of

many trace constituents of atmosphere are deduced, including ozone.

◮ Over 80,000 observations per orbit, so over 106 a day. ◮ Near global coverage (no data during polar nights, some missing data).

How might statistical models be used to produce better Level-3 (gridded) product than what NASA currently does?

Michael Stein Some Areas of Recent Research

slide-23
SLIDE 23

Observation locations from 2 orbits

longitude latitude

−90 −60 −30 30 60 90 Date line −90 90 Michael Stein Some Areas of Recent Research

slide-24
SLIDE 24

Scope for fruitful interaction between statistics and numerical analysis. Information flow in both directions.

◮ Statistical problems produce new challenges in applied/computational

math.

◮ Statistical/probabilistic thinking can yield new algorithms and theory for

numerical analysis.

Michael Stein Some Areas of Recent Research

slide-25
SLIDE 25

STATMOS

Statistics in the Atmospheric and Oceanic Sciences, NSF-supported network.

◮ For anyone interested in this area, I have money for travel to rest of

network (NC State, U of Washington, NCAR, etc.).

◮ For any graduate student interested in this area, I can also pay your salary

while you are visiting another member of the network.

◮ If someone has postdoc money, I may be able to split cost of postdoc for

research related to network goals.

Michael Stein Some Areas of Recent Research