A Two-Level Toeplitz Model for Large-Scale Simultaneous Hypothesis - - PowerPoint PPT Presentation

a two level toeplitz model for large scale simultaneous
SMART_READER_LITE
LIVE PREVIEW

A Two-Level Toeplitz Model for Large-Scale Simultaneous Hypothesis - - PowerPoint PPT Presentation

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion A Two-Level Toeplitz Model for Large-Scale Simultaneous Hypothesis Testing Dan Cervone Advisor: Carl Morris December


slide-1
SLIDE 1

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

A Two-Level Toeplitz Model for Large-Scale Simultaneous Hypothesis Testing

Dan Cervone

Advisor: Carl Morris

December 10, 2012

slide-2
SLIDE 2

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

1

Empirical Bayes Methods for Simultaneous Hypothesis Testing

2

Estimating Level II Covariance Matrix

3

Data Results

4

Conclusion

slide-3
SLIDE 3

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Efron’s fdr[3]

Suppose we have M test statistics (assumed to be z scores): zi|µi

ind

∼ N(µi, 1) for i = 1, ..., M. µi

iid

∼ p0δ0 + (1 − p0)g(µi) zi

iid

∼ f (zi) (marginally) Define

fdr(z) = P(µi = 0|zi = z) = p0φ(zi)

f (zi)

ˆ fdr(z) = p0φ(zi)

ˆ f (zi)

where ˆ f is an estimate of the density function f . Declare the ith test statistic nonnull if: ˆ fdr(zi) ≤ q. Bayesian posterior probability interpretation relies on independence!

slide-4
SLIDE 4

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Graphical summary of HIV data z scores

  • −6

−4 −2 2 4 2000 4000 6000 8000

(gene) index z score

HIV data z scores

z scores obtained by transforming test statistics for 2-sample t-tests (4 HIV patients, 4 non-HIV patients)[5].

slide-5
SLIDE 5

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Autocorrelation of HIV z scores

0.00 0.25 0.50 0.75 1.00 5 10 15 20

Lag Autocorrelation

0.00 0.25 0.50 0.75 1.00 50 100 150 200

Lag Autocorrelation

slide-6
SLIDE 6

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Alternative two-level model

We assume the following two-level model for z scores: zi|µi

ind

∼ N(µi, V = 1) for i = 1, ..., M. µ ∼ NM(0, Σ) where Σ is assumed to be of symmetric Toeplitz form:            σ0 σ1 σ2 · · · · · · σM−1 σ1 σ0 σ1 ... . . . σ2 σ1 σ0 ... ... . . . . . . ... ... ... ... σ2 . . . ... ... ... σ1 σM−1 · · · · · · σ2 σ1 σ0           

slide-7
SLIDE 7

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Inference

Inferential model: µ|z, Σ ∼ NM(Bz/V , B) B = (Σ−1 + V −1IM)−1 z ∼ NM(0, Σ + V IM) Empirical Bayes approach: can estimate Σ from marginal likelihood and plug into the posterior.

slide-8
SLIDE 8

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Inference

Inferential model: µ|z, Σ ∼ NM(Bz/V , B) B = (Σ−1 + V −1IM)−1 z ∼ NM(0, Σ + V IM) Empirical Bayes approach: can estimate Σ from marginal likelihood and plug into the posterior. Compared to existing approaches handling dependent test statistics, ours has the following benefits: Decision rule is not monotonic in the size of test statistic. Generic covariance structure, but comes at the assumption of normality.

slide-9
SLIDE 9

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

1

Empirical Bayes Methods for Simultaneous Hypothesis Testing

2

Estimating Level II Covariance Matrix

3

Data Results

4

Conclusion

slide-10
SLIDE 10

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Data augmentation

Consider the following data augmentation[2]: Let yT = (zT zT

mis), where zmis is a (M − 1) × 1 vector of

missing observations. Assume y ∼ NL(0, ΣC + V IL) with ΣC (symmetric) circulant and L = 2M − 1. Example: Assume M = 4, so L = 7. ΣC has the the form:           σ0 σ1 σ2 σ3 σ3 σ2 σ1 σ1 σ0 σ1 σ2 σ3 σ3 σ2 σ2 σ1 σ0 σ1 σ2 σ3 σ3 σ3 σ2 σ1 σ0 σ1 σ2 σ3 σ3 σ3 σ2 σ1 σ0 σ1 σ2 σ2 σ3 σ3 σ2 σ1 σ0 σ1 σ1 σ2 σ3 σ3 σ2 σ1 σ0          

slide-11
SLIDE 11

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Data augmentation

Consider the following data augmentation[2]: Let yT = (zT zT

mis), where zmis is a (M − 1) × 1 vector of

missing observations. Assume y ∼ NL(0, ΣC + V IL) with ΣC (symmetric) circulant and L = 2M − 1. Example: Assume M = 4, so L = 7. ΣC has the the form:           σ0 σ1 σ2 σ3 σ3 σ2 σ1 σ1 σ0 σ1 σ2 σ3 σ3 σ2 σ2 σ1 σ0 σ1 σ2 σ3 σ3 σ3 σ2 σ1 σ0 σ1 σ2 σ3 σ3 σ3 σ2 σ1 σ0 σ1 σ2 σ2 σ3 σ3 σ2 σ1 σ0 σ1 σ1 σ2 σ3 σ3 σ2 σ1 σ0           Upper left M × M block is (unconstrained) symmetric Toeplitz.

slide-12
SLIDE 12

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

EM algorithm

Using the augmented (complete) data y, we use the EM algorithm to estimate ˆ ΣC. E-step: Q(ΣC|z, ˆ Σ

(k) C ) = − log(|ΣC|) − Tr(Σ−1 C S(k))

S(k) derived from z, ˆ Σ

(k) C

using MVN properties.

M-step: ˆ Σ

(k+1) C

= argmaxQ(ΣC|z, ˆ Σ

(k) C )

Has closed-form solution, since all ΣC have constant, known eigenvectors (entries consist of powers of complex roots of unity).

Some (minor) technical considerations needed to ensure convergence of ˆ Σ

(k) to local maximum.[4]

slide-13
SLIDE 13

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Large sample properties

The MLE ˆ Σ is not consistent in the usual sense as the number of parameters is the same as the number of observations (M). However, Information for all unknown parameters increases with each new observation. Geometric constraints of Toeplitz form, positive definiteness. Simulation results show good approximation of EB posterior to oracle posterior. If the autocovariances form a convergent sum, their estimates have variance O(1/L). Related results from the literature involving conditions of sparsity, or smoothness of spectral density.[1][6]

slide-14
SLIDE 14

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

1

Empirical Bayes Methods for Simultaneous Hypothesis Testing

2

Estimating Level II Covariance Matrix

3

Data Results

4

Conclusion

slide-15
SLIDE 15

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

HIV data

  • −6

−4 −2 2 4 2000 4000 6000 8000

(gene) index z score

HIV data z scores

slide-16
SLIDE 16

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

HIV data

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

fdr vs. posterior probability in (−2,2)

Posterior probability in (−2,2) fdr value

9 cases identified as non-null by both; 99.4% cases identified as null by both

slide-17
SLIDE 17

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

HIV data

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

fdr vs. posterior probability in (−2,2)

Posterior probability in (−2,2) fdr value

  • 9 cases identified as non-null by both; 99.4% cases identified as

null by both

slide-18
SLIDE 18

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

HIV data

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

fdr vs. posterior probability in (−2,2)

Posterior probability in (−2,2) fdr value

  • 9 cases identified as non-null by both; 99.4% cases identified as

null by both

slide-19
SLIDE 19

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

HIV data

40 50 60 70 80 −3 −2 −1 1 2 3

Discrepencies: fdr vs. EB posterior

index z score

  • bserved z score

posterior mean +/− 2SD

fdr = 1

slide-20
SLIDE 20

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

HIV data

2600 2610 2620 2630 2640 −3 −2 −1 1 2

Discrepencies: fdr vs. EB posterior

index z score

  • bserved z score

posterior mean +/− 2SD

fdr = 0.269

slide-21
SLIDE 21

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

1

Empirical Bayes Methods for Simultaneous Hypothesis Testing

2

Estimating Level II Covariance Matrix

3

Data Results

4

Conclusion

slide-22
SLIDE 22

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

Areas of further work

Theoretical conditions for consistency of Toeplitz MLE. Full Bayes: prior on matrix parameters. What can this model tell us if the Toeplitz covariance is correctly specified but without normality? Scalability: O(L3) implementation can be dramatically improved theoretically.

slide-23
SLIDE 23

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

  • R. Dahlhaus.

Efficient parameter estimation for self-similar processes. The Annals of Statistics, pages 1749–1766, 1989.

  • A. Dembo, C.L. Mallows, and L.A. Shepp.

Embedding nonnegative definite toeplitz matrices in nonnegative definite circulant matrices, with application to covariance estimation. Information Theory, IEEE Transactions on, 35(6):1206–1212, 1989.

  • B. Efron, R. Tibshirani, J.D. Storey, and V. Tusher.

Empirical bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96(456):1151–1160, 2001.

slide-24
SLIDE 24

Empirical Bayes Methods for Simultaneous Hypothesis Testing Estimating Level II Covariance Matrix Data Results Conclusion

D.R. Fuhrmann and M.I. Miller. On the existence of positive-definite maximum-likelihood estimates of structured covariance matrices. Information Theory, IEEE Transactions on, 34(4):722–729, 1988. A.B. Van’t Wout, G.K. Lehrman, S.A. Mikheeva, G.C. O’Keeffe, M.G. Katze, R.E. Bumgarner, G.K. Geiss, and J.I. Mullins. Cellular gene expression upon human immunodeficiency virus type 1 infection of cd4+-t-cell lines. Journal of Virology, 77(2):1392–1402, 2003.

  • H. Xiao and W.B. Wu.

Covariance matrix estimation for stationary time series. The Annals of Statistics, 40(1):466–493, 2012.