Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem - - PowerPoint PPT Presentation

covariance matrix estimation for the cryo em
SMART_READER_LITE
LIVE PREVIEW

Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem - - PowerPoint PPT Presentation

Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics July 25, 2014 Joint work with Gene Katsevich (Princeton) and


slide-1
SLIDE 1

Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem

Amit Singer

Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

July 25, 2014

Joint work with Gene Katsevich (Princeton) and Alexander Katsevich (UCF)

Amit Singer (Princeton University) July 2014 1 / 41

slide-2
SLIDE 2

Single Particle Cryo-Electron Microscopy

Drawing of the imaging process:

Amit Singer (Princeton University) July 2014 2 / 41

slide-3
SLIDE 3

The X-ray Transform

(Psφ)(x, y) =

  • R

φ(RT

s r)dz,

where r = (x, y, z)T , Rs ∈ SO(3), and φ ∈ L2(B) is the electric potential

  • f the molecule.

Projection Is Molecule φ Electron source Rs =   | | | R1

s

R2

s

R3

s

| | |   ∈ SO(3)

Amit Singer (Princeton University) July 2014 3 / 41

slide-4
SLIDE 4

The cryo-EM problem

If S : L2(D) → Rq is discretization onto N × N grid (q ≈ π

4 N2) then

the microscope produces Is = SPsφ + ǫs, s = 1, . . . , n, where ǫs is noise. Nyquist ⇒ can reconstruct only up to a bandlimit W . Natural search space for φ: space B of band limited functions. Suppose dim(B) = p. Cryo-EM problem: Given Is, estimate Rs and find approximation of φ in B.

Amit Singer (Princeton University) July 2014 4 / 41

slide-5
SLIDE 5

The heterogeneity problem

What if the molecule has more than one possible structure?

(Image source: H. Liao and J. Frank, Classification by bootstrapping in single particle methods, Proceedings of the 2010 IEEE international conference on biomedical imaging, 2010.) Amit Singer (Princeton University) July 2014 5 / 41

slide-6
SLIDE 6

The heterogeneity problem (discrete conformational space)

Problem (1)

φ : Ω × R3 → R, with P[φ = φc] = pc, c = 1, . . . , C. φ1, . . . , φn are i.i.d. samples from φ Let Is = SPsφs + ǫs, where P1, . . . , Pn are obtained from i.i.d. rotations Rs (independent

  • f the φs) drawn from a distribution over SO(3),

and ǫs ∼ N(0, σ2Ip), independent of φs, Ps. Given Rs, Is, find the number of classes C, approximations of the molecular structures φc in B, and their probabilities pc.

Amit Singer (Princeton University) July 2014 6 / 41

slide-7
SLIDE 7

Previous work

Common lines (e.g., Shatsky et al, 2010) Maximum likelihood (e.g., Wang et al, 2013) Bootstrapping and PCA (e.g., Penczek et al, 2011)

Amit Singer (Princeton University) July 2014 7 / 41

slide-8
SLIDE 8

The heterogeneity problem: finite-dimensional formulation

For purposes of this talk, suppose that φc ∈ B. Note that Ps := SPs|B is a linear operator Ps : Rp → Rq. If Xs is basis representation of φs, then Is = PsXs + ǫs. X1, . . . , Xn are i.i.d samples of a random vector X in Rp. When X has C classes, Σ = Var[X] has rank C − 1. Penczek (2011) proposed a (suboptimal) method to use the top C − 1 eigenvectors of Σ to solve the heterogeneity problem. How to estimate Σ?

Amit Singer (Princeton University) July 2014 8 / 41

slide-9
SLIDE 9

Covariance Estimation from Projected Data

Consider the following general statistical estimation problem:

Problem (2)

X is a random vector on Cp, with E[X] = µ and Var(X) = Σ. X1, . . . , Xn are i.i.d. samples from X Let Is = PsXs + ǫs, s = 1, . . . , n, P1, . . . , Pn : Cp → Cq are i.i.d. samples (independent of the Xs) from a distribution over Cq×p (q ≤ p) ǫs are i.i.d. noises such that E[ǫs] = 0, Var[ǫs] = σ2Iq, independent of Xs, Ps Given Is, Ps, and σ2, estimate µ and Σ.

Amit Singer (Princeton University) July 2014 9 / 41

slide-10
SLIDE 10

PCA from projected data: Well-known special cases

Is = PsXs + ǫs, s = 1, . . . , n. If p = q and Ps = Ip, we have the usual covariance estimation problem. When Ps are coordinate-selection operators, we have a missing data statistics problem (Little and Rubin, 1987). A special case of the matrix sensing problem (Recht, Fazel, Parrilo, 2010) Here we do not attempt to impute the missing entries / estimate Xs assuming Σ has low rank. Instead we shall try to estimate Σ and its rank directly.

Amit Singer (Princeton University) July 2014 10 / 41

slide-11
SLIDE 11

Obtaining an estimator

Is = PsXs + ǫs. Note that E[Is] = Psµ; Var[Is] = PsΣP∗

s + σ2I.

Can construct estimators µn and Σn as solutions of optimization problem: µn = argmin

µ n

  • s=1

Is − Psµ2 Σn = argmin

Σ n

  • s=1
  • (Is − Psµn)(Is − Psµn)∗ − (PsΣP∗

s + σ2I)

  • 2

F .

Σn is not constraint to be positive semidefinite: we are typically interested only in the few leading eigenvectors of Σn.

Amit Singer (Princeton University) July 2014 11 / 41

slide-12
SLIDE 12

Resulting linear equations

These lead to linear equations for µn and Σn: Anµn := 1 n n

  • s=1

P∗

s Ps

  • µn = 1

n

n

  • s=1

P∗

s Is

LnΣn := 1 n

n

  • s=1

P∗

s PsΣnP∗ s Ps

= 1 n

n

  • s=1

P∗

s (Is − Psµn)(Is − Psµn)∗Ps − σ2

n

n

  • s=1

P∗

s Ps.

Amit Singer (Princeton University) July 2014 12 / 41

slide-13
SLIDE 13

Connection to missing-data statistics

We recover available-case estimators of mean and covariance Known to be consistent under the missing completely at random (MCAR) assumption

Amit Singer (Princeton University) July 2014 13 / 41

slide-14
SLIDE 14

PCA in high dimensions

For the sample covariance matrix Σn = 1 n

n

  • i=1

xixT

i ,

X ∼ N(0, Ip) the asymptotic regime p, n → ∞ with p/n → γ (“high dimensional statistics”) is by now well understood (Johnstone, ICM 2006) Marcenko-Pastur distribution, Tracy-Widom distribution, spiked model (Σ = Ip + σ2vv T), phase transition.

Amit Singer (Princeton University) July 2014 14 / 41

slide-15
SLIDE 15

Connection to high-dimensional PCA

1 n

n

  • s=1

P∗

s PsΣnP∗ s Ps = 1

n

n

  • s=1

P∗

s (Is − Psµn)(Is − Psµn)∗Ps − σ2

n

n

  • s=1

P∗

s Ps.

When Ps = I, we recover the sample covariance matrix How does the spectrum of Σn behave as a function of n, p, q and the geometry of Ps? Empirical spectral density? Distribution of largest eigenvalue? Spiked model?

Amit Singer (Princeton University) July 2014 15 / 41

slide-16
SLIDE 16

Theoretical results about Σn: Preliminary Study

Proposition

[Asymptotic Consistency, fixed p] Suppose An → A and Ln → L, where A, L are invertible. Suppose Ps is uniformly bounded and that X has bounded fourth moments. Then, E[Σn] − Σ = O 1 n

  • ;

Var[Σn] = O 1 n

  • ,

Proposition

[Finite Sample Bounds] Suppose X ≤ C1, ǫs ≤ C2, Ps ≤ C3 a.s. Assume that X is centered. If λ = λmin(Ln) and B = C 2

3 C1 + C3C2, then

P {Σ − Σn ≥ t} ≤ p exp −3nλ2t2/2 B4 + 2B2λt

  • ,

t ≥ 0.

Amit Singer (Princeton University) July 2014 16 / 41

slide-17
SLIDE 17

Regularization

Extreme example: q = 2, Ps are random coordinate selection

  • perators.

Coupon collector: n p2 log p for Ln to be invertible. Regularization? min

Σ n

  • s=1
  • (Is − Psµn)(Is − Psµn)∗ − (PsΣP∗

s + σ2I)

  • 2

F + λΣ2 F .

This would set unattainable entries of Σn to 0. While other regularization strategies are possible, e.g., W ΣW ∗1 to promote sparsity in some basis W , they come at a price of increased computational complexity.

Amit Singer (Princeton University) July 2014 17 / 41

slide-18
SLIDE 18

Back to Heterogeneity: Fourier Slice Theorem

If ˆ Ps is the Fourier-domain version of Ps, then ˆ Ps simply restricts Fourier volumes to central planes perpendicular to the viewing direction.

Amit Singer (Princeton University) July 2014 18 / 41

slide-19
SLIDE 19

The slice theorem and covariance matrix estimation

The Fourier slice theorem: Better work in Fourier domain ˆ φs = Fφs, ˆ Is = FIs Estimate ˆ µ, ˆ Σ and then inverse transform to get µ, Σ. ˆ Ps is a restriction to a central plane corresponding to Rs. In 3D, for any pair of frequencies we can find a central plane that passes through them — coordinate selection operators in the continuous limit. The covariance of 2D objects from their 1D line projections cannot be estimated.

Amit Singer (Princeton University) July 2014 19 / 41

slide-20
SLIDE 20

The ramp filter

ˆ Anˆ µn = 1

n

n

s=1 ˆ

P∗

s ˆ

Ps

  • ˆ

µn In the continuous limit: replace Ps with Ps and average with integral

  • ver SO(3) with respect to the Haar measure.

Fourier slice theorem: ( ˆ P∗

s ˆ

Ps ˆ φ)(ρ) = ˆ φ(ρ)δ(ρ · R3

s ).

ˆ Aˆ µ(ρ) = ˆ µ(ρ) 1 4π

  • S2 δ(ρ · θ) dθ = ˆ

µ(ρ) |ρ| 1 4π

  • S2 δ( ρ

|ρ| · θ) dθ = ˆ µ(ρ) 2|ρ| .

Amit Singer (Princeton University) July 2014 20 / 41

slide-21
SLIDE 21

The triangular area filter

ˆ Ln ˆ Σn := 1

n

n

s=1 ˆ

P∗

s ˆ

Ps ˆ Σn ˆ P∗

s ˆ

Ps Fourier slice theorem: ( ˆ P∗

s ˆ

Ps ˆ φ)(ρ) = ˆ φ(ρ)δ(ρ · R3

s ).

( ˆ P∗

s ˆ

Ps ˆ Σ ˆ P∗

s ˆ

Ps)(ρ1, ρ2) = ˆ Σ(ρ1, ρ2)δ(ρ1 · R3

s )δ(ρ2 · R3 s )

Integrate over R3

s :

( ˆ Lˆ Σ)(ρ1, ρ2) = ˆ Σ(ρ1, ρ2)K(ρ1, ρ2). ˆ L is a diagonal operator, with K(ρ1, ρ2) = 1 4π

  • S2 δ(ρ1 · θ)δ(ρ2 · θ) dθ = 1

4π 2 |ρ1 × ρ2|. K is the density of planes that go through frequencies ρ1, ρ2 The filter 1/K is proportional to the area of a triangle with nodes at ρ1, ρ2, and the origin.

Amit Singer (Princeton University) July 2014 21 / 41

slide-22
SLIDE 22

The triangular area filter

( ˆ Lˆ Σ)(ρ1, ρ2) = ˆ Σ(ρ1, ρ2)K(ρ1, ρ2). K(ρ1, ρ2) = 1 4π

  • S2 δ(ρ1 · θ)δ(ρ2 · θ) dθ = 1

4π 2 |ρ1 × ρ2|.

Amit Singer (Princeton University) July 2014 22 / 41

slide-23
SLIDE 23

Challenge: discretization

Now that we are in the Fourier domain, how do we discretize? If we using a Cartesian grid and use nearest neighbors interpolation then we incur large discretization error. If the volumes are 100 × 100 × 100 voxels, then Σ is of size 106 × 106. Low pass filter approximation might be unavoidable. We need to respect rotation and restriction to planes (i.e., spherical coordinates), but then ˆ Ln is no longer diagonal and we can easily run into computational problems with the inversion of ˆ Ln. We need a representation of volumes in which ˆ Ln is sparse.

Amit Singer (Princeton University) July 2014 23 / 41

slide-24
SLIDE 24

Finding the right basis

Idea: Find image and volume domains ˆ I and ˆ V (in Fourier space) so that ˆ Ps(ˆ V) ⊂ ˆ I. Construction: ˆ I = span

  • 1

√ 2π fk(r)eimϕ

  • ;

ˆ V = span({fk(r)Y m

ℓ (θ, ϕ)}).

Note: resulting matrix ˆ Ps is block-diagonal. This is a bit unusual construction, since typically the radial functions depend on the angular frequency ℓ (e.g., Spherical-Bessel, 3D Slepian) We have to be very careful with our construction so that our basis approximately spans B.

Amit Singer (Princeton University) July 2014 24 / 41

slide-25
SLIDE 25

Fourier linear equations

The linear equations in the Fourier domain are essentially the same: ˆ Anˆ µn := n

  • s=1

ˆ P∗

s ˆ

Ps

  • ˆ

µn = ˆ a; ˆ Ln ˆ Σn :=

n

  • s=1

ˆ P∗

s ˆ

Ps ˆ Σn ˆ P∗

s ˆ

Ps = ˆ B, Note: ˆ Ln is block-diagonal (sparsity!), acting on the blocks ˆ Σk1,k2

  • separately. Denote each “block” of ˆ

Ln by ˆ Lk1,k2

n

.

Amit Singer (Princeton University) July 2014 25 / 41

slide-26
SLIDE 26

Challenge: size of ˆ Ln

We still must deal with the enormous size of ˆ Ln, where ˆ Ln ˆ Σn =

n

  • s=1

ˆ P∗

s ˆ

Ps ˆ Σn ˆ P∗

s ˆ

Ps.

Amit Singer (Princeton University) July 2014 26 / 41

slide-27
SLIDE 27

Sum to integral

If rotations are uniformly distributed, we have ˆ Ln ˆ Σn = 1 n

n

  • s=1

ˆ P∗

s ˆ

Ps ˆ Σn ˆ P∗

s ˆ

Ps → 1 4π

  • S2

ˆ P(θ)∗ ˆ P(θ)ˆ Σn ˆ P(θ)∗ ˆ P(θ)dθ = ˆ Lˆ Σn, a.s.

Amit Singer (Princeton University) July 2014 27 / 41

slide-28
SLIDE 28

What is ˆ L?

Let us view ˆ Σk1,k2 : S2 × S2 → C as bandlimited bilinear form. It turns out that ˆ Lk1,k2 ˆ Σk1,k2 = LPFk1,k2(ˆ Σk1,k2 · K), where K(α, β) = 1 4π 2 |α × β|. Thus, ˆ L is no longer diagonal... ...but, ˆ L turns out to be sparse! This can be shown using Clebsch-Gordan coefficients for products of spherical harmonics.

Amit Singer (Princeton University) July 2014 28 / 41

slide-29
SLIDE 29

Sparsity of ˆ L

Only about 6% of ˆ L15,15 are nonzero.

Figure : Sparsity structure of (block of) ˆ L

Amit Singer (Princeton University) July 2014 29 / 41

slide-30
SLIDE 30

Condition number and computational complexity

1

ˆ L is self-adjoint and positive semidefinite

2 λmin(ˆ

L) ≥ 2 (implies consistency)

3 Conjecture: λmax(ˆ

Lk1,k2) grows linearly with min(k1, k2)

4 Conjugate gradient requires O

  • κ(ˆ

Lk1,k2)

  • iterations, where the

cost of each iteration is nnz(ˆ Lk1,k2).

5 Computational complexity: (here Nres = 2

πwmax)

O(nN2N2

res + N9.5 res )

but all steps can be run in parallel (e.g., ˆ L is block-diagonal).

Amit Singer (Princeton University) July 2014 30 / 41

slide-31
SLIDE 31

Numerical results: Two classes

Synthetic dataset: simulate heterogeneity using two “toy” volumes.

Figure : Slices of the toy volumes.

Note: ideal covariance matrix should have rank 1.

Amit Singer (Princeton University) July 2014 31 / 41

slide-32
SLIDE 32

Numerical results: Two classes

Noisy projection images:

(a) Clean (b) SNR = 0.05 (c) SNR = 0.01

Amit Singer (Princeton University) July 2014 32 / 41

slide-33
SLIDE 33

Numerical results: Two classes

Reconstructions from noisy data:

(a) Clean (b) SNR = 0.01 (c) SNR = 0.006 (d) SNR = 0.002 Figure : Reconstructions of mean and top eigenvector for three different SNR

  • values. The top row contains clean and reconstructed slices of the mean; the

bottom row contains slices of the top eigenvector.

Amit Singer (Princeton University) July 2014 33 / 41

slide-34
SLIDE 34

Numerical results: Two classes

Figure : Correlations of true and computed top eigenvectors for various SNRs.

Amit Singer (Princeton University) July 2014 34 / 41

slide-35
SLIDE 35

Numerical results: Two classes

Figure : Corresponding eigenvalue histograms.

Amit Singer (Princeton University) July 2014 35 / 41

slide-36
SLIDE 36

Numerical results: Three classes

Ideal Σ has rank 2.

Figure : Eigenvalue histograms of reconstructed covariance matrix in the three class case for three SNR values. Note that the noise distribution first engulfs the second eigenvalue, and eventually the top eigenvalue as well.

Amit Singer (Princeton University) July 2014 36 / 41

slide-37
SLIDE 37

Numerical results: Three classes – Clustering

Least squares and a Gaussian mixture model: ˆ Xs ≈ ˆ µn +

C−1

  • c′=1

αs,c′ ˆ V c′.

C−1

  • c′=1

αs,c′ ˆ Ps ˆ V c′ ≈ ˆ Ps ˆ Xs − ˆ Ps ˆ µn ≈ ˆ Is − ˆ Ps ˆ µn.

Figure : The coordinates αs for the three class case, colored according to true class.

Amit Singer (Princeton University) July 2014 37 / 41

slide-38
SLIDE 38

Numerical results: Continuous Variability

Amit Singer (Princeton University) July 2014 38 / 41

slide-39
SLIDE 39

Summary

It is possible to do PCA from projected data and estimate rank without imputation. Open problems for the distribution of eigenvalues of Σn in the “large n, large p” regime (also need consider q and distribution of Ps) Generalized the ramp filter of classical computerized tomography to the triangular area filter for tomographic inversion of structural variability. A careful selection of bases and the discovery of a sparse structure allowed us to tractably solve the heterogeneity problem. We began the study of the projection covariance transform L, an

  • perator of interest in tomographic problems involving variability.

Amit Singer (Princeton University) July 2014 39 / 41

slide-40
SLIDE 40

Future research directions

Random matrix theory investigation for high dimensional PCA of projected data Compare approach with matrix sensing/completion methods Study ˆ L further Include contrast transfer function (CTF) in the forward model Add and examine regularization schemes Relax uniform rotations assumption Test on real datasets Other applications (e.g., 4D tomography)

Amit Singer (Princeton University) July 2014 40 / 41

slide-41
SLIDE 41

Thank You!

Funding: NIH/NIGMS R01GM090200 AFOSR FA9550-12-1-0317 Simons Foundation LTR DTD 06-05-2012 DARPA N66001-13-1-4052

  • G. Katsevich, A. Katsevich, A. Singer (submitted). Covariance Matrix Estimation

for the Cryo-EM Heterogeneity Problem. http://arxiv.org/abs/1309.1737

Amit Singer (Princeton University) July 2014 41 / 41