3D Structure Determination using Cryo-Electron Microscopy - - PowerPoint PPT Presentation

3d structure determination using cryo electron microscopy
SMART_READER_LITE
LIVE PREVIEW

3D Structure Determination using Cryo-Electron Microscopy - - PowerPoint PPT Presentation

3D Structure Determination using Cryo-Electron Microscopy Computational Challenges Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics July 28, 2015 Mathematics in Data Science


slide-1
SLIDE 1

3D Structure Determination using Cryo-Electron Microscopy — Computational Challenges

Amit Singer

Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

July 28, 2015 Mathematics in Data Science

Amit Singer (Princeton University) July 2015 1 / 28

slide-2
SLIDE 2

Single Particle Reconstruction using cryo-EM

Schematic drawing of the imaging process: The cryo-EM problem:

Amit Singer (Princeton University) July 2015 2 / 28

slide-3
SLIDE 3

New detector technology: Exciting times for cryo-EM

www.sciencemag.org SCIENCE VOL 343 28 MARCH 2014

1443

The Resolution Revolution

BIOCHEMISTRY Werner Kühlbrandt Advances in detector technology and image processing are yielding high-resolution electron cryo-microscopy structures of biomolecules.

P

recise knowledge of the structure of macromolecules in the cell is essen- tial for understanding how they func-

  • tion. Structures of large macromolecules can

now be obtained at near-atomic resolution by averaging thousands of electron microscope images recorded before radiation damage

  • accumulates. This is what Amunts et al. have

done in their research article on page 1485 of this issue ( 1), reporting the structure of the large subunit of the mitochondrial ribosome at 3.2 Å resolution by electron cryo-micros- copy (cryo-EM). Together with other recent high-resolution cryo-EM structures ( 2– 4) (see the fi gure), this achievement heralds the beginning of a new era in molecular biology, where structures at near-atomic resolution are no longer the prerogative of x-ray crys- tallography or nuclear magnetic resonance (NMR) spectroscopy. Ribosomes are ancient, massive protein- RNA complexes that translate the linear genetic code into three-dimensional proteins. Mitochondria—semi-autonomous organelles A B C

Near-atomic resolution with cryo-EM. (A) The large subunit of the yeast mitochondrial ribosome at 3.2 Å reported by Amunts et al. In the detailed view below, the base pairs of an RNA double helix and a magnesium ion (blue) are clearly resolved. (B) TRPV1 ion channel at 3.4 Å ( 2), with a detailed view of residues lining the ion pore on the four-fold axis of the tetrameric channel. (C) F420-reducing [NiFe] hydrogenase at 3.36 Å ( 3). The detail shows an α helix in the FrhA subunit with resolved side chains. The maps are not drawn to scale.

Amit Singer (Princeton University) July 2015 3 / 28

slide-4
SLIDE 4

Big “Movie” Data, Publicly Available

http://www.ebi.ac.uk/pdbe/emdb/empiar/

Amit Singer (Princeton University) July 2015 4 / 28

slide-5
SLIDE 5

Image Formation Model and Inverse Problem

Projection Ii Molecule φ Electron source Ri =    − R1

i T

− − R2

i T

− − R3

i T

−    ∈ SO(3)

Projection images Ii(x, y) = ∞

−∞ φ(xR1 i + yR2 i + zR3 i ) dz + “noise”.

φ : R3 → R is the electric potential of the molecule. Cryo-EM problem: Find φ and R1, . . . , Rn given I1, . . . , In.

Amit Singer (Princeton University) July 2015 5 / 28

slide-6
SLIDE 6

Toy Example

Amit Singer (Princeton University) July 2015 6 / 28

slide-7
SLIDE 7
  • E. coli 50S ribosomal subunit

27,000 particle images provided by Dr. Fred Sigworth, Yale Medical School 3D reconstruction by S, Lanhui Wang, and Jane Zhao

Amit Singer (Princeton University) July 2015 7 / 28

slide-8
SLIDE 8

Main Algorithmic Challenges

1 Orientation assignment 2 Heterogeneity (resolving structural variability) 3 2D Class averaging (de-noising) 4 Particle picking 5 Symmetry detection 6 Motion correction Amit Singer (Princeton University) July 2015 8 / 28

slide-9
SLIDE 9

The heterogeneity problem

What if the molecule has more than one possible structure?

(Image source: H. Liao and J. Frank, Classification by bootstrapping in single particle methods, ISBI, 2010.)

Katsevich, Katsevich, S (SIAM Journal on Imaging Sciences, 2015) Covariance matrix estimation of the 3-D structures from their 2-D projections (high-dimensional statistics, random matrices, low-rank matrix completion)

Amit Singer (Princeton University) July 2015 9 / 28

slide-10
SLIDE 10

Experimental Data: 70S Ribosome

10000 image dataset (130-by-130), courtesy Joachim Frank (Columbia University)

Class 1 Class 2 Morphing video by S, Joakim And´ en, and Eugene Katsevich

Amit Singer (Princeton University) July 2015 10 / 28

slide-11
SLIDE 11

Class averaging for image denoising

Rotation invariant representation (steerable PCA, bispectrum) Vector diffusion maps (S, Wu 2012), generalization of Laplacian Eigenmaps (Belkin, Niyogi 2003) and Diffusion Maps (Coifman, Lafon 2006) Graph Connection Laplacian Experimental images (70S) courtesy of

  • Dr. Joachim Frank (Columbia)

Class averages by vector diffusion maps (averaging with 20 nearest neighbors) (Zhao, S 2014)

Amit Singer (Princeton University) July 2015 11 / 28

slide-12
SLIDE 12

Orientation Estimation

Standard procedure is iterative refinement. Alternating minimization or expectation-maximization, starting from an initial guess φ0 for the 3-D structure Ii = P(Ri · φ) + ǫi, i = 1, . . . , n.

Ri · φ(r) = φ(R−1

i

r) is the left group action P is integration in the z-direction and grid sampling.

Converges to a local optimum, not necessarily the global one. Model bias is a well-known pitfall Is “reference free” orientation assignment and reconstruction possible?

Amit Singer (Princeton University) July 2015 12 / 28

slide-13
SLIDE 13

3D Puzzle

Optimization problem: min

g1,g2,...,gn∈G n

  • i,j=1

fij(gig−1

j

) G = SO(3) is the group of rotations in space. Parameter space G × G × · · · × G is exponentially large.

Amit Singer (Princeton University) July 2015 13 / 28

slide-14
SLIDE 14

Non-Unique Games over Compact Groups

min

g1,g2,...,gn∈G n

  • i,j=1

fij(gig−1

j

) For G = Z2 this encodes Max-Cut.

1 5 4 3 2 w12 w34 w35 w25 w45 w14 w24

Max-2-Lin(ZL) formulation of Unique Games (Khot et al 2005): Find x1, . . . , xn ∈ ZL that satisfy as many difference eqs as possible xi − xj = bij mod L, (i, j) ∈ E This corresponds to G = ZL and fij(xi − xj) = −1{xi−xj=bij} Our games are non-unique in general, and the group is not necessarily finite.

Amit Singer (Princeton University) July 2015 14 / 28

slide-15
SLIDE 15

Orientation Estimation: Fourier projection-slice theorem

Projection Ii Projection Ij ˆ Ii ˆ Ij 3D Fourier space 3D Fourier space

(xij, yij) (xji, yji) Ricij cij = (xij, yij, 0)T Ricij = Rjcji

Amit Singer (Princeton University) July 2015 15 / 28

slide-16
SLIDE 16

Angular Reconstitution (Vainshtein and Goncharov 1986, Van Heel 1987)

Amit Singer (Princeton University) July 2015 16 / 28

slide-17
SLIDE 17

Least Squares Approach

Projection Ii Projection Ij ˆ Ii ˆ Ij 3D Fourier space 3D Fourier space

(xij, yij) (xji, yji) Ricij cij = (xij, yij, 0)T Ricij = Rjcji

min

R1,R2,...,Rn∈SO(3)

  • i=j

Ricij − Rjcji2

Search space is exponentially large and non-convex. Spectral and semidefinite programming relaxations (non-commutative Grothendieck, Max-Cut) S, Shkolnisky (SIAM Journal on Imaging Sciences, 2011)

Amit Singer (Princeton University) July 2015 17 / 28

slide-18
SLIDE 18

MLE

The images contain more information than that expressed by optimal pairwise matching of common lines. Algorithms based on pairwise matching can succeed only at “high” SNR. (Quasi) Maximum Likelihood: We would like to try all possible rotations R1, . . . , Rn and choose the combination for which the agreement on the common lines (implied by the rotations) as

  • bserved in the images is maximal.

Computationally intractable: exponentially large search space, complicated cost function.

Amit Singer (Princeton University) July 2015 18 / 28

slide-19
SLIDE 19

Quasi MLE

Common line equation: Ricij = Rjcji = Rie3 × Rje3 Rie3 × Rje3 with e3 = (0, 0, 1)T . cij = R−1

i

Rie3 × Rje3 Rie3 × Rje3 = e3 × R−1

i

Rje3 e3 × R−1

i

Rje3 cji = R−1

j

Rie3 × Rje3 Rie3 × Rje3 = R−1

j

Rie3 × e3 R−1

i

Rie3 × e3 Quasi MLE min

R1,...,Rn∈SO(3) n

  • i,j=1

ˆ Ii(·, cij) − ˆ Ij(·, cji)2 ⇐ ⇒ min

g1,...,gn∈G n

  • i,j=1

fij(gig−1

j

)

Amit Singer (Princeton University) July 2015 19 / 28

slide-20
SLIDE 20

Fourier transform over G

Recall for G = SO(2) f (α) =

  • k=−∞

ˆ f (k)eıkα ˆ f (k) = 1 2π 2π f (α)e−ıkα dα In general, for a compact group G f (g) =

  • k=0

dkTr

  • ˆ

f (k)ρk(g)

  • ˆ

f (k) =

  • G

f (g)ρk(g)∗ dg Here

ρk are the unitary irreducible representations of G dk is the dimension of the representation ρk (e.g., dk = 1 for SO(2), dk = 2k + 1 for SO(3)) dg is the Haar measure on G

Amit Singer (Princeton University) July 2015 20 / 28

slide-21
SLIDE 21

Linearization of the cost function

Introduce matrix variables X (k)

ij

= ρk(gig−1

j

) Fourier expansion of fij fij(g) =

  • k=0

dkTr

  • ˆ

fij(k)ρk(g)

  • Linear cost function

f (g1, . . . , gn) =

n

  • i,j=1

fij(gig−1

j

) =

n

  • i,j=1

  • k=0

dkTr

  • ˆ

fij(k)X (k)

ij

  • Amit Singer (Princeton University)

July 2015 21 / 28

slide-22
SLIDE 22

Constraints on the variables X (k)

ij

= ρk(gig −1

j

)

1 X (k) 0 2 X (k)

ii

= Idk, for i = 1, . . . , n

3 rank(X (k)) = dk

X (k)

ij

= ρk(gig−1

j

) = ρk(gi)ρk(g−1

j

) = ρk(gi)ρk(gj)∗ X (k) =      ρk(g1) ρk(g2) . . . ρk(gn)     

  • ρk(g1)∗

ρk(g2)∗ · · · ρk(gn)∗ We drop the non-convex rank constraint. The relaxation is too loose, as we can have X (k)

ij

= 0 (for i = j). Even with the rank constraint, nothing ensures that X (k)

ij

and X (k′)

ij

correspond to the same group element gig−1

j

.

Amit Singer (Princeton University) July 2015 22 / 28

slide-23
SLIDE 23

Additional constraints on X (k)

ij

= ρk(gig −1

j

)

The delta function for G = SO(2) δ(α) =

  • k=−∞

eıkα Shifting the delta function to αi − αj δ(α − (αi − αj)) =

  • k=−∞

eıkαe−ık(αi −αj) =

  • k=−∞

eıkαX (k)

ij ∗

The delta function is non-negative and integrates to 1:

  • k=−∞

eıkαX (k)

ij ∗ ≥ 0,

∀α ∈ [0, 2π) 1 2π 2π

  • k=−∞

eıkαX (k)

ij ∗ dα = X (0) ij ∗ = 1

Amit Singer (Princeton University) July 2015 23 / 28

slide-24
SLIDE 24

Finite truncation via Fej´ er kernel

In practice, we cannot use infinite number of representations to compose the delta function. Simple truncation leads to the Dirichlet kernel which changes sign Dm(α) =

m

  • k=−m

eıkα This is also the source for the Gibbs phenomenon and the non-uniform convergence of the Fourier series. The Fej´ er kernel is non-negative and leads to uniform convergence Fm(α) = 1 m

m−1

  • k=0

Dk(α) =

m

  • k=−m
  • 1 − |k|

m

  • eıkα

The Fej´ er kernel is the first order Ces` aro mean of the Dirichlet kernel.

Amit Singer (Princeton University) July 2015 24 / 28

slide-25
SLIDE 25

Finite truncation via Fej´ er-Riesz factorization

Non-negativity constraints over SO(2)

m

  • k=−m
  • 1 − |k|

m

  • eıkαX (k)

ij ∗ ≥ 0,

∀α ∈ [0, 2π) Fej´ er-Riesz: P is a non-negative trigonometric polynomial over the circle, i.e. P(eıα) ≥ 0 ∀α ∈ [0, 2π) iff P(eıα) = |Q(eıα)|2 for some polynomial Q. Leads to semidefinite constraints on

  • X (k)

ij

  • k for each i, j.

Similar non-negativity constraints hold for general G using the delta function over G δ(g) =

  • k=0

dkTr [ρk(g)] For example, Fej´ er proved that for SO(3) the second order Ces` aro mean of the Dirichlet kernel is non-negative.

Amit Singer (Princeton University) July 2015 25 / 28

slide-26
SLIDE 26

Tightness of the semidefinite program

We solve an SDP for the matrices X (0), . . . , X (m). Numerically, the solution of the SDP has the desired ranks up to a certain level of noise (w.h.p). In other words, even though the search-space is exponentially large, we typically find the MLE in polynomial time. This is a viable alternative to heuristic methods such as EM and alternating minimization. The SDP gives a certificate whenever it finds the MLE.

Amit Singer (Princeton University) July 2015 26 / 28

slide-27
SLIDE 27

Final Remarks

Loss of handedness ambiguity in cryo-EM: If g1, . . . , gn ∈ SO(3) is the solution, then so is Jg1J−1, . . . , JgnJ−1 for J = diag(−1, −1, 1). Define X (k)

ij

= 1

2

  • ρk(gig−1

j

) + ρk(Jgig−1

j

J−1)

  • Splits the representation: 2k + 1 = dk = k + (k + 1), reduced

computation Point group symmetry (cyclic, dihedral, etc.): reduces the dimension

  • f the representation (invariant polynomials)

Translations and rotations simultaneously: SE(3) is a non-compact group, but we can map it to SO(4).

Amit Singer (Princeton University) July 2015 27 / 28

slide-28
SLIDE 28

References

  • A. Singer, Y. Shkolnisky, “Three-Dimensional Structure Determination from Common

Lines in Cryo-EM by Eigenvectors and Semidefinite Programming”, SIAM Journal on Imaging Sciences, 4 (2), pp. 543–572 (2011).

  • A. Singer, H.-T. Wu, “Vector Diffusion Maps and the Connection Laplacian”,

Communications on Pure and Applied Mathematics, 65 (8), pp. 1067–1144 (2012).

  • Z. Zhao, A. Singer, “Rotationally Invariant Image Representation for Viewing Direction

Classification in Cryo-EM”, Journal of Structural Biology, 186 (1), pp. 153–166 (2014).

  • A. S. Bandeira, M. Charikar, A. Singer, A. Zhu, “Multireference Alignment using

Semidefinite Programming”, 5th Innovations in Theoretical Computer Science (ITCS 2014).

  • A. S. Bandeira, Y. Chen, A. Singer “Non-Unique Games over Compact Groups and

Orientation Estimation in Cryo-EM”, http://arxiv.org/abs/1505.03840.

  • G. Katsevich, A. Katsevich, A. Singer, “Covariance Matrix Estimation for the Cryo-EM

Heterogeneity Problem”, SIAM Journal on Imaging Sciences, 8 (1), pp. 126–185 (2015).

  • J. And´

en, E. Katsevich, A. Singer, “Covariance estimation using conjugate gradient for 3D classification in Cryo-EM”, 12th IEEE International Symposium on Biomedical Imaging (ISBI 2015).

Amit Singer (Princeton University) July 2015 28 / 28