Ab i iti th d h / h d th k Ab initio methods: how/why do they work
D.Svergun
Ab initio methods: how/why do they work Ab i iti th d h / h d th - - PowerPoint PPT Presentation
Ab initio methods: how/why do they work Ab i iti th d h / h d th k D.Svergun Small Small- -angle scattering in structural biology angle scattering in structural biology Data analysis Detector Resolution, nm: R l ti 3 Incident
D.Svergun
Data analysis
Detector
Small Small-
angle scattering in structural biology
R l ti
Sh
2θ Sample Incident beam Wave vector k k=2π/λ
g I, relative
2 3
Scattering I( ) Resolution, nm: 3.1 1.6 1.0 0.8
Shape determination Rigid body
Solvent k, k=2π/λ Scattered beam, k1
l
1
curve I(s)
Missing Rigid body modelling
Radiation sources: X-ray tube (λ = 0.1 - 0.2 nm) Synchrotron (λ = 0.05 - 0.5 nm) Thermal neutrons (λ = 0.1 - 1 nm)
s, nm -1 2 4 6 8
EM
Complementary Complementary techniques techniques Oligomeric mixtures g fragments
Homology models Atomic models MS Distances Crystallography NMR h Bioinformatics
Hierarchical systems
Orientations Interfaces
Additional Additional information information
Biochemistry FRET AUC
Flexible systems
EPR
not believe that SAS yields more not believe that SAS yields more than the radius of gyration
number of users are attracted by u b
a a a d by new possibilities of SAS and they want rapid answers to more and more complicated Questions
perform numerous cumbersome actions during the experiment and data analysis, to become each of the Answers
Now we shall go through the major steps required on the way
The momentum transfer [q, Q, s, h, μ, κ …] = 4π sin(θ)/λ,
relative
3
Resolution, nm: 3.1 1.6 1.0 0.8
[q μ ] ( ) I(s)=I0*exp(-sRg/3), sRg <1.3 [ S k ] 2 i (θ)/λ
lg I,
1 2
Scattering curve I(s)
[s, S, k, … ] = 2 sin(θ)/λ I(s)=I0*exp(-2πsRg/3), sRg <1.3*2π
s, nm -1 2 4 6 8
GNOM or CRYSOL input: Angular units in the input file: 4*pi*sin(theta)/lambda [1/angstrom] (1) 4*pi*sin(theta)/lambda [1/nm] (2) 2 * sin(theta)/lambda [1/angstrom] (3) 2 sin(theta)/lambda [1/angstrom] (3) 2 * sin(theta)/lambda [1/nm] (4)
D
The scattering is proportional to that The scattering is proportional to that
which allows
to determine size, shape and internal structure of the particle at low (1-10 ) l ti nm) resolution.
Radius of gyration R (Guinier 1939)
) s R ) I( I(s)
g 2 2
3 1 exp( − ≅
Radius of gyration Rg (Guinier, 1939) Maximum size Dmax: p(r)=0 for r> Dmax Molecular mass (from I(0)) Excluded particle volume (Porod, 1952)
∞
= =
2 2
) ( I(0)/Q; 2 V ds s I s Q π
lg I(s), relative
Solid sphere
lg I(s), relative
lg I(s), relative
lg I(s), relative
lg I(s), relative
Hollow sphere
0 0 0 1 0 2 0 3 0 4 0 5
s, nm-1
0.0 0.1 0.2 0.3 0.4 0.5
Dumbbell
s, nm-1
0.0 0.1 0.2 0.3 0.4 0.5
s, nm-1
0.0 0.1 0.2 0.3 0.4 0.5
s, nm-1
0.0 0.1 0.2 0.3 0.4 0.5
s, nm-1
0.0 0.1 0.2 0.3 0.4 0.5
Long rod Flat disc
3D search model
M parameters
1D scattering
Trial-and-error Non linear
g data
Non-linear search
Lack of 3D information Lack of 3D information bl l d bl l d inevitably leads to inevitably leads to ambiguous interpretation, ambiguous interpretation, and additional information is and additional information is and additional information is and additional information is always required always required
SAS i t it i I( ) <I( )> <{F [ ( )]}2> h F SAS intensity is I(s) = <I(s)>Ω = <{F [ρ(r)]}2>Ω, where F denotes the Fourier transform, <>Ω stands for the spherical average, and s=(s, Ω) is the scattering vector. Expanding ρ(r) in spherical harmonics p g ρ( ) p
) ( ) ( ) ( ω ρ ρ
lm lm l l m l
Y r
∑ ∑
− = ∞ =
= r
the scattering intensity is expressed as
I s A s
l l
( ) ( ) =
∞
∑ ∑
2
2 2
π I s A s
l m l lm
( ) ( ) =
= =−
∑ ∑
2 π
where the partial amplitudes Alm(s) are the Hankel transforms from the radial functions
A s i r j sr r dr
lm l lm l
( ) ( ) ( ) =
∞
∫
2
2
π ρ
and jl(sr) are the spherical Bessel functions.
Stuhrmann, H.B. Acta Cryst., A26 (1970) 297.
SAXS, 1982 SAXS, 1982 Cryo Cryo-
EM, 2005 Pro Pro-
head Svergun, D.I., Feigin, L.A. & Schedrin, B.M. Svergun, D.I., Feigin, L.A. & Schedrin, B.M. (1982) (1982) Acta Cryst. Acta Cryst. A38 A38, 827 , 827 Agirrezabala, J. M. Agirrezabala, J. M. et al. et al. & Carrascosa J.L. (2005) & Carrascosa J.L. (2005) EMBO J. EMBO J. 24 24, 3820 , 3820 Mature virus Mature virus
Homogeneous particle Scattering density in spherical coordinates Homogeneous particle Scattering density in spherical coordinates (r,ω) = (r,θ,ϕ) may be described by the envelope function:
) ( , 1 ) ( ω F r ≤ ≤ ⎨ ⎧
r ρ
) ( ) ( , , ) ( ω ρ F r > ⎩ ⎨ ⎧ = r
Shape parameterization by a limited series of spherical harmonics:
F(ω) is an envelope function
⋅ = ≅
L L
Y f F F ) ( ) ( ) (
l l l
ω ω ω
series of spherical harmonics:
envelope function
Ylm(ω) – orthogonal spherical harmonics, flm – parametrization coefficients,
Small-angle scattering intensity from the entire particle is
= − =
≅
L
Y f F F ) ( ) ( ) (
l l m lm lm
ω ω ω
g g y p calculated as the sum of scattering from partial harmonics:
L 2 2 l Stuhrmann, H. B. (1970) Z.
177 198
= − =
=
theor
s A s I
2 2
) ( 2 ) (
l l m lm
π
177-198. Svergun, D.I. et al. (1996) Acta
H ti l
Homogeneous particle
f00 = + f + + +
r ρ
f00 A00(s) = f11 + + A11(s)
r
f20 + +
f22 +
+…
( )
envelope function
A20(s) A22(s) +
Spatial resolution: , R – radius of an equivalent sphere.
) 1 ( π + = L R δ
p , q p Number of model parameters flm is (L+1)2. One can easily impose symmetry by selecting appropriate harmonics in the sum.
) 1 ( + L
One can easily impose symmetry by selecting appropriate harmonics in the sum. This significantly reduces the number of parameters describing F(ω) for a given L.
Vector of model parameters: A sphere of radius Dmax is filled by densely packed beads of radius r0<< Dmax ⎧ particle if 1 Position ( j ) = x( j ) = (phase assignments) Solvent Particle r0 Dmax ⎩ ⎨ ⎧ solvent if particle if 1
Number of model parameters M ≈ (Dmax / r0)3 ≈ 103 is too big for conventional minimization methods – Monte-Carlo like approaches are to be used
But: This model is able to describe rather complex describe rather complex shapes
Chacón, P. et al. (1998) Biophys. J. 74, 2760 2775 2r0 2760-2775. Svergun, D.I. (1999) Biophys. J. 76, 2879-2886 2r0
Dmax
Pure Monte Carlo runs in a danger to be trapped into a Pure Monte Carlo runs in a danger to be trapped into a local minimum local minimum Solution: use a global minimization method like Solution: use a global minimization method like simulated annealing or genetic algorithm simulated annealing or genetic algorithm
Local search always goes to a better Local search always goes to a better point and can thus be trapped in a local point and can thus be trapped in a local point and can thus be trapped in a local point and can thus be trapped in a local minimum minimum Pure Monte Pure Monte-
Carlo search always goes to th l t l l i i ( t id th l t l l i i ( t id the closest local minimum (nature: rapid the closest local minimum (nature: rapid quenching and vitreous ice formation) quenching and vitreous ice formation) To get out of local minima, global search To get out of local minima, global search To get out of local minima, global search To get out of local minima, global search must be able to (sometimes) go to a must be able to (sometimes) go to a worse point worse point Slower annealing allows to search for a Slower annealing allows to search for a Slower annealing allows to search for a Slower annealing allows to search for a global minimum (nature: normal, e.g. global minimum (nature: normal, e.g. slow freezing of water and ice formation) slow freezing of water and ice formation)
Aim: find a vector of M variables {x} minimizing a function f(x) 1. Start from a random configuration x at a “high” temperature T. 2. Make a small step (random modification of the configuration) x → x’ and 2. Make a small step (random modification of the configuration) x → x and compute the difference Δ = f(x’) - f(x). 3. If Δ < 0, accept the step; if Δ > 0, accept it with a probability e- Δ /T 4. Make another step from the old (if the previous step has been rejected) 4. Make another step from the old (if the previous step has been rejected)
5. Anneal the system at this temperature, i.e. repeat steps 2-4 “many” (say, 100M tries or 10M successful tries, whichever comes first) times, (say, 100M tries or 10M successful tries, whichever comes first) times, then decrease the temperature (T’ = cT, c<1). 6. Continue cooling the system until no improvement in f(x) is observed. Shape determination: M≈ 103 variables (e.g. 0 or 1 bead assignments in DAMMIN Rigid body methods: M≈ 101 variables (positional and rotational parameters Rigid body methods: M≈ 10 variables (positional and rotational parameters
f(x) is always (Discrepancy + Penalty)
Using Using simulated simulated annealing, annealing, finds finds a a compact compact dummy dummy atoms atoms configuration configuration X X that that fits fits the the scattering scattering data data by by i i i i i i i i minimizing minimizing
) ( )] , ( ), ( [ ) (
exp 2
X P X s I s I X f α χ + =
where where χ is is the the discrepancy discrepancy between between the the experimental experimental and and calculated calculated curves, curves, P(X) P(X) is is the the penalty penalty to to ensure ensure compactness compactness and and connectivity, connectivity, α> 0 its its weight weight. .
compact compact p loose loose disconnected disconnected
The 3D model is required not only to fit the data but also to fulfill (often stringent) physical and/or biochemical constrains
The 3D model is required not only to fit the data but also to fulfill (often stringent) physical and/or biochemical constrains
Program DAMMIN Slow mode Bovine serum albumin, molecular mass 66 kDa, no symmetry imposed
Program DAMMIN Slow mode Bovine serum albumin: comparison of the ab initio model with the crystal structure of human serum albumin
DAMMIF is a completely reimplemented DAMMIN written in object-oriented code
than DAMMIN (in fast mode takes about 1 2 min mode, takes about 1-2 min
volume
CPUs
Franke, D. & Svergun, D. I. (2009)
Very low resolution
Very low resolution
Ambiguity of the models
Ambiguity of the models Accounts for a restricted portion of the data
lg I(s) 8 Resolution, nm 2.00 1.00 0.67 0.50 0.33 7
Shape
Atomic structure
How to construct ab initio
5 6
F
How to construct ab initio models accounting for higher resolution data?
s, nm-1 5 10 15
Proteins
Proteins typically typically consist consist
folded folded polypeptide polypeptide chains chains composed composed of
amino acid acid residues residues At a resolution
0.5 nm a protein can be represented by an ensemble of K dummy residues Scattering from such a model centered at the Cα positions with coordinates { ri} Scattering from such a model is computed using the Debye (1915) formula. Starting from a random model, simulated annealing is employed similar to DAMMIN
Excluded volume effects and local interactions lead to a characteristic distribution of nearest neighbors g around a given residue in a polypeptide chain
Number of neighbours 5 6 3 4 1 2 Shell radius, nm 0.2 0.4 0.6 0.8 1.0
log I relative
Envelope Envelope Bead model Bead model Dummy residues Dummy residues
log I, relative Experimental data Envelope model Bead model 2 Bead model Dummy residue model 1
1
5 10 s, nm-1
Comparison with the crystal Comparison with the crystal SASHA SASHA DAMMI N DAMMI N GASBOR GASBOR structure structure of lysozyme
1996 1999 1999 2001 2001
Z M
I27 FNIII TK M5
Z
26 926 aa 1.2 μm 26 926 aa
N C
Z-disc I-band A-band H-zone Z1Z2 Z7 I1 I27 Ax TK M5 fold IG EF IG IG FN-III kinase IG method X NMR X NMR NMR X NMR
NMR data: Pastore lab; X-ray data: Wilmanns lab
Z1Z2 includes two modules at the N-terminal of the Z terminal of the Z-disc of titin and disc of titin and
Z1Z2 includes two modules at the N Z1Z2 includes two modules at the N terminal of the Z terminal of the Z disc of titin and disc of titin and interacts with telethonin interacts with telethonin Shape of Z1Z2 and localization of the his-tag Cross-linking function
Native Z1Z2 His-Z1Z2 Tele90-Z1Z2 Tele90 Z1Z2
Zou, P ., Gautel, M., Geerlof, M., Wilmanns, M., Koch, M.H.J. & Svergun, D.I. (2003)
~100 Å
Zou P ., Pinotsis N., Lange S., Song Y .H., Popov A., Mavridis I., Mayans O.M., Gautel M. & Wilmanns M. (2006) Nature 439, 229-33.
One component, one scattering pattern: “normal” shape determination One component, one scattering pattern: “normal” shape determination
Chacón, P . et al. (1998) Biophys. J. 74, 2760-2775 Svergun, D.I. (1999) Biophys. J. 76, 2879-2886
A+ B A B
Many components, many scattering patterns: shape and internal structure Many components, many scattering patterns: shape and internal structure
Svergun, D.I. (1999) Biophys. J. 76, 2879-2886 Svergun, D.I. & Nierhaus, K.H. (2000) J. Biol. Chem. 275, 14432-14439
EG+ C C subunit Ab initio shapes C subunit EG subunit Scattering from free subunits and their complex in solution complex in solution
3D map of the yeast V-ATPase by electron microscopy.
In solution, EG makes an L-shaped assembly with subunit-C. This model is supported by the EM showing three copies of EG, two of them linked by C. The data further indicate a conformational
py
Diepholz, M. et al. (2008) Structure 16, 1789-1798
t e ed by C e data u t e d cate a co
change of EGC during regulatory assembly/disassembly.
lg I, relative
11 0% D2O 40% D2O 55% D2O 10 75% D2O 100% D2O 8 9
s, nm-1
0.5 1.0 1.5 2.0 8
2
>
Δ Δ + Δ =
k j jk m i m j j j m j m
s I s I s I ) ( 2 ) ( ) ( ) (
2
ρ ρ ρ
Start: random phase assignments within the search volume, no fit to the experimental data Finish: condensed multiphase model with minimum interfacial area fitting multiple data sets the experimental data fitting multiple data sets
Program MONSA, Svergun, D.I. (1999) Biophys. J. 76, 2879; Petoukhov, M.V. & Svergun, D. I. (2006) Eur. Biophys. J. 35, 567.
Ran (structure known) Exportin-t t-RNA (structure known) Ran (structure known) Exportin t t RNA (structure known) (tentative homology model)
lg I, relative
8 Ternary complex Ran tRNA Fits 6 7 0 5 1 0 1 5 2 0 5
s, nm-1
0.5 1.0 1.5 2.0
One X-ray scattering pattern from the ternary complex fitted by DAMMIN
Fukuhara, N., Fernandez, E., Ebert, J., Conti, E. & Svergun, D. I. (2004) J. Biol. Chem. 279, 2176
X-
ray scattering
From Exportin-t, Ran, tRNA t, Ran, tRNA 3 curves 3 curves From Exportin From Exportin t, Ran, tRNA t, Ran, tRNA 3 curves 3 curves
Neutron scattering Neutron scattering
Ternary complex with protonated Ran in 0, 40, 55, 75, 100% D in 0, 40, 55, 75, 100% D2O 5 curves 5 curves
Ternary complex with deuterated Ran y p y p in 0, 40, 55, 70, 100% D in 0, 40, 55, 70, 100% D2O 5 curves 5 curves
TOTAL TOTAL 13 curves 13 curves
lg I, relative
8 Ternary complex Ran tRNA Fits
lg I, relative
6 7 11 0% D2O 40% D2O 55% D2O 75% D2O 100% D 0 5 1 0 1 5 2 0 5 9 10 100% D2O Fits
s, nm-1
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 8
Three X ray and five neutron data
s, nm-1
Three X-ray and five neutron data sets fitted by MONSA
lg I, relative
7 8 Ternary complex Ran tRNA Fits
lg I, relative
11 0% D2O 40% D2O
lg I, relative
6 7 10 55% D2O 75% D2O 100% D2O Fits
11 0% D2O 40% D2O 55% D2O 70% D O
s, nm-1
0.5 1.0 1.5 2.0 5 9
10 70% D2O 100% D2O Fits
, s, nm-1
0.5 1.0 1.5 2.0 8
9
Three X ray and ten neutron data
s, nm-1
0.5 1.0 1.5 2.0
Three X-ray and ten neutron data sets fitted by MONSA
lg I, relative
8 Ternary complex Ran tRNA
lg I, relative
11 0% D2O 40% D O 6 7 tRNA Fits 9 10 40% D2O 55% D2O 75% D2O 100% D2O Fits
s, nm-1
0.5 1.0 1.5 2.0 5
s, nm-1
0.5 1.0 1.5 2.0 8 9
lg I, relative
11 0% D2O 40% D2O 55% D2O 10
270% D2O 100% D2O Fits
High resolution models of the components docked into the three-phase ab initio model of the l b d X d t tt i
s, nm-1
0.5 1.0 1.5 2.0 9
complex based on X-ray and neutron scattering from selectively deuterated particles
Domain and quaternary structure Complexes and assemblies Domain and quaternary structure Complexes and assemblies
Dcp1/Dcp2 complex S-layer proteins Toxin B α-synuclein oligomers She et al, Mol Cell (2008) Fagan et al Mol. Microbiol (2009) Albesa-Jové et al JMB (2010) Giehm et al PNAS USA (2011)
Structural transitions Flexible/transient systems
Src kinase Cytochrome/adrenodoxin Microbiol (2009) Complement factor H Bernado et al JMB (2008) Xu et al JACS (2008) Morgan et al NSMB (2011)
Genetic algorithm DALAI GA (Chacon et al., 1998, 2000)
Genetic algorithm DALAI_GA (Chacon et al., 1998, 2000)
‘Give-
n-
take’ procedure SAXS3D (Bada et al., 2000)
Spheres modeling program GA STRUCT (Heller et al., 2002)
Spheres modeling program GA_STRUCT (Heller et al., 2002)
Envelope models: SASHA(1)
(1) (Svergun et al., 1996)
(Svergun et al., 1996)
Dummy atoms: DAMMIN(1,4)
(1,4) & MONSA
& MONSA(1,2)
(1,2) (Svergun 1999)
(Svergun 1999)
Dummy atoms: DAMMIN(
) ( ) & MONSA
& MONSA(
) ( ) (Svergun, 1999)
(Svergun, 1999)
Dummy residues: GASBOR(1,3)
(1,3) (Petoukhov et al., 2001)
(Petoukhov et al., 2001)
(1) (1) Able to impose symmetry and anisometry constrains
Able to impose symmetry and anisometry constrains
( ) ( ) Able to impose symmetry and anisometry constrains
Able to impose symmetry and anisometry constrains
(2) (2) Multiphase inhomogeneous models
Multiphase inhomogeneous models
(3) (3) Accounts for higher resolution data
Accounts for higher resolution data
( ) ( ) Accounts for higher resolution data
Accounts for higher resolution data
(4) (4) DAMMIF is 30 times faster (D.Franke & D.Svergun, 2009)
DAMMIF is 30 times faster (D.Franke & D.Svergun, 2009)
Or Always remember about ambiguity!
Shape determination of 5S RNA: a variety of Shape determination of 5S RNA: a variety of DAMMIN models yielding identical fits DAMMIN models yielding identical fits y g y g
Funari, S., Rapp, G., Perbandt, M., Dierks, K., Vallazza, M., Betzel, Ch., Erdmann, V. A. & Svergun, D. I. (2000) J. Biol. Chem. 275, 31283-31288.
Program Program SUPCOMB SUPCOMB – – a tool to align and conquer a tool to align and conquer
Aligns heterogeneous high
Aligns heterogeneous high-
and low-
resolution models and provides a dissimilarity measure (NSD) provides a dissimilarity measure (NSD)
For shape determination allows one to find common
For shape determination allows one to find common
For shape determination, allows one to find common
For shape determination, allows one to find common features in a series of independent reconstructions features in a series of independent reconstructions
Kozin, M.B. & Svergun, D.I. (2001) J. Appl. Crystallogr. 34, 33-41
1. Find a set of solutions starting from random initial models and superimpose all pairs of models with SUPCOMB
superimpose all pairs of models with SUPCOMB. 2. Find the most probable model (which is on average least different from all the others) and align all the other models with this reference
3. Remap all models onto a common grid to obtain the solution spread region and compute the spatial occupancy density of the grid points region and compute the spatial occupancy density of the grid points. 4. Reduce the spread region by rejecting knots with lowest occupancy to find the most populated volume 5. These steps are automatically done by a package called DAMAVER if you just put all multiple solutions in one directory
Program DAMAVER, Volkov & Svergun (2003) J. Appl. Crystallogr. 36, 860
Solution spread region Solution spread region
Most populated volume Most populated volume
The final model obtained within The final model obtained within the solution spread region the solution spread region
10
10
0 Idata SASHA DAMMIN
10
10
0 Idata SASHA DAMMIN
cube cube
10
10
10 DAMMIN 10
10
10 0.0 0.2 0.4 0.6 0.8 1.0 10
s
0.0 0.2 0.4 0.6 0.8 1.0
s
Prism 1:2:4 Prism 1:2:4
I
data SASHA
cylinder 2:5 cylinder 2:5
Spread Spread Average NSD ≈ 0.5 Average NSD ≈ 0.5
10
210
310
4DAMMIN
region Most region Most
0.0 0.1 0.2 0.3 0.4 10
1s
Most probable volume Most probable volume
cylinder 1:10 cylinder 1:10
10
4I
data SASHA DAMMIN
Spread region Spread region
10
3Most probable l Most probable l
0.0 0.1 0.2 0.3
s
Ring 1:3:1 Ring 1:3:1
volume volume Average NSD ≈ 0.9 Average NSD ≈ 0.9
10
410
5I
data body 1 body 2
1
g
Spread region Spread region
10
210
33
1
Most probable Most probable
0.0 0.1 0.2 0.3
s
volume volume
Volkov, V .V . & Svergun, D.I. (2003) J. Appl. Crystallogr. 36, 860-864.
Disk 5:1 Disk 5:1
10
410
5 Idata SASHA DAMMIN
Spread region Spread region
10
210
310
Most probable volume Most probable volume
0.0 0.1 0.2 0.3 10
1s
Disk 10:1 Disk 10:1
Very long search may provide more accurate model Very long search may provide more accurate model
10
4I
data SASHA DAMMIN
Spread region Spread region
10
3g Most probable volume g Most probable volume
0.0 0.1 0.2 0.3
s
This structure can not be restored without use of additional information This structure can not be restored without use of additional information Average NSD > 1 Average NSD > 1
Typical solution with P5 symmentry Typical solution with P5 symmentry Original body Original body Typical solution with no Typical solution with no Typical solution with no symmetry Typical solution with no symmetry Spread region Most probable volume Spread region Most probable volume
However: symmetry biases the results and must also be used with caution. Always run in P1 first! However: symmetry biases the results and must also be used with caution. Always run in P1 first!
P1 P3 P3, prolate
Svergun, D.I., Konrad, S., Huss, M., Koch, M.H.J., Wieczorek, H., Altendorf, K.- H., Volkov, V .V . & Grueber, G. (1998) Biochemistry 37, 17659-17663.
2012 2012 1993 1993
M.Petoukhov,
D Franke: D.Franke:
Ab initio tutorial