Studies of flexible systems
D.Svergun, EMBL-Hamburg
Studies of flexible systems D.Svergun, EMBL-Hamburg DOGMA: No - - PowerPoint PPT Presentation
Studies of flexible systems D.Svergun, EMBL-Hamburg DOGMA: No function without structure! Not true! Flexibility enables more function Hub proteins in networks Interaction specialists P. Tompa et al . Intrinsically disordered proteins:
D.Svergun, EMBL-Hamburg
Flexibility enables more function
Hub proteins in networks Interaction specialists
in Structural Biology 35 (2015): 49-59.
D
The scattering is proportional to that
which allows
to determine size, shape and internal structure of the particle at low (1-10 nm) resolution. For equilibrium and non-equilibrium mixtures, solution scattering permits to determine the number of components and, given their scattering intensities Ik(s), also the volume fractions
Dealing with flexibility is not easy but possible
This plot provides a sensitive means of monitoring the degree of compactness of a protein as a function of a given parameter. Most conveniently represented using the so-called Kratky plot of s2I(s) vs s. Globular particle : bell-shaped curve Gaussian chain : plateau at large s-values but beware: a plateau does not imply a Gaussian chain
Folded: relatively small Rg and Dmax, bell-shaped Kratky plot (e.g. for folded α-amylase (448 AAs) Rg=2.4 nm) Disordered: large Rg and Dmax, increasing Kratky plot (e.g. for IUP tau (441 AAs) Rg=6.5 nm)
Receveur-Brechot V. & Durand D (2012), Curr Protein Pept. Sci., 13, 55-75 The bell shape vanishes as folded domains disappear and flexibility increases
k k k
dr sr sr r p s I
D
sin ) ( 4 ) (
vk = volume fraction Ik(s) = scattering intensity from the k-th component For monodisperse systems the scattering is proportional to that of a single particle averaged
SAXS curves
Analysis of the overall size descriptors (Rg, p(r), Kratky)
Modelling: ab initio (DAMMIN/DAMMIF) and Rigid body (BUNCH/CORAL) Analysis of the differences
Rigid Scenario Flexible Scenario
Go for flexibility!
Detection of Flexibility: A Crucial Issue PolyUbiquitin Molecules
2,3,4 and 5 Ubiquitin (72 AA) domains connected by 20 AA linker (RanCH)
Flexible Rigid
Flexible Multidomain Proteins present less features in the SAXS curve than their rigid counterparts Bernadó Eur. Biophys. J. 2009, 39, 769
Rigid Scenario Flexible Scenario
Detection of Flexibility: A Crucial Issue
Flexible Rigid Flexible Proteins have large Dmax values and smooth ending to p(Dmax) =0 Not more than two peaks in the p(r), indicating distal correlation between folded domains, appear in the flexible scenario Additional peaks are only present in the rigid scenario
Bernadó Eur. Biophys. J. 2009, 39, 769
Flexible Rigid
Good fits are obtained in both ab initio and rigid body modelling No structural variation is observed between solutions Homogeneous densities are
There is a systematic decrease of resolution Domains appear isolated, no interdomain contacts are observed in BUNCH solutions
Modelling the Flexible Scenario with Single Conformation Strategies
Bernadó Eur. Biophys. J. 2009, 39, 769
► Smooth scattering profiles and featureless Kratky Plots ► Large Rg and Dmax values ► Absence of correlation peaks in the p(r) function ► Low correlation densities in ab initio reconstructions ► Isolated domains in rigid body modelling ► Prediction of disorder using bioinformatics tools
http://www.idpbynmr.eu/home/science/research‐tools.html
A general approach for characterization of flexible systems: Ensemble Optimization Method (EOM)
Problem of standard methods: notoriously flexible systems, e.g.
could often not be interpreted in terms of a single model (either no fit to the data or irreproducibility of reconstructions) Solution: take flexibility into account by allowing for co- existence of multiple conformations, which are selected from a large initial random pool
Bernadó, P., Mylonas, E., Petoukhov, M.V., Blackledge, M., & Svergun, D. I . (2007)
pool (~104-105) random models and compute their scattering patterns
sub-ensemble(s) (~100-101) such that their mixture fits the available experimental data (also from deletion mutants, if available)
selected ensembles to characterize the flexibility of the macromolecule
Bernadó, P., Mylonas, E., Petoukhov, M.V., Blackledge, M., & Svergun, D. I . (2007)
... Rg
1
Rg
2 Rg 3 Rg 4
Rg
5
...
(Rg)
...
N n n sI N s I
1) ( 1 ) (
Genetic Algorithm Pool generation Crysol
Mutation Crossing Crossing Mutation Elitism Generation 1 Elitism Generation 2 Chromosome Chromosome
N n n s
I N s I
1
) ( 1 ) (
C C
Kohn et al. PNAS, 2004, 101, 12491 Rg
Rg = R0∙N R0 Persistence Length Solvent ‘quality’
Several experimental and theoretical studies establish 0.598 as an indication of the ‘random coil’ in chemically denatured (Urea or GuHCl) proteins. N
Theoretical distribution of the bond and dihedral angles for random chains Quasi Cα -Cα Ramachandran plot
coordinates alone, JMB, 1997, 273, 371‐376
Bond angles vs. Dihedral angles
R0=1.9270.270 0.598 0.028 ‘fully disordered’, IDPs R0=2.540.01 0.522 0.010 ‘less disordered’
22
> CYLSRKLMLDARENLKLLDRMNRLSPHSCL QDRKDFGLPQEMVEGDQLQKDQAFPVLYE MLQQSFNLFYTEHSSAAWDTTLLEQLCTGL QQQLDHLDTCRGQVMGEEDSELGNMDPIV TVKKYF sequence.seq curve.dat
Inputs for using EOM:
0.01 0.02 0.03 0.04 0.05 0.06 0.07 20 40 60 80 Series1 Series2 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 50 100 150 200 250 Series1 Series2 pool ensembles pool ensembles Rg [Å] Dmax [Å] Rg = 45.05 Rg = 32.96 Dmax = 140.38 Dmax = 101.02
24
MRIGMV……..GGVQSHVLQ…..VLRDAGHEVS…….PHVKLPDYVS
missing loop 30 AA
Kratky Plot
apoferritin
vs.
pool
Nter.pdb Cter.pdb seq.seq curve.dat
25
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 21 23 25 27 Series1 Series2 Rg [Å] pool ensembles Rg = 24.46 Rg = 24.28
I(s) vkIk(s)
k
❏Symmetric core ❏Symmetric linkers/termini ❏Symmetric core ❏Asymmetric linkers/termini
??missing??
31 AA N-terminal tail
??missing??
27
pool
high resolution (MX) N-terminal pentamer domain
(full length protein measured in two buffers, with low and high ionic strength respectively)
high resolution (MX) C-terminal monomer domain
??missing??
122 AA inter-domains linker
??missing??
28
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 20 40 60 80 100 Pool High ionic strength Low ionic strength 0.01 0.02 0.03 0.04 0.05 0.06 0.07 80 130 180 230 280 330 380 Pool High ionic strength Low ionic strength
Rg, Å Dmax, Å
Multi-curves fitting
pool
(full length protein measured in two buffers with low and high ionic strength respectively)
29
Case extra: dodecamer (P62, 2 domains) + tRNA
158 AA N-terminal tail high resolution (MX) N-terminal monomer domain (141 AA) 9 AA inter-domains linker high resolution (MX) C-terminal monomer domain (270 AA) 30 N single strand tRNA
subUnit contact residues range max distance in
Number of chains: 1 000
Rg Density 10 20 30 40 50 60 0.00 0.02 0.04 0.06 0.08Number of chains: 5 000
Rg Density 10 20 30 40 50 60 0.00 0.02 0.04 0.06 0.08Number of chains: 10
Rg Density 10 20 30 40 50 60 0.00 0.05 0.10 0.15 0.20 0.25 0.30Number of chains: 10 000
Rg Density 10 20 30 40 50 60 0.00 0.02 0.04 0.06 0.08Number of chains: 64 790
Rg Density 10 20 30 40 50 60 0.00 0.02 0.04 0.06 0.08Number of chains: 100
Rg Density 10 20 30 40 50 60 0.00 0.05 0.10 0.1530
Generate a pool, select two subpopulations from it and calculate scattering curve for their mixture
31
Wide subpopulations
Rg, Å Rg, Å Rg, ÅNarrow subpopulations
Comparison of Rg distributions showing that subpopulations of conformers can be identified from a large ensemble if the difference between their mean Rg is greater than approximately two times the standard deviations of the original pool. The Rg values of the two subpopulations are indicated as vertical lines on each plot.
EOM resolution and multimodal distributions
EOM 2.0 G. Tria, H. Mertens, M. Kachala, D.I. Svergun, IUCr J. (2015) 2, p. 207
EOM 2.0
Tau protein “structure”
I(s) vkIk(s)
k
Mylonas et al. (2008), Biochemistry 47:10345‐10353
* * * * * * * * * * * *
Pool Sel
NGF (Nerve Growth Factor) is involved in the maintenance and growth of specific neuronal populations, both in the central and peripheral nervous system, and exerts its biological role by binding to different receptors (antibodies). ProNGF consists of a compact dimeric NGF part (2*13 KDa) and 2*113 residues (flexible) pro‐region. The X‐ray structure of NGF homodimer (MM=26 kDa) is known, whereas there is no structural information about pro‐region of rm‐ proNGF.
Structural and functional properties of mouse proNGF
Ab initio models BUNCH models
Structural and functional properties of mouse proNGF
The analysis by EOM
‐ Pro‐regions are more compact
than random chains. ‐ One pro‐region is folded, the other is extended.
Structural and functional properties of mouse proNGF
Structural and functional properties of mouse proNGF
(A) without AMS (B) with AMS BUNCH model of ProNGF with AMS (green pro‐peptides) together with EOM model of ProNGF without AMS (red pro‐peptides). The increase of the proNGF compactness upon AMS addition confirms the likely nature of proNGF as an intrinsically disordered protein. SAXS experiments exploited the effect of chemical chaperones. One of them, ammonium sulfate (AMS), decreased the Rg from 2.9 nm of the “naked” protein to 2.65 nm.
J.Falces, I.Arregi, P.V.Konarev, M.A.Urbaneja, D.I.Svergun, S.G.Taneva & S.Bañuelos Biochemistry (2010) 49, 9756-9769 Nuclear import of the pentameric nucleoplasmin (NP1) is mediated by importin , that recognizes its nuclear localization sequence (NLS), and importin , that interacts with and is in charge of the translocation of NP// complex through the nuclear pore. According to ITC measurements NP pentamer can bind with high affinity 5 importin / heterodimers. The solution structures of / heterodimer, NP/ and NP// were reconstructed using SAXS data. Importin NP pentamer core
J.Falces, I.Arregi, P.V.Konarev, M.A.Urbaneja, D.I.Svergun, S.G.Taneva & S.Bañuelos Biochemistry (2010) 49, 9756-9769 / (1:1) NP/ (1:5) NP// (1:5:5) DAMMIF and SASREF models
J.Falces, I.Arregi, P.V.Konarev, M.A.Urbaneja, D.I.Svergun, S.G.Taneva & S.Bañuelos Biochemistry (2010) 49, 9756-9769 The formed multi-domain complex shows an extended shape, and remains stable by virtue of two attachment points: recognition of the NLS by importin a and recognition of the IBB domain by importin b, which allow for conformational flexibility. This modular and articulated architecture might facilitate the passage of such a big particle through the nuclear pore complex. EOM analysis
► Many biological functions such as transcription, regulation, cell cycle control, requires extensive flexibility ► Is more common in higher organisms that have to perform more and more controlled functions. Disorder is correlated with complexity ► High selectivity and moderate affinity properties are often linked to flexibility ► …and not jut proteins!!! RNAs are often highly flexible biomolecules
The hexameric Hfq (HfqEc) is involved in riboregulation of target mRNAs by small trans‐encoded RNAs. Hfq proteins of different bacteria comprise an evolutionarily conserved core, whereas the C‐terminus is variable in length. By bioinfomatics, NMR, CD and SAXS the C‐termini are demonstrated to be flexible and to extend laterally away from the hexameric core. The flexible C‐terminal moiety is capable of tethering long and structurally diverse RNA molecules.
Beich‐Frandsen M, Vecerek B, Konarev PV, Sjöblom B, Kloiber K, Hämmerle H, Rajkowitsch L, Miles AJ, Kontaxis G, Wallace BA, Svergun DI, Konrat R, Bläsi U and Djinovic‐Carugo K. (2011) Nucleic Acids Res. 39, 4900‐15
Dynamics and function of the C‐terminus of the E. coli RNA chaperone Hfq
Almeida Ribeiro, E., Beich‐Frandsen, M., Konarev, P. V., Shang, W., Vecerek, B., Kontaxis, G., Hammerle, H., Peterlik,H., Svergun, D. I., Blasi, U. & Djinovic‐Carugo, K. (2012) Nucleic Acids Res. 40, 8072‐8084.
DsrA domain II bound to the RNA chaperone Hfq
SAXS on truncated and full length Hfq complexes reveals 1:1 complexes with a limited flexibility
the sRNA conformational space
Crystal and solution structures of substrate‐bound chitinase from Moritella marina
Chitinases break down glycosidic bonds in chitin and only few crystal structures are reported because of the flexibility of these enzymes. The dimeric crystal structure (at BESSY) of chitinase 60 from M. marina (MmChi60) contains four domains: catalytic, two Ig-like, and chitin-binding (ChBD). There are however indications for flexibility in the ligand-binding region
Crystal and solution structures of substrate‐bound chitinase from Moritella marina
SAXS (at EMBL) demonstrates that MmChi60 is monomeric and flexible in
to probe the surface of chitin.
Catalytic domain ChBD
Resolving a SAXS/FRET controversy for IDPs
Chemically denatured proteins and intrinsically disordered proteins (IDPs) populate heterogeneous conformational ensembles in solution. SAXS measures their average size as the radius of gyration (Rg). Single-molecule FRET (smFRET) provides the mean dye-to-dye distance (RE) for proteins with fluorescently labeled termini. Several studies reported inconsistencies between SAXS and smFRET on native and chemically denatured IDPs. SAXS: Rg only marginally changes upon chemical denaturation of an IDP smFRET: RE significantly increases when an IDP is chemically unfolded suggesting that a native IDP is in a “collapsed” state
Fuertes G, Banterle N, Ruff KM, Chowdhury A, Mercadante D, Koehler C, Kachala M, Estrada Girona G, Milles S, Mishra A, Onck PR, Gräter F, Esteban-Martín S, Pappu RV, Svergun DI, Lemke EA. (2017) Proc Natl Acad Sci USA, 114: E6342-E6351
Resolving a SAXS/FRET controversy for IDPs
SAXS provides not just the Rg but also the overall shape!
In the above normalized plot A, a map of “asphericity” is given: the more to the right, the more anisometric the average shape is (given the same Rg). Red: native IDP Blue: chemically denatured IDP The native IDP ensembles populate more isometric states compared to the unfolded IDPs
Resolving a SAXS/FRET controversy for IDPs
Atomistic simulations of the IDPs were conducted using the CAMPARI force- field with ABSINTH implicit solvation shell. The ensembles were reweighted to agree with the FRET and SAXS observations, and the chemically denatured ensembles clearly displayed more anisometric appearance. Therefore the observed increase in RE is simply a consequence of higher anisometry of the chemically unfolded IDPs compared to natives.
Fuertes G, Banterle N, Ruff KM, Chowdhury A, Pappu RV, Svergun DI, Lemke EA. (2018) Science, 361 pii: eaau8230.
Structural Meaning of the Selected Conformations None Unstructured system can be described with only 50 conformations… But SAXS data can. SAXS is a low resolution technique!!!! It is tempting to look at the structures at atomic/residue level… Don’t do that because (Remember that) SAXS is a low resolution technique and the information content is limited Structures collected are simply a TOOL to describe the shape distributions... If certain structure is collected at each run… It does not mean that it is prevalent in solution FAQ about EOM for Unstructured Proteins
EOM allows one to quantitatively characterize the flexibility of a particle (what the conformations that the protein prefers in solution) Intrinsically unfolded proteins (IDP) can be easily modelled with EOM SAXS in solution can be used as complementary technique to model flexible systems, disordered regions, etc..
Basis-Set Supported SAXS (S. Yang, L. Blachowicz, L. Makowski and
A basis set comprising a small number of assembly states is generated from coarse grained simulations. From these, a relative population of the different assembly states is determined via a Bayesian-based Monte Carlo procedure seeking to optimize the theoretical scattering profiles against experimental SAXS data EROS (S. Yang and B. Roux, (2011) Structure, 19, 3) Combines SAXS with the results of coarse/grained computer simulations for multi/domain proteins. Minimal Ensemble Search (M. Pelikan, G. L. Hura and M. Hammel, (2009), Gen. Physiol. Biophys., 28, 174) A genetic algorithm is used to identify the minimal ensemble required to best fit the experimental data from a pool of models. For these models molecular dynamics (MD) simulations are used to explore the conformational space.
Basis-Set Supported SAXS Reconstruction
Yang, Blachowicz, Makowski & Roux PNAS 2010, 107, 15757 Large Pool of Conformations using a physically meaningful Force Field Clustering the Structures (25 Families) Clustering SAXS Curves (9 Families)
2 exp calc max q min q 2 i 2 Ns i i calc
) q ( I log ) q ( I log ) q ( 1 P ) q ( I P ) q ( I
Bayesian Statistics Pi and Pi
BILBOMD – MES Strategy
Pellikan et al. Gen. Physiol. Biophys. 2009, 28, 174
Use high temperature MD simulations to generate the pool Applies size and Rg restrictions in this calculation. They do not use the pool as a threshold GA is applied using 2,3,4, and 5 conformations to describe the curve Analysis of Rg and Dmax of the conformations collected to decide whether the protein is flexible or rigid
model and normal modes
fit the experimental data
Dealing with flexibility is not easy But the results can be really very nice and interesting
► With SAS one can meaningfully address flexibility. In fact it is the most powerful technique to study large amplitude motions in solution. ► Discerning between flexible and rigid scenarios is fundamental ► Ensemble Methods (EOM, MES, BSS‐SAXS etc.) are appropriate tools to study (potentially) flexible molecules. ► Unique structural information based on distributions can be obtained with ensemble methods
Conclusions