Protein Structure Prediction 1 Ram Samudrala, University of - PowerPoint PPT Presentation

Protein Structure Prediction 1 Ram Samudrala, University of Washington

Rationale for Understanding Protein Structure and Function structure determination Protein sequence structure prediction -large numbers of sequences, including Protein structure whole genomes - three dimensional - complicated ? - mediates function Protein function homology rational mutagenesis biochemical analysis - rational drug design and treatment of disease model studies - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution 2

Protein Folding DNA …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… protein sequence …-L-K-E-G-V-S-K-D-… one amino acid unfolded protein not unique mobile inactive expanded spontaneous self-organization irregular (~1 second) native state 3

Protein Folding DNA …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… protein sequence …-L-K-E-G-V-S-K-D-… one amino acid unfolded protein not unique mobile inactive expanded spontaneous self-organisation irregular (~1 second) unique shape native state precisely ordered stable/functional globular/compact helices and sheets 4

Protein Folding Landscape Large multi-dimensional space of changing conformations J=10 -3 s unfolded barrier free energy height molten globule Δ G * * native J=10 -8 s − Δ * G ∝ jump time (J) e RT 5 folding reaction

Protein Primary Structure twenty types of amino acids two amino acids join by forming a peptide bond R R H O H C H H N OH C α C α C α OH N C N C H H O O H H R each residue in the amino acid main chain has two degrees of freedom ( φ and ψ) R R H O H O H H χ χ ψ φ ψ φ C φ N C φ N C α C α C α C α N C N C ψ ψ χ χ H H O H O H R R 6 the amino acid side chains can have up to four degrees of freedom (χ 1-4 )

Protein Secondary Structure many φ,ψ combinations are not possible β sheet (anti-parallel) +180 β L φ 0 α C -180 0 ψ -180 +180 N β sheet (parallel) α helix C N 7

Protein Tertiary and Quaternary Structures Ribonuclease inhibitor (2bnh) Haemoglobin (1hbh) Hemagglutinin (1hgd) 8

Methods for Determining Protein Structure X-ray crystallography Protein sequence NMR spectroscopy expensive -large numbers of and slow sequences, including Protein structure whole genomes - three dimensional - complicated ? - mediates function Protein function homology rational mutagenesis biochemical analysis - rational drug design and treatment of disease model studies - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution 9

A Naïve Approach • Use the first principles to produce the native conformation of a protein • not only the correct structure, but entire energy landscape ab initio !!! • it would explain dynamic behavior of a protein Let’s see how this could work… • there are only 5 atom types (C, H, O, N, S) , so if we can accurately model interactions between them, we could get to the solution of the folding problem So, why is it then so complicated… • atomic interactions cannot be modeled with sufficient accuracy (plus proteins are only marginally stable) • some phenomena are highly non-linear (for example, Van der Waals forces) • large number in the degrees of freedom + modeling water molecules 10

Predictions Needed NOW!!! • Pure ab initio approach is out of reach for a long time • We must adopt a less purist approach What should we do? • use approximations • use all available information • vast number of sequences • large number of structures • functional site information 11

Methods for Predicting Protein Structure comparative modeling fold recognition Protein sequence ab initio prediction -large numbers of sequences, including Protein structure whole genomes - three dimensional - complicated ? - mediates function Protein function homology rational mutagenesis biochemical analysis - rational drug design and treatment of disease model studies - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution 12

Overall Approach Protein Sequence Multiple Sequence Database Searching Domain Assignment Alignment Fold Secondary No Recognition Homologue Structure in PDB and Disorder Prediction Yes Yes Comparative Sequence-Structure Predicted Modelling Alignment Fold Ab-initio No Structure 3-D Protein Model Prediction 13 modified from http://bioinf.cs.ucl.ac.uk

Comparative (Homology) Modeling of Protein Structure Aims to produce protein models with high accuracy • • Proteins that have similar sequences (i.e., related by evolution) have similar three- dimensional structures • A model of a protein whose structure is not known can be constructed if the structure of a related protein has been determined by experimental methods • Similarity must be obvious and significant for good models to be built • Need ways to build regions that are not similar between the two related proteins • Need ways to move model closer to the native structure 14

Comparative Modeling of Protein Structure scan align … … KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** construct non-conserved build initial model side chains and main chains refine 15

Let’s Look Closer at Steps of Homology Modeling 1. Template recognition and initial alignment 2. Alignment correction 3. Backbone generation 4. Loop modeling 5. Side-chain modeling 6. Model optimization 7. Model validation 16

1. Template Recognition Recognition of similarity between the target and template Target – protein with unknown structure. Template – protein with known structure. Main difficulty – deciding which template to pick, multiple choices/template structures. Template structure can be found by searching for structures in PDB using sequence-sequence alignment methods . 19

Two Zones of Sequence Alignment Sequence identity 100 Safe homology modeling zone 50 Twilight zone 50 100 150 200 Alignment length 20

3. Backbone Generation 1. If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned. 2. If two aligned residues are the same, copy their side chain coordinates as well. 21

4. Loop Modeling insertion AHYATPTTT AH---TPSS deletion Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict. Approaches to loop modeling: knowledge-based : searches the PDB for loops with known structure - energy-based : an energy function is used to evaluate the quality of a loop. - Energy minimization or Monte Carlo. 22

4. Loop Modeling – Database Approach Scan database and search protein fragments with correct number of residues and correct end-to-end distances 23

5. Side-Chain Modeling Side chain conformations – rotamers. In similar proteins - side chains have similar conformations. If % identity is high - side chain conformations can be copied from template to target. If % identity is not very high - modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions. Problem: side chain configurations depend on backbone conformation which is predicted, not real E 2 E 3 E = min ( E 1 , E 2 , E 3 ) E 1 24

6. Model Optimization • Energy optimization of entire structure. • Since conformation of backbone depends on conformations of side chains and vice versa - iterative approach Predict rotamers Shift in backbone 25

6. Model Optimization??? CASP5 assessors, homology modeling category: “We are forced to draw the disappointing conclusion that, similarly to what observed in previous editions of the experiment, no model resulted to be closer to the target structure than the template to any significant extent.” The consensus is not to refine the model, as refinement usually pulls the model away from the native structure!! 26

Historical Perspective on Comparative Modeling BC alignment excellent side chain ~ 80% short loops 1.0 Å longer loops 2.0 Å 27

Historical Perspective on Comparative Modeling BC CASP1 alignment excellent poor side chain ~ 80% ~ 50% short loops 1.0 Å ~ 3.0 Å longer loops 2.0 Å > 5.0 Å 28

Prediction for CASP4 target T128/sodm C α RMSD of 1.0 Å for 198 residues (PID 50%) 29

Prediction for CASP4 target T122/trpa C α RMSD of 2.9 Å for 241 residues (PID 33%) 30

Prediction for CASP4 target T125/sp18 C α RMSD of 4.4 Å for 137 residues (PID 24%) 31

Prediction for CASP4 target T112/dhso C α RMSD of 4.9 Å for 348 residues (PID 24%) 32

Prediction for CASP4 target T92/yeco C α RMSD of 5.6 Å for 104 residues (PID 12%) 33

Protein Structure Prediction 1 Ram Samudrala, University of - PowerPoint PPT Presentation

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for Understanding Protein Structure and Function structure determination Protein sequence structure prediction -large numbers of sequences, including Protein

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Protein Structure Prediction Protein = chain of amino acids (AA) aa connected by peptide

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Collaboration-based Function Prediction in Protein-Protein Interaction networks Hossein Rahmani

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir

Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold

Part I : I ntroduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National

Supervised Convolutional GSN for Protein Secondary Structure Prediction Jian Zhou Olga

Evolutionary design of energy functions for protein structure prediction Natalio Krasnogor nx

Bronte Heritage Waterfront Park Performance Area | Tensile Structure Submitted by: Bronte

Examples of wood buildings. The importance of design details in the performance. Lorenzo, David,

Sensitivity Studies Wind tunnel simulations Alan Robins, Paul Hayden, Hong Cheng, Matteo

Space Debris: From the Ivory Tower to Creating Public Awareness Ward Munters Institute for

2021 & 2025 Draft LCR Study Results for LA Basin and San Diego-Imperial Valley Areas David Le

Simulation of the pMSSM Simulation of the pMSSM The pMSSM What is needed? Procedure

World Maps and Localization 15-494 Cognitive Robotics David S. Touretzky & Ethan

Implementing ISO ICSR/ICH E2B(R3) Training Module PhV-M2b Submodule b: backwards and forwards

Sambuz

Useful Links

Newsletter

Mail Us

Protein Structure Prediction 1 Ram Samudrala, University of - PowerPoint PPT Presentation

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for Understanding Protein Structure and Function structure determination Protein sequence structure prediction -large numbers of sequences, including Protein

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Protein Structure Prediction Protein = chain of amino acids (AA) aa connected by peptide

DeepLoc Data set statistics &amp; performance Protein prediction II Gregor Sturm, Johannes Rest,

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Collaboration-based Function Prediction in Protein-Protein Interaction networks Hossein Rahmani

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir

Protein Structure Bioinformatics Introduction Secondary Structure Prediction &amp; Fold

Part I : I ntroduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National

Supervised Convolutional GSN for Protein Secondary Structure Prediction Jian Zhou Olga

Evolutionary design of energy functions for protein structure prediction Natalio Krasnogor nx

Bronte Heritage Waterfront Park Performance Area | Tensile Structure Submitted by: Bronte

Examples of wood buildings. The importance of design details in the performance. Lorenzo, David,

Sensitivity Studies Wind tunnel simulations Alan Robins, Paul Hayden, Hong Cheng, Matteo

Space Debris: From the Ivory Tower to Creating Public Awareness Ward Munters Institute for

2021 &amp; 2025 Draft LCR Study Results for LA Basin and San Diego-Imperial Valley Areas David Le

Simulation of the pMSSM Simulation of the pMSSM The pMSSM What is needed? Procedure

World Maps and Localization 15-494 Cognitive Robotics David S. Touretzky &amp; Ethan

Implementing ISO ICSR/ICH E2B(R3) Training Module PhV-M2b Submodule b: backwards and forwards

Sambuz

Useful Links

Newsletter

Mail Us

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold

2021 & 2025 Draft LCR Study Results for LA Basin and San Diego-Imperial Valley Areas David Le

World Maps and Localization 15-494 Cognitive Robotics David S. Touretzky & Ethan