Protein Structure Bioinformatics Introduction Secondary Structure - PDF document

Introduction to Protein Structure Bioinformatics 29.9.2004 Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold recognition EMBnet course Basel, September 29, 2004 Lorenza Bordoli Swiss Institute of Bioinformatics Overview � Introduction � Secondary Structure Prediction � Fold Recognition Lorenza Bordoli 1

Introduction to Protein Structure Bioinformatics 29.9.2004 Principles of protein structure � Primary Structure � Secondary Structure � Tertiary Structure (Fold) � Quaternary Structure Principles of protein structure Protein structure include: � Core Region: � Secondary structure element packed in close proximity in hydrophobic environment � Limited amino acid substitution � Outside the core: � loops and structural elements in contact with water, membrane or other proteins � Amino acid substitution: not as restricted as above Lorenza Bordoli 2

Introduction to Protein Structure Bioinformatics 29.9.2004 PDB Holdings PDB Holdings Lorenza Bordoli 3

Introduction to Protein Structure Bioinformatics 29.9.2004 Protein Structure Databases � PDB: http://www.pdb.org � X-Ray, NMR => atom coordinates of the proteins are deposited in PDB: worldwide repository for the 3-D biological macromolecular structure data. � EBI-MSD: http://www.ebi.ac.uk/msd/ (2003) � suite of web-based search and retrieval interfaces for macromolecular structure research. Protein Structure Databases http://www.wwpdb.org/ Lorenza Bordoli 4

Introduction to Protein Structure Bioinformatics 29.9.2004 Introduction � Goal: Relationship between amino acid sequence and three-dimensional structure in proteins? Can we predict the structure from the sequence? � Currently: comparative (homology) modeling; See Lecture Thursday (Torsten) Homology Modeling Homology modeling = Comparative protein modeling Structure is better conserved than sequence Similar Sequence � Similar Structure Idea: Using experimental 3D-structures of related family members (templates) to calculate a model for a new sequence (target). Lorenza Bordoli 5

Introduction to Protein Structure Bioinformatics 29.9.2004 Flow chart: analyze a new protein sequence Database Does sequence align Protein family Protein Sequence similarity search with a protein of Sequence search known structure ? (BLAST) (Pfam) Predicted Relatioship 3D Homology Modeling to known structure? Structural model Structure prediction Hints for domain (Secondary Structure assignment? Fold recognition) Function? 3D structural analysis in laboratory Secondary structure assignment � DSSP � Dictionary of Secondary Structure of Proteins (Kabsch & Sander, 1983) � Based on recognition of hydrogen-bonding patterns in known structures � Automated assignment of secondary structure � Interprets backbone hydrogen bonds � Uses a Coulomb approximation for the hydrogen bond energy (-0.5 kcal/mol cut-off) � Secondary structures are assigned to consecutive segments of residues with hydrogen bonds Lorenza Bordoli 6

Introduction to Protein Structure Bioinformatics 29.9.2004 Secondary structure assignment � DSSP secondary structure elements � 8 secondary structure classes – H ( α -helix) → H – G (3 10 -helix) → H – I ( π -helix) → H – E (extended strand) → E – B (residue in isolated β -bridge) → E – T (turn) → L – S (bend) → L – " " (blank = other) → L Secondary Structure prediction � What is protein secondary structure prediction? � Simplification of prediction problem � 3D → 1D � Why do we need it? � As starting point for 3D modeling: • Improve sequence alignments • Use in fold recognition (discover family/superfamily relationship) • Definition of loops / core regions Lorenza Bordoli 7

Introduction to Protein Structure Bioinformatics 29.9.2004 Secondary Structure prediction � Assumption: � there should be a correlation between amino acid sequence and secondary structure � What can we predict? � α -helix � β -strand � Loop (coil) Secondary Structure prediction � Projection onto strings of structural assignments � “Secondary Structure” 3-state model: (S) β -Strand (E) (H) α -Helix (L) Loop SEQ MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKSGSELGKQAK SS SSSSSSLLLLLLHHHHHHHHHHHLLLSSSLHHHHHHHHHHHLLLLLLHHH SS SSSSSS HHHHHHHHHHH SSS HHHHHHHHHHH HHH Lorenza Bordoli 8

Introduction to Protein Structure Bioinformatics 29.9.2004 Accuracy of prediction � 3-state-per-residue accuracy: � Gives % of correctly predicted residues in α , β or other state � Q 3 = 100 • Σ c i /N • N= total number of residues • C i = number of correctly predicted residue in state I (H,E,L) Performance Evaluation � Assumption: there should be a correlation * between amino acid sequence and secondary structure � Systematic performance testing pre-requisite for reliability of method PDB Dataset Training Set Test Set PDB sub set: PDB sub-set: derive correlation* => Q3 Lorenza Bordoli 9

Introduction to Protein Structure Bioinformatics 29.9.2004 Conformational Preferences α β RT Biochimica et Biophysica Acta 916: 200-204 (1987). 1st Generation secondary structure prediction � 1st Generation based on single amino acid propensities � Chou and Fasman, 1974 � Robson, 1976 � GOR-1: Garnier, Osguthorpe, and Robson, 1978 � Preference of particular residues for certain secondary structure elements: � Single-residue statistics: analysis of the frequency of each 20 aa in α helices, β strands or coils � Databases of very limited size � < 55% Q 3 accuracy Lorenza Bordoli 10

Introduction to Protein Structure Bioinformatics 29.9.2004 1st Generation secondary structure prediction � Chou and Fasman (partial table): Amino Acid P α P β P t Glu 1.51 0.37 0.74 Met 1.45 1.05 0.60 Ala 1.42 0.83 0.66 Val 1.06 1.70 0.50 Ile 1.08 1.60 0.50 Tyr 0.69 1.47 1.14 Pro 0.57 0.55 1.52 Gly 0.57 0.75 1.56 Chou-Fasman P ij -values Name P(H) P(E) P(turn) f(i) f(i+1) f(i+2) f(i+3) Alanine 142 83 66 0.06 0.076 0.035 0.058 Arginine 98 93 95 0.07 0.106 0.099 0.085 Aspartic Acid 101 54 146 0.147 0.11 0.179 0.081 Asparagine 67 89 156 0.161 0.083 0.191 0.091 Cysteine 70 119 119 0.149 0.05 0.117 0.128 Glutamic Acid 151 37 74 0.056 0.06 0.077 0.064 Glutamine 111 110 98 0.074 0.098 0.037 0.098 Glycine 57 75 156 0.102 0.085 0.19 0.152 Histidine 100 87 95 0.14 0.047 0.093 0.054 Isoleucine 108 160 47 0.043 0.034 0.013 0.056 Leucine 121 130 59 0.061 0.025 0.036 0.07 Lysine 114 74 101 0.055 0.115 0.072 0.095 Methionine 145 105 60 0.068 0.082 0.014 0.055 Phenylalanine 113 138 60 0.059 0.041 0.065 0.065 Proline 57 55 152 0.102 0.301 0.034 0.068 Serine 77 75 143 0.12 0.139 0.125 0.106 Threonine 83 119 96 0.086 0.108 0.065 0.079 Tryptophan 108 137 96 0.077 0.013 0.064 0.167 Tyrosine 69 147 114 0.082 0.065 0.114 0.125 Valine 106 170 50 0.062 0.048 0.028 0.053 Lorenza Bordoli 11

Introduction to Protein Structure Bioinformatics 29.9.2004 Chou-Fasman How it works: a. Assign all of the residues the appropriate set of parameters b. Identify a-helix and b-sheet regions. Extend the regions in both directions. c. If structures overlap compare average values for P(H) and P(E) and assign secondary structure based on best scores. d. Turns are modeled as tetra-peptides using 2 different probability values. Assign Pij values 1. Assign all of the residues the appropriate set of parameters T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 P(E) 147 75 55 147 83 37 130 105 93 75 147 75 114 143 152 114 66 74 59 60 95 143 114 156 P(turn) Lorenza Bordoli 12

Introduction to Protein Structure Bioinformatics 29.9.2004 Scan peptide for α− helix regions 2. Identify regions where 4/6 aa have a P(H) >100 “alpha-helix nucleus” T S P T A E L M R S T G 69 77 57 69 142 151 121 145 98 77 69 57 P(H) T S P T A E L M R S T G 69 77 57 69 142 151 121 145 98 77 69 57 P(H) Extend α -helix nucleus 3. Extend helix in both directions until a set of four residues have an average P(H) <100. T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 Repeat steps 1 – 3 for entire peptide Lorenza Bordoli 13

Protein Structure Bioinformatics Introduction Secondary Structure - PDF document

Introduction to Protein Structure Bioinformatics 29.9.2004 Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold recognition EMBnet course Basel, September 29, 2004 Lorenza Bordoli Swiss Institute of

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for

Protein Structure Bioinformatics Introduction Basel, 27. September 2004 Biozentrum -

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Part I : I ntroduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National

Protein Structure Prediction Protein = chain of amino acids (AA) aa connected by peptide

Outline Day 1 & 2 Introduction: The protein structure knowledge gap Recap: Basic

Computational Methods in Protein Structure Prediction C.A. Floudas Department of Chemical

COMP364: PDB & Biopython Jrme Waldisphl, McGill University

10: Biological Applications for HMMs Machine Learning and Real-world Data (MLRD) Ann Copestake

Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector

JASPAR, TFCAT and PAZAR Wyeth W. Wasserman University of British Columbia www.cisreg.ca

Protein Modeling (Coaches Clinic) Shuchismita Dutta October 27, 2007 2007 State Champions: Ola

http://vision.unipv.it/ PROTEIN STRUCTURE ANALYSIS THROUGH HOUGH TRANSFORM AND RANGE TREE

. Modeling and predicting the structure of transmembrane proteins uhl 123 , Jean-Marc Steyaert 2

Sambuz

Useful Links

Newsletter

Mail Us

Protein Structure Bioinformatics Introduction Secondary Structure - PDF document

Introduction to Protein Structure Bioinformatics 29.9.2004 Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold recognition EMBnet course Basel, September 29, 2004 Lorenza Bordoli Swiss Institute of

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for

Protein Structure Bioinformatics Introduction Basel, 27. September 2004 Biozentrum -

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Part I : I ntroduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National

Protein Structure Prediction Protein = chain of amino acids (AA) aa connected by peptide

Outline Day 1 &amp; 2 Introduction: The protein structure knowledge gap Recap: Basic

Computational Methods in Protein Structure Prediction C.A. Floudas Department of Chemical

COMP364: PDB &amp; Biopython Jrme Waldisphl, McGill University

10: Biological Applications for HMMs Machine Learning and Real-world Data (MLRD) Ann Copestake

Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector

JASPAR, TFCAT and PAZAR Wyeth W. Wasserman University of British Columbia www.cisreg.ca

Protein Modeling (Coaches Clinic) Shuchismita Dutta October 27, 2007 2007 State Champions: Ola

http://vision.unipv.it/ PROTEIN STRUCTURE ANALYSIS THROUGH HOUGH TRANSFORM AND RANGE TREE

. Modeling and predicting the structure of transmembrane proteins uhl 123 , Jean-Marc Steyaert 2

Sambuz

Useful Links

Newsletter

Mail Us

Outline Day 1 & 2 Introduction: The protein structure knowledge gap Recap: Basic

COMP364: PDB & Biopython Jrme Waldisphl, McGill University