outline day 1 2
play

Outline Day 1 & 2 Introduction: The protein structure knowledge - PDF document

Swiss Institute of Bioinformatics SIB Doctoral School in Bioinformatics Advanced Course Protein Structure: Prediction and Analysis Day 2: Protein Structure Modeling Lausanne, September 1-5, 2008 Torsten Schw ede Biozentrum - Universitt


  1. Swiss Institute of Bioinformatics SIB Doctoral School in Bioinformatics Advanced Course Protein Structure: Prediction and Analysis Day 2: Protein Structure Modeling Lausanne, September 1-5, 2008 Torsten Schw ede Biozentrum - Universität Basel Swiss Institute of Bioinformatics Klingelbergstr 50-70 CH - 4056 Basel, Switzerland Tel: + 41-61 267 15 81 Outline Day 1 & 2 Introduction: The protein structure knowledge gap � Recap: Basic principles of proteins and their 3-dimensional structures � � Protein Structure modeling and prediction Comparative protein structure modeling � What happened to fold recognition? � De novo prediction � � Evaluation and Assessment of Protein Structure Model Quality Practicals & Tutorials: Tutorial: Structure Visualization with DeepView (Nicolas Guex, SIB Lausanne) � Practical: Examples of comparative modeling and model evaluation � Exam / credit points: � Student presentations on one of the evaluation examples

  2. The number of distinct protein domains in nature is limited. � Chothia (1992) Proteins. One thousand families for the molecular biologist. Nature. 3 5 7 : 543-4. � Idea: determine structures of representative proteins and then derive most other structures by homology modeling. � Structural Genomics (Protein Structure Initiative, Riken in Japan, SPINE in Europe) Fold Classification Fold Classification � Fold classification is an important to systematically study protein structure evolution � Multi-domain proteins have to be divided into domains prior to classification � There is no consensus on how to delineate the domains. � Three main protein structure classification databases are commonly used: � SCOP: manual classification based on evolutionary information � CATH: semi-automatic classification based on geometric criteria � FSSP: automatic classification based on direct structural similarity

  3. Fold Classification Databases � The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. � UCL, Janet Thornton & Christine Orengo � clusters proteins semi-automated at four major levels: � Class(C) � Architecture(A) � Topology(T) � Homologous superfamily (H) [ http:/ / w w w .biochem .ucl.ac.uk/ bsm / cath_ new / ] Protein Structure Classification � Class( C) derived from secondary structure content is assigned automatically � Architecture( A) describes the gross orientation of secondary structures, independent of connectivity. � Topology( T) clusters structures according to their topological connections and numbers of secondary structures � Hom ologous Superfam ily ( H) This level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous. [ http: / / www.cathdb.info ]

  4. Fold Classification Databases � Top of the hierarchy: Example: 1EWF

  5. Example: 1EWF Fold Classification Databases � Structural Classification of Proteins: SCOP � MRC Cambridge (UK), Alexey Murzin, Brenner S. E., Hubbard T., Chothia C. � hierarchical classification of protein domain structures � created by manual inspection � comprehensive description of the structural and evolutionary relationships � organized as a hierarchical structure • Class • Fold • Superfamily • Family • Species [ http:/ / scop.m rc-lm b.cam .ac.uk/ scop/ ]

  6. Fold Classification Databases The different m ajor levels in the hierarchy are: � Fold : Major structural similarity Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. � Superfam ily : Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. � Fam ily : Clear evolutionarily relationship Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pair wise residue identities between the proteins are 30% and greater. Fold Classification Databases

  7. Fold Classification Databases Fold Classification Databases

  8. qCOPS and Fold Space Navigator � Quantitative classification of protein structures, navigation through fold space and visualization of pairwise structure similarities. http: / / www.came.sbg.ac.at Protein Structure / Fold Databases PDB: http: / / www.pdb.org � � EBI-MSD http: / / www.ebi.ac.uk/ msd/ SCOP http: / / scop.mrc-lmb.cam.ac.uk/ scop/ � � CATH http: / / www.biochem.ucl.ac.uk/ bsm/ cath_new/ � FSSP: http: / / ekhidna.biocenter.helsinki.fi/ dali/ start

  9. Comparing Protein Structures � Why do we want to compare protein structures? � Classify structures � Identify structural movements (induced fit, NMR, etc.) � Analyze evolutionary relationships � Identify recurring structural motifs � Assess quality of predicted models � … Comparing Protein Structures What do we need to compare structures? � � Protein sequences can be treated as linear strings of letters . For a given similarity matrix, sequences can be aligned optimally using dynamic programming. Protein structures are 3-dimensional objects. We need � to find algorithms (analogue to DP for sequences) which find an optimal match for two shapes – given a certain similarity measure.

  10. Comparing Protein Structures � What do we need to compare structures? 1. Structural feature description 2. Comparison / superposition algorithms 3. Distance / similarity measure 1. Description A Description B 2. Similarity / Distance 3. Measure Comparing Protein Structures � Local or global comparison? � Global: n = 5 � Local: n = 4

  11. Comparing Protein Structures � Distance measure: Root mean square deviation � Comparing two sets of points (= atoms in structures) A = { a 1 … a n } and B = { b 1 … b n } with a i Position vector of atom i in structure A n Number of equivalent atoms � We need to define a 1: 1 correspondence for atoms in A and B � Root mean square distance is calculated from the squared Euclidian distances between corresponding points: n ∑ − 2 ( ) a b i i = = . . . . 0 i r m s d n Comparing Protein Structures Distances in Euclidian Space � For two points x = (x 1 ,x 2 ,x 3 ,… ,x n ) and y = (x 1 ,x 2 ,x 3 ,… ,x n ), a p- norm distance is defined as: n ∑ = − x y i i � 1-norm distance = 1 i 1 / 2 ⎛ 2 ⎟ ⎞ = ∑ n ⎜ − x y � 2-norm distance i i ⎝ ⎠ = 1 i 1 / p ⎛ ⎞ = ∑ n � p-norm distance ⎜ − p ⎟ x y i i ⎝ ⎠ = 1 i � In Euclidian space R n , distances are normally given as Euclidian distance (= 2-norm distance), which is a generalization of the Pythagorean theorem. � p need not be integer, but can not be less than 1.

  12. Comparing Protein Structures Distances � Mathematically, “distance” is a function which meets the following criteria: � One can find the distance between any two points. � That distance is a distinct real number. � It is positive definite. d( x , y ) ≥ 0, and d( x , y ) = 0 if and only if x = y . � It is symmetric. d( x , y ) = d( y , x ). � It satisfies the triangle inequality, d( x , z ) ≤ d( x , y ) + d( y , z ). � Such a function is known as a metric. � Geometrically, the right-hand part of the triangle inequality states that the sum of the lengths of any two sides of a triangle is greater than the length of the remaining side: Y Z X Comparing Protein Structures � Protein Structure Superposition � Assume that we know the correspondence set between A and B (e.g. NMR ensembles, induced fit) � Task: Find a rigid transformation RT which best superposes A = { a 1 … a n } and B = { b 1 … b n } � Many solutions in image analysis, mechanical engineering, … � Find a rigid transformation RT which minimizes the error function E: n ∑ 2 = − min ( ) E RT a b RT i i = 1 i

  13. Comparing Protein Structures � Protein Structure Superposition � The rigid transformation RT has a translational component T and a rotational component R: RT(a) = R(X) + T � The error function to be minimized becomes: n ∑ 2 = − + min ( ) E R a b T RT i i = 1 i Comparing Protein Structures � 1. The translational component � The error function is as its minimum with respect to the translation when: ∂ n E ( ) ∑ = − + = 2 ( ) 0 R a b T ∂ i i T = 1 i ⎛ ⎞ n n ∑ ∑ = − ⎜ ⎟ + T R a b i i ⎝ ⎠ = = 1 1 i i � If both structures A and B are centered on the coordinate origin, Σ a i and Σ b i become 0, and then also T = 0. � In the first step, we translate the centers of structures A and B to the origin of the coordinate system.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend