protein threading protein threading
play

Protein threading Protein Threading Basic premise Structure is - PDF document

Protein threading Protein Threading Basic premise Structure is better conserved than sequence The number of unique structural (domain) folds in nature Structure can adopt a is fairly small (possibly a few thousand) wide range of


  1. Protein threading Protein Threading • Basic premise Structure is better conserved than sequence The number of unique structural (domain) folds in nature Structure can adopt a is fairly small (possibly a few thousand) wide range of mutations. • Statistics from Protein Data Bank (~35,000 structures) Physical forces favor certain structures. 90% of new structures submitted to PDB in the past three years have similar structural folds in PDB Number of folds is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel Concept of Threading Threading problem o Thread ( align or place) a query protein sequence • Threading: Given a sequence, and a fold (template), onto a template structure in “optimal” way compute the optimal alignment score between the o Good alignment gives approximate backbone sequence and the fold. structure • If we can solve the above problem, then Query sequence • Given a sequence, we can try each known fold, and find MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE the best fold that fits this sequence. Template set • Because there are only a few thousands folds, we can find the correct fold for the given sequence. • Threading is NP-hard. Components of Threading Protein Threading – structure database • Template library • Use structures from DB classification categories (PDB) • Build a template database • Scoring function • Single and pairwise energy terms • Alignment • Consideration of pairwise terms leads to NP-hardness • heuristics • Confidence assessment • Z-score, P-value similar to sequence alignment statistics • Improvements • Local threading, multi-structure threading

  2. Protein Threading – energy function Assessing Prediction Reliability MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE how preferable to put two particular residues how well a residue fits nearby: E_p a structural environment: E_s alignment gap Score = -1500 Score = -720 Score = -1120 Score = -900 penalty: E_g Which one is the correct structural total energy: E_p + E_s + E_g fold for the target sequence if any? find a sequence-structure alignment The one with the highest score ? to minimize the energy function Prediction of Protein Structures Prediction of Protein Structures • Examples – a few good examples • Not so good example actual predicted actual predicted actual predicted actual predicted Existing Prediction Programs • PROSPECT • https://csbl.bmb.uga.edu/protein_pipeline • FUGU • http://www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html • THREADER • http://bioinf.cs.ucl.ac.uk/threader/

  3. CASP/CAFASP CASP6/CAFASP4 • CASP: Critical CASP • 64 targets Assessment of Predictor • Resources for predictors Structure Prediction • No X-ray, NMR machines (of course) • CAFASP4 predictors: no manual intervention • CASP6 predictors: anything (servers, google,…) • CAFASP: Critical • Evaluation: Assessment of Fully • CASP6 Assessed by experts+computer CAFASP Automated Structure Predictor • CAFASP4 evaluated by a computer program. Prediction • Predicted structures are superimposed on the 1. Won’t get tired experimental structures. 2. High-throughput • CASP7 will be held this year (November) Protein structure databases • PDB • 3D structures • SCOP • Murzin, Brenner, Hubbard, Chothia • Classification • Class (mostly alpha, mostly beta, alpha/beta (interspersed), alpha+beta (segregated), multi-domain, membrane) • Fold (similar structure) • Superfamily (homology, distant sequence similarity) (a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA (e) antibodies (f) viruses (g) actin (h) the nucleosome • Family (homology and close sequence similarity) (i) myosin (j) ribosome Courtesy of David Goodsell, TSRI Protein databases The SCOP Database • CATH Structural Classification Of Proteins • Orengo et al • Class (alpha, beta, alpha/beta, few SSEs) FAMILY: proteins that are >30% similar, or >15% similar and have similar known structure/function • Architecture (orientation of SSEs but ignoring connectivity) SUPERFAMILY: proteins whose families have some sequence and function/structure similarity suggesting a common evolutionary • Topology (orientation and connectivity, based on origin SSAP = fold of SCOP) • Homology (sequence similarity = superfamily of COMMON FOLD: superfamilies that have same secondary structures SCOP) in same arrangement, probably resulting by physics and chemistry • S level (high sequence similarity = family of SCOP) • SSAP alignment tool (dynamic programming) CLASS: alpha, beta, alpha–beta, alpha+beta, multidomain

  4. Protein databases Protein structure comparison • FSSP • Levels of structure description • DALI structure alignment tool (distance matrix) • Atom/atom group • Holm and Sander • Residue • Fragment • MMDB • Secondary structure element (SSE) • VAST structure comparison (hierarchical) • Basis of comparison • Madej, Bryant et al • Geometry/architecture of coordinates/relative positions • sequential order of residues along backbone, ... • physio-chemical properties of residues, … How to compare? Structure Analysis – Basic Issues • Coordinates for representing 3D structures • Key problem : find an optimal correspondence • Cartesian between the arrangements of atoms in two • Other (e.g. dihedral angles) molecular structures (say A and B) in order to align • Basic operations them in 3D • Translation in 3D space • Optimality of the alignment is determined using a • Rotation in 3D space root mean square measure of the distances • Comparing 3D structures between corresponding atoms in the two • Root mean square distances between points of two molecules are typically used as a measure of how well they are aligned molecules • Efficient ways to compute minimal RMSD once correspondences are • Complication : It is not known a priori which atom known (O(n) algorithm) in molecule B corresponds to a given atom in • Using eigenvalue analysis of correlation matrix of points molecule A (the two molecules may not even have • Due to the high computational complexity, practical the same number of atoms) algorithms rely on heuristics Structure Analysis – Basic Issues Find the optimal alignment • Sequence order dependent approaches • Computationally this is easier • Interest in motifs preserving sequence order • Sequence order independent approaches • More general • Active sites may involve non-local AAs + • Searching with structural information

  5. Optimal Alignment Structure Comparison Which atom in structure A corresponds to • Find the highest number of atoms aligned with which atom in structure B ? the lowest RMSD (Root Mean Squared Deviation) THESESENTENCESALIGN--NICELY • Find a balance between local regions with very ||| || |||| ||||| |||||| good alignments and overall alignment THE--SEQUENCE-ALIGNEDNICELY Structural Alignment Structure Comparison Methods to superimpose structures by translation and rotation x 1 , y 1 , z 1 x 1 + d, y 1 , z 1 An optimal superposition of myoglobin x 2 , y 2 , z 2 x 2 + d, y 2 , z 2 and beta-hemoglobin, which are Translation x 3 , y 3 , z 3 x 3 + d, y 3 , z 3 structural neighbors. However, their sequence homology is only 8.5% Rotation Structure Comparison Root Mean Square Deviation Scoring system to find optimal alignment 5 = ∑ − 2 (X X ) RED1 BLUE1 d + d + d + d + d Answer: Root Mean Square Deviation ( RMSD ) RMS i = 1 ~ 1 2 3 4 5 5 5 ∑ 2 d 3 i = i 4 RMSD 1 5 2 n n = number of atoms d i = distance between 2 corresponding atoms i 1 2 3 4 5 in 2 structures

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend