kpax protein structure alignment
play

Kpax Protein Structure Alignment Demo: Using Kpax on Linux Dave - PowerPoint PPT Presentation

Outline Overview of Protein Sequences and Structures Structural Alignment Using Dynamic Programming The Kpax Algorithm Explained Kpax Protein Structure Alignment Demo: Using Kpax on Linux Dave Ritchie Practical: Homology Modeling Using


  1. Outline Overview of Protein Sequences and Structures Structural Alignment Using Dynamic Programming The Kpax Algorithm Explained Kpax – Protein Structure Alignment Demo: Using Kpax on Linux Dave Ritchie Practical: Homology Modeling Using Kpax + Modeler Team Orpailleur Inria Nancy – Grand Est 2 / 33 Protein Sequences and Structures Comparing Two Strings Q. Suppose we have two strings, e.g. EXPONENTIAL and POLYNOMIAL . How do we measure their similarity? A1. In information theory, the edit distance measures the cost of transforming one string into another using one-character edits POLYNOMIAL A2. Match 3 letters and then give a score for each pair... ||| EXPONENTIAL Q. Suppose gaps are allowed. What is the best possible alignment? --POLYNOM-IAL --POLYNOMIAL A. How about or ? || | ||| || | ||| EXPO--NENTIAL EXPONEN-TIAL Q. Which is better ? A1. The second one? (6 matches + 3 gaps v’s 6 matches + 5 gaps) Source: ”The Gam protein of bacteriophage Mu is an orthologue of eukaryotic Ku”, A2. ... It depends on the score for each pair and the penalty for a gap F.A. di Fagagna et al. , EMBO Reports (2003), 4, 47–52 3 / 33 4 / 33

  2. Dynamic Programming Back-Tracking Through The DP Scoring Table Dynamic programming (DP) is a method of dividing a problem into smaller P O L Y N O M I A L sub-problems. It was first described by Richard Bellman in the 1940s. But p p p p p p p p p p p p 0 instead of using recursion, it uses a table (“memoisation” in 1940s language). p E 1 p X 2 Goal: find similarity E ( n , m ) between two strings: x [ 1: n ] and y [ 1: m ] p P 3 p O 4 p 5 N Sub-goal: find E ( i , j ) between two prefixes: x [ 1: i ] and y [ 1: j ] p E 6 p N 7 p T 8 x [ i ] x [ i ] Observation: the best alignment must end on y [ j ] or or − p I 9 y [ j ] − p A 10 p L 11 Method: build similarity table with scores S ( i , j ) and penalties P ( i ) : p 12 0 1 2 3 4 5 6 7 8 9 1011  E ( i − 1 , j − 1 ) + S ( i , j )   --POLYNOMIAL E ( i , j ) = max E ( i , j − 1 ) − P ( i ) This gives the desired optimal alignment || | |||  E ( i − 1 , j ) − P ( j )  EXPONEN-TIAL Then, “trace back” from E ( n , m ) to E ( 1 , 1 ) to extract the alignment 5 / 33 6 / 33 3D Least-Squares Fitting So, What’s The Problem? Least-squares fitting finds the 3D rotation/translation matrix M that DP is “perfect” for 1D string matching minimises the sum of squared distances: Least-squares fitting is “perfect” for 3D superposition N � BUT ( x A i − M . x B i ) 2 F = Proteins are not made of 1D symbols or 3D points. They are made i = 1 For proteins, the x i are normally C α atom coordinates of complex 3D chemical components (amino acid residues). It is The translational part is easy – shift centres of mass to the origin difficult to write a good scoring function to compare residues... The rotation can be found using eigenvector or quaternion methods Similar 1D protein sub-sequences can have different 3D shapes ( α -helices, β -strands), i.e. global environment can affect local shape. The residual error (RMSD) is then given by We don’t know a priori the right 1D pairings for 3D fitting... � N Proteins are globally flexible. Even if many local 1D regions “match”, � � 1 � � ( x A i − M . x B i ) 2 RMSD = not all of them might simultaneously superpose well in 3D space... N i = 1 ADDITIONALLY! So, given list of aligned C α ’s, we can fit optimally to some RMSD Proteins can contain multiple repeats and/or transpositions... 7 / 33 8 / 33

  3. Over 100 Structure Alignment Algorithms in 25 Years Quick List of Structural Alignment Approaches http://en.wikipedia.org/wiki/Structural alignment software “elastic” Gaussian scoring “double dynamic programming” on C α distance matrices triples or higher fragments (8-tuples) of C α atoms backbone C α vectors backbone torsion angles secondary structure elements geometric hashing Voronoi tessellations structural alphabets Lagrangian contact map optimisation eigenvector analysis of distance matrices Fourier correlations 90 more... Gaussian fragments ... 9 / 33 10 / 33 Introducing Kpax Defining Local Coordinate Frames All C α atoms have highly conserved tetrahedral geometry Exploit this to define a “canonical” C α –C–N orientation e.g. put C α at origin; C on -ve z axis; N in +ve xz plane http://kpax.loria.fr/ Dynamic programming with Gaussian scores Uses NO sequence similarity OR secondary structure information Very fast database search (CATH, SCOP, Pfam, ..., user-defined) Rigid and flexible structural alignments Now, ALL α -helices and β -strands look the same at the origin Multiple flexible alignments coming soon... 11 / 33 12 / 33

  4. Comparing Structural Fragments Representing Local Geometry as a Product of Gaussians In the canonical frame, similar structures have similar distances Calculate Gaussian distribution of all C α atoms in CATH between their up-stream and down-stream C α atoms: . .. . . . .. . .. . .. . .. . .. . . . . . . .. . . .. . . . . .. . . . . . . . . . . .. . . . . . . . . . . . .. . . .. . . . . .. . . .. . . . . .. . . .. . .. . . .. .. . . .. .. . . . .. . .. . .. . . .. .. . . . . . .. . . . . . . . . . . . . . . .. . .. .. . . . . ... . . . . . . .. . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . -3 . .. . . .. . . . .. . .. . . .. . . . .. . . . .. . . .. . . .. . . . . . . . . . .. . . z . . . . . . . -2 . . .. . . . . . . . . .. . .. . . ... . . . . . . . . .. . . .. . . . . . . .. . .. . . .. . . . . . . . . . . .. . . . . . . ... . . . .. . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . -1 y .. . . . .. . . . . . . . . . . . . . .. . . . . .. . ... . . . . . . . . .. . . . . . . . . . . .. . . . . . . . .. . .. . . . .. . . . . . .. . . . . . . . . . ... . . . . . . . . . . . .. . . . . . . . . . . . . . . x . . . .. . . .. . . . . ... . +1 . . . . . .. . . . . . . .. .. . . . . . . . .. . . . . . . . .. .. . . . . . . .. . . .. . . . . . . .. . . . .. . . . .. . . . . . . .. .. . . . . .. . . . . . .. . . .. . . . . . . CATH +2 . . .. . . .. . . .. . . .. . .. . .. . . .. . .. . .. . . . . .. . . . . . . . . . . . .. . . . . . .. . . . .. . . . .. . . . .. . . . . . .. . . .. . . . . .. . . . .. .. .. +3 .. .. . . .. . . . . .. . . . .. .. . . . .. . . . Gives Gaussian width σ k for each up-stream and down-stream C α Then, represent residue i as a product of Gaussians: ψ i = φ − 1 ( x i − 1 ) φ + 1 ( x i − n ) φ + n ( x i + 1 ) ... φ − n ( x i + n ) i i i i each individual Gaussian function has the form: But how to combine all the distances into a single score? i ( x i + k ) = N k e − β k r 2 k / 2 σ 2 φ k k 13 / 33 14 / 33 Calculating a Per-Residue Local Similarity Score Detecting Secondary Structure Elements By sliding a model α -helix and β -strand along a structure, Kpax detects its secondary structure elements (SSEs) automatically (it does not distinguish π or 3 10 helices or detect β -turns). Here are some examples: Calculate the local-frame similarity, K local , as an overlap integral ij � K local = ψ i ψ j d x − n ... + n . ij With products of Gaussians, this reduces to a simple sum = e − � n k = − n β k R 2 i + k , j + k / 4 σ 2 K local k , ij In identical α -helices, β -strands, and even loops, K local = 1. ij Nice, but how to match correctly a short α -helix with a longer one? 15 / 33 16 / 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend