RECOGNITION OF RECOGNITION OF PROTEIN FUNCTION PROTEIN FUNCTION - PowerPoint PPT Presentation

RECOGNITION OF RECOGNITION OF PROTEIN FUNCTION PROTEIN FUNCTION USING THE LOCAL SIMILARITY USING THE LOCAL SIMILARITY Kirill E. Alexandrov Dmitry A. Filimonov Boris N. Sobolev Vladimir V. Poroikov Institute of Biomedical Chemistry of Russian Academy of Medical Sciences, Moscow, Russia

Agenda Agenda 1. History of Problem 2. Sequence Local Similarity 3. Algorithm of Similarity Calculation 4. Local Similarity Approach Paradigm 5. Algorithm of Protein Function Recognition 6. Prediction Accuracy Estimation 7. Results of Local Similarity Approach Evaluation 8. Acknowledgements

The central dogma of SAR/QSAR/QSPR: The central dogma of SAR/QSAR/QSPR: Property = Function ( Structure ) Property = Function ( Structure ) Continuity hypothesis: the difference of structures is less, the difference of properties is less y pred = x 0 + � i x i F i (S) F i (S) = LogP, ..., (LogP) 2 , ... – traditional QSAR F i (S) = Sim(S,S i ) – similarity based QSAR MLR – multiple linear regression PLS – projections to latent structures ANN – artificial neural network SVM – support vector machine

The local similarity principle The local similarity principle QSAR with CoMFA Tripos' patented Comparative Molecular Field Analysis (CoMFA) has been used as the method of choice in hundreds of published QSAR studies.

Neighborhoods of atoms descriptors Neighborhoods of atoms descriptors MOLECULAR BIOLOGY QUANTUM CHEMISTRY QUANTUM FIELD THEORY: M = V + VgM = V + VgV + VgVgV + VgVgVg + … M i = V i + V i gM = V i + V i g(M 1 + M 2 + … + M m ) All descriptors are based on the concept of atoms’ of molecule description subject to the neighborhood of them: MNA - multilevel neighborhoods of atoms RMNA - reaction multilevel neighborhoods of atoms QNA - quantitative neighborhoods of atoms FNA - fuzzy neighborhoods of atoms �� . � ., �� . � . (2006) �� , L, (2), 66-75.

Multilevel neighborhoods of atoms descriptors Multilevel neighborhoods of atoms descriptors – – MNA MNA N O O H N H C C MNA/0: C C C O H C C H H O H N H C C MNA/1: C(CN-H) C C O H C C H H O H N H C C MNA/2: C(C(CC-H)N(CC)-H(C)) C C O H C C H H O �� . � ., �� . � . (2006) �� , L, (2), 66-75.

Multilevel neighborhoods of atoms descriptors Multilevel neighborhoods of atoms descriptors – – MNA MNA MNA/2 C(C(CC-H)C(CC-C)-H(C)) C(C(CC-H)C(CN-H)-H(C)) C(C(CC-H)C(CN-H)-C(C-O-O)) H N H C(C(CC-H)N(CC)-H(C)) C C C(C(CC-C)N(CC)-H(C)) N(C(CN-H)C(CN-H)) C C O -H(C(CC-H)) H C C H -H(C(CN-H)) H O -H(-O(-H-C)) -C(C(CC-C)-O(-H-C)-O(-C)) -O(-H(-O)-C(C-O-O)) -O(-C(C-O-O)) �� . � ., �� . � . (2006) �� , L, (2), 66-75.

Prediction of activity spectra for organic compounds Prediction of activity spectra for organic compounds According to the Bayes formula the probability P(A|S) of that compound S has activity A is equal to: P(A|S) = P(S|A)•P(A)/P(S) Let the descriptors of organic compound D 1 , ..., D m are mutually independent, then: P(S|A) = P(D 1 , ..., D m |A) = � i P(D i |A) P(A) and P(A|D i ) are caculated as sums over all organic compounds of the training set: �� . � ., �� . � . (2006) �� , L, (2), 66-75.

Quatitative neighborhoods of atoms descriptors Quatitative neighborhoods of atoms descriptors – – QNA QNA Q i = a i � k [g(C)] ik b k a i and b k are parameters of atoms i and k g(C) is function of the connectivity matrix C - � � - � P i = B i k (Exp(- � C)) ik B k - � � - � A k Q i = B i k (Exp(- � C)) ik B k A = � (IP + EA), B = IP – EA, IP is the first ionization potential, EA is the electron affinity. Feynman R. Ph. Phys. Rev. , 1939, 56, 340-343. Robert G. Parr et al. J. Chem. Phys. , 1978, 68(8), 3801-3807. Gasteiger J, Marsili M. Tetrahedron , 1980, 36, 3219-3228. Rappe A K and W A Goddard III. J. Ph. Ch. , 1991, 95, 3358-3363.

Quatitative neighborhoods of atoms descriptors Quatitative neighborhoods of atoms descriptors – – QNA QNA ChemNavigator DataBase in QNA Space 976,545,026 QNA descriptors of 24,621,668 molecules Initial QNA Space Normalized QNA Space

Quatitative neighborhoods of atoms descriptors Quatitative neighborhoods of atoms descriptors – – QNA QNA Nicotinic Acid Aspirin Sulfathiazole

GUSAR GUSAR – – QNA based prediction QNA based prediction of quantitative properties of organic compounds of quantitative properties of organic compounds

GUSAR GUSAR – – QNA based prediction QNA based prediction of quantitative properties of organic compounds of quantitative properties of organic compounds CDK2 inhibitors DHFR inhibitors ACE inhibitors Vibrio fischeri Chlorella vulgaris Tetrahymena pyriformis

GUSAR – GUSAR – QNA based prediction QNA based prediction of quantitative properties of organic compounds of quantitative properties of organic compounds PLS MLR GFA HQSAR delta R2 test delta Q2 CoMFA delta R2 EVA CoMSIA 3D Cerius2 2D Cerius2 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20

OK. But, how local OK. But, how local similarity can be used similarity can be used for recognition for recognition of protein function?… … of protein function?

Pairwise Pairwise sequence alignment sequence alignment 1996, Autumn Homology-derived annotation based on the pairwise sequence alignment was a general way to predict the protein function for a long time.

Sequence Local Similarity. Sequence Local Similarity. Frame 20, shift from Frame 20, shift from -8 -8 to to +8 +8 AANRDPSQFPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVA 2 ANRDPSQFPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVAL 1 NRDPSQFPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALR 1 RDPSQFPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRA 0 DPSQFPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRAL 1 PSQFPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALF 2 SQFPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFG 1 QFPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGR 1 FPDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRF 2 PDPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFP 0 DPHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFPA 1 PHRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFPAL 0 The best match HRFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFPALS 9 RFDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFPALSL 0 FDVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFPALSLG 3 DVTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFPALSLGI 1 VTRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFPALSLGID 1 TRDTRGHLSFGQGIHFCMGRPLAKLEGEVALRALFGRFPALSLGIDA 2 Query sequence GTAINKPLSEKMMLFGMGKRRCIGEVLAKWEIFLFLAILLQQLEFSV 9 R i = 9

Sequence Local Similarity. Sequence Local Similarity. Algorithm Algorithm of of Similarity Calculation Similarity Calculation , i is position number in the query sequence A a and b are aminoacid residuals in sequence A and sequence B m is current shift between sequence A and sequence B F is frame size R i is primary similarity value S i is the local similarity value for position i in the query sequence A with sequence B About 1000 sequences per second.

Sequence Local Similarity. Sequence Local Similarity. 13 13.11. .11.1996 1996

“If there exists correspondence between similarity of “ If there exists correspondence between similarity of substrates and protein sequences in substrates and protein sequences in cytochrome cytochrome P450 P450 superfamily superfamily? ?” ” real data — — 1 0.9 … average random data Proportion of homologs 0.8 0.7 *** confidence interval CYP4 0.6 0.5 0.4 0.3 0.2 The results of substrate-based 0.1 0 clustering correspond to 0 25 50 75 100 125 150 Number of clusters homology-based classification for families CYP 1, 2, 3, 4, 5, 6, 7, 11 1 Proportion of homologs For other families of P450 0.9 0.8 0.7 (CYP 8, 17, 19, 21, 24, 26, 27) CYP7 0.6 0.5 substrate-based clustering brings 0.4 0.3 0.2 to the contradictions with the 0.1 0 traditional classification 0 25 50 75 100 125 150 Number of clusters Borodina Yu.V., Lisitsa A.V., Poroikov V.V., Filimonov D.A., Sobolev B.N., Archakov A.A. Nova Acta Leopoldina. , 2003, 87(329), 47-55.

“ “Quantifying the Relationships among Drug Classes Quantifying the Relationships among Drug Classes” ” A subset of the MDDR database containing 65 367 compounds organized in 249 sets that associate with a specific biological target “By multiple criteria, bioinformatics and chemoinformatics networks differed substantially, and only occasionally did a high sequence similarity correspond to a high ligand-set similarity.” Hert, J., Keiser, M. J., Irwin, J. J., Oprea, T. I., Shoichet, B. K. “Quantifying the Relationships among Drug Classes” J. Chem. Inf. Model. , 2008, 48(4) , 755-765.

Unique law Machine Fundamental of nature Learning theory Ab initio principles Learning by example Molecular Partial estimate Homology Modelling

Protein Protein function recognition based on learning by example function recognition based on learning by example C A ¬A B It is based on a data set of sequences with known properties. This data set must be subdivided into “positive” and “negtive” examples – group A and its complement ¬A

Is there universal similarity reasonable? Is there universal similarity reasonable?

RECOGNITION OF RECOGNITION OF PROTEIN FUNCTION PROTEIN FUNCTION - PowerPoint PPT Presentation

RECOGNITION OF RECOGNITION OF PROTEIN FUNCTION PROTEIN FUNCTION USING THE LOCAL SIMILARITY USING THE LOCAL SIMILARITY Kirill E. Alexandrov Dmitry A. Filimonov Boris N. Sobolev Vladimir V. Poroikov Institute of Biomedical Chemistry of

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Collaboration-based Function Prediction in Protein-Protein Interaction networks Hossein Rahmani

Animal protein production in a Animal protein production in a Animal protein production in a

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Hasup Lee, Seungtaek Sun and Ye-Yeong Park ( Group 6 ) Protein-Protein interaction is

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

PROTEIN EXPRESSION AND PURIFICATION PROTEIN EXPRESSION AND PURIFICATION Why do we decide to

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

What is the matter? Text categorization (broadly construed): identification of similar

COMPETING REGIONAL ORDERS IN THE SHARED NEIGHBORHOOD: THE EU, RUSSIA, AND THE NORM CONTESTATION IN

Graph Neural Networks for Neutrino Classification Nicholas Choma and Joan Bruna July 18, 2018

Environmental Testing Laboratory Basic Analytical Procedures Karla Buechler Corporate

Financial frame Murray Auchincloss Murray Auchincl oss Chief financial officer Welcome back

Data Center Challenges Building Networks for Agility Sreenivas Addagatla, Albert Greenberg,

(and GR Hydrodynamics) Christian David Ott California Institute of Technology C.

Spider-Man meditation Small Group Discussion How did yesterday go for you? Did anything

Sambuz

Useful Links

Newsletter

Mail Us