predicting secondary structure of all helical proteins
play

Predicting Secondary Structure of All-Helical Proteins Using Hidden - PowerPoint PPT Presentation

Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies, Andrew Lee, Marten van Dijk, and Srinivas Devadas Computer Science and Artificial


  1. Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies, Andrew Lee, Marten van Dijk, and Srinivas Devadas Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Workshop on Pattern Recognition in Bioinformatics – August 20, 2006

  2. Protein Structure Prediction • Classical problem: given sequence, predict structure Sequence Sequence Structure Structure • High-level approaches 1. Energy-minimization (ab-initio) techniques - Elegant, but often lack correct parameters 2. Homology-based techniques - Useful, but hard to predict new proteins Our approach: Use energy minimization, but learn parameters from existing proteins

  3. Our Framework (Training) Protein Data Bank Protein Data Bank Correct Amino - acid structure Sequence Prediction Prediction Energy Energy Algorithm Algorithm Parameters Parameters Predicted Learning Learning structure Algorithm Algorithm correct incorrect Done! Done! Constraints Constraints energy(incorrect) > energy(correct)

  4. Our Framework (Testing) Amino - acid Sequence Prediction Prediction Energy Energy Algorithm Algorithm Parameters Parameters Predicted structure

  5. Initial Focus: Secondary Structure • Classify each residue as alpha helix, beta strand, coil – In this paper, restrict to all-alpha proteins • Applications: – Informing tertiary structure predictors – Identification of homologous proteins – Identification of active sites (coils)

  6. Secondary Structure Predictors 100% 90% Prediction Accuracy (Q3) 80% 70% 60% 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year

  7. Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Prediction Accuracy (Q3) HMMs 80% 70% Zvelebil et al. DSC 60% GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year

  8. Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Neural Networks Neural Networks Prediction Accuracy (Q3) PSIPred HMMs Porter SSPro4 80% Peterson PSIPred SSPro Riis/Krough PHD 70% Zvelebil et al. DSC Qian/Sejnoweski 60% GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year

  9. Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Neural Networks Neural Networks Prediction Accuracy (Q3) SVMs PSIPred HMMs Porter Nguyen Hu SSPro4 80% Peterson PSIPred Kim Ward SSPro Ceroni Riis/Krough PHD Hua/Sun Casbon 70% Zvelebil et al. DSC Qian/Sejnoweski 60% GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year

  10. Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Neural Networks Neural Networks Prediction Accuracy (Q3) SVMs PSIPred HMMs HMMs Porter NguyenHu SSPro4 80% Peterson PSIPred Kim Ward SSPro Ceroni Won HMMSTR Riis/Krough Nguyen PHD Hua/Sun Casbon 70% Martin Zvelebil et al. DSC Schmidler et al. Martin Qian/Sejnoweski 60% GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year

  11. Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Neural Networks Neural Networks 1400-2900 Prediction Accuracy (Q3) SVMs parameters PSIPred HMMs HMMs Porter 680 MB of NguyenHu SSPro4 80% Peterson support vectors PSIPred Kim Ward SSPro Ceroni Won HMMSTR Riis/Krough Nguyen PHD Hua/Sun Casbon 70% Martin Zvelebil et al. DSC Schmidler et al. Martin Qian/Sejnoweski 471 parameters 60% • Exploits biochemical models • Offers biological insight GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year

  12. Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% 302 params Neural Networks Neural Networks 1400-2900 Prediction Accuracy (Q3) SVMs parameters PSIPred HMMs HMMs Porter 680 MB of NguyenHu SSPro4 80% Peterson support vectors PSIPred Kim Ward SSPro Ceroni Won THIS HMMSTR Riis/Krough PAPER Nguyen PHD Hua/Sun Casbon 70% Martin Zvelebil et al. DSC Schmidler et al. Martin Qian/Sejnoweski 471 parameters 60% • Exploits biochemical models • Offers biological insight GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year

  13. Our Framework Applied to Helix Prediction Protein Data Bank Protein Data Bank Alpha Helices Alpha Helices Correct Amino - acid MNIFEMLRIDEGL structure Hidden Hidden Sequence HHHHHHHHH Markov Model Markov Model Prediction Prediction Energy Energy Support Support Algorithm Algorithm Parameters Parameters Vector Vector Machines Machines Predicted Learning Learning structure Algorithm Algorithm correct incorrect Done! Done! Constraints Constraints energy(incorrect) > energy(correct)

  14. Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total

  15. Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy =

  16. Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix)

  17. Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap)

  18. Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap) + C L,-3 + C R,-2 + C I,-1 + C D,0 + C E,1 + C G,2 + C L,3 (C-cap)

  19. Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap) + C L,-3 + C R,-2 + C I,-1 + C D,0 + C E,1 + C G,2 + C L,3 (C-cap)

  20. Learning the Parameters Feature Space Energy ( ) = H A *A + H G *G Legal structure G: # of Glycines in Helices Correct structure = w · [A G] where w represents the energy parameters [H A H G ] Highest energy in direction of energy parameters w A: # of Alanines in Helices

  21. Learning the Parameters Feature Space Energy ( ) = H A *A + H G *G Legal structure G: # of Glycines in Helices Correct structure = w · [A G] where w represents the energy parameters [H A H G ] Highest energy in direction of energy parameters w w A: # of Alanines in Helices

  22. Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices Correct structure Predicted structure w A: # of Alanines in Helices

  23. Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices Correct structure Predicted structure A: # of Alanines in Helices

  24. Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices 2. Refine parameters Correct structure Predicted structure Separating Hyperplane A: # of Alanines in Helices

  25. Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices 2. Refine parameters Correct structure Predicted structure w Separating Hyperplane A: # of Alanines in Helices

  26. Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices 2. Refine parameters Correct structure Predicted structure w A: # of Alanines in Helices

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend