Improving protein secondary structure prediction based on short - PowerPoint PPT Presentation

Improving protein secondary structure prediction based on short subsequences with local structure similarity Hsin-Nan Lin, Ting-Yi Sung, Shinn-Ying Ho, Wen-Lian Hsu Bioinformatics Program, TIGP (Taiwan International Graduate Program), Academia Sinica, Taiwan The author Hsin-Nan Lin wishes to acknowledge, with thanks, the Taiwan International Graduate Program (TIGP) of Academia Sinica for financial support towards attending this conference.

Outline � Introduction ◦ Protein secondary structure predictions ◦ Existing PSS methods � Methods ◦ Synonymous words ◦ Compilation of a synonymous dictionary ◦ Prediction model � Results ◦ Experiment results ◦ Two factors that affect prediction performance � Conclusions 2/22

Protein Secondary Structure Prediction � Protein secondary structure (PSS) elements ◦ The local conformation of amino acids ◦ 3 secondary structure states: helix (H), strand (E), coil (C). Ref: http://bioweb.wku.edu/courses/biol22000/3AAprotein/images/F03-08C.GIF � Protein secondary structure prediction ◦ Assign one of the states to each amino acid. ◦ Useful for protein 3D structure prediction, function prediction, and subcellular localization prediction, etc. 3/22

Existing PSS methods � Template based methods Ref: Rajkumar Bondugula, Dong Xu, 2006 � Sequence profile based methods Ref: http://bioinfo.se/kurser/swell/secstrpred.html 4/22

Outline � Introduction ◦ Protein Secondary Structure Predictions ◦ Existing PSS methods � Methods ◦ Synonymous words ◦ Construction of a Synonymous Dictionary ◦ Prediction Algorithm � Results ◦ Experiment results ◦ Two factors that affect prediction performance � Conclusions 5/22

A Dictionary based approach -- SymPred � Treating proteomic data as a language ◦ A protein structure is encoding by its amino acid sequence. ◦ protein sequence � text ◦ protein structure � meaning � Treating PSS prediction as a translation problem � protein sequence � secondary structure state sequence � A general approach for analyzing protein sequences ◦ It can be applied to PSL prediction, function prediction, remote homology detection, sequence alignment, etc. 6/22

Synonymous words in protein sequences � Protein language remains a mystery � Structure robustness ◦ Structures are more conserved than sequences ◦ Proteins of 40% ↑ sequence identity are highly similar in structure � A significant local pairwise alignment of two proteins implies two similar paragraphs. ◦ Define synonymous words in protein sequences � Definition ◦ A synonymous word is an n-gram of a protein sequence aligned with another n-gram in the other protein. 7/22

8/22 Synonymous words (cont.) EWQL � HHHH � DFDM

Compilation of Synonymous Dictionary A protein sequence � PSI-BLAST � sequence alignments � synonymous words 9/22

An example of synonymous word entry Flexibility: PSL, functions, 3D structure,.. 10/22

Properties of synonymous words � Protein dependency ◦ Synonymous words are generated from significant sequence alignments (Context-sensitive). � two similar protein words do not imply they are synonymous ◦ The material of generating synonymous words depends on the query protein sequence. � Sequence Identity Independency ◦ Protein A �� Protein B (SI = 50%) ◦ Protein B �� Protein C (SI = 40%) ◦ Protein A �� Protein C (SI = 20%) Similar Similar proteins of B proteins of A Similar proteins of C 11/22

Translation model Obtain the final structure through voting 12/22

Datasets � DSSP Database ◦ A database of PSS assignments ◦ DsspNr-25 � A Non-redundant subset of DSSP � 8,297 protein chains � EVA benchmark datasets ◦ A platform analyzing PSS predictors ◦ EVA_Set1: 80 protein chains ◦ EVA_Set2: 212 protein chains 14/22

Translation Performance on DsspNr-25 DsspNr ‐ 25 Q3 Q3H Q3E Q3C SOV (8297 proteins) SymPred 81.0 84.3 71.6 77.7 76.0 +5.9% +7.3% PROSP 75.1 79.7 67.6 71.3 68.7 15/22

Two factors affect translation performance � Word length ◦ a trade-off between specificity and sensitivity � Long words: increase specificity, lose sensitivity � Short words: lose specificity, increase sensitivity ◦ exact matching vs. inexact matching � Exact matching: WGPV �� WGPV (exactly the same) � Inexact matching: WGPV �� WGPV, *GPV, W*PV, WG*V, WGP* (at most one mismatch character) 16/22

Two factors affect translation performance (cont.) � Template pool size SymPred has the potential to improve further when the number of proteins of known structures continue increasing 17/22

Performance Comparison on EVA_Set1 EVA_Set1 Q3 ERRsig SOV ERRsig (80 proteins) Q3 SOV ± 1.4 ± 1.9 SymPred 78.8 76.4 ± 1.2 ± 1.5 SAM ‐ T99sec 77.2 74.6 ± 1.4 ± 2.0 PSIPRED 76.8 75.4 ± 1.4 ± 1.9 PROFsec 75.5 74.9 ± 1.4 ± 1.9 PHDpsi 73.4 69.5 18/22

Performance Comparison on EVA_Set2 EVA_Set2 Q3 ERRsig SOV ERRsig (212 proteins) Q3 SOV ± 0.9 ± 1.2 SymPred 79.2 76.0 ± 0.8 ± 1.1 PSIPRED 77.8 75.4 ± 0.8 ± 1.1 PROFsec 76.7 74.8 ± 0.8 ± 1.2 PHDpsi 75.0 70.9 19/22

20/22 Confidence Level vs. Q3 PCC = 0.992

Conclusions � Local similarities in protein sequences exhibit conserved structures. � With the increasing number of protein sequences of known structures, SymPred can further improve prediction accuracy. � The prediction result is traceable. � Our dictionary based approach is general for various protein related problems. � Synonymous words provide an alternative sequence analysis method. 22/22

Thank You ! Please visit our web server http://bio-cluster.iis.sinica.edu.tw/~bioapp/SymPred/

Improving protein secondary structure prediction based on short - PowerPoint PPT Presentation

Improving protein secondary structure prediction based on short subsequences with local structure similarity Hsin-Nan Lin, Ting-Yi Sung, Shinn-Ying Ho, Wen-Lian Hsu Bioinformatics Program, TIGP (Taiwan International Graduate Program), Academia

Secondary Framing Secondary Framing Secondary Framing Secondary Framing 1 1 Secondary Framing

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for

Protein Structure Prediction Protein = chain of amino acids (AA) aa connected by peptide

Supervised Convolutional GSN for Protein Secondary Structure Prediction Jian Zhou Olga

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Collaboration-based Function Prediction in Protein-Protein Interaction networks Hossein Rahmani

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

B uchi Complementation via Alternating Automata Fabian Reiter July 16, 2012 B uchi

x = y; List list1; n

Knots-quivers correspondence, lattice paths, and rational knots Marko Sto si c 1 CAMGSD,

The most correct 0 (770) meson mass and width values E.Bartos, S.Dubnicka , Anna Z.Dubnickova,

data Rob Nowak www.ece.wisc.edu/~nowak OSL, Les Houches, January 10, 2013 Learning from

Real Physics from Unphysical Simulations Steven G. Johnson MIT Applied Mathematics, MIT

Nuclear RG perspective on SRC and EMC physics Dick Furnstahl Department of Physics Ohio State

Applications of Renormalization Group Methods in Nuclear Physics 6 Dick Furnstahl Department

Sambuz

Useful Links

Newsletter

Mail Us

Improving protein secondary structure prediction based on short - PowerPoint PPT Presentation

Improving protein secondary structure prediction based on short subsequences with local structure similarity Hsin-Nan Lin, Ting-Yi Sung, Shinn-Ying Ho, Wen-Lian Hsu Bioinformatics Program, TIGP (Taiwan International Graduate Program), Academia

Secondary Framing Secondary Framing Secondary Framing Secondary Framing 1 1 Secondary Framing

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for

Protein Structure Prediction Protein = chain of amino acids (AA) aa connected by peptide

Supervised Convolutional GSN for Protein Secondary Structure Prediction Jian Zhou Olga

DeepLoc Data set statistics &amp; performance Protein prediction II Gregor Sturm, Johannes Rest,

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Protein Structure Bioinformatics Introduction Secondary Structure Prediction &amp; Fold

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Collaboration-based Function Prediction in Protein-Protein Interaction networks Hossein Rahmani

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

B uchi Complementation via Alternating Automata Fabian Reiter July 16, 2012 B uchi

x = y; List list1; n

Knots-quivers correspondence, lattice paths, and rational knots Marko Sto si c 1 CAMGSD,

The most correct 0 (770) meson mass and width values E.Bartos, S.Dubnicka , Anna Z.Dubnickova,

data Rob Nowak www.ece.wisc.edu/~nowak OSL, Les Houches, January 10, 2013 Learning from

Real Physics from Unphysical Simulations Steven G. Johnson MIT Applied Mathematics, MIT

Nuclear RG perspective on SRC and EMC physics Dick Furnstahl Department of Physics Ohio State

Applications of Renormalization Group Methods in Nuclear Physics 6 Dick Furnstahl Department

Sambuz

Useful Links

Newsletter

Mail Us

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold