A Grammatical Inference approach to Transmembrane domain prediction. - PowerPoint PPT Presentation

A Grammatical Inference approach to Transmembrane domain prediction. Piedachu Peris, Dami´ an L´ opez and Marcelino Campos Departamento de Sistemas Inform´ aticos y Computaci´ on. Universidad Polit´ ecnica de Valencia. pperis@dsic.upv.es dlopez@dsic.upv.es mcampos@dsic.upv.es

Introduction Transmembrane proteins: Involved in: • Communication between cells • Transport of ions and nutrients • Reception of viruses • Diabetes, hypertension, depression, arthritis, cancer... 1

Introduction Prediction of transmembrane regions in proteins. Different approaches: • Hidden Markov Models: ◦ Sonnhammer E. et al.: TMHMM • Neural Networks: ◦ Fariselli P. et al.: HTP • Statistical analysis: ◦ Pasquier C. et al.: PRED-TMR Our approach (igTM): Based on Grammatical Inference. 2

Preliminary concepts (I) Alphabet: Σ = { a, b, c, d, e, f, g } ∆ = { A, B, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, Z } Word: u = abababab w = abcddabfged fc v = MNY IFDLSILLV V A Language: L 1 = { a n b n : n ≥ 1 } L 2 = { transmembrane proteins sequences } f m a n : m ≥ 1 n ≥ 0 } L 3 = { d 3

Preliminary concepts (II) Finite automaton: a a b Transducer: a/0 a/0 p=1 p=0.8 a/1 a/1 b/1 b/1 p=0.2 4

Grammatical Inference (GI) Goal: Learn a language from a sample of words. S = { aab, aaaab, aaaaaab } Different GI algorithm → different language: L a = { a n b : n ≥ 1 } L b = { a n b : n ≥ 2 } L c = { ( aa ) n b : n ≥ 1 } Greater alphabet → more difficult to learn a language. 5

Method 1. Words: Set of proteins (sequences of amino acids) W = { MDAIKKM, GDAV KK, MDAAIKKM } 2. Alphabet reduction: Dayhoff Amino acid Dayhoff MDAIKKM GDAVKK MDAAIKKM C a G, S, T, A, P b ecbedde bcbedd ecbbedde D, E, N, q c 3. Domain and topology annotation: R, H, K, d L, V, M, I e ecbedde bcbedd ecbbedde Y, F, W f iiMMMoo ooMMMi iiiMMooo B, Z g 6

Method (II) 4. GI process: Inference of a probabilistic transducer: input: protein + annotation (each symbol related to its label): [ei][ci][bM][eM][dM][do][eo] [bo][co][bM][eM][dM][di] [ei][ci][bi][bM][eM][do][do][eo] d/o 1/3 e/o 2/3 b/i 1/2 c/i e/i 2/3 d/o 1/3 b/M b/M 1/2 d/o 1/2 b/o 1/3 d/M 2/3 c/o b/M e/M d/i 1/2 output: annotation of words (proteins): iiMMMoo ooMMMi iiiMMooo 5. Test phase: returns the transduction that is most likely produced by the input string. input: MDAIKKKHL → ecbedddde output: iiiMMoooo 7

Databases We used three datasets to train and test our method: TMHMM database: set of 160 transmembrane proteins, available at: http://www.cbs.dtu.dk/ ∼ krogh/TMHMM . TMPDB: set of 302 transmembrane proteins, available at: http://www.genome.jp/SIT/tsegdir/whatis tmpdb.html . 101-pred-TMR db: Set of 101 transmembrane proteins, used to elaborate the pred-TMR prediction method. We downloaded each of the proteins from Uniprot web page. 8

Performance measures T P Sensitivity (Sn) S n = T P + F N T P Specificity (Sp) S p = T P + F P Correlation coefficient (CC) ( T P × T N ) − ( F N × F P ) √ CC = ( T P + F N ) × ( T N + F P ) × ( T P + F P ) × ( T N + F N ) Average conditional probability (ACP) � � ACP = 1 T P T P T N T N T P + F N + T P + F P + T N + F P + 4 T N + F N Approximated correlation (AC) AC = ( ACP − 0 , 5) × 2 9

Experimentation Encoding and annotation of an example sequence for each different experimental configuration: Sequence: MRVTAPRTLLLLLWGAVALTETWAGSHSMR Dayhoff: edebbbdbeeeeefbbebebcbfbbbdbed TM domains: 4-10, 20-25 exp1: edebbbdbeeeeefbbebebcbfbbbdbed...MMMMMMM.........MMMMMM..... exp2: edebbbdbeeeeefbbebebcbfbbbdbedoooMMMMMMMiiiiiiiiiMMMMMMooooo exp3: edebbbdbeeeeefbbebebcbfbbbdbedoooNNNNNNNiiiiiiiiiPPPPPPooooo exp4: edebbbdbeeeeefbbebebcbfbbbdbedOOONNNNNNNiiiiIIIIIPPPPPPooooo exp5: edebbbdbeeeeefbbebebcbfbbbdbedooCMMMMMMMDiiiiiiiAMMMMMMBoooo exp6: MRVTAPRTLLLLLWGAVALTETWAGSHSMRoooMMMMMMMiiiiiiiiiMMMMMMooooo 10

Results - TMHMM database TMHMM database Sn Sp AC 0.795 0.808 0.692 exp2 0.820 0.794 0.703 exp3 0.748 0.801 0.656 igTM exp4 0.808 0.702 exp5 0.810 0.819 0.796 exp6 0.707 0.900 0.879 0.827 TMHMM 0.786 0.898 0.767 Pred-TMR 0.832 0.854 0.768 S-TMHMM 11

Results - TMPDB TMPDB Sn Sp AC 0.675 0.757 0.538 exp1 0.690 0.751 0.542 exp2 0.670 0.741 0.530 exp3 0.601 0.735 0.476 igTM exp4 0.683 0.750 0.539 exp5 0.710 exp6 0.759 0.557 0.739 0.831 0.659 TMHMM 0.777 0.899 0.756 Pred-TMR 0.737 0.829 0.659 S-TMHMM 12

Results - 101-PRED-TMR-DB 101-PRED-TMR-DB Sn Sp CC AC 0.810 0.811 0.702 0.702 exp2 0.758 0.781 0.667 0.652 exp3 0.693 0.795 0.640 0.618 igTM exp4 0.793 0.697 0.692 exp5 0.821 0.801 0.820 exp6 0.855 0.709 0.899 0.871 0.822 0.817 TMHMM 0.814 0.909 0.792 0.795 Pred-TMR - - 0.77 - WaveTM - - 0.82 - HMMTOP 0.831 0.840 0.772 0.760 S-TMHMM 13

Conclusions and future work Results in line with those existing in literature This system does not need any biological knowledge. Method can be tested online at: http://esparta.dsic.upv.es:8080/code/igtm.php Future work: • use this method together with another one, based on HMM, to perform better. • train this method with another (larger if possible) databases (e.g.: http://opm.phar.umich.edu/) • new inference algorithms 14

Thank you! Any question? 15

A Grammatical Inference approach to Transmembrane domain prediction. - PowerPoint PPT Presentation

A Grammatical Inference approach to Transmembrane domain prediction. Piedachu Peris, Dami an L opez and Marcelino Campos Departamento de Sistemas Inform aticos y Computaci on. Universidad Polit ecnica de Valencia.

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Grammatical markers and grammatical relations in the simple clause in Old French Nicolas

Syntax Valency Jirka Hana Jirka Hana Syntax Valency Grammatical Roles Adjunct versus

Modeling Ensembles of Transmembrane -barrel Proteins Jrme Waldisphl 1,2,* , Charles W.

. Modeling and predicting the structure of transmembrane proteins uhl 123 , Jean-Marc Steyaert 2

3D Structure modeling & analysis of transmembrane protein EVI2A from Homo sapiens Vinit Kale 1

1 2 Abstract: Matriptase-2 is a type II transmembrane serine protease and the key regulator of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Assessment of Chinese Grammatical Knowledge for D/hh children: Current findings and implications

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

MOLTO: Multilingual On-Line Translation Or: Using Grammatical Framework to Build

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher

Nonnegative Matrix Factorization and Applications Christine De Mol (joint work with Michel

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

Genetics and pathophysiology of ARVC AJ Marian, M.D. Center for Cardiovascular Genetics B rown

Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches

4/18/2013 Disclosures Facts & Fiction about Pediatric Obesity Treatm ent: Nutrition &

Algorithms in Bioinformatics: Proteins Methods for protein Molecular Distance Geometry

CEE 370 Environmental Engineering Principles Lecture #8 Environmental Chemistry VI: Acids-

Foundations of Pharmaceutical Science Foundations of Pharmaceutical Science (Hass, Voigt, Balaz)

A Grammatical Inference approach to Transmembrane domain prediction. - PowerPoint PPT Presentation

A Grammatical Inference approach to Transmembrane domain prediction. Piedachu Peris, Dami an L opez and Marcelino Campos Departamento de Sistemas Inform aticos y Computaci on. Universidad Polit ecnica de Valencia.

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Grammatical markers and grammatical relations in the simple clause in Old French Nicolas

Syntax Valency Jirka Hana Jirka Hana Syntax Valency Grammatical Roles Adjunct versus

Modeling Ensembles of Transmembrane -barrel Proteins Jrme Waldisphl 1,2,* , Charles W.

. Modeling and predicting the structure of transmembrane proteins uhl 123 , Jean-Marc Steyaert 2

3D Structure modeling &amp; analysis of transmembrane protein EVI2A from Homo sapiens Vinit Kale 1

1 2 Abstract: Matriptase-2 is a type II transmembrane serine protease and the key regulator of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Assessment of Chinese Grammatical Knowledge for D/hh children: Current findings and implications

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

MOLTO: Multilingual On-Line Translation Or: Using Grammatical Framework to Build

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher

Nonnegative Matrix Factorization and Applications Christine De Mol (joint work with Michel

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

Genetics and pathophysiology of ARVC AJ Marian, M.D. Center for Cardiovascular Genetics B rown

Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches

4/18/2013 Disclosures Facts &amp; Fiction about Pediatric Obesity Treatm ent: Nutrition &amp;

Algorithms in Bioinformatics: Proteins Methods for protein Molecular Distance Geometry

CEE 370 Environmental Engineering Principles Lecture #8 Environmental Chemistry VI: Acids-

Foundations of Pharmaceutical Science Foundations of Pharmaceutical Science (Hass, Voigt, Balaz)

3D Structure modeling & analysis of transmembrane protein EVI2A from Homo sapiens Vinit Kale 1

4/18/2013 Disclosures Facts & Fiction about Pediatric Obesity Treatm ent: Nutrition &