enginedb a repository of functional analogue gene products
play

ENGINEDB: A repository of functional analogue gene products Giulia - PowerPoint PPT Presentation

ENGINEDB: A repository of functional analogue gene products Giulia De Sario, Angelica Tulipano, Andreas Gisel Istituto di Tecnologie Biomediche, Sede Bari, CNR, Via Amendola 122/D, Bari, Italy BITS 2009, Genova Andreas Gisel ITB-Bari CNR


  1. ENGINEDB: A repository of functional analogue gene products Giulia De Sario, Angelica Tulipano, Andreas Gisel Istituto di Tecnologie Biomediche, Sede Bari, CNR, Via Amendola 122/D, Bari, Italy BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  2. Functional Analogues 402AA 431AA Sequence Identity: 0,0510441 InterPro - Protease_inhib_I4_serpin. InterPro - EGF. EGF_3. EGF_like_reg_CS. Kringle. Peptidase_S1_S6. Peptidase_S1A. Gene Ontology serine-type endopeptidase activity 0.00139635 blood coagulation 5.07353e-05 fibrinolysis 2.5489e-06 PAI1_HUMAN UROK_HUMAN Urokinase-type plasminogen activator Plasminogen activator inhibitor BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  3. How do we find these functional analogues? BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  4. Gene Ontology • GO is an international standard to annotate genes: – www.genontology.org • is structured as a directed acyclic graph with three independent branches with top-level terms – ‘molecular function’, – ‘biological process’ and – ‘cellular component’ • data are available in a public database GODB www.godatabase.org/dev • more than 4.800.000 gene products are described by the GO terms • more than 27800 GO terms ending up with >24.700.000 associations • Updated about every two months BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  5. Gene Ontology path indirectly associated term directly associated term Semantic similarity measurement P(term) = # gene products associated to the term or any of its children # total associations between all GO terms and gene products Resnik P, J Artif Intelligence Res 1999, 11: 95-130. BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  6. Algorithm • Through a χ ² statistical test we compare two gene product A and B: – we count the number of the GO terms directly or indirectly associated which are common and uncommon to two genes; – we weight each term with 1-p(term), giving more importance to specific terms. # go terms in A # go terms not in A # go terms in B O 11 O 12 # go terms not in B O 21 O 22 • The higher the χ ² value is, the bigger is the probability of functional dependence between the two gene products A and B. BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  7. Data Analysis BCL2_HUMAN Non-redundant list of GO terms  Description of gene product 3,7 million gene products (UniProt) are described by 170925 descriptions 27000 CPU hours Gene analogue finder: a GRID solution for finding functionally analogous gene products. Tulipano A, Donvito G, Licciulli F, Maggi G, Gisel A. BMC Bioinformatics. 2007 Sep 3;8:329. BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  8. Results • Analogues Genus - Species Chi Square Common TermsNo Common Terms +----------------------------------------------+-------------+------+ | name | p_value | code | +----------------------------------------------+-------------+------+ • PAI1_HUMAN Homo - sapiens 26350.67188 47 0 | protease binding | 1.8611e-06 | IPI | • FIBR_EISFO Eisenia - fetida 17928.62500 32 0 | protease binding | 1.8611e-06 | IPI | • PLMN_CAPHI Capra - hircus 17407.75586 33 2 | serine-type endopeptidase activity | 0.00139635 | IEA | | serine-type endopeptidase inhibitor activity | 0.000221876 | IEA | • CEKI_CAEEC Caesalpinia - echinata 16403.94922 30 1 | serine-type endopeptidase inhibitor activity | 0.000221876 | EXP | • Serpinf2 Rattus - norvegicus 15237.88184 33 7 | serine-type endopeptidase inhibitor activity | 0.000221876 | EXP | • Tmprss6_predicted Rattus - norvegicus 15092.10352 36 12 | protein binding | 0.0116756 | IPI | • UROK_HUMAN Homo - sapiens 14343.85547 38 18 | protein binding | 0.0116756 | IPI | | blood coagulation | 5.07353e-05 | TAS | • NVSP_NERVI Nereis - virens 14327.84277 33 10 | fibrinolysis | 2.5489e-06 | TAS | • PLMN_STRCA Struthio - camelus 14327.76074 33 10 | regulation of angiogenesis | 8.17267e-06 | IEA | • FA12_PIG Sus - scrofa 14327.75977 33 10 | extracellular region | 0.00467537 | NAS | • FA12_BOVIN Bos - taurus 14327.75977 33 10 | extracellular region | 0.00467537 | EXP | | extracellular region | 0.00467537 | EXP | • FA12_CAVPO Cavia - porcellus 14327.75977 33 10 | plasma membrane | 0.0123278 | EXP | • FIBC_LUMRU Lumbricus - rubellus 14129.44141 32 9 +----------------------------------------------+-------------+------+ • PLMN_PIG Sus - scrofa 13983.45605 33 11 • PLMN_PETMA Petromyzon - marinus 13983.45605 33 11 • FA12_HUMAN Homo - sapiens 13983.43848 33 11 • TFPI1_HUMAN Homo - sapiens 13799.77637 26 1 • ANTA_HYDMA Hydra - magnipapillata 13774.76465 29 5 • ANTA_HAEOF Haementeria - officinalis 13774.76465 29 5 +------------------------------------+-------------+------+ • KLKB1_HUMAN Homo - sapiens 13655.81934 33 12 | name | p_value | code | • Klkb1 Mus - musculus 13655.81934 33 12 +------------------------------------+-------------+------+ | serine-type endopeptidase activity | 0.00139635 | IEA | • KLKB1_BOVIN Bos - taurus 13655.81934 33 12 | peptidase activity | 0.0169393 | IEA | • Klkb1 Mus - musculus 13655.81934 33 12 | response to hypoxia | 1.81255e-05 | IEA | • DISA_AGKCO Agkistrodon - contortrix 13589.35840 27 3 | proteolysis | 0.00855986 | TAS | • DISB_VIPLE Macrovipera - lebetina 13589.35840 27 3 | chemotaxis | 0.0009295 | TAS | | signal transduction | 0.0121321 | TAS | • VSP2_TRIEL Protobothrops - elegans 13346.19141 33 13 | blood coagulation | 5.07353e-05 | IEA | • VSP1_TRIEL Protobothrops - elegans 13346.19141 33 13 | smooth muscle cell migration | 1.3756e-06 | IEA | • f7i Danio - rerio 13272.62012 33 13 | fibrinolysis | 2.5489e-06 | IEA | • PLMN_ERIEU Erinaceus - europaeus 13218.69531 34 15 | extracellular region | 0.00467537 | IEA | | plasma membrane | 0.0123278 | EXP | • PLMN_MACEU Macropus - eugenii 13218.69531 34 15 +------------------------------------+-------------+------ BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  9. Data filtering Comparing 170925 descriptions would produce 14,6*10 9 results. We introduced wo threshold to limit the data to significant results: a) On χ ²-value 77 25000 20000 15000 77 10000 5000 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 a) On average p-value of common terms BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  10. The Access http://spank.ba.itb.cnr.it/engine/ BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  11. The Access http://spank.ba.itb.cnr.it/engine/ BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  12. The Access http://spank.ba.itb.cnr.it/engine/ BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  13. The Access Webservice http://spank.ba.itb.cnr.it/docs/engineDB.wsdl <soapenv:Envelope <s> xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:fun="http://cathdb.info/FuncNet_1_0/"> <p1>P30273</p1> <soapenv:Header/> <p2>O00241</p2> <soapenv:Body> <fun:ScorePairwiseRelations> <rs>14321.45508</rs> <proteins1> <pv>0.18345472940256</pv> <p>A3EXL0</p> <p>Q8NFN7</p> </s> <p>O75865</p> <p>Q5SRD3</p> <p>Q9Y5G3</p> <p>O60486</p> <p>P19012</p> FuncNet is an open platform for the prediction and comparison of <p>Q9NWG8</p> <p>P30273</p> protein function, funded by the European Union’s <p>Q92817</p> EMBRACE Network of Excellence, and developed in partnership with </proteins1> <proteins2> the ENFIN project. <p>Q5SR05</p> <p>Q9H8H3</p> <p>P22676</p> <p>O00241</p> It is designed to answer questions like: <p>O14498</p> <p>P78552</p> <p>Q8NF37</p> Given one set of proteins which are known to share a particular <p>Q8NGM6</p> <p>Q0ZAJ7</p> biological function… <p>Q6PIM1</p> </proteins2> </fun:ScorePairwiseRelations> </soapenv:Body> … which of these other proteins also share that function? </soapenv:Envelope> http:/funcnet.eu/ BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  14. The Access Webservice http://spank.ba.itb.cnr.it/docs/engineDB.wsdl BITS 2009, Genova Andreas Gisel ITB-Bari CNR

  15. In Future • Compatible with different gene product identifiers • Sequence comparison • Domain comparison • Select specific organisms • Search with user defined keywords BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend