ENGINEDB: A repository of functional analogue gene products Giulia - - PowerPoint PPT Presentation

enginedb a repository of functional analogue gene products
SMART_READER_LITE
LIVE PREVIEW

ENGINEDB: A repository of functional analogue gene products Giulia - - PowerPoint PPT Presentation

ENGINEDB: A repository of functional analogue gene products Giulia De Sario, Angelica Tulipano, Andreas Gisel Istituto di Tecnologie Biomediche, Sede Bari, CNR, Via Amendola 122/D, Bari, Italy BITS 2009, Genova Andreas Gisel ITB-Bari CNR


slide-1
SLIDE 1

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

ENGINEDB: A repository of functional analogue gene products

Giulia De Sario, Angelica Tulipano, Andreas Gisel

Istituto di Tecnologie Biomediche, Sede Bari, CNR, Via Amendola 122/D, Bari, Italy

slide-2
SLIDE 2

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Functional Analogues

InterPro - Protease_inhib_I4_serpin. InterPro - EGF. EGF_3. EGF_like_reg_CS. Kringle. Peptidase_S1_S6. Peptidase_S1A.

Sequence Identity: 0,0510441 431AA 402AA UROK_HUMAN

Urokinase-type plasminogen activator

PAI1_HUMAN

Plasminogen activator inhibitor

serine-type endopeptidase activity 0.00139635 blood coagulation 5.07353e-05 fibrinolysis 2.5489e-06

Gene Ontology

slide-3
SLIDE 3

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

How do we find these functional analogues?

slide-4
SLIDE 4

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Gene Ontology

  • GO is an international standard to annotate genes:

– www.genontology.org

  • is structured as a directed acyclic graph with three independent

branches with top-level terms

– ‘molecular function’, – ‘biological process’ and – ‘cellular component’

  • data are available in a public database GODB

www.godatabase.org/dev

  • more than 4.800.000 gene products are described by the GO terms
  • more than 27800 GO terms ending up with >24.700.000 associations
  • Updated about every two months
slide-5
SLIDE 5

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Gene Ontology

P(term) = # gene products associated to the term or any of its children # total associations between all GO terms and gene products

Semantic similarity measurement path directly associated term indirectly associated term

Resnik P, J Artif Intelligence Res 1999, 11:95-130.

slide-6
SLIDE 6

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Algorithm

  • Through a χ² statistical test we compare two gene product

A and B:

– we count the number of the GO terms directly or indirectly associated which are common and uncommon to two genes; – we weight each term with 1-p(term), giving more importance to specific terms.

  • The higher the χ² value is, the bigger is the probability of

functional dependence between the two gene products A and B.

O22 O21 # go terms not in B O12 O11 # go terms in B # go terms not in A # go terms in A

slide-7
SLIDE 7

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Data Analysis

BCL2_HUMAN

Non-redundant list of GO terms  Description of gene product

3,7 million gene products (UniProt) are described by 170925 descriptions

27000 CPU hours

Gene analogue finder: a GRID solution for finding functionally analogous gene products. Tulipano A, Donvito G, Licciulli F, Maggi G, Gisel A. BMC Bioinformatics. 2007 Sep 3;8:329.

slide-8
SLIDE 8

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Results

  • Analogues

Genus - Species Chi Square Common TermsNo Common Terms

  • PAI1_HUMAN

Homo - sapiens 26350.67188 47

  • FIBR_EISFO

Eisenia - fetida 17928.62500 32

  • PLMN_CAPHI

Capra - hircus 17407.75586 33 2

  • CEKI_CAEEC

Caesalpinia - echinata 16403.94922 30 1

  • Serpinf2

Rattus - norvegicus 15237.88184 33 7

  • Tmprss6_predicted

Rattus - norvegicus 15092.10352 36 12

  • UROK_HUMAN

Homo - sapiens 14343.85547 38 18

  • NVSP_NERVI

Nereis - virens 14327.84277 33 10

  • PLMN_STRCA

Struthio - camelus 14327.76074 33 10

  • FA12_PIG

Sus - scrofa 14327.75977 33 10

  • FA12_BOVIN

Bos - taurus 14327.75977 33 10

  • FA12_CAVPO

Cavia - porcellus 14327.75977 33 10

  • FIBC_LUMRU

Lumbricus - rubellus 14129.44141 32 9

  • PLMN_PIG

Sus - scrofa 13983.45605 33 11

  • PLMN_PETMA

Petromyzon - marinus 13983.45605 33 11

  • FA12_HUMAN

Homo - sapiens 13983.43848 33 11

  • TFPI1_HUMAN

Homo - sapiens 13799.77637 26 1

  • ANTA_HYDMA

Hydra - magnipapillata 13774.76465 29 5

  • ANTA_HAEOF

Haementeria - officinalis 13774.76465 29 5

  • KLKB1_HUMAN

Homo - sapiens 13655.81934 33 12

  • Klkb1

Mus - musculus 13655.81934 33 12

  • KLKB1_BOVIN

Bos - taurus 13655.81934 33 12

  • Klkb1

Mus - musculus 13655.81934 33 12

  • DISA_AGKCO

Agkistrodon - contortrix 13589.35840 27 3

  • DISB_VIPLE

Macrovipera - lebetina 13589.35840 27 3

  • VSP2_TRIEL

Protobothrops - elegans 13346.19141 33 13

  • VSP1_TRIEL

Protobothrops - elegans 13346.19141 33 13

  • f7i

Danio - rerio 13272.62012 33 13

  • PLMN_ERIEU

Erinaceus - europaeus 13218.69531 34 15

  • PLMN_MACEU

Macropus - eugenii 13218.69531 34 15

+----------------------------------------------+-------------+------+ | name | p_value | code | +----------------------------------------------+-------------+------+ | protease binding | 1.8611e-06 | IPI | | protease binding | 1.8611e-06 | IPI | | serine-type endopeptidase activity | 0.00139635 | IEA | | serine-type endopeptidase inhibitor activity | 0.000221876 | IEA | | serine-type endopeptidase inhibitor activity | 0.000221876 | EXP | | serine-type endopeptidase inhibitor activity | 0.000221876 | EXP | | protein binding | 0.0116756 | IPI | | protein binding | 0.0116756 | IPI | | blood coagulation | 5.07353e-05 | TAS | | fibrinolysis | 2.5489e-06 | TAS | | regulation of angiogenesis | 8.17267e-06 | IEA | | extracellular region | 0.00467537 | NAS | | extracellular region | 0.00467537 | EXP | | extracellular region | 0.00467537 | EXP | | plasma membrane | 0.0123278 | EXP | +----------------------------------------------+-------------+------+ +------------------------------------+-------------+------+ | name | p_value | code | +------------------------------------+-------------+------+ | serine-type endopeptidase activity | 0.00139635 | IEA | | peptidase activity | 0.0169393 | IEA | | response to hypoxia | 1.81255e-05 | IEA | | proteolysis | 0.00855986 | TAS | | chemotaxis | 0.0009295 | TAS | | signal transduction | 0.0121321 | TAS | | blood coagulation | 5.07353e-05 | IEA | | smooth muscle cell migration | 1.3756e-06 | IEA | | fibrinolysis | 2.5489e-06 | IEA | | extracellular region | 0.00467537 | IEA | | plasma membrane | 0.0123278 | EXP | +------------------------------------+-------------+------

slide-9
SLIDE 9

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Comparing 170925 descriptions would produce 14,6*109 results. We introduced wo threshold to limit the data to significant results: a) On χ²-value a) On average p-value of common terms

77 5000 10000 15000 20000 25000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 77

Data filtering

slide-10
SLIDE 10

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

The Access

http://spank.ba.itb.cnr.it/engine/

slide-11
SLIDE 11

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

The Access

http://spank.ba.itb.cnr.it/engine/

slide-12
SLIDE 12

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

The Access

http://spank.ba.itb.cnr.it/engine/

slide-13
SLIDE 13

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

The Access

http://spank.ba.itb.cnr.it/docs/engineDB.wsdl

Webservice

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:fun="http://cathdb.info/FuncNet_1_0/"> <soapenv:Header/> <soapenv:Body> <fun:ScorePairwiseRelations> <proteins1> <p>A3EXL0</p> <p>Q8NFN7</p> <p>O75865</p> <p>Q5SRD3</p> <p>Q9Y5G3</p> <p>O60486</p> <p>P19012</p> <p>Q9NWG8</p> <p>P30273</p> <p>Q92817</p> </proteins1> <proteins2> <p>Q5SR05</p> <p>Q9H8H3</p> <p>P22676</p> <p>O00241</p> <p>O14498</p> <p>P78552</p> <p>Q8NF37</p> <p>Q8NGM6</p> <p>Q0ZAJ7</p> <p>Q6PIM1</p> </proteins2> </fun:ScorePairwiseRelations> </soapenv:Body> </soapenv:Envelope>

<s> <p1>P30273</p1> <p2>O00241</p2> <rs>14321.45508</rs> <pv>0.18345472940256</pv> </s>

FuncNet is an open platform for the prediction and comparison of protein function, funded by the European Union’s EMBRACE Network of Excellence, and developed in partnership with the ENFIN project. It is designed to answer questions like: Given one set of proteins which are known to share a particular biological function… … which of these other proteins also share that function? http:/funcnet.eu/

slide-14
SLIDE 14

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

The Access

http://spank.ba.itb.cnr.it/docs/engineDB.wsdl

Webservice

slide-15
SLIDE 15

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

In Future

  • Compatible with different gene product

identifiers

  • Sequence comparison
  • Domain comparison
  • Select specific organisms
  • Search with user defined keywords
slide-16
SLIDE 16

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Acknowledgement

  • Giulia De Sario1,
  • Angelica Tulipano1,
  • Giacinto Donvito2,
  • Giorgio Maggi2,3,
  • Andreas Gisel*1

(1) Istituto di Tecnologie Biomediche, Sede Bari, CNR, Via Amendola 122/D, Bari, Italy (2) INFN Bari, Via Amendola 173, Bari, Italy (3) Dipartimento Interateneo di Fisica, Università e Politecnico di Bari, via Amendola 173, Bari Italy

slide-17
SLIDE 17

BITS 2009, Genova Andreas Gisel ITB-Bari CNR

Thank you for your attention!