go2pub pubmed query tool based on semantic expansion of
play

GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene - PowerPoint PPT Presentation

GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene Ontology Terms, a Lipid Metabolism Case Study Charles Bettembourg, Christian Diot, Anita Burgun and Olivier Dameron INRA UMR598 - INSERM U936 03/07/2012 Bettembourg, Diot,


  1. GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene Ontology Terms, a Lipid Metabolism Case Study Charles Bettembourg, Christian Diot, Anita Burgun and Olivier Dameron INRA UMR598 - INSERM U936 03/07/2012 — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 1 / 39

  2. Introduction Context: Literature search Context: Literature search Pubmed : more than 20 million citations Fast continuous growth — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

  3. Introduction Context: Literature search Context: Literature search Pubmed : more than 20 million citations Fast continuous growth — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

  4. Introduction Context: Literature search Context: Literature search Pubmed : more than 20 million citations Fast continuous growth Numerous queries to build... . . . and relevant results to select. Need for automatic search tools — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

  5. Introduction Requirements Requirements Precise and complex queries — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

  6. Introduction Requirements Requirements Precise and complex queries Low silence and noise Available articles — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

  7. Introduction Requirements Requirements Precise and complex queries Low silence and noise Available articles Relevant article Non relevant article — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

  8. Introduction Requirements Requirements Precise and complex queries Low silence and noise Available articles Search results Relevant article Non relevant article Silence Noise — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

  9. Introduction Requirements Relevance measures: Precision PubMed Search results Relevant article Non relevant article Silence Noise Precision = Relevant retrieved documents = All retrieved documents + Precision Precision is the ratio between relevant retrieved documents and all the results obtained by the search tool for a query. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 4 / 39

  10. Introduction Requirements Relevance measures: Recall PubMed Search results Relevant article Non relevant article Silence Noise Recall = Relevant retrieved documents = All relevant documents + Recall Precision is the ratio between relevant retrieved documents and all the relevants documents available in the database. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 5 / 39

  11. Introduction Requirements Relevance measures: F-Score F-Score Measure combining precision and recall (1 + β²) . (Precision . Recall) F = β (β² . Precision + Recall) 2 . (Precision . Recall) F = 1 (Precision + Recall) — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 6 / 39

  12. Introduction Problems Domain specific vocabulary Application: literature search for species-specific metabolisms ◮ ex: lipid metabolism for chicken Several methods and tools ◮ Interface to set filters on PubMed queries ◮ Text-mining approaches ⋆ Natural Language Process ⋆ Latent Semantic Analysis BUT: Need a corpus of specific vocabulary — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 7 / 39

  13. Introduction Problems Complex querying process Writing exhaustive and complex queries relies on domain-specific knowledge ◮ ex: lipid metabolism Need a lot of keywords for a complex query ◮ Contradicts user-friendly requirement Automatic query enrichment using ontologies reconciles both requirements — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 8 / 39

  14. Introduction Problems Complex querying process Writing exhaustive and complex queries relies on domain-specific knowledge ◮ ex: lipid metabolism Need a lot of keywords for a complex query ◮ Contradicts user-friendly requirement Automatic query enrichment using ontologies reconciles both requirements Ontology definition (Bard, 2004) An ontology is a formal way of representing knowledge in which concepts are described both by their meaning and their relationship to each other. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 8 / 39

  15. Semantic expansion and query enrichment Gene Ontology Gene Ontology Controlled vocabulary Hierarchy with inheritance — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 9 / 39

  16. Semantic expansion and query enrichment Gene Ontology Gene Ontology Controlled vocabulary Hierarchy with inheritance More than 34.000 terms to describe: ◮ Biological Processes ◮ Molecular Functions ◮ Cellular Components — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 9 / 39

  17. Semantic expansion and query enrichment Gene Ontology Annotations Gene Ontology Annotations Functional annotation of genes Multi-species One GO term may annote many genes Each gene can be used as PubMed keyword — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 10 / 39

  18. Semantic expansion and query enrichment Gene Ontology Annotations Gene Ontology Annotations Functional annotation of genes Multi-species One GO term may annote many genes Each gene can be used as PubMed keyword Main idea The genes annotated by a GO term of interest or one of its descendants can be used as keywords in a PubMed query. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 10 / 39

  19. Semantic expansion and query enrichment Expansion illustration Expansion illustration PPAR Regulation of fatty acid metabolic process (GO:0019217) CAV1 Regulation of fatty acid biosynthetic process (GO:0042304) Negative regulation of Positive regulation of fatty acid biosynthetic fatty acid biosynthetic process (GO:0045717) process (GO:0045723) ChREBP APOA1 BRCA1 — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 11 / 39

  20. Semantic expansion and query enrichment Expansion illustration Expansion illustration PPAR Regulation of fatty acid metabolic process (GO:0019217) CAV1 Regulation of fatty acid biosynthetic process (GO:0042304) Negative regulation of Positive regulation of fatty acid biosynthetic fatty acid biosynthetic process (GO:0045717) process (GO:0045723) ChREBP APOA1 BRCA1 Extension to the descendants is important Not supported by other tools — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 11 / 39

  21. Semantic expansion and query enrichment Example Example GO:0019217 Regulation of fatty acid metabolic process 14 genes 57 symbols, names and synonyms = keywords — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 12 / 39

  22. Semantic expansion and query enrichment Example Semantic expansion is useful GO:0019217 (Regulation of fatty acid metabolic process), Chicken — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

  23. Semantic expansion and query enrichment Example Semantic expansion is useful GO:0019217 (Regulation of fatty acid metabolic process), Chicken Without query expansion: ◮ 2 articles concerning only 1 gene — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

  24. Semantic expansion and query enrichment Example Semantic expansion is useful GO:0019217 (Regulation of fatty acid metabolic process), Chicken Without query expansion: ◮ 2 articles concerning only 1 gene With query expansion: ◮ 9 articles concerning 7 genes — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

  25. GO2PUB website http://go2pub.genouest.org — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 14 / 39

  26. GO2PUB website Fill the form Please enter a GO term lipi — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 15 / 39

  27. GO2PUB website Query example Query example — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 16 / 39

  28. Relevance of GO2PUB Method Relevance analysis Comparison of GO2PUB with PubMed query system and with GoPubMed — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

  29. Relevance of GO2PUB Method Relevance analysis Comparison of GO2PUB with PubMed query system and with GoPubMed Qualitative analysis ◮ Selection of relevant results of 3 very specific queries sent to GO2PUB, PubMed and GoPubMed ◮ Calculation of Precision, Recall and F-score for each tool. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

  30. Relevance of GO2PUB Method Relevance analysis Comparison of GO2PUB with PubMed query system and with GoPubMed Qualitative analysis ◮ Selection of relevant results of 3 very specific queries sent to GO2PUB, PubMed and GoPubMed ◮ Calculation of Precision, Recall and F-score for each tool. Generalization study ◮ Comparison of the results obtained with 20 random GO terms ◮ Selection of relevant results and computation of Precision, Recall and F-score for GO2PUB and GoPubMed — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend