biofacets solution towards leveraging the wealth of
play

BioFacets: Solution Towards Leveraging the Wealth of Online - PowerPoint PPT Presentation

DI LS 2006 July 20, 2006 BioFacets: Solution Towards Leveraging the Wealth of Online Biological Databases Malika Mahoui, Zina Ben Miled, Amey Godse, Harshad Kulkarni, Nianhua Li Presented by: Malika Mahoui Biological Domain Data


  1. DI LS 2006 July 20, 2006 BioFacets: Solution Towards Leveraging the Wealth of Online Biological Databases Malika Mahoui, Zina Ben Miled, Amey Godse, Harshad Kulkarni, Nianhua Li Presented by: Malika Mahoui

  2. Biological Domain • Data intensive domain • Gene 900 n u m b e r o f b io lo g ic a l d a ta b a s e s 800 – GenBank 700 – EMBL 600 500 • Protein 400 300 – SwissProt 200 – PI R 100 0 – PDB 1999 2000 2001 2002 2003 2004 2005 2006 2007 DI LS 2006 2

  3. Biological Research • Characteristics: Biological Databases – Representational heterogeneity – Diversity of biological data – Large result sets 1. Querying remote databases 2. I ntegrating multiple databases 3. Representing result sets DI LS 2006 3

  4. Biofacets Solution • features – Meta-search engine for biological databases – Wrapper-mediator approach for data integration – Dynamic Facetted approach for results classification – Results presentation and query refinement based on faceted classification – Cache management and query optimization to support system performance DI LS 2006 4

  5. 5 BioFacets Architecture DI LS 2006

  6. Faceted Classification • Concept largely understood in digital libraries • Assign multiple classifications to a result record • Examples include Flamenco framework for image search • Limitation: assume existence of data/ metadata a priori DI LS 2006 6

  7. Facet & Facet Specification • A method of classification • Facet name • Assignment of value: <Facet – Static fName=”data_type” • Data Type – Dynamic type="static” • Protein Length isHierarchical="false”> • Level: </Facet> – Non-hierarchical • Gene function – Hierarchical • Organism Lineage DI LS 2006 7

  8. Classification Rules Facet Type Rule Type Specification < Rule> < ruleFacetName> data_type< / ruleFacetName> Static fixed value rule < ruleMethod> fixed< / ruleMethod> < Value> protein_data< / Value> < / Rule> < Rule> < ruleFacetName> organism< / ruleFacetName> Dynamic field value rule < ruleMethod> fieldvalue< / ruleMethod> < Field> scientific_name< / Field> < / Rule> < Rule> < ruleFacetName> organism< / ruleFacetName> < ruleMethod> lookup< / ruleMethod> < DataSource> newt< / DataSource> < LookupBaseURL> Dynamic lookup value rule http:/ / www.ebi.ac.uk/ newt/ display? from= au& amp;match= taxonomy+ identifier&amp;search= < / LookupBaseURL> < LookupField> tax_id< / LookupField> < ValueField> scientific_name< / ValueField> < / Rule> DI LS 2006 8

  9. 9 Start Page DI LS 2006

  10. 10 ITEM COUNT HISTORY RESULTS FACET ITEM Faceted Browsing SELECTION FACET DI LS 2006

  11. 11 Demonstration • Demo DI LS 2006

  12. Conclusions • BioFacets has the potential to become the “Google” for biologists enhanced with a dynamic faceted classification approach for results presentation DI LS 2006 12

  13. Acknowledgments • NSF CAREER DBI -DBI -0133946 • NSF DBI -0110854 DI LS 2006 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend