nosql working group use case network of life
play

NoSQL working group Use case: Network of Life Mario David (LIP) - PowerPoint PPT Presentation

NoSQL working group Use case: Network of Life Mario David (LIP) With contribution from Miguel Porto and Rui Figueira (CIBIO Portugal) EGI-Engage 1 www.egi.eu Outline GBIF and Atlas of Living Australia Web portal From GBIF to Network


  1. NoSQL working group Use case: Network of Life Mario David (LIP) With contribution from Miguel Porto and Rui Figueira (CIBIO Portugal) EGI-Engage 1 www.egi.eu

  2. Outline • GBIF and Atlas of Living Australia Web portal • From GBIF to Network of Life • Graph DBs - ArangoDB • Current status and first tests EGI-Engage 2 www.egi.eu

  3. Challenges of GBIF biodiversity data Global Biodiversity Information Facility • 570 million records with many dimensions. • Need to support different spatial scales, information detail, in the same platform. • Ensure confidence, users need to be able to scrutinize all details of information. • The rate of new data addition is not fully predictable. • Crossing data with other types of information (remote sensing, climatic) is also resource-demanding. EGI-Engage Rui Figueira (CIBIO) 3 www.egi.eu

  4. Atlas of Living Australia Platform for web portals and services for societal uses in biodiversity Provide: • Efficient organization and management of biodiversity information, including to find, access and visualize data; • Integration with genetic, habitat, ecosystem and geographical data; • Building different facets, e.g., for Invasive Alien Species, threatened species, nature conservation • Web data services through API. EGI-Engage Rui Figueira (CIBIO) 4 www.egi.eu

  5. One platform, many facets (thematic, regional, national), different user communities EGI-Engage Rui Figueira (CIBIO) 5 www.egi.eu

  6. One platform, many facets (thematic, regional, national), different user communities EGI-Engage Rui Figueira (CIBIO) 6 www.egi.eu

  7. One platform, many facets (thematic, regional, national), different user communities EGI-Engage Rui Figueira (CIBIO) 7 www.egi.eu

  8. One platform, many facets (thematic, regional, national), different user communities EGI-Engage Rui Figueira (CIBIO) 8 www.egi.eu

  9. One platform, many facets (thematic, regional, national), different user communities EGI-Engage Rui Figueira (CIBIO) 9 www.egi.eu

  10. Advantages of cloud solutions Provide: • Scalability of the allocation of resources. • Sharing infrastructure and capacity between members of GBIF network. • Persistence and availability of big volumes of data. EGI-Engage Rui Figueira (CIBIO) 10 www.egi.eu

  11. GBIF ⇒ Net of Life Biologists POV GBIF { { --- --- --- --- --- --- --- --- } } EGI-Engage 11 www.egi.eu

  12. GBIF ⇒ Net of Life Biologists POV Network of Life pollination { --- --- --- { { --- --- --- } --- --- --- --- --- --- } } EGI-Engage 12 www.egi.eu

  13. GBIF ⇒ Net of Life Maths/Comp.Scient POV G = (V, E) V = {v1, v2, …} Graph ⇒ GraphDB E = { {v1, v2}, {v1, v3},... } Vertices Edges EGI-Engage 13 www.egi.eu

  14. GBIF ⇒ Net of Life Maths/Comp.Scient POV GraphDB + Documents ⇒ ArangoDB Vertices Edges { --- --- --- { { --- --- --- } --- --- --- --- --- --- } } Documents EGI-Engage 14 www.egi.eu

  15. ArangoDB - I • Multi-model database: document, graph, key-value • Open source: https://github.com/arangodb/arangodb • Document model : • Data stored as linked JSON-like documents, organized in collections • No schema enforced, but set of indexes can be defined for each collection • Fields can store other subdocuments and pointers to independent documents EGI-Engage Miguel Porto (CIBIO) 15 www.egi.eu

  16. ArangoDB - II • Graph model : • An “interpretation” built upon the document model: • Defined by a set of document collections representing vertices . • Another set of collections representing the edges connecting the vertices. • Vertexes and Edges are documents. • Native support for traversal queries: • Highly customizable behaviour • No need for “infinite” JOINs. • Indexes : • Graph traversal indexes (edge-vertex connections) • Geo indexes (constructed from latitude-longitude fields) • Full text, hash, etc. EGI-Engage Miguel Porto (CIBIO) 16 www.egi.eu

  17. ArangoDB - III • AQL query language : • SQL-like but very different logic: • Entirely JSON-based. • No tables. • Rather complete set of functions to work with documents: • Data aggregation. • Filtering (including Geo functions), etc. • Document and array manipulation • Graph traversal and shortest path functions • Easy querying, processing and output results in the desired data format • Very flexible in chaining and nesting query sentences “Powerful and Fast” EGI-Engage Miguel Porto (CIBIO) 17 www.egi.eu

  18. Network of Life: Architecture Parallelized computations Data analysis native modules JSON data WEB services AQL queries Network of Life Frontends, Web, ArangoDB server Java server R Graph traversal Exposes services for: Visualization Data aggregation ● querying interaction data at different Network queries levels of aggregation Network data analysis ● downloading raw data Hypothesis testing ● submitting data analysis jobs Data downloading ● uploading new data ... EGI-Engage Miguel Porto (CIBIO) 18 www.egi.eu

  19. Some first tests • Simple ArangoDB instance running on the desktop • Good query performance, in particular the ones involving geographic indexes and graph traversal • ArangoDB having integrated geo indexes matches nicely the use case • The application logic should be implemented in the AQL queries. EGI-Engage Miguel Porto (CIBIO) 19 www.egi.eu

  20. Test deployment • ArangoDB in cluster mode ⇒ allow sharding • Deployed 2 VMs in INCD Openstack • Each VM with 2 types of processes: • Coordinators : receives requests, distributes them to the DBServers, executes AQL queries and returns the result to the clients. The coordinator also exposes information about cluster health and cluster statistics. • DBServers : can both store sharded (and non-sharded) collections. • A database and a coordinator can live on the same server. • And… learning the business :) EGI-Engage 20 www.egi.eu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend