beyond macrobenchmarks
play

Beyond Macrobenchmarks Microbenchmark-based Graph Database - PowerPoint PPT Presentation

Beyond Macrobenchmarks Microbenchmark-based Graph Database Evaluation Matteo Lissandrini, Martin Brugnara, Yannis Velegrakis Universiteit Utrecth Knowledge Graph Protein Interaction Road Network Network Graphs are Everywhere Social


  1. Beyond Macrobenchmarks Microbenchmark-based Graph Database Evaluation Matteo Lissandrini, Martin Brugnara, Yannis Velegrakis Universiteit Utrecth

  2. Knowledge Graph Protein Interaction Road Network Network Graphs are Everywhere Social Network 2 Graph Databases Evaluation – Matteo Lissandrini

  3. PROPERTY node01 GRAPHS 2 0 e g d rences e Na Name: Matteo refere Ro Role: Post-do doc edge01 re In Interests: Graphs Presents Pr node02 On : 2019-08 On Title: Beyond Ti ond…. 08-26 26 Top Topic: Gr GraphDB edge03 in in Edge-labelled Multigraphs node03 G : ⟨ V, E, L, ℓ⟩ name: VLDB’19 na ye year ar: 2019 ID : V / E ↦ ℕ Labeling ℓ : E ↦ L Properties: V/E ↦ { <key,value>, …} 3 Graph Databases Evaluation – Matteo Lissandrini

  4. GRAPH DATABASES Oracle Graph CosmosDB Neptune 4 Graph Databases Evaluation – Matteo Lissandrini

  5. WHERE TO OLAP * STORE A Business-intelligence Batch Algorithms Graph GRAPH? Processing ArangoDB Processing Statistics Blazegraph Mining [Ammar and Özsu, VLDB’18] Neo4j OrientDB Complex Queries Pathfinding Sparksee Connectivity Titan/Janus Export/Import OLTP GraphLab Updates Giraph/Pregel Graph Transaction GraphX Databases Selectivity Indices User-interaction Our Focus Concurrency Availability 5 Graph Databases Evaluation – Matteo Lissandrini

  6. HOW TO CHOOSE ArangoDB THE RIGHT SYSTEM? Blazegraph Neo4j Complex Queries OrientDB Pathfinding Sparksee Connectivity Titan/Janus Export/Import What solution ? works best? OLTP Updates Transaction Selectivity Indices Graph User-interaction Concurrency Databases Availability 6 Graph Databases Evaluation – Matteo Lissandrini

  7. THERE IS NO SILVER BULLET Different Data Characteristics Different Query Types Different Use-cases Different Data Organization Different Indexing/Optimizations Different Query Processing Strategies 7 Graph Databases Evaluation – Matteo Lissandrini

  8. GRAPH DATABASE ARCHITECTURES Specialized Native Query-processing &Algorithms Query Processing How to implement a Graph Database Specialized Data-structures Non Storage Native & Indexes Native 8 Graph Databases Evaluation – Matteo Lissandrini

  9. GOAL: UNDERSTAND 9 GRAPH DATABASES PERFORMANCE FACTORS 1 System Architecture Query Workload Data Characteristics OUTCOME 2 Evaluate Pros/Cons of each design decision Identify cause of underperformant operations 9 Graph Databases Evaluation – Matteo Lissandrini

  10. Macro-Benchmarks Micro-Benchmark Our Proposal Example: Goals Goals • Predefined realistic(?) Domain & Application • Applicable over different Domains/Datasets • Study specific Use-Cases • Test Basic & Common Operations Techniques Techniques • Test Complex Operations • Decompose Complex Queries • Queries based on the structure of the data • Identify Ubiquitous Operators and output of previous queries • Test Same Operations under Different Conditions Limitations Advantages • Test query-planner but hides single operator • Domain/Data Independent performance • Generalizable • Domain Specific • Allow identification of Weak Operators 10 Graph Databases Evaluation – Matteo Lissandrini

  11. MICRO-BENCHMARKING GRAPH OPERATIONS CRUD: Create Read Graph Queries: Update Delete Edges & Traversals Insertions, updates, retrievals both Access local structure around the for values stored on nodes and edges , node, verify reachability , as well as and structural elements search for nodes with specific (add/remove/retrieve nodes/edges) structural characteristics 11 Graph Databases Evaluation – Matteo Lissandrini

  12. MICRO-BENCHMARKING GRAPH OPERATIONS CRUD: Create Read Update Delete Insertions, updates, retrievals both for values stored on nodes and edges , and structural elements (add/remove/retrieve nodes/edges) • • Create new node with property P { Name : Value } Find node/edge with specific ID • • Add edge from v 1 to v 2 (plus some properties P ) Find nodes/edges with property P { Name : Value } • • Add property P { Name : Value } to node v or to edge e Find edges with a specific label • Add a new node , and then edges from it to other nodes • • Update Value for property P { Name : Value } Count edges/nodes • • Delete Node/Edge Count distinct edge labels • Delete node property P from node/edge 12 Graph Databases Evaluation – Matteo Lissandrini

  13. MICRO-BENCHMARKING GRAPH OPERATIONS Graph Queries: Edges & Traversals Access local structure around the node, verify reachability , as well as search for nodes with specific structural characteristics • Find nodes directly connected (find all • Find all nodes reachable in K or less steps ( BFS ) incoming/outgoing edges) • Find a list of shortest paths between two nodes • Find only certain connections ( filter by label ) • Degree based search : e.g., high degree nodes, only inbound connections 13 Graph Databases Evaluation – Matteo Lissandrini

  14. # Query 1. g.loadGraphSON("/path") Description OUR FRAMEWORK 2. g.addVertex(p[]) Load dataset into the graph ‘g’ 3. g.addEdge(v1 , v2 , l) Cat Create new node with properties p 4. g.addEdge(v1 , v2 , l , p[]) Add edge � from � 1 to � 2 Selected Operations 5. L v.setProperty(Name, Value) Same as Q.3 , but with properties p 6. e.setProperty(Name, Value) Add property Name = Value to node � 7. g.addVertex(. . . ); g.addEdge(. . . ) Add a new node, and then edges to it Add property Name = Value to edge e 8. g.V.count() C 9. g.E.count() Total number of nodes 10. g.E.label.dedup() Total number of edges 11. g.V.has(Name, Value) Existing edge labels (no duplicates) 12. g.E.has(Name, Value) Nodes with property Name = Value 13. g.E.has(’label’,l) Edges with property Name = Value 14. g.V(id) • Coverage of all the required operations Edges with label l 15. g.E(id) R The node with identifier � d 16. v.setProperty(Name, Value) The edge with identifier � d 17. e.setProperty(Name, Value) • Complex queries can be composed through those Update property Name for vertex � 18. g.removeVertex(id) Update property Name for edge e 19. g.removeEdge(id) Delete node identified by � d 20. v.removeProperty(Name) U • Domain agnostic Delete edge identified by � d 21. e.removeProperty(Name) Remove node property Name from � 22. v.in() Remove edge property Name from e 23. v.out() D Nodes adjacent to � via incoming edges 24. v.both(‘l’) Nodes adjacent to � via outgoing edges 25. v.inE.label.dedup() Nodes adjacent to � via edges labeled l 26. v.outE.label.dedup() Labels of in coming edges of � (no dupl.) 27. v.bothE.label.dedup() Labels of outgoing edges of � (no dupl.) 28. g.V.filter{it.inE.count()>=k} 3 5 d i s t i n c Labels of edges of � (no dupl.) t 29. g.V.filter{it.outE.count()>=k} Nodes of at least k-incoming-degree 30. g.V.filter{it.bothE.count()>=k} C o n c r e Nodes of at least k-outgoing-degree t e 31. g.V.out.dedup() Nodes of at least k-degree 32. v.as(‘i’).both().except(vs) O p Nodes having an incoming edge e r a t o r s .store(j).loop(‘i’) T Nodes reached via breadth-First 33. v.as(‘i’).both(*ls).except(j) .store(vs).loop(‘i’) traversal from � Nodes reached via breadth-First 34. v1.as(’i’).both().except(j).store(j) Unweighted Shortest Path from � 1 to � 2 .loop(’i’){!it.object.equals(v2)} traversal from � on labels � s .retain([v2]).path() 35. Shortest Path on ‘l’ ∗ [ ] d e n o t e s a H a s h M a p Same as Q.34 , but only following label � ; g i s t h e g r a p h ; � a n d e a r e n o d e / e d g e s . 14 Graph Databases Evaluation – Matteo Lissandrini

  15. B a t t e r OUR FRAMEWORK i e s I n c l u d e d Experimental Environment Various Sizes & Domains: Real and Synthetic Datasets Ready-to-go Systems & Configurations Connected Component Degree Most popular systems | V | | E | | L | # Maxim Density Modularity Avg Max � 1 . 34 ∗ 10 − 3 3 . 66 ∗ 10 − 2 already integrated and Yeast 2 . 3 K 7 . 1 K 167 101 2 . 2 K 6 . 1 66 11 1 . 10 ∗ 10 − 6 5 . 45 ∗ 10 − 3 21 . 6 1 . 3 K 23 MiCo 100 K 1 . 1 M 106 1 . 3 K 93 K ready to use 1 . 19 ∗ 10 − 6 9 . 82 ∗ 10 − 1 1 . 9 M 4 . 3 M 424 133 K 1 . 6 M 4 . 3 92 K 48 Frb-O 1 . 20 ∗ 10 − 6 9 . 91 ∗ 10 − 1 Frb-S 0 . 5 M 0 . 3 M 1814 0 . 16 M 20 K 1 . 3 13 K 4 1 . 94 ∗ 10 − 7 7 . 97 ∗ 10 − 1 Frb-M 4 M 3 . 1 M 2912 1 . 1 M 1 . 4 M 1 . 5 139 K 37 3 . 87 ∗ 10 − 8 2 . 12 ∗ 10 − 1 Frb-L 28 . 4 M 31 . 2 M 3821 2 M 23 M 2 . 2 1 . 4 M 33 4 . 43 ∗ 10 − 5 ldbc 184 K 1 . 5 M 15 1 184 K 0 16 . 6 48 K 10 PREVIOUS TESTS ONLY 1M Nodes 15 Graph Databases Evaluation – Matteo Lissandrini

  16. OUR FRAMEWORK Extensibility Reproducible! Common Query Language Easy to add • New Queries • New Systems • New Datasets Plug and Play setup & Controlled Environment 16 Graph Databases Evaluation – Matteo Lissandrini

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend