cypher for apache spark
play

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP - PowerPoint PPT Presentation

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg mats@neotechnology.com opencypher.org | opencypher@googlegroups.com opencypher.org | opencypher@googlegroups.com Cypher for Apache Spark Apache Spark:


  1. Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg mats@neotechnology.com opencypher.org | opencypher@googlegroups.com opencypher.org | opencypher@googlegroups.com

  2. Cypher for Apache Spark ● Apache Spark: computational platform (OLAP) ● Neo4j: transactional graph database (OLTP) ○ Query language: Cypher Wouldn't it be lovely to be able to execute a Spark job on a Neo4j graph? How do we integrate? What is a graph when it isn't in Neo4j anymore? ==> Cypher is the bridge! opencypher.org | opencypher@googlegroups.com

  3. Schematic dataflow :Cypher :Cypher opencypher.org | opencypher@googlegroups.com

  4. Example use case ● Graph of financial transactions ● Snapshot subgraph of transactions made during last month ● Do computationally heavy graph analytics on transaction patterns ○ Consume results as report (for humans) ○ Feed back results as new data to original graph ○ Deploy results as new graph ● Neo4j still operational for incoming transactions due to analytics off-loaded to Spark ● Fully integrated OLTP + OLAP opencypher.org | opencypher@googlegroups.com

  5. Apache Spark -- overview / characteristics ● DataFrames are abstractions of tables ○ Based of RDD (Resilient Distributed Dataset) ○ SQL type system deployed in a non-type safe way (Scala code) ● SQL and API that compiles to lazily executed plans ○ Catalyst plan optimiser ● Distributed architecture for scalability opencypher.org | opencypher@googlegroups.com

  6. Key developments ● Extend Cypher with the ability to return graphs ○ Cypher becomes closed over graphs ○ True compositionality of queries ● Modelling dynamic Cypher type system on strict table-based, SQL-aligned Spark DataFrames ○ Using DataFrames to make use of Catalyst optimiser ○ No support for type inheritance (compare Cypher's ANY type) opencypher.org | opencypher@googlegroups.com

  7. Key developments -- type system ● Represent entities as flat maps ○ One column per property and label / rel type ○ Requires exact type information of all properties ➢ Acquired during import of graph ➢ Read-only setting allows immutable schema opencypher.org | opencypher@googlegroups.com

  8. Key developments -- return graphs ● Interpret query results as a graph rather than table ○ Round-trip: graph to graph; can execute another query ○ No focus on syntax ● Pipeline of queries lazily evaluated on top of one another ○ Maximum utilisation of Catalyst to reorder operations ● Complementary API for injecting other operations in-between queries ○ Based on Spark DataFrame API opencypher.org | opencypher@googlegroups.com

  9. Demo of prototype opencypher.org | opencypher@googlegroups.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend