Cypher for Apache Spark Graph processing workloads on OLAP and OLTP - - PowerPoint PPT Presentation

cypher for apache spark
SMART_READER_LITE
LIVE PREVIEW

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP - - PowerPoint PPT Presentation

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg mats@neotechnology.com opencypher.org | opencypher@googlegroups.com opencypher.org | opencypher@googlegroups.com Cypher for Apache Spark Apache Spark:


slide-1
SLIDE 1
  • pencypher.org | opencypher@googlegroups.com
  • pencypher.org | opencypher@googlegroups.com

Cypher for Apache Spark

Graph processing workloads

  • n OLAP and OLTP

Mats Rydberg mats@neotechnology.com

slide-2
SLIDE 2
  • pencypher.org | opencypher@googlegroups.com

Cypher for Apache Spark

  • Apache Spark: computational platform (OLAP)
  • Neo4j: transactional graph database (OLTP)

○ Query language: Cypher

Wouldn't it be lovely to be able to execute a Spark job on a Neo4j graph? How do we integrate? What is a graph when it isn't in Neo4j anymore? ==> Cypher is the bridge!

slide-3
SLIDE 3
  • pencypher.org | opencypher@googlegroups.com

Schematic dataflow

:Cypher :Cypher

slide-4
SLIDE 4
  • pencypher.org | opencypher@googlegroups.com

Example use case

  • Graph of financial transactions
  • Snapshot subgraph of transactions made during last month
  • Do computationally heavy graph analytics on transaction patterns

○ Consume results as report (for humans) ○ Feed back results as new data to original graph ○ Deploy results as new graph

  • Neo4j still operational for incoming transactions due to analytics
  • ff-loaded to Spark
  • Fully integrated OLTP + OLAP
slide-5
SLIDE 5
  • pencypher.org | opencypher@googlegroups.com

Apache Spark -- overview / characteristics

  • DataFrames are abstractions of tables

○ Based of RDD (Resilient Distributed Dataset) ○ SQL type system deployed in a non-type safe way (Scala code)

  • SQL and API that compiles to lazily executed plans

○ Catalyst plan optimiser

  • Distributed architecture for scalability
slide-6
SLIDE 6
  • pencypher.org | opencypher@googlegroups.com

Key developments

  • Extend Cypher with the ability to return graphs

○ Cypher becomes closed over graphs ○ True compositionality of queries

  • Modelling dynamic Cypher type system on strict table-based,

SQL-aligned Spark DataFrames

○ Using DataFrames to make use of Catalyst optimiser ○ No support for type inheritance (compare Cypher's ANY type)

slide-7
SLIDE 7
  • pencypher.org | opencypher@googlegroups.com

Key developments -- type system

  • Represent entities as flat maps

○ One column per property and label / rel type ○ Requires exact type information of all properties

➢ Acquired during import of graph ➢ Read-only setting allows immutable schema

slide-8
SLIDE 8
  • pencypher.org | opencypher@googlegroups.com

Key developments -- return graphs

  • Interpret query results as a graph rather than table

○ Round-trip: graph to graph; can execute another query ○ No focus on syntax

  • Pipeline of queries lazily evaluated on top of one another

○ Maximum utilisation of Catalyst to reorder operations

  • Complementary API for injecting other operations in-between

queries

○ Based on Spark DataFrame API

slide-9
SLIDE 9
  • pencypher.org | opencypher@googlegroups.com

Demo of prototype