#NODES #2k19 Earth (Milky Road), 10/10/2019 larus-ba.it/neo4j - - PowerPoint PPT Presentation

nodes 2k19
SMART_READER_LITE
LIVE PREVIEW

#NODES #2k19 Earth (Milky Road), 10/10/2019 larus-ba.it/neo4j - - PowerPoint PPT Presentation

larus-ba.it/neo4j @AgileLARUS Streaming Graph Data with Kafka Andrea Santurbano / @santand84 #NODES #2k19 Earth (Milky Road), 10/10/2019 larus-ba.it/neo4j @AgileLARUS Agenda Agenda Introduction Partnership Neo4j and Larus What


slide-1
SLIDE 1

larus-ba.it/neo4j @AgileLARUS

Andrea Santurbano / @santand84

#NODES #2k19

Earth (Milky Road), 10/10/2019

Streaming Graph Data with Kafka

slide-2
SLIDE 2

larus-ba.it/neo4j @AgileLARUS

Agenda

slide-3
SLIDE 3

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Agenda

  • Introduction

○ Partnership Neo4j and Larus

  • What is Neo4j Streams?

○ What is Apache Kafla? ○ How we combined Neo4j and Kafla?

  • DEMO

○ Real-time Polyglot Persistence with Elastic, Kafla and Neo4j

  • Hunger Games
slide-4
SLIDE 4

larus-ba.it/neo4j @AgileLARUS

(LARUS)-[:LOVES]->(Neo4j)

slide-5
SLIDE 5

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

WHO ARE WE?

Andrea [:WORKS_AT] [:LOVES] [:INTEGRATOR_LEADER_FOR]

slide-6
SLIDE 6

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

WHO’S LARUS?

LARUS BUSINESS AUTOMATION

  • Founded in 2004
  • Headquartered in Venice, ITALY
  • Delivering services Worldwide
  • Mission: “Bridging the gap between Business and IT”

#1 Solution Partner in Italy since 2013

  • Creator of the Neo4j JDBC Driver
  • Creator of the Neo4j Apache Zeppelin Interpreter
  • Creator of the Neo4j ETL Tool
  • Developed 90+ APOC

VENICE [:BASED_IN]

slide-7
SLIDE 7

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

COLLABORATING FOR NEO4J USERS

2016 Neo4j JDBC Driver 2015 2011 First Spikes in Retail for Articles’ Clustering 2014 2018 Neo4j APOC, ETL, Spark, Zeppelin, Kafla 2019 Kafla commercial, GraphQL

slide-8
SLIDE 8

larus-ba.it/neo4j @AgileLARUS

Widely used open-source, scalable streaming infrastructure

Apache Kafka

slide-9
SLIDE 9

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

What is Apache Kafka?

A DISTRIBUTED STREAMING PLATFORM

Has three key capabilities:

  • Publish and subscribe to streams of records;
  • Store streams of records in a fault-tolerant

durable way;

  • Process streams of records as they occur.
slide-10
SLIDE 10

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

What is Apache Kafka?

HOW IT WORKS?

  • 1. TOPICS: a topic is a category or feed name to

which records are published.

  • 2. PARTITIONS: for each topic, the Kafla cluster

maintains a partitioned, distributed, persistent log

slide-11
SLIDE 11

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

What is Apache Kafka?

HOW IT’S USED?

Kafla is generally used for two classes of applications:

  • Building real-time streaming data pipelines;
  • Building real-time streaming applications.
slide-12
SLIDE 12

larus-ba.it/neo4j @AgileLARUS

Enables Kafka Streaming on Neo4j!

What is Neo4j Streams?

slide-13
SLIDE 13

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

What is Neo4j Streams?

Andrea [:AUTHOR_OF] [:CREATOR_OF]

X

Michael

ENABLES DATA STREAM ON NEO4J

The project is a Neo4j Plugin composed of several parts:

  • Neo4j Streams Change Data Capture;
  • Neo4j Streams Sink;
  • Neo4j Streams Procedures

We also have a Kafla Connect Plugin:

  • Kafla Connect Sink plugin.
slide-14
SLIDE 14

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Benefits

  • Avoid custom "hacky" solutions
  • Deployed by Neo4j Field Engineering
  • Used by many customers (hardened)
  • Continuous development
  • Quick response to issues
  • Officially (enterprise) supported by Confluent and Neo4j through Larus
slide-15
SLIDE 15

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j - Kafka Integration - Use Cases

HOW CAN IT BE USED?

  • write / read data directly from Neo4j operations to Kafla
  • change data capture stream graph changes into larger

architectures, e.g. to feed microservices or other databases

  • exchange data/updates between distinct Neo4j

installations, e.g. from analytics

  • integrate with existing Kafla architectures of customers
  • use other Kafla connectors to offer more Neo4j

integrations

  • build just-in-time data warehouses with Spark & Hadoop
slide-16
SLIDE 16

larus-ba.it/neo4j @AgileLARUS

Stream database changes!

Neo4j Streams: Change Data Capture

slide-17
SLIDE 17

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Change Data Capture

Change data “what”?

In databases, Change Data Capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so an action can be taken using the changed data.

Well suited use-cases?

  • CDC solutions occur most often in data-warehouse environments;
  • Allows to replicate databases without having a/much performance impact on its operation.
slide-18
SLIDE 18

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Change Data Capture

How it works?

Each transaction communicates its changes to our event listener:

  • exposing creation, updates and deletes of Nodes, Relationships and Properties
  • providing before-and-after information
  • provide schema information
  • configuring property filtering for each topic

Those events are sent asynchronously to Kafla, so the commit path should not be influenced by that.

slide-19
SLIDE 19

larus-ba.it/neo4j @AgileLARUS

Ingest data into Neo4j directly from the Stream!

Neo4j Streams: Sink

slide-20
SLIDE 20

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Sink

INGEST YOUR DATA, WITH YOUR RULES

The sink provides several ways in order to ingest data from Kafla:

  • Via Cypher Template
  • Via CDC event published by another Neo4j Instance via the CDC module
  • Via projection of a JSON/AVRO event into Node/Relationship by providing an extraction pattern
  • Via CUD file format

(event)-[:TO]->(graph)

slide-21
SLIDE 21

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Sink

HOW WE MANAGE BAD DATA

The Neo4j Streams Sink module provide a Dead Letter Queue mechanism that if activated re-route all “bad-data” to a configured topic. What we mean for “bad-data”?

  • De-Serialization errors. I.e. bad formatted JSON:

{id: 1, "name": "Andrea", "surname": "Santurbano"}

  • Transient errors while ingesting data into the DB (i.e. MERGE on null values...).
slide-22
SLIDE 22

larus-ba.it/neo4j @AgileLARUS

Interact with Apache Kafka directly from Cypher!

Neo4j Streams: Procedures

slide-23
SLIDE 23

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Streams Procedures

CONSUME/PRODUCE DATA DIRECTLY FROM CYPHER

The Neo4j Streams project comes out with two procedures:

  • streams.publish: allows custom message streaming from Neo4j to the configured environment by

using the underlying configured Producer;

  • streams.consume: allows consuming messages from a given topic.
slide-24
SLIDE 24

larus-ba.it/neo4j @AgileLARUS

Run Neo4j Integration in your Kafka Infrastructure

Confluent Connect Neo4j Plugin

slide-25
SLIDE 25

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Kafka Connect

WHAT IS KAFKA CONNECT?

In open source component of Apache Kafla, is a framework for connecting Kafla with external systems such as databases, key-value stores, search indexes, and file systems.

slide-26
SLIDE 26

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Kafka Connect Sink

HOW IT WORKS?

It works exactly in the same way as the Neo4j Sink plugin so you can provide for each topic your own ingestion setup. You can download it from the Confluent HUB!

slide-27
SLIDE 27

larus-ba.it/neo4j @AgileLARUS

Real-time Polyglot Persistence with Elastic, Kafka and Neo4j

DEMO

slide-28
SLIDE 28

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

RT Polyglot Persistence with Elastic, Kafka & Neo4j

slide-29
SLIDE 29

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

RT Polyglot Persistence with Elastic, Kafka & Neo4j

slide-30
SLIDE 30

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Lessons learned

THE POWER OF THE STREAM!

  • We have seen how to use the CDC in order to

stream transaction events from Neo4j to other systems;

  • We have seen how to use the SINK in order to

ingest data into Neo4j by providing our own business rules;

  • We have seen how to use the Streams

PROCEDURES in order to consume/produce data directly from Cypher.

  • We demonstrated how to create a simple

Polyglot workflow with Apache Kafla Connect

slide-31
SLIDE 31

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Read More

slide-32
SLIDE 32

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

CODE

DEMO CODE:

github.com/conker84/nodes-2k19

NEO4J STREAMS REPOSITORY:

github.com/neo4j-contrib/neo4j-streams

slide-33
SLIDE 33

larus-ba.it/neo4j @AgileLARUS

FEEDBACK

Please use the integration in your organization and share your experience

slide-34
SLIDE 34

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Hunger Games Questions for "Streaming Graph Data with Kafka"

CONSUME/PRODUCE DATA DIRECTLY FROM CYPHER

  • 1. Easy: What is the behaviour of the streams procedures.
  • a. They consume/produce data from/to another Neo4j instance via Bolt
  • b. They consume/produce data from/to Apache Kafla within Neo4j
  • c. They consume/produce data via Amazon Kinesis within Neo4j
  • 2. Medium: How many ingestions ways supports The Sink.
  • a. 4
  • b. 1
  • c. 3
  • 3. Hard: What kind of informations are exposed via the CDC module?

Answer here: r.neo4j.com/hunger-games

slide-35
SLIDE 35

larus-ba.it/neo4j @AgileLARUS

THANKS! @santand84 Questions !?

#NODES #2k19

Earth (Milky Road), 10/10/2019

slide-36
SLIDE 36

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Sink

INGESTION VIA CYPHER TEMPLATE

Configure an import statement for each Kafla topic streams.sink.topic.cypher.<TOPIC>=<CYPHER_STATEMENT> For example: streams.sink.topic.cypher.sales= \ MATCH (c:Customer {id: event.start.id}) \ MATCH (p:Product {id: event.end.id}) \ MERGE (c)-[:PLACED]->(o:Order)-[:FOR]->(p) \ SET o += event.properties

slide-37
SLIDE 37

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Sink

INGESTION VIA CDC EVENT FROM ANOTHER NEO4J INSTANCE

We allow ingesting the data in two ways:

  • The SourceId strategy which merges the nodes/relationships by the CDC event `id` field (it's related to

the Neo4j physical ID) streams.sink.topic.cdc.sourceId=<TOPICS_SEPARATED_BY_SEMICOLON>

  • The Schema strategy which merges the nodes/relationships by the constraints (UNIQUENESS,

NODE_KEY) defined in your graph model streams.sink.topic.cdc.schema=<TOPICS_SEPARATED_BY_SEMICOLON>

slide-38
SLIDE 38

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Sink

INGESTION VIA JSON PROJECTION

You can extract nodes and relationships from a JSON by providing a extraction pattern. Each property can be prefixed with:

  • !: identify the id (could be more than one property), it's *mandatory*
  • -: exclude the property from the extraction
  • Labels can be chained via :
slide-39
SLIDE 39

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Sink

INGESTION VIA JSON PROJECTION - NODE PATTERN EXTRACTION

Given: {"userId": 1, "name": "Andrea", "surname": "Santurbano", "address": {"city": "Venice", "cap": "30100"}} You can transform it into a node by specifying one of these patterns:

Pattern Result

User:Actor{!userId} or User:Actor{!userId,*} (User:Actor{userId: 1, name: 'Andrea', surname: 'Santurbano', `address.city`: 'Venice', `address.cap`: 30100}) User{!userId, surname} (User:Actor{userId: 1, surname: 'Santurbano'}) User{!userId, surname, address.city} (User:Actor{userId: 1, surname: 'Santurbano', `address.city`: 'Venice'}) User{!userId, -address} (User:Actor{userId: 1, name: 'Andrea', surname: 'Santurbano'})

slide-40
SLIDE 40

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Sink

INGESTION VIA JSON PROJECTION - RELATIONSHIP PATTERN EXTRACTION

Given: {"userId": 1, "productId": 100, "price": 10, "currency": "€", "shippingAddress": {"city": "Venice", cap: "30100"}} You can transform it into a relationship by specifying one of these patterns:

Pattern Result

(User{!userId})-[:BOUGHT]->(Product{!productId}) Or (User{!userId})-[:BOUGHT{price, currency}]->(Product{!productId}) (User{userId: 1})-[:BOUGHT{price: 10, currency: '€', `shippingAddress.city`: 'Venice', `shippingAddress.cap`: 30100}]->(Product{productId: 100}) (User{!userId})-[:BOUGHT{price}]->(Product{!productId}) (User{userId: 1})-[:BOUGHT{price: 10}]->(Product{productId: 100})

slide-41
SLIDE 41

LARUS Business Automation Srl Italy’s #1 Neo4j Partner

Neo4j Streams: Sink

INGESTION VIA CUD FILE FORMAT

It’s JSON file that represents Graph Entities (Nodes/Relationships) and how to manage them in term

  • f Create/Update/Delete operations.

{ "op": "merge", "properties": { "foo": "value", "key": 1 }, "ids": {"key": 1, "otherKey": "foo"}, "labels": ["Foo","Bar"], "type": "node", "detach": true } UNWIND [..., {"op": "merge", "properties": {"foo": "value", "key": 1}, "ids": {"key": 1, "otherKey": "foo"}, "labels": ["Foo","Bar"], "type": "node", "detach": true}, ...] AS event MERGE (n:Foo:Bar {key: event.key, otherKey: event.otherKey}) SET n += event.properties