Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri - - PowerPoint PPT Presentation

graph databases and neo4j
SMART_READER_LITE
LIVE PREVIEW

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri - - PowerPoint PPT Presentation

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra Its not what you know , its who you know. Data volume is increasing but the connection between discrete data is going to increase at even


slide-1
SLIDE 1

Graph Databases and Neo4j

Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra

slide-2
SLIDE 2

Data Relationships and connectedness

  • It’s not what you know, it’s who you
  • know. Data volume is increasing but the

connection between discrete data is going to increase at even faster clip.

  • You have got 2 million Facebook likes to a
  • post. At the same time there are 2000

committed shoppers to your ecommerce

  • site. But that's not enough. Are you drawing

the relationships between Facebook promotions and purchased items from your most loyal shoppers?

slide-3
SLIDE 3

RDBMS

  • Relational database are not effective at handling data

relationships, especially when those relationships are added or adjusted on an ad hoc basis.

  • The schema model of RDBMS is inflexible and it can't keep up with

the increasing volume of data and relationships between them.

  • To compensate, we can try to leave certain columns empty but this

approach requires more code to handle the greater number of exceptions in the data.

  • Moreover the JOIN operation is quite expensive once we start doing

multiple hops across tables.

slide-4
SLIDE 4

In order to discover what products a customer bought, developers would need to write several JOIN tables which significantly slow the performance of the application.

slide-5
SLIDE 5

NOSQL

  • It enabled us to query more and more data at faster speed than

relational database.

  • However, since they are set of disconnected documents it was still

harder to harness data connections properly.

  • Developers add data relationships to NoSQL by using foreign keys. But

it later becomes just as prohibitively expensive as in a RDBMS. These foreign keys have another weak point too: they only “point” in one direction, making reciprocal queries too time-consuming to run.

  • Instead of ACID transactions like in RDBMS, it favours BASE

transactions for scalibility and availability.

slide-6
SLIDE 6

Graph Database

  • Graph databases store data relationships as relationships. This explicit

storage of relationship data means fewer disconnects between the evolving schema and actual database.

  • The flexibility of a graph data model allows us to add new nodes and

relationships without compromising our existing network or expensively migrating the data.

  • With data relationships at their center, graph databases are incredibly

efficient when it comes to query speeds, even for deep and complex queries.

  • It supports ACID properties.
slide-7
SLIDE 7
slide-8
SLIDE 8

Nodes

  • These are the instances of discrete data points. There are five nodes

in the previous slides; meaning as many data points.

  • These nodes have labels. Think about it as types. Person is a label.

There can be subtypes such as student. The node is a person and a student.

  • There are properties for each nodes, like name and date of birth for a
  • person. For a house properties can be surface area, number of rooms

etc.

  • Each node can have different number of properties.
slide-9
SLIDE 9

Relationships

  • The arrows between the nodes are the relationships. These arrows

aka relationships are directional. A person owns a house but doesn't mean the house also owns the person.

  • These relationships have properties too. The 'knows' relationship

between two persons can have a property, 'since when'. Similarly for 'friends' relationship, 'begin date' and 'end date' can be properties.

  • To add a relationship you just need to add a new arrow and write its
  • properties. There can be multiple arrows between two nodes in

either or both direction.

slide-10
SLIDE 10

Intuitiveness

  • White board model is the physical model.

The way you represent data on whiteboard, its exactly how it is stored in database

  • Data is created and maintained in a logical

fashion.

  • Less translation friction, from code to

database calls as well as from business people describing the app requirements to developers.

slide-11
SLIDE 11

Speed

  • Speed naturally increases when data

becomes intuitive.

  • It can quickly deliver results to your

application to take real time decisions.

  • Processes that were performed in batch can

now be performed in real time.

slide-12
SLIDE 12

Agility

  • You can add some constraints to maintain

your data integrity but it also has the flexibility to add or remove data in a fly.

  • Schema optional.
  • Modern graph databases are equipped for

frictionless development and graceful systems maintenance.

slide-13
SLIDE 13
slide-14
SLIDE 14

Leading Graph Databases

slide-15
SLIDE 15
slide-16
SLIDE 16

What is Neo4j

  • Neo4j is a graph database management system.
  • Neo4j is ACID compliant.
  • Neo4j is implemented in JAVA but accessible

from software written in other languages using the Cypher query language.

slide-17
SLIDE 17

The founders

Johan Teleman Emil Eifrem Peter

???

slide-18
SLIDE 18

Neo4j partner in China called We-Yun has built an application atop the Neo4j database that allows Chinese citizens to do a self assessment” by checking to see if they came in contact with a known carrier of the virus that causes Covid-19

Epidemic Search was developed by Neo4j’s Chinese business partner, We-Yun

slide-19
SLIDE 19

ACID compliance

ACID provides principles governing how changes are applied to a database. In a very simplified way, it states:

  • (A) when you do something to change a database the change should

work or fail as a whole

  • (C) the database should remain consistent (this is a pretty broad topic)
  • (I) if other things are going on at the same time they shouldn't be able to

see things mid-update

  • (D) if the system blows up (hardware or software) the database needs to

be able to pick itself back up; and if it says it finished applying an update, it needs to be certain

slide-20
SLIDE 20

ACID compliance

NoSQL Databases were not usually ACID compliant. According to an older Wikipedia article, NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees. The name was an attempt to describe the emergence of a growing number

  • f non-relational, distributed data stores that often did not attempt to

provide ACID guarantees.

slide-21
SLIDE 21

Setting up and using Neo4j sandbox

Go to https://neo4j.com/sandbox/ For a free demo about the software before making a commitment. Neo4j sandbox gives us various datasets with which we can play around.

slide-22
SLIDE 22

Setting up the Neo4j sandbox and using the Fifa women’s world cup as an example dataset, we get redirected to this page

slide-23
SLIDE 23

Using the Cypher code as shown above, we are able to extract the teams that played in the 2019 Fifa Women’s world cup

slide-24
SLIDE 24

Using the Cypher code as shown above, we will show the number of teams that participated in each world cup starting from 1991, China till 2019, France.

slide-25
SLIDE 25

On clicking on the node of Italy, we see 3 options. To the left is unlock node. To the right is hide node and below is expand node.

slide-26
SLIDE 26

Expanding the Italy node gives rise to all the nodes connected to Italy in the database

slide-27
SLIDE 27

Setting up and installing Neo4j

slide-28
SLIDE 28

Setting up and installing Neo4j

slide-29
SLIDE 29

Graph Database Query Language

slide-30
SLIDE 30

Cypher Que uery ry Lang anguage

  • It is a declarative graph query language that allows

for expressive and efficient data querying in a property graph

  • An invention of Andrés Taylor while working for

Neo4j, Inc. in 2011.

  • Cypher was originally intended to be used with the

graph database Neo4j, but was opened up through the openCypher project in October 2015

  • It was designed keeping SQL in mind, but was based
  • n the components and needs of a database built

upon the concepts of graph theory.

slide-31
SLIDE 31

Grap aph Que uery ry Lang anguage (G (GQL)

  • Even though Cypher takes inspiration from SQL, it's not a

standardized graph query langauage

  • With the openCypher project, an effort began to

standardize Cypher as the query language for graph processing.

  • After a series of meeting, there was a consensus to work

towards Cypher becoming a significant input into a wider project for an international standardized Graph Query Language called GQL.

  • In September 2019, the proposal for this was approved.

https://standardsdevelopment.bsigroup.com/projects/9019- 02970 https://www.iso.org/standard/76120.html

slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37

Grap aph Mod

  • del

The above graph can explained simply as: Jennifer likes Graphs. Jennifer is friends with Michael.

Jennifer works for Neo4j. Since Cypher is designed to be human-readable, it’s construct is based on English prose and iconography to make syntax visual and easily understood.

slide-38
SLIDE 38

Cypher Syntax

  • Cypher is based on the Property Graph Model, which
  • rganizes data into nodes and edges (called “relationships” in

Cypher) //node (variable:Label {propertyKey: 'propertyValue'}) //relationship

  • [variable: RELATIONSHIP_TYPE]->

//relationship can also have properties

  • [variable:RELATIONSHIP_TYPE{properkey:'propertyValue'}]->
  • It also depicts patterns of nodes and relationships and

filters those patterns based on labels and properties. //Cypher pattern (node1:LabelA)-[rel1:RELATIONSHIP_TYPE]->(node2:LabelB)

slide-39
SLIDE 39

Keywords in in Cypher

  • We need to be able to create, read, update, or

delete data in Neo4j, and keywords help us accomplish that functionality.

  • Similar to other query languages, Cypher

contains a variety of keywords for specifying patterns, filtering patterns, and returning results

  • Among those most common are: MATCH,

WHERE, and RETURN. These operate slightly differently than the SELECT and WHERE in SQL; however, they have similar purposes.

slide-40
SLIDE 40

MATCH: The MATCH keyword in Cypher is what searches for an existing node, relationship, label, property, or pattern in the database. If one is familiar with SQL, MATCH works pretty much like SELECT in SQL. RETURN: The RETURN keyword in Cypher specifies what values or results you might want to return from a Cypher

  • query. One can tell Cypher to return nodes, relationships, node and relationship properties, or patterns in

your query results. RETURN is not required when doing write procedures, but is needed for reads. WHERE: WHERE adds constraints to the patterns in a MATCH or OPTIONAL MATCH clause or filters the results of a WITH clause.This is similar to WHERE in SQL WHERE is not a clause in its own right — rather, it’s part of MATCH, OPTIONAL MATCH and WITH. In the case of WITH, WHERE simply filters the results.

slide-41
SLIDE 41

MATCH (p:Person) RETURN p

slide-42
SLIDE 42

MATCH (:Person {name: 'Jennifer'})-[:WORKS_FOR]->(company:Company) RETURN company

slide-43
SLIDE 43

QUERING USING RELATIONSHIPS

If data is stored with one relationship direction, and a query specifies the wrong direction, Cypher will not return any

  • results. In these cases where you may not be sure of direction, it is better to use an undirected relationship and

retrieve some result //data stored with this direction CREATE (p:Person)-[:LIKES]->(t:Technology) //query relationship backwards will not return results MATCH (p:Person)<-[:LIKES]-(t:Technology) While a direction must be inserted to the database, it can be matched with an undirected relationship where Cypher ignores any particular direction and retrieves the relationship and connected nodes, no matter what the physical direction is. //better to query with undirected relationship unless sure of direction MATCH (p:Person)-[:LIKES]-(t:Technology)

slide-44
SLIDE 44

ADDING DATA IN CYPHER (CREATE):

Adding data in Cypher works very similarly to any other data access language’s insert statement. Instead of the INSERT keyword like in SQL, though, Cypher uses CREATE. You can use CREATE to insert nodes, relationships, and patterns into Neo4j.

CREATE (friend:Person {name: 'Mark'}) RETURN friend

slide-45
SLIDE 45

We can also add new relationships using CREATE: MATCH (jennifer:Person {name: 'Jennifer'}) MATCH (mark:Person {name: 'Mark'}) CREATE (jennifer)-[rel:IS_FRIENDS_WITH]->(mark)

slide-46
SLIDE 46

Updating Data with Cypher using SET:

We may have a node or relationship in the data, but you want to modify its properties. You can do this by matching the pattern you want to find and using the SET keyword to add, remove, or update properties MATCH (p:Person {name: 'Jennifer'}) SET p.birthdate = date('1980-01-01') RETURN p We could also update relationships using SET Suppose we want Jennifer’s WORKS_FOR relationship with her company to include the year that she started working there. To do this, you can use similar syntax as above for updating nodes MATCH (:Person {name: 'Jennifer'})-[rel:WORKS_FOR]-(:Company {name: 'Neo4j'}) SET rel.startYear = date({year: 2018}) RETURN rel

slide-47
SLIDE 47

DELETING DATA

Cypher uses the DELETE keyword for deleting nodes and relationships. It is very similar to deleting data in other languages like SQL, with one exception. Because Neo4j is ACID-compliant, you cannot delete a node if it still has

  • relationships. If you could do that, then you might end up with a relationship pointing to nothing and an

incomplete graph.

Deleting a Relationship:

MATCH (j:Person {name: 'Jennifer'})-[r:IS_FRIENDS_WITH]->(m:Person {name: 'Mark'}) DELETE r

Deleting a Node:

We can delete a node which does not have any relationship. MATCH (m:Person {name: 'Mark'}) DELETE m Using the DETACH DELETE syntax tells Cypher to delete any relationships the node has, as well as remove the node itself. MATCH (m:Person {name: 'Mark'}) DETACH DELETE m

slide-48
SLIDE 48

Learn more on CYPHER https://neo4j.com/docs/cypher-manual/current/

slide-49
SLIDE 49

Graph Databases Uses Cases

  • Fraud detection
  • Real-time recommendation engines
  • Master data management (MDM)
  • Network and IT operations
  • Identity and access management (IAM)
slide-50
SLIDE 50

Fraud Detection

  • Banks, merchants and credit card processors companies lose billions of

dollars every year to credit card fraud. Credit card data can be stolen by criminals using a variety of methods.

  • Graph databases can help find credit card thieves faster. By

representing transactions as a graph, we can look for the common denominator in the fraud cases and find the point of origin of the scam.

slide-51
SLIDE 51

Credit Card Fraud Graph Data Model

  • A series of credit card transactions can be represented as a graph. Each

transaction involves two nodes: a person (the customer) and a merchant. The nodes are linked by the transaction itself. A transaction has a date and a status.

  • Legitimate transactions have the status "Undisputed". Fraudulent

transactions are "Disputed".

  • The graph data model represents how the data looks as a graph.
slide-52
SLIDE 52

Credit Card Fraud Graph Data Model

slide-53
SLIDE 53

Credit Card Fraud

  • Identify the Fraudulent Transactions.
  • The criminal we are looking for is involved in a legitimate transaction

during which he captures his victims credit card numbers. After that, he can execute his illegitimate transactions. That means that we not only want the illegitimate transactions but also the transactions happening before the theft.

  • Now we want to find the common denominator. Is there a common

merchant in all of these seemingly innocuous transactions?

slide-54
SLIDE 54

In each instance of a fraudulent transaction, the credit card holder had visited Walmart in the days just prior.

slide-55
SLIDE 55

Ca Case Stu tudy: Walm lmart E-Commerce

  • Walmart deals with almost 250 million customers weekly through its

11,000 stores across 27 countries, and through its retail websites in 10 countries.

  • It wanted to optimize its online recommendations.​
  • As Walmart Software Developer, Marcos explained: “A relational

database wasn’t satisfying our requirements about performance and simplicity, due the complexity of our queries.”

  • Walmart’s eCommerce group chose Neo4j to help Walmart understand

the behavior and preferences of these online buyers.

slide-56
SLIDE 56

Ca Case Stu tudy: Walm lmart E-Commerce

  • By design, graph databases can quickly query customers’ past purchases,

as well as instantly capture any new interests shown in the customers’ current online visit – essential for making real-time recommendations

  • Matching historical and session data in this way is trivial for graph

databases like Neo4j, enabling them to easily outperform relational and

  • ther ‘NoSQL’ data products.
  • Walmart is now using Neo4j to make sense of online shoppers’ behavior

in order to be able to optimize-up and cross-sell major product lines in core markets

slide-57
SLIDE 57

Other Case Studies

  • eBay is using Neo4j to improve the ways shoppers search for the items

they seek. Further details can be found at: https://neo4j.com/case- studies/ebay/

  • US Army is using Neo4j to track equipment maintenance. Further details

can be found at: https://neo4j.com/case-studies/us-army/

  • NBC News Analyzes Hundreds of Thousands of Russian Troll Tweets Using
  • Neo4j. Further details can be found at: https://neo4j.com/case-

studies/nbc-news/