Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri - PowerPoint PPT Presentation

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra

• It’s not what you know , it’s who you know. Data volume is increasing but the connection between discrete data is going to increase at even faster clip. Data • You have got 2 million Facebook likes to a Relationships post. At the same time there are 2000 and committed shoppers to your ecommerce connectedness site. But that's not enough. Are you drawing the relationships between Facebook promotions and purchased items from your most loyal shoppers?

RDBMS • Relational database are not effective at handling data relationships, especially when those relationships are added or adjusted on an ad hoc basis. • The schema model of RDBMS is inflexible and it can't keep up with the increasing volume of data and relationships between them. • To compensate, we can try to leave certain columns empty but this approach requires more code to handle the greater number of exceptions in the data. • Moreover the JOIN operation is quite expensive once we start doing multiple hops across tables.

In order to discover what products a customer bought, developers would need to write several JOIN tables which significantly slow the performance of the application.

NOSQL • It enabled us to query more and more data at faster speed than relational database. • However, since they are set of disconnected documents it was still harder to harness data connections properly. • Developers add data relationships to NoSQL by using foreign keys. But it later becomes just as prohibitively expensive as in a RDBMS. These foreign keys have another weak point too: they only “point” in one direction, making reciprocal queries too time-consuming to run. • Instead of ACID transactions like in RDBMS, it favours BASE transactions for scalibility and availability.

Graph Database • Graph databases store data relationships as relationships. This explicit storage of relationship data means fewer disconnects between the evolving schema and actual database. • The flexibility of a graph data model allows us to add new nodes and relationships without compromising our existing network or expensively migrating the data. • With data relationships at their center, graph databases are incredibly efficient when it comes to query speeds, even for deep and complex queries. • It supports ACID properties.

Nodes • These are the instances of discrete data points. There are five nodes in the previous slides; meaning as many data points. • These nodes have labels. Think about it as types. Person is a label. There can be subtypes such as student. The node is a person and a student. • There are properties for each nodes, like name and date of birth for a person. For a house properties can be surface area, number of rooms etc. • Each node can have different number of properties.

Relationships • The arrows between the nodes are the relationships. These arrows aka relationships are directional. A person owns a house but doesn't mean the house also owns the person. • These relationships have properties too. The 'knows' relationship between two persons can have a property, 'since when'. Similarly for 'friends' relationship, 'begin date' and 'end date' can be properties. • To add a relationship you just need to add a new arrow and write its properties. There can be multiple arrows between two nodes in either or both direction.

• White board model is the physical model. The way you represent data on whiteboard, its exactly how it is stored in database • Data is created and maintained in a logical fashion. Intuitiveness • Less translation friction, from code to database calls as well as from business people describing the app requirements to developers.

• Speed naturally increases when data becomes intuitive. • It can quickly deliver results to your Speed application to take real time decisions. • Processes that were performed in batch can now be performed in real time.

• You can add some constraints to maintain your data integrity but it also has the flexibility to add or remove data in a fly. • Schema optional. Agility • Modern graph databases are equipped for frictionless development and graceful systems maintenance.

Leading Graph Databases

• Neo4j is a graph database management system. • Neo4j is ACID compliant. What is Neo4j • Neo4j is implemented in JAVA but accessible from software written in other languages using the Cypher query language.

??? Peter Emil Eifrem Johan Teleman The founders

Neo4j partner in China called We-Yun has built an application atop the Neo4j database that allows Chinese citizens to do a self assessment” by checking to see if they came in contact with a known carrier of the virus that causes Covid-19 Epidemic Search was developed by Neo4j’s Chinese business partner, We -Yun

ACID compliance ACID provides principles governing how changes are applied to a database. In a very simplified way, it states: • (A) when you do something to change a database the change should work or fail as a whole • (C) the database should remain consistent (this is a pretty broad topic) • (I) if other things are going on at the same time they shouldn't be able to see things mid-update • (D) if the system blows up (hardware or software) the database needs to be able to pick itself back up; and if it says it finished applying an update, it needs to be certain

ACID compliance NoSQL Databases were not usually ACID compliant. According to an older Wikipedia article, NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees. The name was an attempt to describe the emergence of a growing number of non-relational, distributed data stores that often did not attempt to provide ACID guarantees.

Setting up Go to and using https://neo4j.com/sandbox/ For a free demo about the software before making Neo4j a commitment. Neo4j sandbox gives us various datasets with which we can play around. sandbox

Setting up the Neo4j sandbox and using the Fifa women’s world cup as an example dataset, we get redirected to this page

U sing the Cypher code as shown above, we are able to extract the teams that played in the 2019 Fifa Women’s world cup

Using the Cypher code as shown above, we will show the number of teams that participated in each world cup starting from 1991, China till 2019, France.

On clicking on the node of Italy, we see 3 options. To the left is unlock node. To the right is hide node and below is expand node.

Expanding the Italy node gives rise to all the nodes connected to Italy in the database

Setting up and installing Neo4j

Graph Database Query Language

• It is a declarative graph query language that allows for expressive and efficient data querying in a property graph • An invention of Andrés Taylor while working for Cypher Neo4j, Inc. in 2011. • Cypher was originally intended to be used with the Que uery ry graph database Neo4j, but was opened up through the openCypher project in October 2015 Lang anguage • It was designed keeping SQL in mind, but was based on the components and needs of a database built upon the concepts of graph theory.

• Even though Cypher takes inspiration from SQL, it's not a standardized graph query langauage • With the openCypher project, an effort began to standardize Cypher as the query language for graph processing. Grap aph Que uery ry • After a series of meeting, there was a consensus to work towards Cypher becoming a significant input into a wider Lang anguage project for an international standardized Graph Query Language called GQL. (G (GQL) • In September 2019, the proposal for this was approved. https://standardsdevelopment.bsigroup.com/projects/9019- 02970 https://www.iso.org/standard/76120.html

Grap aph Mod odel Since Cypher is designed to be human- readable, it’s construct is based on English prose and iconography to make syntax visual and easily understood. The above graph can explained simply as: J ennifer likes Graphs. Jennifer is friends with Michael. Jennifer works for Neo4j.

• Cypher is based on the Property Graph Model, which organizes data into nodes and edges (called “relationships” in Cypher) //node (variable:Label {propertyKey: 'propertyValue'}) Cypher //relationship Syntax -[variable: RELATIONSHIP_TYPE]-> //relationship can also have properties -[variable:RELATIONSHIP_TYPE{properkey:'propertyValue'}]-> • It also depicts patterns of nodes and relationships and filters those patterns based on labels and properties. //Cypher pattern (node1:LabelA)-[rel1:RELATIONSHIP_TYPE]->(node2:LabelB)

• We need to be able to create, read, update, or delete data in Neo4j, and keywords help us accomplish that functionality. • Similar to other query languages, Cypher Keywords in in contains a variety of keywords for specifying Cypher patterns, filtering patterns, and returning results • Among those most common are: MATCH, WHERE, and RETURN. These operate slightly differently than the SELECT and WHERE in SQL; however, they have similar purposes.

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri - PowerPoint PPT Presentation

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra Its not what you know , its who you know. Data volume is increasing but the connection between discrete data is going to increase at even

An Introduc/on to Neo4j @iansrobinson ian.robinson@neotechnology.com #neo4j Neo4j

Stefan Plantikow, Neo4j 2017 Stefan Plantikow, Neo4j 2 2017 Stefan Plantikow, Neo4j

Data Integration for Neo4j using Kettle Matt Casters, matt.casters@neo4j.com mattcasters Neo4j

Neosemantics - A Linked Data Toolkit for Neo4j Jess Barrasa - Neo4j Jess Barrasa

Neo4j and Spring Data Going from relational databases to databases with relations Michael

Neo4j and graph databases Presented By: Stephanie McIntyre Graph Databases: The Database Model

All-new SDN-RX: Reactive Spring Data Neo4j Spring Data Neo4j / Neo4j-OGM Team Michael Simons

Intro to Neo4j for Developers Jennifer Reif Developer Relations Engineer, Neo4j

Causal Consistency For Large Neo4j Clusters Jim Webber Chief Scientist, Neo4j QCon London Leads

Graph Exploration w/ Neo4j 1

Neo4j Graph Data Science Library An Overview Max Kieling What is the Graph Data Science

Django and Neo4j Domain modeling that kicks ass! twitter: @thobe / #neo4j Tobias Ivarsson

Building Spatial Search Algorithms for Neo4j Craig Taverner Neo4j Cypher and Spatial

Neo4j Spatial - GIS for the rest of us. OSCON Data 2011 #neo4j Peter Neubauer @peterneubauer

Boleslaw Szymanski CLASS PLAN Main Topics Overview of graph databases Installing and

Graphs All The Way Down Building A GraphQL API Backed By A Graph Database William Lyon @lyonwj

Terminology - NPPI & PII Defined Non-public Personal Information (NPPI): Personally

Basic Concepts in Big Data ChengXiang (Cheng) Zhai

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020

1 Privacy: Video Whose Information Is It? What is privacy? Examine a transaction of

Linear Models EECS 442 David Fouhey Fall 2019, University of Michigan

Money Smart Week: Debt Management / Credit Repair State of Connecticut Department of Banking

SCAMS FRAUD and ABUSE! Observations and Lessons Even the best of us can let out guard

Todays Presenters Ken McDonnell Financial Education Program Analyst, Office of Financial

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri - PowerPoint PPT Presentation

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra Its not what you know , its who you know. Data volume is increasing but the connection between discrete data is going to increase at even

An Introduc/on to Neo4j @iansrobinson ian.robinson@neotechnology.com #neo4j Neo4j

Stefan Plantikow, Neo4j 2017 Stefan Plantikow, Neo4j 2 2017 Stefan Plantikow, Neo4j

Data Integration for Neo4j using Kettle Matt Casters, matt.casters@neo4j.com mattcasters Neo4j

Neosemantics - A Linked Data Toolkit for Neo4j Jess Barrasa - Neo4j Jess Barrasa

Neo4j and Spring Data Going from relational databases to databases with relations Michael

Neo4j and graph databases Presented By: Stephanie McIntyre Graph Databases: The Database Model

All-new SDN-RX: Reactive Spring Data Neo4j Spring Data Neo4j / Neo4j-OGM Team Michael Simons

Intro to Neo4j for Developers Jennifer Reif Developer Relations Engineer, Neo4j

Causal Consistency For Large Neo4j Clusters Jim Webber Chief Scientist, Neo4j QCon London Leads

Graph Exploration w/ Neo4j 1

Neo4j Graph Data Science Library An Overview Max Kieling What is the Graph Data Science

Django and Neo4j Domain modeling that kicks ass! twitter: @thobe / #neo4j Tobias Ivarsson

Building Spatial Search Algorithms for Neo4j Craig Taverner Neo4j Cypher and Spatial

Neo4j Spatial - GIS for the rest of us. OSCON Data 2011 #neo4j Peter Neubauer @peterneubauer

Boleslaw Szymanski CLASS PLAN Main Topics Overview of graph databases Installing and

Graphs All The Way Down Building A GraphQL API Backed By A Graph Database William Lyon @lyonwj

Terminology - NPPI &amp; PII Defined Non-public Personal Information (NPPI): Personally

Basic Concepts in Big Data ChengXiang (Cheng) Zhai

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020

1 Privacy: Video Whose Information Is It? What is privacy? Examine a transaction of

Linear Models EECS 442 David Fouhey Fall 2019, University of Michigan

Money Smart Week: Debt Management / Credit Repair State of Connecticut Department of Banking

SCAMS FRAUD and ABUSE! Observations and Lessons Even the best of us can let out guard

Todays Presenters Ken McDonnell Financial Education Program Analyst, Office of Financial

Terminology - NPPI & PII Defined Non-public Personal Information (NPPI): Personally