How graph databases started the multi-model revolution Luca Garulli - - PowerPoint PPT Presentation

how graph databases started the multi model revolution
SMART_READER_LITE
LIVE PREVIEW

How graph databases started the multi-model revolution Luca Garulli - - PowerPoint PPT Presentation

How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the last two years alone.


slide-1
SLIDE 1

How graph databases started the multi-model revolution

Luca Garulli

Author and CEO @OrientDB

QCon Sao Paulo - March 26, 2015

slide-2
SLIDE 2

“90% of the data 
 in the world today 
 has been created 
 in the last two years alone.”

  • IBM

Welcome to Big Data

slide-3
SLIDE 3

Just Data

Order #134

(Order)

Luca

(Provider)

Commodore Amiga 1200 (Product)

Jill

(Customer)

Monitor 40” (Product) Mouse (Product)

Bruno

(Provider)

slide-4
SLIDE 4

Just Data

Order #134

(Order)

Luca

(Provider)

Commodore Amiga 1200 (Product)

Jill

(Customer)

Monitor 40” (Product) Mouse (Product)

Bruno

(Provider)

Data by itself has little value, it’s the relationship between data that gives it incredible value

slide-5
SLIDE 5

Relationships give data “meaning”

Order #134

(Order)

Luca

(Provider)

Commodore Amiga 1200 (Product)

(Sells) Jill

(Customer)

(Has) (Makes)

Monitor 40” (Product)

(Sells) (Has)

Mouse (Product)

Bruno

(Provider)

(Sells) (Has)

slide-6
SLIDE 6

Top NoSQL categories

Key/Value Databases Document Databases Graph Databases Column Databases

slide-7
SLIDE 7

Top NoSQL categories

Key/Value Databases Document Databases Graph Databases Column Databases

slide-8
SLIDE 8

Why do most NoSQL products avoid managing relationships?

slide-9
SLIDE 9

Joins is the Evil

ID Name

10 John 11 John 24 Mike 28 Mike

ID Address

10 24 10 33 32 44

ID Location

24 Milan 33 London 18 Paris 18 Madrid 44 Moscow Customer CustomerAddress Address

Is this familiar?

slide-10
SLIDE 10

Why ¡is ¡the ¡join ¡ so ¡slow?

slide-11
SLIDE 11

A-­‑Z A-­‑L M-­‑Z

Imagine ¡an ¡ ¡ Address ¡Book ¡ where ¡we ¡want ¡to ¡find ¡ Luca’s ¡phone ¡number

Index Lookup: how does it work?

slide-12
SLIDE 12

A-­‑Z A-­‑L M-­‑Z A-­‑L A-­‑D E-­‑L M-­‑Z M-­‑R S-­‑Z

Index ¡algorithms ¡are ¡all ¡ similar ¡and ¡based ¡on ¡ balanced ¡trees

Index Lookup: how does it work?

slide-13
SLIDE 13

A-­‑Z A-­‑L M-­‑Z A-­‑L A-­‑D E-­‑L M-­‑Z M-­‑R S-­‑Z A-­‑D A-­‑B C-­‑D E-­‑L E-­‑G H-­‑L

Index Lookup: how does it work?

slide-14
SLIDE 14

A-­‑Z A-­‑L M-­‑Z A-­‑L A-­‑D E-­‑L M-­‑Z M-­‑R S-­‑Z A-­‑D A-­‑B C-­‑D E-­‑L E-­‑G H-­‑L E-­‑G E-­‑F G H-­‑L H-­‑J K-­‑L

Index Lookup: how does it work?

slide-15
SLIDE 15

Index Lookup: how does it work?

A-­‑Z A-­‑L M-­‑Z A-­‑L A-­‑D E-­‑L M-­‑Z M-­‑R S-­‑Z A-­‑D A-­‑B C-­‑D E-­‑L E-­‑G H-­‑L E-­‑G E-­‑F G H-­‑L H-­‑J K-­‑L

Luca

Found! ¡ ¡ This ¡lookup ¡took ¡5 ¡steps. ¡ With ¡millions ¡of ¡indexed ¡ records, ¡the ¡tree ¡depth ¡ could ¡be ¡1000’s ¡of ¡levels!

slide-16
SLIDE 16

Joins Kill Performance

ID Name

10 John 11 John 24 Mike 28 Mike

ID Address

10 24 10 33 32 44

ID Location

24 Milan 33 London 18 Paris 18 Madrid 44 Moscow Customer CustomerAddress Address

Joins are executed every time you cross relationships Querying million of records joining 3-4 tables could generate billions of combinations

slide-17
SLIDE 17

This is why the database query performance suffers as the database increases in size O(Log N)

slide-18
SLIDE 18

P E R F O R M A N C E DATABASE SIZE

RDBMS performance on traversal

slide-19
SLIDE 19

In a world that’s becoming more connected, we need a better way to store data and manage relationships

Read: Data is important, but relationships are even more fundamental today

slide-20
SLIDE 20

“A graph database is any storage system that provides index-free adjacency”

  • Marko Rodriguez

(author of TinkerPop Blueprints)

slide-21
SLIDE 21

Every developer knows the Relational Model, but who knows the Graph one?

slide-22
SLIDE 22

Back to school: Graph Theory crash course

slide-23
SLIDE 23

Basic Graph

Luca Sao ¡Paulo Visited

slide-24
SLIDE 24

Vertices ¡and ¡Edges ¡can ¡ have ¡properties Vertices ¡are ¡directed

* ¡https://github.com/tinkerpop/blueprints/wiki/Property-­‑Graph-­‑Model

Property Graph Model*

Sao ¡Paulo ¡

people: ¡12,000,000

Luca ¡

company: ¡ OrientTechnologies

Vertices ¡and ¡Edges ¡can ¡ have ¡properties Vertices ¡and ¡Edges ¡can ¡ have ¡properties

Visited ¡

  • n: ¡2015
slide-25
SLIDE 25

Luca Sao ¡Paulo Visited ¡

  • n

: ¡ 2 1 5

An ¡Edge ¡connects ¡only ¡2 ¡vertices ¡ ¡ Use ¡multiple ¡edges ¡to ¡represent ¡1-­‑N ¡ and ¡N-­‑M ¡relationships

Worked ¡

  • n

: ¡ 2 1 5

1-N and N-M Relationships

slide-26
SLIDE 26

Congrats! This is your diploma in «Graph Theory»

slide-27
SLIDE 27

The Graph theory is so simple, yet so powerful

slide-28
SLIDE 28

How does a true* Graph Database manage relationships?

*a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB

slide-29
SLIDE 29

Luca

Sao ¡Paulo

Visited ¡

  • n

: ¡ 2 1 5

#13:55 #15:99 Each element in the Graph has own immutable Record ID #22:11 (Edge) (Vertex) (Vertex) Each element in the Graph has own immutable Record ID Each element in the Graph has own immutable Record ID

slide-30
SLIDE 30

Luca

Sao ¡Paulo

Visited ¡

  • n

: ¡ 2 1 5

#13:55 #15:99 Connections use persistent pointers

  • ut = #22:11

in = #22:11

#22:11 (Edge) (Vertex) (Vertex)

  • ut = #13:55

in = #15:99

slide-31
SLIDE 31

Luca

Sao ¡Paulo

Visited ¡

  • n

: ¡ 2 1 5

#13:55 #15:99

  • ut = #22:11

in = #22:11

#22:11 (Edge) (Vertex) (Vertex)

  • ut = #13:55

in = #15:99

slide-32
SLIDE 32

Luca

Sao ¡Paulo

Visited ¡

  • n

: ¡ 2 1 5

#13:55 #15:99

  • ut = #22:11

in = #22:11

#22:11 (Edge) (Vertex) (Vertex)

  • ut = #13:55

in = #15:99

slide-33
SLIDE 33

A Graph Database creates the relationship just once (when the edge is created) VS RDBMS computes the relationship every time you query a database

slide-34
SLIDE 34

When you move from a RDBMS to a Graph Database you jump from a O(log N) speed to a near O(1) With a Graph Database, the traversing time is not affected by database size! This is huge in the BigData age

slide-35
SLIDE 35

Graph Databases Easily Manage Complex Relationships

No costs to traverse relationships:

  • Recommendation engines
  • Social Applications
  • Spatial Apps
  • Master Data Management
  • Information Clustering

John Thriller Comedy Pulp Fiction Mr Bean Theater B Theater A Theater C NYC San Josè Lives in Likes

slide-36
SLIDE 36

GraphDB Database Quadrant

Relationships Complexity > Data Complexity >

Relational Key Value Column Graph Document

slide-37
SLIDE 37

GraphDB Database Quadrant

Relationships Complexity > Data Complexity >

Relational Key Value Column Graph Document

These were 1st generation NoSQL products, where each tool was

  • nly good at a few use cases
slide-38
SLIDE 38
slide-39
SLIDE 39

Oracle (RDBMS) Redis or Memcache (Key/Value) MongoDB (DocDB) Neo4j (GraphDB) Application

ETL

1st Generation NoSQL: Scenario

Primary DB

slide-40
SLIDE 40

1st Generation NoSQL: Fact

In > 90% of use cases, NoSQL products are used as second DBMS

slide-41
SLIDE 41

Oracle (RDBMS) Redis or Memcache (Key/Value) MongoDB (DocDB) Neo4j (GraphDB) Application

ETL

1st Generation NoSQL: Problems

  • No standard between NoSQL

products

  • Multiple vendors = multiple skills
  • ETL + synchronization code

is costly to write and maintain

  • Performance and Reliability is

hard to predict

slide-42
SLIDE 42

2nd Generation NoSQL is Multi-Model

slide-43
SLIDE 43

What’s Multi-Model DBMS?

Graph Document Object Key/Value

Multi Model represents the intersection

  • f multiple models in just one

product

slide-44
SLIDE 44

What’s Multi-Model DBMS?

Graph Document Object Key/Value

Multi Model represents the intersection

  • f multiple models in just one

product

  • Just one product to learn and maintain
  • Just one vendor relationship to manage
  • No ETL, no synchronization required
  • Performance and Reliability is easy to test from the

beginning

slide-45
SLIDE 45

Relationships give data “meaning”

Order #134

(Order)

Luca

(Provider)

Commodore Amiga 1200 (Product)

(Sells) Jill

(Customer)

(Has) (Makes)

Monitor 40” (Product)

(Sells) (Has)

3 Wheel Mouse (Product)

Bruno

(Provider)

(Sells) (Has)

slide-46
SLIDE 46

Multi-Model domain schema

Customer Provider Product

name: string qty: int

Actor

name: string surname: string

Sells

price: decimal

Inherits Edge Legenda:

V

Vertex Makes

Order

number: int date: datetime

Has

price: decimal

slide-47
SLIDE 47

`

Vertices and Edges are Documents

{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Jill”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: { “city”:”London", “tags”:”millennial” } }

Jill Order M a k e s

General purpose solution:

  • JSON
  • Schema-less
  • Schema-full
  • Schema-hybrid
  • Nested documents
  • Rich indexing and querying
  • Developer friendly
slide-48
SLIDE 48

Polymorphic queries

Luca

(Provider)

Jill

(Customer)

SELECT * FROM Customer SELECT * FROM Provider SELECT * FROM Actor

Bruno

(Provider)

Bruno

(Provider)

Jill

(Customer)

Luca

(Provider)

slide-49
SLIDE 49

Multi-Model complex domains schema

Band Genre Account MusicTaste Location

Likes Performs Inherits Edge Legenda:

V

Vertex Plays

slide-50
SLIDE 50

Multi-Model complex domains

Snow Patrol

(Band)

Luca

(Account)

Indie

(Genre) 123, 1st Street Austin, TX (Location)

(Performs) April 7, 2015 9pm-11.30pm

(Likes) Jill

(Account)

(Likes) (Likes) Rock

(Genre)

(Likes) (Plays)

slide-51
SLIDE 51

Multi-Model Database Quadrant

Relationships Complexity > Data Complexity >

Relational Key Value Column Graph Multi-Model Document

slide-52
SLIDE 52

Multi-Model Solutions

slide-53
SLIDE 53

There are a few DBMSs that claim to be Multi-Model, but they do not have a true Graph Engine. The “Graph” is only a layer on top

  • f the engine.

Under the hood they do JOINs, which means traversal time is affected by database size.

slide-54
SLIDE 54

Meet OrientDB

The First Ever Multi-Model Database Combining Flexibility

  • f Documents with

Connectedness of Graphs

slide-55
SLIDE 55

With a true Graph, Document, Key/Value and Object Oriented engine

slide-56
SLIDE 56

FEATURES ORIENTDB)) MONGODB NEO4J MYSQL) (RDBMS) Operational Database X X X Graph Database X X Document Database X X Object-Oriented Concepts X Schema-full, Schema-less, Schema mix X User and Role & Record Level Security X Record Level Locking X X X SQL X X ACID Transaction X X X Relationships (Linked Documents) X X X Custom Data Types X X X Embedded Documents X X Multi-Master Zero Configuration Replication X Sharding X X Server Side Functions X X X Native HTTP Rest/ JSON X X Embeddable with No Restrictions X

OrientDB features

slide-57
SLIDE 57

DEMO

slide-58
SLIDE 58
  • Support for TinkerPop standard

for Graph DB: Gremlin language and Blueprints API

  • SQL + extensions for graphs
  • JDBC driver to connect any BI tool
  • HTTP/JSON support
  • Drivers in Java, Node.js, Python,

PHP, .NET, Perl, C/C++ and more

API & Standards

slide-59
SLIDE 59

Availability and Integrity

  • Atomic, Consistent, Isolated and Durable (ACID)

multi-statement transactions Master Node Master Node

C C C C C C C Multi-master Replication

slide-60
SLIDE 60

Scalability and Performance

  • Multi-Master Replication, Sharding and Auto-

Discovery to Simplify Ops

  • +200k Tps on Commodity Hardware

Master Node Master Node

C C C C C C C

Auto- Discovered Node

slide-61
SLIDE 61

Some numbers

50,000

Downloads per Month from 200+ countries.

70+

Committers contributing to the product

1000s

Users from SMBs to Fortune 10 Companies.

17+

Years of Research have been put in the product

slide-62
SLIDE 62

A Bright Future

Graph DBMS increased their popularity by 500% within the last 2 years Document DBMS are the 3rd fastest growing category

slide-63
SLIDE 63

Some of Our Customers

slide-64
SLIDE 64

Get Started for Free

OrientDB Community Edition is FREE for any purpose (Apache 2 license) Udemy Getting Started Training is ★★★★★ and Free

http://www.orientechnologies.com/getting-started

OrientDB Enterprise is Free for Development

slide-65
SLIDE 65

Thank you. Ask your questions on Twitter for the Big Data Panel using #QCONBIGDATA Luca Garulli @lgarulli

www.orientdb.com