How graph databases started the multi-model revolution
Luca Garulli
Author and CEO @OrientDB
QCon Sao Paulo - March 26, 2015
How graph databases started the multi-model revolution Luca Garulli - - PowerPoint PPT Presentation
How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the last two years alone.
Luca Garulli
Author and CEO @OrientDB
QCon Sao Paulo - March 26, 2015
Welcome to Big Data
Just Data
Order #134
(Order)
Luca
(Provider)
Commodore Amiga 1200 (Product)
Jill
(Customer)
Monitor 40” (Product) Mouse (Product)
Bruno
(Provider)
Just Data
Order #134
(Order)
Luca
(Provider)
Commodore Amiga 1200 (Product)
Jill
(Customer)
Monitor 40” (Product) Mouse (Product)
Bruno
(Provider)
Relationships give data “meaning”
Order #134
(Order)
Luca
(Provider)
Commodore Amiga 1200 (Product)
(Sells) Jill
(Customer)
(Has) (Makes)
Monitor 40” (Product)
(Sells) (Has)
Mouse (Product)
Bruno
(Provider)
(Sells) (Has)
Top NoSQL categories
Key/Value Databases Document Databases Graph Databases Column Databases
Top NoSQL categories
Key/Value Databases Document Databases Graph Databases Column Databases
Joins is the Evil
ID Name
10 John 11 John 24 Mike 28 Mike
ID Address
10 24 10 33 32 44
ID Location
24 Milan 33 London 18 Paris 18 Madrid 44 Moscow Customer CustomerAddress Address
Is this familiar?
A-‑Z A-‑L M-‑Z
Imagine ¡an ¡ ¡ Address ¡Book ¡ where ¡we ¡want ¡to ¡find ¡ Luca’s ¡phone ¡number
Index Lookup: how does it work?
A-‑Z A-‑L M-‑Z A-‑L A-‑D E-‑L M-‑Z M-‑R S-‑Z
Index ¡algorithms ¡are ¡all ¡ similar ¡and ¡based ¡on ¡ balanced ¡trees
Index Lookup: how does it work?
A-‑Z A-‑L M-‑Z A-‑L A-‑D E-‑L M-‑Z M-‑R S-‑Z A-‑D A-‑B C-‑D E-‑L E-‑G H-‑L
Index Lookup: how does it work?
A-‑Z A-‑L M-‑Z A-‑L A-‑D E-‑L M-‑Z M-‑R S-‑Z A-‑D A-‑B C-‑D E-‑L E-‑G H-‑L E-‑G E-‑F G H-‑L H-‑J K-‑L
Index Lookup: how does it work?
Index Lookup: how does it work?
A-‑Z A-‑L M-‑Z A-‑L A-‑D E-‑L M-‑Z M-‑R S-‑Z A-‑D A-‑B C-‑D E-‑L E-‑G H-‑L E-‑G E-‑F G H-‑L H-‑J K-‑L
Luca
Found! ¡ ¡ This ¡lookup ¡took ¡5 ¡steps. ¡ With ¡millions ¡of ¡indexed ¡ records, ¡the ¡tree ¡depth ¡ could ¡be ¡1000’s ¡of ¡levels!
Joins Kill Performance
ID Name
10 John 11 John 24 Mike 28 Mike
ID Address
10 24 10 33 32 44
ID Location
24 Milan 33 London 18 Paris 18 Madrid 44 Moscow Customer CustomerAddress Address
Joins are executed every time you cross relationships Querying million of records joining 3-4 tables could generate billions of combinations
P E R F O R M A N C E DATABASE SIZE
RDBMS performance on traversal
Read: Data is important, but relationships are even more fundamental today
(author of TinkerPop Blueprints)
Luca Sao ¡Paulo Visited
Vertices ¡and ¡Edges ¡can ¡ have ¡properties Vertices ¡are ¡directed
* ¡https://github.com/tinkerpop/blueprints/wiki/Property-‑Graph-‑Model
Sao ¡Paulo ¡
people: ¡12,000,000
Luca ¡
company: ¡ OrientTechnologies
Vertices ¡and ¡Edges ¡can ¡ have ¡properties Vertices ¡and ¡Edges ¡can ¡ have ¡properties
Visited ¡
Luca Sao ¡Paulo Visited ¡
: ¡ 2 1 5
An ¡Edge ¡connects ¡only ¡2 ¡vertices ¡ ¡ Use ¡multiple ¡edges ¡to ¡represent ¡1-‑N ¡ and ¡N-‑M ¡relationships
Worked ¡
: ¡ 2 1 5
*a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB
Luca
Sao ¡Paulo
Visited ¡
: ¡ 2 1 5
#13:55 #15:99 Each element in the Graph has own immutable Record ID #22:11 (Edge) (Vertex) (Vertex) Each element in the Graph has own immutable Record ID Each element in the Graph has own immutable Record ID
Luca
Sao ¡Paulo
Visited ¡
: ¡ 2 1 5
#13:55 #15:99 Connections use persistent pointers
in = #22:11
#22:11 (Edge) (Vertex) (Vertex)
in = #15:99
Luca
Sao ¡Paulo
Visited ¡
: ¡ 2 1 5
#13:55 #15:99
in = #22:11
#22:11 (Edge) (Vertex) (Vertex)
in = #15:99
Luca
Sao ¡Paulo
Visited ¡
: ¡ 2 1 5
#13:55 #15:99
in = #22:11
#22:11 (Edge) (Vertex) (Vertex)
in = #15:99
A Graph Database creates the relationship just once (when the edge is created) VS RDBMS computes the relationship every time you query a database
When you move from a RDBMS to a Graph Database you jump from a O(log N) speed to a near O(1) With a Graph Database, the traversing time is not affected by database size! This is huge in the BigData age
Graph Databases Easily Manage Complex Relationships
No costs to traverse relationships:
John Thriller Comedy Pulp Fiction Mr Bean Theater B Theater A Theater C NYC San Josè Lives in Likes
GraphDB Database Quadrant
Relationships Complexity > Data Complexity >
Relational Key Value Column Graph Document
GraphDB Database Quadrant
Relationships Complexity > Data Complexity >
Relational Key Value Column Graph Document
Oracle (RDBMS) Redis or Memcache (Key/Value) MongoDB (DocDB) Neo4j (GraphDB) Application
ETL
1st Generation NoSQL: Scenario
Primary DB
1st Generation NoSQL: Fact
Oracle (RDBMS) Redis or Memcache (Key/Value) MongoDB (DocDB) Neo4j (GraphDB) Application
ETL
1st Generation NoSQL: Problems
products
is costly to write and maintain
hard to predict
What’s Multi-Model DBMS?
Graph Document Object Key/Value
Multi Model represents the intersection
product
What’s Multi-Model DBMS?
Graph Document Object Key/Value
Multi Model represents the intersection
product
beginning
Relationships give data “meaning”
Order #134
(Order)
Luca
(Provider)
Commodore Amiga 1200 (Product)
(Sells) Jill
(Customer)
(Has) (Makes)
Monitor 40” (Product)
(Sells) (Has)
3 Wheel Mouse (Product)
Bruno
(Provider)
(Sells) (Has)
Multi-Model domain schema
Customer Provider Product
name: string qty: int
Actor
name: string surname: string
Sells
price: decimal
Inherits Edge Legenda:
V
Vertex Makes
Order
number: int date: datetime
Has
price: decimal
`
Vertices and Edges are Documents
{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Jill”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: { “city”:”London", “tags”:”millennial” } }
Jill Order M a k e s
General purpose solution:
Polymorphic queries
Luca
(Provider)
Jill
(Customer)
SELECT * FROM Customer SELECT * FROM Provider SELECT * FROM Actor
Bruno
(Provider)
Bruno
(Provider)
Jill
(Customer)
Luca
(Provider)
Multi-Model complex domains schema
Band Genre Account MusicTaste Location
Likes Performs Inherits Edge Legenda:
V
Vertex Plays
Multi-Model complex domains
Snow Patrol
(Band)
Luca
(Account)
Indie
(Genre) 123, 1st Street Austin, TX (Location)
(Performs) April 7, 2015 9pm-11.30pm
(Likes) Jill
(Account)
(Likes) (Likes) Rock
(Genre)
(Likes) (Plays)
Multi-Model Database Quadrant
Relationships Complexity > Data Complexity >
Relational Key Value Column Graph Multi-Model Document
There are a few DBMSs that claim to be Multi-Model, but they do not have a true Graph Engine. The “Graph” is only a layer on top
Under the hood they do JOINs, which means traversal time is affected by database size.
Meet OrientDB
With a true Graph, Document, Key/Value and Object Oriented engine
FEATURES ORIENTDB)) MONGODB NEO4J MYSQL) (RDBMS) Operational Database X X X Graph Database X X Document Database X X Object-Oriented Concepts X Schema-full, Schema-less, Schema mix X User and Role & Record Level Security X Record Level Locking X X X SQL X X ACID Transaction X X X Relationships (Linked Documents) X X X Custom Data Types X X X Embedded Documents X X Multi-Master Zero Configuration Replication X Sharding X X Server Side Functions X X X Native HTTP Rest/ JSON X X Embeddable with No Restrictions X
OrientDB features
for Graph DB: Gremlin language and Blueprints API
PHP, .NET, Perl, C/C++ and more
API & Standards
Availability and Integrity
multi-statement transactions Master Node Master Node
C C C C C C C Multi-master Replication
Scalability and Performance
Discovery to Simplify Ops
Master Node Master Node
C C C C C C C
Auto- Discovered Node
Some numbers
50,000
Downloads per Month from 200+ countries.
70+
Committers contributing to the product
1000s
Users from SMBs to Fortune 10 Companies.
17+
Years of Research have been put in the product
A Bright Future
Graph DBMS increased their popularity by 500% within the last 2 years Document DBMS are the 3rd fastest growing category
Some of Our Customers
Get Started for Free
OrientDB Community Edition is FREE for any purpose (Apache 2 license) Udemy Getting Started Training is ★★★★★ and Free
http://www.orientechnologies.com/getting-started
OrientDB Enterprise is Free for Development
Thank you. Ask your questions on Twitter for the Big Data Panel using #QCONBIGDATA Luca Garulli @lgarulli