An Intro to Graphs Stefan Armbruster
Neo Technology
An Intro to Graphs Stefan Armbruster Neo Technology Agenda - - PowerPoint PPT Presentation
An Intro to Graphs Stefan Armbruster Neo Technology Agenda Introductjon NO-SQL context What is Neo4j? When/why should I use it? Graph Queries Cypher query language Create and query data Technical Overview
An Intro to Graphs Stefan Armbruster
Neo Technology
Agenda
– NO-SQL context – What is Neo4j? – When/why should I use it?
– Cypher query language – Create and query data
– Deployment modes – Java APIs – Other libraries
Introductjon
Relatjonal all the things
The Relatjonal Crossroads
Four NOSQL Categories
arising from the “relational crossroads”
KV CF Doc
Graph
Denormalise Normalise
Four NOSQL Categories
arising from the “relational crossroads”
Denormalise Normalise
What is a graph?
Vertjce Edge
What is a graph?
Node
Relatjonship
http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg
Meet Leonhard Euler
Theory (1736)
Königsberg (Prussia) - 1736
A A B B D D C C
A A B B D D C C
1 2 3 4 7 6 5
What are graphs good for?
Data Complexity
complexity = f(size, semi-structure, connectedness)
Size
complexity = f(size, semi-structure,
connectedness) The Real Complexity
Semi-Structure
Semi-Structure
Email: mark.needham@neotechnology.com Email: m.h.needham@gmail.com T witter: @markhneedham Skype: mk_jnr1984
USER CONTACT
CONTACT_TYPE
FIRST_NAME LAST_NAME USER_ID EMAIL_1 EMAIL_2 TWITTER FACEBOOK SKYPE Mark Needham 315
mark.needham@neotech nology.com m.h.needham@gmail.com
@markhneedhamNULL
mk_jnr1984
complexity = f(size, semi-structure,
connectedness) The Real Complexity
Social Network
Network Impact Analysis
Route Finding
Recommendatjons
Logistjcs
Access Control
Fraud Analysis
Neo4j is a Graph Database
When Should I Use Graph Databases??
domains – Lots of join tables? Connectedness – Lots of sparse tables? Semi-structure
Graph Modeling
Labeled Property Graph Data Model
Relatjonships (contjnued)
Nodes can have more than one relatjonship Self relatjonships are allowed Nodes can be connected by more than one relatjonship
Graph Queries
Querying a Graph
– Contextualized “ego-centric” queries
– Start node(s)
– 2 million+ joins per second
Index-free adjacency
Queries: Patuern Matching
Patuern
Start Node
Patuern
Match
Patuern
Match
Patuern
Match
Patuern
Non-Match
Patuern
Non-Match
Patuern Not anchored to start node
Other models to look at
7 8
htups://github.com/neo4j-contrib/graphgist/wiki
htup://docs.neo4j.org/chunked/milestone/data-modeling- examples.html
Technical Overview
Embedded
Server
High Availability
throughput
– Scale vertjcally for writes
– Every instance is full copy of store
– Master is immediately consistent – Cluster is eventually consistent
Neo4j Architecture
Other Libraries
– Shortest Path – Shortest Weighted Path – A* – Dijkstra – Custom cost evaluators – Available in the core distributjon
– Geospatjal data – 3rd party library – Used in Telco productjon systems – htups://github.com/neo4j/spatjal
Spring Data Neo4j
– Object state persisted to graph and SQL database – Distributed transactjons
Case Studies
Background Business problem
carriers and couriers. Calculate multiple routing
possible routes
same-day delivery, consumer-to-consumer shipping (www.shutl.it) and more predictable delivery times
Solution & Benefits
possible routes in real time for every order
than the prior MySQL solution
to-market & code quality
previously not possible, and to easily extend the platform
Industry: Retail Use case: Retail & C2C Delivery San Francisco & London
Quick & predictable delivery is an important competitive cornerstone
acquired U.K.-based Shutl to form the core of a new delivery service, launching eBay Now ( www.ebay.com/now) prior to Christmas 2013
same-day delivery, with 70% of the market
Background Business problem Solution & Benefits
second screen applications to end-users, advertisers and broadcasters
reinvent TV since the advent of … TV.
implement and query their electronic program guide data
representation
channels/broadcasters/programs does not complicate the model unnecessarily
milliseconds (neo4j 2.0 traversal) Industry: Media Use case: Master Data Management (Television EPG Data)
London, UK
broadcasters and more shows were being added
applications - a key strategic disadvantage in a fast- moving industry
to explode
“make or break” with respect to Zeebox’ offering and market position
Industry: Online Job Search Use case: Social / Recommendatjons
anonymized inside information to job seekers
Business problem
through personal & professional connections
competitive market
Solution & Benefits
their network of Facebook friends
graph
brought online as graph size and load have increased
Person Person Company Company
KNOWS
Person Person Person Person
KNOWS
Company Company
KNOWS W O R K S _ A T WORKS_AT
Neo Technology Confidential
Background
Sausalito, CA
Industry: Communicatjons Use case: Network Management
Background
Business problem
because of the need to model network impacts
resilience during unplanned network outages
for additional redundancy
daily changes to network infrastructure
Solution & Benefits
modeling, aggregation & troubleshooting
network
new applications to access network data
mapping between the real world and the graph
requirements
Router Router Service Service
DEPENDS_ONSwitch Switch Switch Switch Router Router Fiber Link Fiber Link Fiber Link Fiber Link Fiber Link Fiber Link Oceanfloor Cable Oceanfloor Cable
DEPENDS_ON D E P E N D S _ O N DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON LINKED LINKED L I N K E D DEPENDS_ONParis, France
Background Business Problem Solution & Benefits
clustering
routing
much better than relational
Industry: logistics Use case: parcel routing
to any point
Learning more
http://stackoverflow.com/questions/tagged/neo4j
http://groups.google.com/group/neo4j
Free Online Course
htup://www.neo4j.org/learn/online_course
Graph Databases Book
www.graphdatabases.com
Neo4j 2.0 by Michael Hunger
htup://info.neotechnology.com/Neo4j20_de.html
Any questjons?