Busy Developer Dr. Jim Webber Chief Scientist, Neo Technology - - PowerPoint PPT Presentation

busy developer
SMART_READER_LITE
LIVE PREVIEW

Busy Developer Dr. Jim Webber Chief Scientist, Neo Technology - - PowerPoint PPT Presentation

A Little Graph Theory for the Busy Developer Dr. Jim Webber Chief Scientist, Neo Technology @jimwebber Roadmap Imprisoned data Graph models Graph theory Local properties, global behaviours Predictive techniques Graph


slide-1
SLIDE 1

A Little Graph Theory for the Busy Developer

  • Dr. Jim Webber

Chief Scientist, Neo Technology @jimwebber

slide-2
SLIDE 2

Roadmap

  • Imprisoned data
  • Graph models
  • Graph theory

– Local properties, global behaviours – Predictive techniques

  • Graph matching

– Predictive, real-time analytics for fun and profit

  • Fin
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

http://www.flickr.com/photos/crazyneighborlady/355232758/

slide-6
SLIDE 6

http://gallery.nen.gov.uk/image82582-.html

slide-7
SLIDE 7

http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale

slide-8
SLIDE 8

Aggregate-Oriented Data

http://martinfowler.com/bliki/AggregateOrientedDatabase.html

“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences. This is why aggregate-oriented stores talk so much about map-reduce.”

slide-9
SLIDE 9
slide-10
SLIDE 10

complexity = f(size, connectedness, uniformity)

slide-11
SLIDE 11

http://www.bbc.co.uk/london/travel/downloads/tube_map.html

slide-12
SLIDE 12

Property graphs

  • Property graph model:

– Nodes with properties – Named, directed relationships with properties – Relationships have exactly one start and end node

  • Which may be the same node
slide-13
SLIDE 13

Property Graph Model

slide-14
SLIDE 14

Property Graph Model

slide-15
SLIDE 15

Property Graph Model

name: the Doctor age: 907 species: Time Lord first name: Rose late name: Tyler vehicle: tardis model: Type 40

slide-16
SLIDE 16

Labeled Property Graph Model

(Neo4j 2.0)

name: the Doctor age: 907 species: Time Lord first name: Rose late name: Tyler vehicle: tardis model: Type 40 Label:human Label:timelord

slide-17
SLIDE 17

Property graphs are very whiteboard-friendly

slide-18
SLIDE 18

http://blogs.adobe.com/digitalmarketing/analytics/predictive-analytics/predictive-analytics-and-the-digital-marketer/

slide-19
SLIDE 19

http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg

Meet Leonhard Euler

  • Swiss mathematician
  • Inventor of Graph

Theory (1736)

slide-20
SLIDE 20

http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

slide-21
SLIDE 21
slide-22
SLIDE 22

Triadic Closure

name: Kyle name: Stan name: Kenny

slide-23
SLIDE 23

Triadic Closure

name: Kyle name: Stan name: Kenny name: Kyle name: Stan name: Kenny

FRIEND

slide-24
SLIDE 24

Structural Balance

name: Cartman name: Craig name: Tweek

slide-25
SLIDE 25

Structural Balance

name: Cartman name: Craig name: Tweek name: Cartman name: Craig name: Tweek

FRIEND

slide-26
SLIDE 26

Structural Balance

name: Cartman name: Craig name: Tweek name: Cartman name: Craig name: Tweek

ENEMY

slide-27
SLIDE 27

Structural Balance

name: Kyle name: Stan name: Kenny name: Kyle name: Stan name: Kenny

FRIEND

slide-28
SLIDE 28

Structural Balance is a key predictive technique

And it’s domain-agnostic

slide-29
SLIDE 29

Allies and Enemies

UK Germany France Russia Italy Austria

slide-30
SLIDE 30

Allies and Enemies

UK Germany France Russia Italy Austria

slide-31
SLIDE 31

Allies and Enemies

UK Germany France Russia Italy Austria

slide-32
SLIDE 32

Allies and Enemies

UK Germany France Russia Italy Austria

slide-33
SLIDE 33

Allies and Enemies

UK Germany France Russia Italy Austria

slide-34
SLIDE 34

Allies and Enemies

UK Germany France Russia Italy Austria

slide-35
SLIDE 35

Predicting WWI

[Easley and Kleinberg]

slide-36
SLIDE 36

Strong Triadic Closure

It if a node has strong relationships to two neighbours, then these neighbours must have at least a weak relationship between them. [Wikipedia]

slide-37
SLIDE 37

Triadic Closure

(weak relationship)

name: Kenny name: Stan name: Cartman

slide-38
SLIDE 38

Triadic Closure

(weak relationship)

name: Kenny name: Stan name: Cartman name: Kenny name: Stan name: Cartman

FRIEND 50%

slide-39
SLIDE 39

Weak relationships

  • Relationships can have “strength” as well as

intent

– Think: weighting on a relationship in a property graph

  • Weak links play another super-important

structural role in graph theory

– They bridge neighbourhoods

slide-40
SLIDE 40

Local Bridges

FRIEND

name: Kenny name: Stan name: Kyle

FRIEND FRIEND

name: Sally name: Bebe name: Wendy

FRIEND FRIEND 50%

name: Cartman

FRIEND ENEMY

slide-41
SLIDE 41

Local Bridge Property

“If a node A in a network satisfies the Strong Triadic Closure Property and is involved in at least two strong relationships, then any local bridge it is involved in must be a weak relationship.” [Easley and Kleinberg]

slide-42
SLIDE 42

University Karate Club

slide-43
SLIDE 43

Graph Partitioning

  • (NP) Hard problem

– Recursively remove the spanning links between dense regions – Or recursively merge nodes into ever larger “subgraph” nodes – Choose your algorithm carefully – some are better than others for a given domain

  • Can use to (almost exactly) predict the

break up of the karate club!

slide-44
SLIDE 44

University Karate Clubs

(predicted by Graph Theory)

9

slide-45
SLIDE 45

University Karate Clubs

(what actually happened!)

slide-46
SLIDE 46
slide-47
SLIDE 47

Cypher

  • Declarative graph pattern matching language

– “SQL for graphs” – Columnar results

  • Supports graph matching queries

– And aggregation, ordering and limit, etc. – Mutation

slide-48
SLIDE 48

Cypher is Declarative

  • Imperative

– follow relationship – breadth-first vs depth- first – explicit algorithm

  • Declarative

– specify starting point – specify desired outcome – algorithm adaptable – based on query

slide-49
SLIDE 49

Cypher is a pattern matching language

A C B

slide-50
SLIDE 50

Un-named Nodes & Rels

() --> ()

slide-51
SLIDE 51

Un-named Relationship

(A) --> (B)

B A

slide-52
SLIDE 52

ASCII Art Patterns

A -[:LOVES]-> B

LOVES

B A

slide-53
SLIDE 53

ASCII Art Patterns

A --> B --> C

C B A

slide-54
SLIDE 54

ASCII Art Patterns

A --> B --> C, A --> C A --> B --> C <-- A

A C B

slide-55
SLIDE 55

Variable Length Paths

A -[*]-> B

A B A B A B

slide-56
SLIDE 56

Optional Relationships

A -[?]-> B

A B

slide-57
SLIDE 57

Example Query

  • The top 5 most frequently appearing

companions:

start doctor=node:characters(character = 'Doctor') match (doctor)<-[:COMPANION_OF]-(companion)

  • [:APPEARED_IN]->(episode)

return companion.character, count(episode)

  • rder by count(episode) desc

limit 5 Start node from index Subgraph pattern Accumulates rows by episode Limit returned rows

slide-58
SLIDE 58
slide-59
SLIDE 59

Firstname: Mickey Surname: Smith DoB: 19781006 SKU: 5e175641 Product: Badgers Nadgers Ale SKU: 2555f258 Product: Peewee Pilsner Category: beer SKU: 49d102bc Product: Baby Dry Nights Category: nappies Category: baby Category: alcoholic drinks SKU: 49d102bc Product: XBox 360 Category: consumer electronics Category: console

BOUGHT BOUGHT MEMBER_OF MEMBER_OF MEMBER_OF MEMBER_OF MEMBER_OF

slide-60
SLIDE 60
slide-61
SLIDE 61

Firstname: * Surname: * DoB: 1996 > x > 1972 Category: beer Category: nappies

BOUGHT

Category: game console

slide-62
SLIDE 62

Firstname: * Surname: * DoB: 1996 > x > 1972 Category: beer Category: nappies

!BOUGHT

Category: game console

slide-63
SLIDE 63

(beer) (nappies) (console) (daddy) () () ()

slide-64
SLIDE 64

Flatten the graph

(daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies) (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer) (daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console)

slide-65
SLIDE 65

Wrap in a Cypher MATCH clause

MATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies), (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer), (daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console)

slide-66
SLIDE 66

Cypher WHERE clause

MATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies), (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer), (daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console) WHERE b is null

slide-67
SLIDE 67

Full Cypher query

START beer=node:categories(category=‘beer’), nappies=de:categories(category=‘nappies’), xbox=node:products(product=‘xbox 360’) MATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer), (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies), (daddy)-[b?:BOUGHT]->(xbox) WHERE b is null RETURN distinct daddy

slide-68
SLIDE 68

Results

==> +---------------------------------------------+ ==> | daddy | ==> +---------------------------------------------+ ==> | Node[15]{name:"Rory Williams",dob:19880121} | ==> +---------------------------------------------+ ==> 1 row ==> 6 ms ==> neo4j-sh (0)$

slide-69
SLIDE 69
slide-70
SLIDE 70

What are graphs good for?

  • Recommendations
  • Business intelligence
  • Social computing
  • Geospatial
  • MDM
  • Systems management
  • Web of things
  • Genealogy
  • Time series data
  • Product catalogue
  • Web analytics
  • Scientific computing (especially bioinformatics)
  • Indexing your slow RDBMS
  • And much more!
slide-71
SLIDE 71

Free O’Reilly eBook!

Visit: http://GraphDatabases.com

slide-72
SLIDE 72

Thanks for listening

Neo4j: http://neo4j.org Neo Technology: http://neotechnology.com Me: @jimwebber

slide-73
SLIDE 73

Neo4j Meetup in Hilversum Next Week