Busy Developer Dr. Jim Webber Chief Scientist, Neo Technology - - PowerPoint PPT Presentation

busy developer
SMART_READER_LITE
LIVE PREVIEW

Busy Developer Dr. Jim Webber Chief Scientist, Neo Technology - - PowerPoint PPT Presentation

Session code: 6191 A Little Graph Theory for the Busy Developer Dr. Jim Webber Chief Scientist, Neo Technology @jimwebber Roadmap Imprisoned data Graph models Graph theory Local properties, global behaviours Predictive


slide-1
SLIDE 1

A Little Graph Theory for the Busy Developer

  • Dr. Jim Webber

Chief Scientist, Neo Technology @jimwebber

Session code: 6191

slide-2
SLIDE 2

Roadmap

  • Imprisoned data
  • Graph models
  • Graph theory

– Local properties, global behaviours – Predictive techniques

  • Graph matching

– Predictive, real-time analytics for fun and profit

  • Fin
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

http://www.flickr.com/photos/crazyneighborlady/355232758/

slide-6
SLIDE 6

http://gallery.nen.gov.uk/image82582-.html

slide-7
SLIDE 7

http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale

slide-8
SLIDE 8

Aggregate-Oriented Data

http://martinfowler.com/bliki/AggregateOrientedDatabase.html

“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences. This is why aggregate-oriented stores talk so much about map-reduce.”

slide-9
SLIDE 9
slide-10
SLIDE 10

complexity = f(size, connectedness, uniformity)

slide-11
SLIDE 11

http://www.bbc.co.uk/london/travel/downloads/tube_map.html

slide-12
SLIDE 12

Property graphs

  • Property graph model:

– Nodes with properties – Named, directed relationships with properties – Relationships have exactly one start and end node

  • Which may be the same node
slide-13
SLIDE 13

Property Graph Model

slide-14
SLIDE 14

Property Graph Model

slide-15
SLIDE 15

Property Graph Model

name: the Doctor age: 907 species: Time Lord first name: Rose late name: Tyler vehicle: tardis model: Type 40

slide-16
SLIDE 16

Property graphs are very whiteboard-friendly

slide-17
SLIDE 17

http://blogs.adobe.com/digitalmarketing/analytics/predictive-analytics/predictive-analytics-and-the-digital-marketer/

slide-18
SLIDE 18

http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg

Meet Leonhard Euler

  • Swiss mathematician
  • Inventor of Graph

Theory (1736)

slide-19
SLIDE 19

http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

slide-20
SLIDE 20
slide-21
SLIDE 21

Triadic Closure

name: Kyle name: Stan name: Kenny

slide-22
SLIDE 22

Triadic Closure

name: Kyle name: Stan name: Kenny name: Kyle name: Stan name: Kenny

FRIEND

slide-23
SLIDE 23

Structural Balance

name: Cartman name: Craig name: Tweek

slide-24
SLIDE 24

Structural Balance

name: Cartman name: Craig name: Tweek name: Cartman name: Craig name: Tweek

FRIEND

slide-25
SLIDE 25

Structural Balance

name: Cartman name: Craig name: Tweek name: Cartman name: Craig name: Tweek

ENEMY

slide-26
SLIDE 26

Structural Balance

name: Kyle name: Stan name: Kenny name: Kyle name: Stan name: Kenny

FRIEND

slide-27
SLIDE 27

Structural Balance is a key predictive technique

And it’s domain-agnostic

slide-28
SLIDE 28

Allies and Enemies

UK Germany France Russia Italy Austria

slide-29
SLIDE 29

Allies and Enemies

UK Germany France Russia Italy Austria

slide-30
SLIDE 30

Allies and Enemies

UK Germany France Russia Italy Austria

slide-31
SLIDE 31

Allies and Enemies

UK Germany France Russia Italy Austria

slide-32
SLIDE 32

Allies and Enemies

UK Germany France Russia Italy Austria

slide-33
SLIDE 33

Allies and Enemies

UK Germany France Russia Italy Austria

slide-34
SLIDE 34

Predicting WWI

[Easley and Kleinberg]

slide-35
SLIDE 35

Strong Triadic Closure

It if a node has strong relationships to two neighbours, then these neighbours must have at least a weak relationship between them. [Wikipedia]

slide-36
SLIDE 36

Triadic Closure

(weak relationship)

name: Kenny name: Stan name: Cartman

slide-37
SLIDE 37

Triadic Closure

(weak relationship)

name: Kenny name: Stan name: Cartman name: Kenny name: Stan name: Cartman

FRIEND 50%

slide-38
SLIDE 38

Weak relationships

  • Relationships can have “strength” as well as

intent

– Think: weighting on a relationship in a property graph

  • Weak links play another super-important

structural role in graph theory

– They bridge neighbourhoods

slide-39
SLIDE 39

Local Bridges

FRIEND

name: Kenny name: Stan name: Kyle

FRIEND FRIEND

name: Sally name: Bebe name: Wendy

FRIEND FRIEND 50%

name: Cartman

FRIEND ENEMY

slide-40
SLIDE 40

Local Bridge Property

“If a node A in a network satisfies the Strong Triadic Closure Property and is involved in at least two strong relationships, then any local bridge it is involved in must be a weak relationship.” [Easley and Kleinberg]

slide-41
SLIDE 41

University Karate Club

slide-42
SLIDE 42

Graph Partitioning

  • (NP) Hard problem

– Recursively remove the spanning links between dense regions – Or recursively merge nodes into ever larger “subgraph” nodes – Choose your algorithm carefully – some are better than others for a given domain

  • Can use to (almost exactly) predict the

break up of the karate club!

slide-43
SLIDE 43

University Karate Clubs

slide-44
SLIDE 44
slide-45
SLIDE 45

Cypher

  • Declarative graph pattern matching language

– “SQL for graphs” – Columnar results

  • Supports graph matching queries

– And aggregation, ordering and limit, etc. – Mutation

slide-46
SLIDE 46

Cypher is Declarative

  • Imperative

– follow relationship – breadth-first vs depth- first – explicit algorithm

  • Declarative

– specify starting point – specify desired outcome – algorithm adaptable – based on query

slide-47
SLIDE 47

Cypher is a pattern matching language

A C B

slide-48
SLIDE 48

Un-named Nodes & Rels

() --> ()

slide-49
SLIDE 49

Un-named Relationship

(A) --> (B)

B A

slide-50
SLIDE 50

ASCII Art Patterns

A -[:LOVES]-> B

LOVES

B A

slide-51
SLIDE 51

ASCII Art Patterns

A --> B --> C

C B A

slide-52
SLIDE 52

ASCII Art Patterns

A --> B --> C, A --> C A --> B --> C <-- A

A C B

slide-53
SLIDE 53

Variable Length Paths

A -[*]-> B

A B A B A B

slide-54
SLIDE 54

Optional Relationships

A -[?]-> B

A B

slide-55
SLIDE 55

Example Query

  • The top 5 most frequently appearing

companions:

start doctor=node:characters(character = 'Doctor') match (doctor)<-[:COMPANION_OF]-(companion)

  • [:APPEARED_IN]->(episode)

return companion.character, count(episode)

  • rder by count(episode) desc

limit 5 Start node from index Subgraph pattern Accumulates rows by episode Limit returned rows

slide-56
SLIDE 56
slide-57
SLIDE 57

Firstname: Mickey Surname: Smith DoB: 19781006 SKU: 5e175641 Product: Badgers Nadgers Ale SKU: 2555f258 Product: Peewee Pilsner Category: beer SKU: 49d102bc Product: Baby Dry Nights Category: nappies Category: baby Category: alcoholic drinks SKU: 49d102bc Product: XBox 360 Category: consumer electronics Category: console

BOUGHT BOUGHT MEMBER_OF MEMBER_OF MEMBER_OF MEMBER_OF MEMBER_OF

slide-58
SLIDE 58
slide-59
SLIDE 59

Firstname: * Surname: * DoB: 1996 > x > 1972 Category: beer Category: nappies

BOUGHT

Category: game console

slide-60
SLIDE 60

Firstname: * Surname: * DoB: 1996 > x > 1972 Category: beer Category: nappies

!BOUGHT

Category: game console

slide-61
SLIDE 61

(beer) (nappies) (console) (daddy) () () ()

slide-62
SLIDE 62

Flatten the graph

(daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies) (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer) (daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console)

slide-63
SLIDE 63

Wrap in a Cypher MATCH clause

MATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies), (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer), (daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console)

slide-64
SLIDE 64

Cypher WHERE clause

MATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies), (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer), (daddy)-[b:BOUGHT]->()-[:MEMBER_OF]->(console) WHERE b is null

slide-65
SLIDE 65

Full Cypher query

START beer=node:categories(category=‘beer’), nappies=de:categories(category=‘nappies’), xbox=node:products(product=‘xbox 360’) MATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer), (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies), (daddy)-[b?:BOUGHT]->(xbox) WHERE b is null RETURN distinct daddy

slide-66
SLIDE 66

Results

==> +---------------------------------------------+ ==> | daddy | ==> +---------------------------------------------+ ==> | Node[15]{name:"Rory Williams",dob:19880121} | ==> +---------------------------------------------+ ==> 1 row ==> 6 ms ==> neo4j-sh (0)$

slide-67
SLIDE 67
slide-68
SLIDE 68

What are graphs good for?

  • Recommendations
  • Business intelligence
  • Social computing
  • Geospatial
  • MDM
  • Systems management
  • Web of things
  • Genealogy
  • Time series data
  • Product catalogue
  • Web analytics
  • Scientific computing (especially bioinformatics)
  • Indexing your slow RDBMS
  • And much more!
slide-69
SLIDE 69

Free O’Reilly eBook!

Visit: http://GraphDatabases.com

slide-70
SLIDE 70

Thanks for listening

Neo4j: http://neo4j.org Neo Technology: http://neotechnology.com Me: @jimwebber

slide-71
SLIDE 71

Neo4j Meetup in Hilversum Next Week