Hej! @ryguyrg ABOUT ME Developed web apps for 5 years including - - PowerPoint PPT Presentation

hej
SMART_READER_LITE
LIVE PREVIEW

Hej! @ryguyrg ABOUT ME Developed web apps for 5 years including - - PowerPoint PPT Presentation

Hej! @ryguyrg ABOUT ME Developed web apps for 5 years including e-commerce, business workflow, more. Worked at Google for 8 years on Google Apps, Cloud Platform Technologies: Python, Java, BigQuery, Oracle, MySQL, OAuth


slide-1
SLIDE 1

Hej!

@ryguyrg
slide-2
SLIDE 2

ABOUT ME

  • Developed web apps for 5 years
including e-commerce, business workflow, more.
  • Worked at Google for 8 years on
Google Apps, Cloud Platform
  • Technologies: Python, Java,
BigQuery, Oracle, MySQL, OAuth ryan@neo4j.com @ryguyrg
slide-3
SLIDE 3

Carpe Diem Data

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

Why are YOU here today, hopefully

slide-9
SLIDE 9

Power of Graph Algorithms to Understand Your Data

slide-10
SLIDE 10

Power of Graph Algorithms to Understand Your Data

slide-11
SLIDE 11

Graph Algorithms on ACID

slide-12
SLIDE 12

Graph Algorithms on ACID

Graph Algorithms +

ACID-compliant native graph database

slide-13
SLIDE 13 Bank US Account 123 Person A Acme Inc Bank Bahamas Address X H A S _ A C C O U N T R E G I S T E R E D IS_OFFICER_OF WITH LIVES_AT LIVES_AT NODE RELATIONSHIP Person B
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

Anti Money Laundering Anti Money Laundering

slide-17
SLIDE 17

Product Recommendations

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Sports

slide-21
SLIDE 21

Literature

slide-22
SLIDE 22

Urban Planning

slide-23
SLIDE 23

Toxic Waste Management

slide-24
SLIDE 24

Historical Tooling

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

NumPy

OR

OR

slide-31
SLIDE 31

The New World

slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34 Twitter 
 Streaming API Python Tweet Collection 
 (includes user data) Rabbit MQ MongoDB Neo4j R Scripts

  • Graph Stats
  • Community Detection
MySQL Graph .grap hml Tableau Graph Visualization
slide-35
SLIDE 35
  • Hit a wall with igraph/R
  • Need to scale graph algorithms
slide-36
SLIDE 36 Twitter 
 Streaming API Python Tweet Collection 
 (includes user data) Rabbit MQ MongoDB Neo4j R Scripts

  • Graph Stats
  • Community Detection
MySQL Graph .grap hml Tableau Graph Visualization
slide-37
SLIDE 37
slide-38
SLIDE 38

OPTIMIZED FOR

slide-39
SLIDE 39

OLTP

slide-40
SLIDE 40

GREAT FOR

slide-41
SLIDE 41

Subgraph Queries

slide-42
SLIDE 42

WORKING ON

slide-43
SLIDE 43

Global Queries

slide-44
SLIDE 44

IN

slide-45
SLIDE 45
slide-46
SLIDE 46

Neo4j Graph Algorithms

slide-47
SLIDE 47 Neo4j Native Graph Database Analytics Integrations Cypher Query Language Wide Range of APOC Procedures Optimized Graph Algorithms
slide-48
SLIDE 48 Finds the optimal path
  • r evaluates route
availability and quality Evaluates how a group is clustered
  • r partitioned
Determines the importance of distinct nodes in the network
slide-49
SLIDE 49 1.Call as Cypher procedure 2.Pass in specification (Label, Prop, Query) and configuration 3.~.stream variant returns (a lot) of results
 CALL algo.<name>.stream('Label','TYPE',{conf})
 YIELD nodeId, score 4.non-stream variant writes results to graph
 returns statistics
 CALL algo.<name>('Label','TYPE',{conf})

Usage

slide-50
SLIDE 50

What about Virtual Graphs?

Pass in Cypher statement for node- and relationship-lists.
 
 CALL algo.<name>(
 'MATCH ... RETURN id(n)',
 'MATCH (n)-->(m) 
 RETURN id(n) as source, 
 id(m) as target', {graph:'cypher'})
slide-51
SLIDE 51

Supported Centrality Algos

  • PageRank (baseline)
  • Betweeness
  • Closeness
  • Degree
slide-52
SLIDE 52

Supported Centrality Algos

CALL algo.pageRank.stream ('Page', 'LINKS', {iterations:20, dampingFactor:0.85}) YIELD node, score RETURN node, score ORDER BY score DESC LIMIT 20 CALL algo.pageRank('Page', 'LINKS', 
 {iterations:20, dampingFactor:0.85, write: true, writeProperty:"pagerank"}) 
 YIELD nodes, loadMillis, computeMillis, writeMillis
slide-53
SLIDE 53

Supported Pathfinding Algos

  • Single Source Short Path
  • All-Nodes SSP
  • Parallel BFS / DFS
slide-54
SLIDE 54
  • Combine data from sources into one graph
  • Project to relevant subgraphs
  • Enrich data with algorithms
  • Traverse, collect, filter aggregate 


with queries

  • Visualize, Explore, Decide, Export
  • From all APIs and Tools

Goal: Iterate Quickly

slide-55
SLIDE 55

A note on Performance

125 250 375 500 Union-Find (Connected Components) PageRank
  • 251
Seconds 152 416 124 Neo4j is Significantly Faster Spark GraphX results publicly available
  • Amazon EC2 cluster running 64-bit Linux
  • 128 CPUs with 68 GB of memory, 2 hard
disks Neo4j Configuration
  • Physical machine running 64-bit
Linux
  • 128 CPUs with 55 GB RAM, SSDs
Twitter 2010 Dataset
  • 1.47 Billion Relationships
  • 41.65 Million Nodes
GraphX Neo4j Neo4j GraphX
slide-56
SLIDE 56

What’s the Future Look Like?

slide-57
SLIDE 57

Improved Performance & Testing

slide-58
SLIDE 58

Improved Performance & Testing Scaling via Parallel Processing

slide-59
SLIDE 59

Scaling Across the Cluster

slide-60
SLIDE 60

THANK YOU!

ryan@neo4j.com @ryguyrg