APOC Pearls Michael Hunger Developer Relations Engineering, Neo4j - - PowerPoint PPT Presentation
APOC Pearls Michael Hunger Developer Relations Engineering, Neo4j - - PowerPoint PPT Presentation
APOC Pearls Michael Hunger Developer Relations Engineering, Neo4j Follow @mesirii APOC Unicorns Michael Hunger Developer Relations Engineering, Neo4j Follow @mesirii All Images by TeeTurtle.com & Unstable Unicorns Power Up
APOC Unicorns
Michael Hunger Developer Relations Engineering, Neo4j Follow @mesirii
All Images by TeeTurtle.com & Unstable Unicorns
Power Up
Backercorns: https://unstable-unicorns.backerkit.com/hosted_preorders/project_updates?page=4 https://www.kickstarter.com/projects/ramybadie/unstable-unicorns-control-and-chaos-the-back ercorn/posts/2271771
Extending Neo4j
User Defined Procedures let you write custom code that is:
- Written in any JVM language
- Deployed to the Database
- Accessed by applications via Cypher
Extending Neo4j
Neo4j Execution Engine User Defined Procedure Applications Bolt
User Defined Procedures let you write custom code that is:
- Written in any JVM language
- Deployed to the Database
- Accessed by applications via Cypher
APOC History
- My Unicorn Moment
- 3.0 was about to have
User Defined Procedures
- Add the missing utilities
- Grew quickly 50 - 150 - 450
- Active OSS project
- Many contributors
Agenda
why and how of user defined extensions
- procedures, functions, aggregation functions
- history of apoc
- 5 pearls -> come to the training if you want to see more
- apoc.help() + doc & videos
- 1 3x utilities - text, map and collection functions
- 2 aggregation functions
- 3 data integration - load json
- 4 handling large updates - periodic iterate
- 5 graph refactoring
- 6 path expanders
- 7 triggers
- 8 time to live
- 9 graph grouping
- 10 cypher functions
- Neo4j Sandbox
- Neo4j Desktop
- Neo4j Cloud
Available On
- Neo4j Sandbox
- Neo4j Desktop
- Neo4j Cloud
Available On
Install
- Utilities & Converters
- Data Integration
- Import / Export
- Graph Generation / Refactoring
- Transactions / Jobs / TTL
- much more ...
What's in the Box?
- Videos
- Documentation
- Browser Guide
- APOC Training
- Neo4j Community Forum
- apoc.help()
Where can I learn more?
If you learn one thing: call apoc.help("keyword")
APOC Video Series
Youtube Playlist: r.neo4j.com/apoc-videos
APOC Docs
- installation instructions
- videos
- searchable overview table
- detailed explanation
- examples
neo4j-contrib.github.io/neo4j-apoc-procedures
Browser Guide
:play apoc
- live examples
The Pearls - That give you Superpowers
21
Data Integration
22
- Relational / Cassandra
- MongoDB, Couchbase,
ElasticSearch
- JSON, XML, CSV, XLS
- Cypher, GraphML
- ...
Data Integration
apoc.load.json
- load json from web-apis and files
- JSON Path
- streaming JSON
- compressed data
neo4j-contrib.github.io/neo4j-apoc-procedures/#_load_json
WITH "https://api.stackexchange.com/2.2/questions?pagesize=100..." AS url CALL apoc.load.json(url) YIELD value UNWIND value.items AS q MERGE (question:Question {id:q.question_id}) ON CREATE SET question.title = q.title, question.share_link = q.share_link, question.favorite_count = q.favorite_count MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name MERGE (owner)-[:ASKED]->(question) FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag)) …
StackOverflow data model
Huge Transactions
28
apoc.periodic.iterate
- driving statement
- executing statement
- batching
- parallel execution
- handling retries
neo4j-contrib.github.io/neo4j-apoc-procedures/#_apoc_periodic_iterate
Run large scale imports
CALL apoc.periodic.iterate( 'LOAD CSV … AS row RETURN row', 'MERGE (n:Node {id:row.id}) SET n.name = row.name', {batchSize:10000})
CALL apoc.periodic.iterate( 'UNWIND range(1,1000000) as id return id', 'CREATE (n:Node {id:id,name:"an "+id})', {batchSize:10000, parallel:true}) YIELD batches, total, timeTaken; +-------------------------------+ | batches | total | timeTaken | +-------------------------------+ | 100 | 1000000 | 1 | +-------------------------------+ 1 row available after 1868 ms, consumed after another 0 ms
Run large scale imports
Run large scale updates
CALL apoc.periodic.iterate( 'MATCH (n:Person) RETURN n', 'SET n.name = n.firstName + " " + n.lastName', {batchSize:10000, parallel:true})
Utilities
33
Text Functions - apoc.text.*
indexOf, indexesOf split, replace, regexpGroups format capitalize, decapitalize random, lpad, rpad snakeCase, camelCase, upperCase charAt, hexCode base64, md5, sha1,
https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_text_functions
Collection Functions - apoc.coll.*
sum, avg, min,max,stdev, zip, partition, pairs sort, toSet, contains, split indexOf, .different
- ccurrences, frequencies, flatten
disjunct, subtract, union, … set, insert, remove randomItem(s)
https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.4/docs/overview.adoc#collection-functions
Map Functions - apoc.map.*
- .fromNodes, .fromPairs,
.fromLists, .fromValues
- .merge
- .setKey,removeKey
- .clean(map,[keys],[values])
- .groupBy(Multi)
https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.4/docs/overview.adoc#map-functions
JSON - apoc.convert.*
.toJson([1,2,3]) .fromJsonList('[1,2,3]') .fromJsonMap('{"a":42,"b":"foo","c":[1,2,3]}') .toTree([paths],[lowerCaseRels=true]) .getJsonProperty(node,key) .setJsonProperty(node,key,complexValue) (JSON)-[:IS]->(everywhere)-[:LIKE]->(graphs)
Graph Refactoring
38
- .cloneNodes
- .mergeNodes
- .extractNode
- .collapseNode
- .categorize
Relationship Modifications
- .to(rel, endNode)
- .from(rel, startNode)
- .invert(rel)
- .setType(rel, 'NEW-TYPE')
Aggregation Function - apoc.refactor.*
apoc.refactor.mergeNodes
MATCH (n:Person) WITH n.email AS email, collect(n) as people WHERE size(people) > 1 CALL apoc.refactor.mergeNodes(people) YIELD node RETURN node
apoc.create.addLabels
MATCH (n:Movie) CALL apoc.create.addLabels( id(n), [ n.genre ] ) YIELD node REMOVE node.genre RETURN node
Triggers
42
Triggers
CALL apoc.trigger.add( name, statement,{phase:before/after})
- apoc.trigger.pause/resume/list/remove
- Transaction-Event-Handler calls Cypher statement
- parameters:
- createdNodes, assignedNodeProperties, deletedNodes,...
- utility functions to extract entities/properties from update-records
- triggers stored in graph, restored at startup
https://medium.com/neo4j/streaming-graph-loading-with-neo4j-and-apoc-triggers-188ed4dd40d5
Time to Live
44
enable in config: apoc.ttl.enabled=true Label :TTL apoc.date.expire(In)(node, time, unit) Creates Index on :TTL(ttl)
Time To Live TTL
background job (every 60s - configurable) that runs: MATCH (n:TTL) WHERE n.ttl > timestamp() WITH n LIMIT 1000 DET DELETE n
Time To Live TTL
Aggregation Functions
47
Aggregation Function - apoc.agg.*
- more efficient variants of collect(x)[a..b]
- .nth,.first,.last,.slice
- .median(x)
- .percentiles(x,[0.5,0.9])
- .product(x)
- .statistics() provides a full
numeric statistic
Graph Grouping
49
Graph Grouping
MATCH (p:Person) set p.decade = b.born / 10; MATCH (p1:Person)-->()<--(p2:Person) WITH p1,p2,count(*) as c MERGE (p1)-[r:INTERACTED]-(p2) ON CREATE SET r.count = c CALL apoc.nodes.group(['Person'],['decade']) YIELD node, relationship RETURN *;
Graph Grouping
MATCH (p:Person) set p.decade = b.born / 10; MATCH (p1:Person)-->()<--(p2:Person) WITH p1,p2,count(*) as c MERGE (p1)-[r:INTERACTED]-(p2) ON CREATE SET r.count = c CALL apoc.nodes.group(['Person'],['decade']) YIELD node, relationship RETURN *;
Cypher Procedures
52
apoc.custom.asProcedure/asFunction (name,statement, columns, params)
- Register statements as real procedures & functions
- 'custom' namespace prefix
- Pass parameters, configure result columns
- Stored in graph and distributed across cluster
Custom Procedures (WIP)
call apoc.custom.asProcedure('neighbours', 'MATCH (n:Person {name:$name})-->(nb) RETURN neighbour', [['neighbour','NODE']],[['name','STRING']]); call custom.neighbours('Joe') YIELD neighbour;
Custom Procedures (WIP)
Report Issues Contribute!
Ask Questions
neo4j.com/slack community.neo4j.com