Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019

(andrew)-[:FREQUENTS]-> 2

How to get good at Cypher tuning Curiosity, patience, flexibility • Sometimes it’s not the query that must bend... • • Know your tools and resources And knowing is half the battle • Learn esoteric secrets key principles of Cypher execution • Row cardinality, cheap vs expensive operations • Practice! • 3

Goal: Do less work Avoid redundant work and operations • Cut out work that’s going to be filtered out anyway • Recognize cheaper ways to do what you want • 4

Curiosity, patience, and flexibility 5

Patience in the fight against... Cypher! (or misunderstandings of its behavior) Sometimes, the planner itself 6

Sometimes it is not the query that must bend... ...maybe you have to change your model… ...maybe you have to use APOC… ...maybe you need to write a stored procedure... 7

Tuning Tools and Resources 8

Reference documentation Neo4j Refcard • Official Cypher manual • Query tuning section • Execution plan descriptions • Cypher knowledge base • 9

Query logs For finding queries already-being used that need tuning Enable in neo4j.conf, successful queries exceeding threshold are logged. Options for finer grain timing breakdowns: Time spent planning, executing, and waiting on locks 2017-11-22 12:38 ... INFO 3 ms: bolt-session bolt johndoe neo4j-javascript/1.4.1 ... 2017-11-22 22:38 ... INFO 61 ms: (planning: 0, cpu: 58, waiting: 0) - 6164496 B - 0 page hits, 1 page faults ... 2017-11-22 12:38 ... INFO 78 ms: (planning: 40, cpu: 74, waiting: 0) - 6347592 B - 0 page hits, 0 page faults ... 2017-11-22 12:38 ... INFO 44 ms: (planning: 9, cpu: 25, waiting: 0) - 1311384 B - 0 page hits, 0 page faults ... 2017-11-22 12:38 ... INFO 6 ms: (planning: 2, cpu: 6, waiting: 0) - 420872 B - 0 page hits, 0 page faults - ... 10

Query log analyzer Graph app in Neo4j Desktop, analyzes a query.log file 11

Graph, schema, and counts overview Graph structure CALL db.schema() OR CALL apoc.meta.graph() Indexes and constraints CALL db.indexes OR :schema Counts store CALL db.stats.retrieve('GRAPH COUNTS') OR CALL apoc.meta.stats() 12

EXPLAIN and PROFILE Prefix before the query, produces query plans EXPLAIN - does not execute the query No db hits, only rough estimate row counts PROFILE - executes the query Includes valuable db hit and row counts 13

How to walk the tree Start node: from result 1. go up always turning left until leaf node Execute down until we reach a branch 2. We have the lhs, now need rhs 3. Go up one on rhs, then repeat from 1 4. 14

Cypher-shell plans - inverted tree Read from bottom-up 1. Start node: from result go 2. down always turning left until leaf node Follow until we reach a 3. branch We have lhs, need rhs 4. Go down one on rhs, 5. repeat from 1 15

DB hits Abstract unit of db work Not comparable between operations Not always something that can be optimized Smoke, let it draw your attention to... 16

Rows Inputs to operations Multiplies out db hits The more rows, the more work Watch for spikes: 44 -> 200 -> 38 17

Key principles of Cypher execution 18

Row Cardinality Operators produce result records/rows • Operators execute for each input record/row • Implications: Reduce rows across the query, reduce total work • MATCHes are not independent • 19

A simple demonstration PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) MATCH (actor:Person)-[:ACTED_IN]->(m) RETURN m.title as title, collect(DISTINCT actor.name) as actors, collect(DISTINCT dir.name) as directors Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 141 total db hits in 1 ms. 20

MATCHes are not independent MATCH (m:Movie {title:’The Matrix’}) MATCH MATCH (p:Person)-[:DIRECTED]->(m) (p:Person)-[:ACTED_IN]->(m) 2 rows 5 rows RETURN m.title as title, collect (DISTINCT p.name) as actors, collect (DISTINCT dir.name) as directors 21

What actually happens (best case) MATCH (m:Movie {title:’The Matrix’}) executed 1 time executed 2 times 2 rows MATCH MATCH (p:Person)-[:DIRECTED]->(m) (p:Person)-[:ACTED_IN]->(m) 10 rows (2 * 5) RETURN m.title as title, collect (DISTINCT p.name) as actors, collect (DISTINCT dir.name) as directors 22

It could be worse MATCH (m:Movie {title:’The Matrix’}) executed 5 times executed 1 times 5 rows MATCH MATCH (p:Person)-[:DIRECTED]->(m) (p:Person)-[:ACTED_IN]->(m) 10 rows (5 * 2) RETURN m.title as title, collect (DISTINCT p.name) as actors, collect (DISTINCT dir.name) as directors 23

Same exact plan PROFILE MATCH (dir:Person)-[:DIRECTED]-> (m:Movie {title:'The Matrix'}) <-[:ACTED_IN]-(actor:Person) RETURN m.title as title, collect(actor.name) as actors, collect(dir.name) as directors Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 141 total db hits in 1 ms. 24

What increases cardinality? MATCHes and OPTIONAL MATCHes • (watch for back-to-back, even WITH-separated) UNWIND operations • Procedure results • (only if they YIELD something) 25

How do we shrink cardinality? Aggregation • (the grouping key will become distinct) WITH DISTINCT … • (applies to the entire row, not just a single variable) LIMIT • (in special cases) (limits ALL rows, not results per row) (WITH on its own does not shrink cardinality) • 26

How do we fix this? PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) MATCH (actor:Person)-[:ACTED_IN]->(m) RETURN m.title as title, collect(actor.name) as actors, collect(dir.name) as directors these are happening too late... 27

Aggregate earlier PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) WITH m, collect(dir.name) as directors MATCH (actor:Person)-[:ACTED_IN]->(m) WITH m, directors, collect(actor.name) as actors RETURN m.title as title, directors, actors Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 101 total db hits in 1 ms. (vs 134) 28

Or use pattern comprehension PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m.title, [(dir:Person)-[:DIRECTED]->(m) | dir] as directors, [(actor:Person)-[:ACTED_IN]->(m) | actor] as actors It’s like an (optional) MATCH and collect Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 94 total db hits in 1 ms. 29

The cost of things...starting node lookup How do you find your starting node in the graph? NodeByLabelScan - Visit every node of a label • If there are property predicates, filter A label must be present in the pattern PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m 30

The most expensive, the one to avoid AllNodesScan - Visit every node in your graph • If there are property predicates, filter Used when no labels are present PROFILE MATCH (m {title:'The Matrix'}) RETURN m 31

The fast, preferred lookup NodeIndexSeek - Lookup via index • Used when a label and at least 1 property present An index must have been created for the label/property combo: CREATE INDEX ON :Movie(title); Variations! WHERE m.title STARTS WITH ‘THE MATRIX’ WHERE m.title IN [‘The Matrix’, ‘Toy Story 4’] PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m 32

The fastest lookup (that you can rarely use) NodeByIdSeek - Lookup via graph id • You must know the graph id ahead of time! Not good to cache outside of Neo4j What if the node was deleted? Also ids of deleted things get reused, so it may be different Could be useful in a multi-query transaction, or within a query PROFILE MATCH (m) WHERE id(m) = 6 RETURN m 33

Fulltext schema index Procedures for fulltext lookup • Named indexes Can index over multiple labels, multiple properties Can create on relationships too! Case insensitive searching, wildcard searching… Can even use custom lucene analyzers PROFILE CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "matrix") YIELD node, score RETURN node.title, node.description, score 34

The costs of access and filtering Properties (most expensive) • WHERE m.released = 1999 Node labels • WHERE actor:Person Relationship types • WHERE type(r) = ‘ACTED_IN’ Relationship type selection • -[:ACTED_IN | DIRECTED]-> 35

Delay access until the node is distinct ... … WITH DISTINCT m WITH m, collect(actor) as actors WHERE m.released = 1999 WHERE m.released = 1999 … ... 36

Wait until filter / LIMIT / fewer rows ... … WITH m, actors WITH m WHERE size(actors) > 5 LIMIT 10 WITH m.released as released WITH m.released as released, ... actors ... 37

If need to use multiple times, project out … WITH m, m.title as title WHERE title IN $listOfTitles OR title CONTAINS ‘Matrix OR size(title) > 10 ... 38

It is sometimes more efficient to project before an expansion … WITH m, m.title as title MATCH (m)-[:ACTED_IN]-(actor) RETURN title, actor.name as actor ... 39

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 - PowerPoint PPT Presentation

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 (andrew)-[:FREQUENTS]-> 2 How to get good at Cypher tuning Curiosity, patience, flexibility Sometimes its not the query that must bend... Know your tools

Cypher Knowledge Graphs slide 1 of 14 Cypher overview Cypher is a family of query languages for

Cypher.PL Prolog Cypher Implementation SoCIM, 10th of May 2017 London Jan Posiadaa

The Cypher Language 2017 Presentation to the LDBC Query Language Task Force Neo Technology Cypher

Intercepting Suspicious Chrome Extension Actions Michael Cypher Department of Computing

Multiple graphs and composable queries in Cypher for Apache Spark Max Kieling openCypher

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Designing Cypher (a graph query language) Narrated by Tobias Lindaaker, Developer at Neo

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

Tuning tuning curves So far: Receptive fields Representation of stimuli Population vectors

A Java Based Interactive Control Design and Tuning Platform Aaron Radke 7/9/3 Aaron Radke

Virtual Memory CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E.

Virtual Memory Programmer can assume he/she has infinite amount of physical memory

Encoder Decoder Models Antonios Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/

BU CS 332 Theory of Computation Lecture 9: Reading: Midterm I review Sipser Ch 0 2.3

Using DNS for Mapping Using DNS for Mapping Host Identifiers to Locators Host Identifiers to

ELECTRONIC COMMUNICATIONS & PUBLIC RECORDS FOR LEGISLATORS Todd Everts Pad McCracken

Freedom of Information Request No: 092-16-17 Safeguarding Training slides:

Preparing Deposit and Service Reports 50-928C, 4/20/E GoToWebinar employer_education@strsoh.org