Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 - - PowerPoint PPT Presentation

tuning cypher
SMART_READER_LITE
LIVE PREVIEW

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 - - PowerPoint PPT Presentation

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 (andrew)-[:FREQUENTS]-> 2 How to get good at Cypher tuning Curiosity, patience, flexibility Sometimes its not the query that must bend... Know your tools


slide-1
SLIDE 1

Tuning Cypher

Andrew Bowman Customer Success Engineer NODES 2019

slide-2
SLIDE 2

2

(andrew)-[:FREQUENTS]->

slide-3
SLIDE 3
  • Curiosity, patience, flexibility
  • Sometimes it’s not the query that must bend...
  • Know your tools and resources
  • And knowing is half the battle
  • Learn esoteric secrets key principles of Cypher execution
  • Row cardinality, cheap vs expensive operations
  • Practice!

3

How to get good at Cypher tuning

slide-4
SLIDE 4
  • Avoid redundant work and operations
  • Cut out work that’s going to be filtered out anyway
  • Recognize cheaper ways to do what you want

4

Goal: Do less work

slide-5
SLIDE 5

Curiosity, patience, and flexibility

5

slide-6
SLIDE 6

Patience in the fight against...

6

Cypher!

(or misunderstandings of its behavior)

Sometimes, the planner itself

slide-7
SLIDE 7

Sometimes it is not the query that must bend...

7

...maybe you have to change your model… ...maybe you have to use APOC… ...maybe you need to write a stored procedure...

slide-8
SLIDE 8

Tuning Tools and Resources

8

slide-9
SLIDE 9

Reference documentation

9

  • Neo4j Refcard
  • Official Cypher manual
  • Query tuning section
  • Execution plan descriptions
  • Cypher knowledge base
slide-10
SLIDE 10

Query logs

For finding queries already-being used that need tuning Enable in neo4j.conf, successful queries exceeding threshold are logged. Options for finer grain timing breakdowns: Time spent planning, executing, and waiting on locks

10

2017-11-22 12:38 ... INFO 3 ms: bolt-session bolt johndoe neo4j-javascript/1.4.1 ... 2017-11-22 22:38 ... INFO 61 ms: (planning: 0, cpu: 58, waiting: 0) - 6164496 B - 0 page hits, 1 page faults ... 2017-11-22 12:38 ... INFO 78 ms: (planning: 40, cpu: 74, waiting: 0) - 6347592 B - 0 page hits, 0 page faults ... 2017-11-22 12:38 ... INFO 44 ms: (planning: 9, cpu: 25, waiting: 0) - 1311384 B - 0 page hits, 0 page faults ... 2017-11-22 12:38 ... INFO 6 ms: (planning: 2, cpu: 6, waiting: 0) - 420872 B - 0 page hits, 0 page faults - ...

slide-11
SLIDE 11

Graph app in Neo4j Desktop, analyzes a query.log file

11

Query log analyzer

slide-12
SLIDE 12

Graph structure CALL db.schema() OR CALL apoc.meta.graph() Indexes and constraints CALL db.indexes OR :schema Counts store CALL db.stats.retrieve('GRAPH COUNTS') OR CALL apoc.meta.stats()

12

Graph, schema, and counts overview

slide-13
SLIDE 13

Prefix before the query, produces query plans EXPLAIN - does not execute the query No db hits, only rough estimate row counts PROFILE - executes the query Includes valuable db hit and row counts

13

EXPLAIN and PROFILE

slide-14
SLIDE 14

14

How to walk the tree

1. Start node: from result go up always turning left until leaf node 2. Execute down until we reach a branch 3. We have the lhs, now need rhs 4. Go up one on rhs, then repeat from 1

slide-15
SLIDE 15

1. Read from bottom-up 2. Start node: from result go down always turning left until leaf node 3. Follow until we reach a branch 4. We have lhs, need rhs 5. Go down one on rhs, repeat from 1

15

Cypher-shell plans - inverted tree

slide-16
SLIDE 16

DB hits

Abstract unit of db work Not comparable between operations Not always something that can be

  • ptimized

Smoke, let it draw your attention to...

16

slide-17
SLIDE 17

Rows

Inputs to operations Multiplies out db hits The more rows, the more work Watch for spikes: 44 -> 200 -> 38

17

slide-18
SLIDE 18

Key principles of Cypher execution

18

slide-19
SLIDE 19

Row Cardinality

19

  • Operators produce result records/rows
  • Operators execute for each input record/row

Implications:

  • Reduce rows across the query, reduce total work
  • MATCHes are not independent
slide-20
SLIDE 20

A simple demonstration

20

PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) MATCH (actor:Person)-[:ACTED_IN]->(m) RETURN m.title as title, collect(DISTINCT actor.name) as actors, collect(DISTINCT dir.name) as directors

Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 141 total db hits in 1 ms.

slide-21
SLIDE 21

MATCH (p:Person)-[:ACTED_IN]->(m) MATCH (p:Person)-[:DIRECTED]->(m)

21

MATCHes are not independent

MATCH (m:Movie {title:’The Matrix’}) RETURN m.title as title, collect(DISTINCT p.name) as actors, collect(DISTINCT dir.name) as directors

2 rows 5 rows

slide-22
SLIDE 22

MATCH (p:Person)-[:DIRECTED]->(m) MATCH (p:Person)-[:ACTED_IN]->(m)

22

What actually happens (best case)

MATCH (m:Movie {title:’The Matrix’}) RETURN m.title as title, collect(DISTINCT p.name) as actors, collect(DISTINCT dir.name) as directors

2 rows 10 rows (2 * 5) executed 2 times executed 1 time

slide-23
SLIDE 23

MATCH (p:Person)-[:DIRECTED]->(m) MATCH (p:Person)-[:ACTED_IN]->(m)

23

It could be worse

MATCH (m:Movie {title:’The Matrix’}) RETURN m.title as title, collect(DISTINCT p.name) as actors, collect(DISTINCT dir.name) as directors

10 rows (5 * 2) 5 rows executed 1 times executed 5 times

slide-24
SLIDE 24

Same exact plan

24

PROFILE MATCH (dir:Person)-[:DIRECTED]-> (m:Movie {title:'The Matrix'}) <-[:ACTED_IN]-(actor:Person) RETURN m.title as title, collect(actor.name) as actors, collect(dir.name) as directors

Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 141 total db hits in 1 ms.

slide-25
SLIDE 25
  • MATCHes and OPTIONAL MATCHes

(watch for back-to-back, even WITH-separated)

  • UNWIND operations
  • Procedure results

(only if they YIELD something)

25

What increases cardinality?

slide-26
SLIDE 26
  • Aggregation

(the grouping key will become distinct)

  • WITH DISTINCT …

(applies to the entire row, not just a single variable)

  • LIMIT

(in special cases) (limits ALL rows, not results per row)

  • (WITH on its own does not shrink cardinality)

26

How do we shrink cardinality?

slide-27
SLIDE 27

How do we fix this?

27

PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) MATCH (actor:Person)-[:ACTED_IN]->(m) RETURN m.title as title, collect(actor.name) as actors, collect(dir.name) as directors these are happening too late...

slide-28
SLIDE 28

Aggregate earlier

28

PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) WITH m, collect(dir.name) as directors MATCH (actor:Person)-[:ACTED_IN]->(m) WITH m, directors, collect(actor.name) as actors RETURN m.title as title, directors, actors

Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 101 total db hits in 1 ms. (vs 134)

slide-29
SLIDE 29

Or use pattern comprehension

29

PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m.title,

[(dir:Person)-[:DIRECTED]->(m) | dir] as directors, [(actor:Person)-[:ACTED_IN]->(m) | actor] as actors It’s like an (optional) MATCH and collect

Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 94 total db hits in 1 ms.

slide-30
SLIDE 30

How do you find your starting node in the graph?

  • NodeByLabelScan - Visit every node of a label

If there are property predicates, filter A label must be present in the pattern

PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m

30

The cost of things...starting node lookup

slide-31
SLIDE 31
  • AllNodesScan - Visit every node in your graph

If there are property predicates, filter Used when no labels are present

PROFILE MATCH (m {title:'The Matrix'}) RETURN m

31

The most expensive, the one to avoid

slide-32
SLIDE 32
  • NodeIndexSeek - Lookup via index

Used when a label and at least 1 property present An index must have been created for the label/property combo: CREATE INDEX ON :Movie(title); Variations! WHERE m.title STARTS WITH ‘THE MATRIX’ WHERE m.title IN [‘The Matrix’, ‘Toy Story 4’]

PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m

32

The fast, preferred lookup

slide-33
SLIDE 33
  • NodeByIdSeek - Lookup via graph id

You must know the graph id ahead of time! Not good to cache outside of Neo4j What if the node was deleted? Also ids of deleted things get reused, so it may be different Could be useful in a multi-query transaction, or within a query

PROFILE MATCH (m) WHERE id(m) = 6 RETURN m

33

The fastest lookup (that you can rarely use)

slide-34
SLIDE 34
  • Procedures for fulltext lookup

Named indexes Can index over multiple labels, multiple properties Can create on relationships too! Case insensitive searching, wildcard searching… Can even use custom lucene analyzers

PROFILE CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "matrix") YIELD node, score RETURN node.title, node.description, score

34

Fulltext schema index

slide-35
SLIDE 35
  • Properties (most expensive)

WHERE m.released = 1999

  • Node labels

WHERE actor:Person

  • Relationship types

WHERE type(r) = ‘ACTED_IN’

  • Relationship type selection
  • [:ACTED_IN | DIRECTED]->

35

The costs of access and filtering

slide-36
SLIDE 36

... WITH DISTINCT m WHERE m.released = 1999 …

36

Delay access until the node is distinct

… WITH m, collect(actor) as actors WHERE m.released = 1999 ...

slide-37
SLIDE 37

... WITH m, actors WHERE size(actors) > 5 WITH m.released as released, actors ...

37

Wait until filter / LIMIT / fewer rows

… WITH m LIMIT 10 WITH m.released as released ...

slide-38
SLIDE 38

… WITH m, m.title as title WHERE title IN $listOfTitles OR title CONTAINS ‘Matrix OR size(title) > 10 ...

38

If need to use multiple times, project out

slide-39
SLIDE 39

… WITH m, m.title as title MATCH (m)-[:ACTED_IN]-(actor) RETURN title, actor.name as actor ...

39

It is sometimes more efficient to project before an expansion

slide-40
SLIDE 40

… WITH m MATCH (m)-[:ACTED_IN]-(actor:Person) WITH m.id as id, collect(actor) as actors // NO ...

40

If aggregating by a unique property, aggregate by the node instead

… WITH m MATCH (m)-[:ACTED_IN]-(actor:Person) WITH m, collect(actor) as actors WITH m.id as id, actors // YES ...

slide-41
SLIDE 41

Move ORDER BY and LIMIT up, when possible

MATCH (m:Movie)<-[:ACTED_IN]-(actor) WITH m, collect(actor) as actors RETURN m.title as title, actors ORDER BY m.released DESC LIMIT 10

297 total db hits 41

MATCH (m:Movie) WITH m ORDER BY m.released DESC LIMIT 10 MATCH (m)<-[:ACTED_IN]-(actor) WITH m, collect(actor) as actors RETURN m.title as title, actors

138 total db hits

slide-42
SLIDE 42

MATCH (p:Person) RETURN p ORDER BY p.name

42

Index-backed ORDER BY

MATCH (p:Person) WITH p WHERE p.name > '' RETURN p ORDER BY p.name

requires order by indexed property possibility of lookup by indexed property type hint

slide-43
SLIDE 43

MATCH (m:Movie)<-[:ACTED_IN]-(a) RETURN m, count(a) as degree ORDER BY degree DESC LIMIT 3

43

Avoid expansions with degrees

MATCH (m:Movie) RETURN m, size((m)<-[:ACTED_IN]-()) as degree ORDER BY degree DESC LIMIT 3

slide-44
SLIDE 44

Aim to be lazy

  • Leverage node indexes for starting nodes, when possible

Use EXPLAIN to double-check

  • Mind the cardinality

Watch for row spikes, check if it makes sense vs counts Get those rows down! Aggregate early, use pattern comprehensions

  • Be frugal with property access!

Strike when cardinality is low

44

slide-45
SLIDE 45

45

But what about practice?

slide-46
SLIDE 46

Hunger Games Questions for "Tuning Cypher"

1. Easy: Which operation for looking up a starting node is typically fastest?

a. NodeByDirectReference b. NodeIndexSeek c. NodeByPrimaryKey

2. Medium: What is the name of the feature that lets us expand a pattern, apply a WHERE clause, then project the results into a list? 3. Hard: If aggregating with respect to a non-unique node property, you should:

a. Aggregate with respect to the node itself instead of its property, it’s more efficient b. Not aggregate with respect to the node itself, as that may give incorrect results c. Aggregate with respect to the node’s label instead, since label access is cheaper

Answer here: r.neo4j.com/hunger-games