Aggregation semantics Jzsef Marton Budapest University of - - PowerPoint PPT Presentation

aggregation semantics
SMART_READER_LITE
LIVE PREVIEW

Aggregation semantics Jzsef Marton Budapest University of - - PowerPoint PPT Presentation

CIR-2017-219 Aggregation semantics Jzsef Marton Budapest University of Technology and Economics 2017-05-10, oCIM2@London Jzsef Marton - Aggregation semantics 1 Aggregation in openCypher Partition tuples based on values for grouping key


slide-1
SLIDE 1

2017-05-10, oCIM2@London József Marton - Aggregation semantics 1

Aggregation semantics

József Marton Budapest University of Technology and Economics

CIR-2017-219

slide-2
SLIDE 2

2017-05-10, oCIM2@London József Marton - Aggregation semantics 2

Aggregation in openCypher

  • Partition tuples based on values for grouping key
  • Return a single resulting tuple for each partition
  • In openCypher: WITH/RETURN clauses
  • E.g. count nodes in each class (.class property)

MATCH (n) RETURN n.class, count(*)

slide-3
SLIDE 3

2017-05-10, oCIM2@London József Marton - Aggregation semantics 3

Implicit grouping key

  • Result definition of a query (step) defines aggregation
  • Neo4j 3.1 docs tells

RETURN n, count(*) We have two return expressions: n, and count(*). The first, n, is not an aggregate function, and so it will be the grouping key. The latter, count(*) is an aggregate expression.

  • What if mixing aggregate and non-aggregate

expressions, e.g. the weighted sum query

RETURN n.weight * sum(n.value)

slide-4
SLIDE 4

2017-05-10, oCIM2@London József Marton - Aggregation semantics 4

Grouping key selection options

  • 1. grouping key is the tuple built from all variables (*) that appear outside of aggregate functions
  • f a particular WITH/RETURN clause

*: node, relationship, their properties or variables chained from previous query step

Pros: clear in all situations, more flexible than option 2 Cons: would change current Neo4j behavior

  • 2. each item of the expression list in WITH/RETURN forced to contain either

i no aggregate function, or ii single aggregate function at the outermost level (this is the approach in #188, #218). Grouping key is the tuple built from items of type (i), i.e. those w/ no aggregates Pros: in line with current Neo4j behavior and the grouping operator in Ullman‘s Database systems -- The complete book, 2009 Cons: poses restriction on WITH/RETURN clauses, can‘t handle the weighted sum query w/o rewriting as

WITH n.weight as weight, sum(n.value) AS sum_val RETURN weight * sum_val

CIR-2017-219

RETURN n.weight * sum(n.value)

slide-5
SLIDE 5

2017-05-10, oCIM2@London József Marton - Aggregation semantics 5

TODO: Choose

  • Neither option restrict expressiveness

might need some query rewrite

  • Option 1 seems clear and flexible enough for practical queries
  • Option 2 is in Neo4j, but complex aggregation and non-

aggregation expressions might yield counter-intuitive result Posing restrictions on creating complex expressions by mixing aggregations and non-aggregations is a safety net for beginners, but cumbersome for more complex queries.

CIR-2017-219

slide-6
SLIDE 6

2017-05-10, oCIM2@London József Marton - Aggregation semantics 6

Feel the difference

M A T C H ( n ) R E T U R N a b s ( n . w e i g h t ) A S a b s , c

  • u

n t ( * ) A S c n t

Option2 gives:

╒═════╤═════╕ │"abs"│"cnt"│ ╞═════╪═════╡ │"2" │"4" │ ├─────┼─────┤ │"1" │"4" │ ├─────┼─────┤ │"0" │"2" │ └─────┴─────┘

Option1 gives:

╒═════╤═════╕ │"abs"│"cnt"│ ╞═════╪═════╡ │"2" │"2" │ ├─────┼─────┤ │"1" │"2" │ ├─────┼─────┤ │"2" │"2" │ ├─────┼─────┤ │"0" │"2" │ ├─────┼─────┤ │"1" │"2" │ └─────┴─────┘

Input graph: ten nodes: two for each weight -2,-1,0,1,2

Model Opt.2 in Opt1 M A T C H ( n ) W I T H a b s ( n . w e i g h t ) A S a b s , n R E T U R N a b s , c

  • u

n t ( * ) a s c n t

slide-7
SLIDE 7

2017-05-10, oCIM2@London József Marton - Aggregation semantics 7

Let‘s get loud

  • 1. grouping key is the tuple built from all variables (*) that appear outside
  • f aggregate functions of a particular WITH/RETURN clause

*: node, relationship, their properties or variables chained from previous query step

  • 2. each item of the expression list in WITH/RETURN forced to contain either

i no aggregate function, or ii single aggregate function at the outermost level (this is the approach in #188, #218). Grouping key is the tuple built from items of type (i), i.e. those w/ no aggregates

CIR-2017-219

slide-8
SLIDE 8

2017-05-10, oCIM2@London József Marton - Aggregation semantics 8

That‘s all