Formal Semantics of SQL (and Cypher) Paolo Guagliardo SQL - - PowerPoint PPT Presentation

formal semantics of sql
SMART_READER_LITE
LIVE PREVIEW

Formal Semantics of SQL (and Cypher) Paolo Guagliardo SQL - - PowerPoint PPT Presentation

Formal Semantics of SQL (and Cypher) Paolo Guagliardo SQL Standard query language for relational databases $30B/year business Implemented in all major RDBMSs (free and commercial) First standardized in 1986 (ANSI) and 1987 (ISO)


slide-1
SLIDE 1

Formal Semantics of SQL

(and Cypher)

Paolo Guagliardo

slide-2
SLIDE 2

SQL

  • Standard query language for relational databases
  • $30B/year business
  • Implemented in all major RDBMSs (free and commercial)
  • First standardized in 1986 (ANSI) and 1987 (ISO)
  • Several revision afterwards

(SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008, SQL:2011, SQL:2016)

“The nice thing about standards is that you have so many to choose from” — Andrew S. Tanenbaum

slide-3
SLIDE 3

How standard is SQL?

SELECT * FROM R WHERE EXISTS ( SELECT * FROM ( SELECT R.A, R.A FROM R ) S )

Both PostgreSQL and Oracle output R

SELECT * FROM ( SELECT R.A, R.A FROM R ) S

PostgreSQL outputs a table with two columns named “A” Oracle throws an ERROR: reference to column “A” is ambiguous

slide-4
SLIDE 4

Who is right?

Let’s have a look at the standard!

  • A. If the <select list> * is simply contained in a <subquery> that is

immediately contained in an <exists predicate>, then the <select list> is equivalent to a <value expression> that is an arbitrary <literal>.

  • B. Otherwise, the <select list> * is equivalent to a <value expression>

sequence in which each <value expression> is a column reference that references a column of T and each column of T is referenced exactly once. The columns are referenced in the ascending sequence

  • f their ordinal position within T.
slide-5
SLIDE 5

… which means

SELECT * FROM R WHERE EXISTS ( SELECT * FROM ( SELECT R.A, R.A FROM R ) S ) SELECT * FROM ( SELECT R.A, R.A FROM R ) S SELECT S.A, S.A FROM ( SELECT R.A, R.A FROM R ) S SELECT R.A FROM R WHERE EXISTS ( SELECT 1 FROM ( SELECT R.A, R.A FROM R ) S )

≡ ≡

slide-6
SLIDE 6

The Need for a Formal Semantics

  • Avoid ambiguity of natural language
  • Clearly defined and not subject to interpretation
  • Easy to understand and implement

Previous attempts

  • Many simplifying assumptions: no bags, no nulls
  • No justification of correctness
slide-7
SLIDE 7

SELECT R.A FROM R WHERE R.A NOT IN ( SELECT S.A FROM S) SELECT R.A FROM R EXCEPT SELECT S.A FROM S SELECT R.A FROM R WHERE NOT EXISTS ( SELECT S.A FROM S WHERE S.A=R.A )

A 1 NULL A NULL

R S

A 1 A 1 NULL A

Answer

slide-8
SLIDE 8

Core SQL fragment

⌧ := (T1, . . . , Tk), := (N1, . . . , Nk), k > 0 ↵ := (A1, . . . , Am), 0 := (N 0

1, . . . , N 0 m), m > 0

Queries: Q := SELECT [DISTINCT] (↵:0 | *) FROM ⌧ : WHERE ✓ | Q (UNION | INTERSECT | EXCEPT) [ ALL ] Q Conditions: ✓ := TRUE | ¯ t (= | 6=) ¯ t | t IS [ NOT ] NULL | ¯ t [ NOT ] IN Q | EXISTS Q | ✓ AND ✓ | ✓ OR ✓ | NOT ✓

Essentially SQL without arithmetic, grouping and aggregation

slide-9
SLIDE 9

Formal Semantics: Challenges

Data model

  • Base relations / query outputs / intermediate results
  • Primitive data manipulation operations

Attribute references

  • Binding rules in subqueries
  • Environment collects and propagates bindings
slide-10
SLIDE 10

Proposed Semantics

JRKD,η = RD J⌧ : KD,η = J(T1, . . . , Tk) : (N1, . . . , Nk)KD,η = N1.JT1KD,η ⇥ · · · ⇥ Nk.JTkKD,η s

FROM

⌧ :

WHERE

✓ {

D,η

=

  • ¯

a 2 J⌧ : KD,η | J✓KD,η;η¯

a = t

t SELECT ⇤

FROM

⌧ :

WHERE

✓ |

D,η

= s

FROM

⌧ :

WHERE

✓ {

D,η

: 1 t SELECT ↵ : 0

FROM

⌧ :

WHERE

✓ |

D,η

= ⇡α s

FROM

⌧ :

WHERE

✓ {

D,η

! : 0 t SELECT DISTINCT ↵ : 0 | ⇤

FROM

⌧ :

WHERE

✓ |

D,η

= " @ t SELECT ↵ : 0 | ⇤

FROM

⌧ :

WHERE

✓ |

D,η

1 A JTRUEKD,η = t JtKD,η = ⇢⌘(A) if t = A t if t 2 C or t = NULL Jt1 = t2KD,η = 8 < : t if Jt1KD,η = Jt2KD,η and Jt1KD,η 6= NULL and Jt2KD,η 6= NULL f if Jt1KD,η 6= Jt2KD,η and Jt1KD,η 6= NULL and Jt2KD,η 6= NULL u if Jt1KD,η = NULL or Jt2KD,η = NULL Jt IS NULLKD,η = ⇢t if JtKD,η = NULL f if JtKD,η 6= NULL Jt IS NOT NULLKD,η = ¬Jt IS NULLKD,η J(t1, . . . tn) = (t0

1, . . . , t0 n)KD,η = n

^

i=1

Jti = t0

iKD,η

J(t1, . . . tn) 6= (t0

1, . . . , t0 n)KD,η = n

_

i=1

Jti 6= t0

iKD,η

J¯ t IN QKD,η = 8 < : t if 9¯ r 2 JQKD,η : J¯ t = ⌫(¯ r)KD,η = t f if 8¯ r 2 JQKD,η : J¯ t = ⌫(¯ r)KD,η = f u if @¯ r 2 JQKD,η : J¯ t = ⌫(¯ r)KD,η = t and 9¯ r 2 JQKD,η : J¯ t = ⌫(¯ r)KD,η 6= f J¯ t NOT IN QKD,η = ¬J¯ t IN QKD,η JEXISTS QKD,η = ⇢t if JQKD,η 6= ? f if JQKD,η = ? J✓1 AND ✓2KD,η = J✓1KD,η ^ J✓2KD,η J✓1 OR ✓2KD,η = J✓1KD,η _ J✓2KD,η JNOT ✓KD,η = ¬J✓KD,η JQ1 UNION ALL Q2KD,η = JQ1KD,η [ JQ2KD,η : `(JQ1K) JQ1 INTERSECT ALL Q2KD,η = JQ1KD,η \ JQ2KD,η : `(JQ1K) JQ1 EXCEPT ALL Q2KD,η = JQ1KD,η JQ2KD,η : `(JQ1K) JQ1 ? Q2KD,η = " JQ1 ? ALL Q2KD,η

  • ,

? 2 {UNION, INTERSECT} JQ1 EXCEPT Q2KD,η = "(JQ1KD,η) JQ2KD,η : `(JQ1K)

  • Fits in one page
  • Non-ambiguous
  • Easy to understand
  • Easy to implement
  • Easy to modify
slide-11
SLIDE 11

Formal Semantics: Validation

  • Cannot prove that semantics is correct
  • Provide sufficient experimental evidence
  • Implemented in Python
  • Validated on 100000+ random SQL queries
slide-12
SLIDE 12

Formal Semantics of Cypher

  • Collaboration between Neo Technology

and the University of Edinburgh

  • Preliminary meeting in December
  • Legal agreements finalized recently
  • Neo Technology sponsors a researcher

(Nadime Francis)

slide-13
SLIDE 13

Challenges

  • Getting the (abstract) data model right
  • Intermediate representation (QUIL?)
  • Identify core fragment
  • Language constantly evolving
  • Follow the footsteps of SQL? (nulls)