formal semantics of sql
play

Formal Semantics of SQL (and Cypher) Paolo Guagliardo SQL - PowerPoint PPT Presentation

Formal Semantics of SQL (and Cypher) Paolo Guagliardo SQL Standard query language for relational databases $30B/year business Implemented in all major RDBMSs (free and commercial) First standardized in 1986 (ANSI) and 1987 (ISO)


  1. Formal Semantics of SQL (and Cypher) Paolo Guagliardo

  2. SQL • Standard query language for relational databases • $30B/year business • Implemented in all major RDBMSs (free and commercial) • First standardized in 1986 (ANSI) and 1987 (ISO) • Several revision afterwards (SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008, SQL:2011, SQL:2016) “The nice thing about standards is that you have so many to choose from” — Andrew S. Tanenbaum

  3. How standard is SQL? SELECT * FROM ( SELECT R.A, R.A FROM R ) S PostgreSQL outputs a table with two columns named “ A ” Oracle throws an ERROR : reference to column “ A ” is ambiguous SELECT * FROM R WHERE EXISTS ( SELECT * FROM ( SELECT R.A, R.A FROM R ) S ) Both PostgreSQL and Oracle output R

  4. Who is right? Let’s have a look at the standard! A. If the <select list> * is simply contained in a <subquery> that is immediately contained in an <exists predicate> , then the <select list> is equivalent to a <value expression> that is an arbitrary <literal> . B. Otherwise, the <select list> * is equivalent to a <value expression> sequence in which each <value expression> is a column reference that references a column of T and each column of T is referenced exactly once . The columns are referenced in the ascending sequence of their ordinal position within T.

  5. … which means SELECT * SELECT S.A, S.A ≡ FROM ( SELECT R.A, R.A FROM ( SELECT R.A, R.A FROM R ) S FROM R ) S SELECT * FROM R SELECT R.A FROM R WHERE EXISTS ( WHERE EXISTS ( ≡ SELECT * SELECT 1 FROM ( SELECT R.A, R.A FROM ( SELECT R.A, R.A FROM R ) S FROM R ) S ) )

  6. The Need for a Formal Semantics • Avoid ambiguity of natural language • Clearly defined and not subject to interpretation • Easy to understand and implement Previous attempts • Many simplifying assumptions: no bags , no nulls • No justification of correctness

  7. A A R S 1 NULL Answer NULL SELECT R.A FROM R A EXCEPT 1 SELECT S.A FROM S SELECT R.A FROM R A WHERE R.A NOT IN ( SELECT S.A FROM S) SELECT R.A FROM R A WHERE NOT EXISTS ( 1 SELECT S.A FROM S NULL WHERE S.A=R.A )

  8. Core SQL fragment ⌧ := ( T 1 , . . . , T k ) , � := ( N 1 , . . . , N k ) , k > 0 ↵ := ( A 1 , . . . , A m ) , � 0 := ( N 0 1 , . . . , N 0 m ) , m > 0 Queries: Q := SELECT [ DISTINCT ] ( ↵ : � 0 | * ) FROM ⌧ : � WHERE ✓ | Q ( UNION | INTERSECT | EXCEPT ) [ ALL ] Q Conditions: ✓ := TRUE | ¯ t (= | 6 =) ¯ t | t IS [ NOT ] NULL | ¯ t [ NOT ] IN Q | EXISTS Q | ✓ AND ✓ | ✓ OR ✓ | NOT ✓ Essentially SQL without arithmetic, grouping and aggregation

  9. Formal Semantics: Challenges Data model • Base relations / query outputs / intermediate results • Primitive data manipulation operations Attribute references • Binding rules in subqueries • Environment collects and propagates bindings

  10. Proposed Semantics J R K D, η = R D J ⌧ : � K D, η = J ( T 1 , . . . , T k ) : ( N 1 , . . . , N k ) K D, η = N 1 . J T 1 K D, η ⇥ · · · ⇥ N k . J T k K D, η s { ⌧ : � FROM � = ¯ a 2 J ⌧ : � K D, η | J ✓ K D, η ; η ¯ a = t ✓ WHERE D, η t SELECT | s { ⇤ ⌧ : � FROM : � � 1 ⌧ : � = FROM ✓ WHERE ✓ WHERE D, η D, η t SELECT | • Fits in one page s { ↵ : � 0 ! ⌧ : � FROM ⌧ : � = ⇡ α : � 0 FROM ✓ WHERE ✓ WHERE D, η D, η t SELECT DISTINCT | t SELECT | ↵ : � 0 | ⇤ ↵ : � 0 | ⇤ 0 1 ⌧ : � ⌧ : � = " FROM FROM @ A ✓ ✓ • Non-ambiguous WHERE WHERE D, η D, η J TRUE K D, η = t ⇢ ⌘ ( A ) if t = A J t K D, η = t if t 2 C or t = NULL • Easy to understand 8 t if J t 1 K D, η = J t 2 K D, η and J t 1 K D, η 6 = NULL and J t 2 K D, η 6 = NULL < J t 1 = t 2 K D, η = f if J t 1 K D, η 6 = J t 2 K D, η and J t 1 K D, η 6 = NULL and J t 2 K D, η 6 = NULL : if J t 1 K D, η = NULL or J t 2 K D, η = NULL u ⇢ t if J t K D, η = NULL J t IS NULL K D, η = if J t K D, η 6 = NULL f • Easy to implement J t IS NOT NULL K D, η = ¬ J t IS NULL K D, η n n ^ _ J ( t 1 , . . . t n ) = ( t 0 1 , . . . , t 0 J t i = t 0 J ( t 1 , . . . t n ) 6 = ( t 0 1 , . . . , t 0 J t i 6 = t 0 n ) K D, η = i K D, η n ) K D, η = i K D, η i =1 i =1 r 2 J Q K D, η : J ¯ 8 t if 9 ¯ t = ⌫ (¯ r ) K D, η = t • Easy to modify < J ¯ r 2 J Q K D, η : J ¯ t IN Q K D, η = f if 8 ¯ t = ⌫ (¯ r ) K D, η = f r 2 J Q K D, η : J ¯ r 2 J Q K D, η : J ¯ if @ ¯ : u t = ⌫ (¯ r ) K D, η = t and 9 ¯ t = ⌫ (¯ r ) K D, η 6 = f J ¯ t NOT IN Q K D, η = ¬ J ¯ t IN Q K D, η ⇢ t if J Q K D, η 6 = ? J EXISTS Q K D, η = if J Q K D, η = ? f J ✓ 1 AND ✓ 2 K D, η = J ✓ 1 K D, η ^ J ✓ 2 K D, η J ✓ 1 OR ✓ 2 K D, η = J ✓ 1 K D, η _ J ✓ 2 K D, η J NOT ✓ K D, η = ¬ J ✓ K D, η J Q 1 UNION ALL Q 2 K D, η = J Q 1 K D, η [ J Q 2 K D, η : ` ( J Q 1 K ) J Q 1 INTERSECT ALL Q 2 K D, η = J Q 1 K D, η \ J Q 2 K D, η : ` ( J Q 1 K ) J Q 1 EXCEPT ALL Q 2 K D, η = J Q 1 K D, η � J Q 2 K D, η : ` ( J Q 1 K ) J Q 1 ? Q 2 K D, η = " � � J Q 1 ? ALL Q 2 K D, η ? 2 { UNION , INTERSECT } , J Q 1 EXCEPT Q 2 K D, η = " ( J Q 1 K D, η ) � J Q 2 K D, η : ` ( J Q 1 K )

  11. Formal Semantics: Validation • Cannot prove that semantics is correct • Provide sufficient experimental evidence • Implemented in Python • Validated on 100000+ random SQL queries

  12. Formal Semantics of Cypher • Collaboration between Neo Technology and the University of Edinburgh • Preliminary meeting in December • Legal agreements finalized recently • Neo Technology sponsors a researcher ( Nadime Francis )

  13. Challenges • Getting the (abstract) data model right • Intermediate representation (QUIL?) • Identify core fragment • Language constantly evolving • Follow the footsteps of SQL? (nulls)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend