Neues in Open-Source-SQL-Datenbanken
@MarkusWinand • @ModernSQL
http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf
Neues in Open-Source-SQL-Datenbanken @MarkusWinand @ModernSQL - - PowerPoint PPT Presentation
Neues in Open-Source-SQL-Datenbanken @MarkusWinand @ModernSQL http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf Neues in Open-Source-SQL-Datenbanken @MarkusWinand @ModernSQL
Neues in Open-Source-SQL-Datenbanken
@MarkusWinand • @ModernSQL
http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf
Neues in Open-Source-SQL-Datenbanken
@MarkusWinand • @ModernSQL
http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf
CHECK Constraints
CHECK Constraints Since SQL-92
CREATE TABLE order_lines ( … qty INTEGER NOT NULL CHECK (qty > 0), … )
CHECK Constraints Since SQL-92
CREATE TABLE order_lines ( … qty INTEGER NOT NULL CHECK (qty > 0), … ) INSERT … (…, qty, …) VALUES (…, 1, …)
CHECK Constraints Since SQL-92
CREATE TABLE order_lines ( … qty INTEGER NOT NULL CHECK (qty > 0), … ) INSERT … (…, qty, …) VALUES (…, 1, …) INSERT … (…, qty, …) VALUES (…, 3, …)
CHECK Constraints Since SQL-92
CREATE TABLE order_lines ( … qty INTEGER NOT NULL CHECK (qty > 0), … ) INSERT … (…, qty, …) VALUES (…, 1, …) INSERT … (…, qty, …) VALUES (…, 3, …) INSERT … (…, qty, …) VALUES (…, 0, …)
CHECK Constraints Since SQL-92
CREATE TABLE order_lines ( … qty INTEGER NOT NULL CHECK (qty > 0), … ) INSERT … (…, qty, …) VALUES (…, 1, …) INSERT … (…, qty, …) VALUES (…, 3, …) INSERT … (…, qty, …) VALUES (…, 0, …)
Before MySQL 8.0.16 and MariaDB 10.2:
Syntax accepted, Constraint ignored
CHECK Constraints Since SQL-92
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
10.2
MariaDB
8.0.16 MySQL 8.3
PostgreSQL
3.5.7
SQLite
9.7
DB2 LUW
11gR1
Oracle
2008R2
SQL Server
INTERSECT and EXCEPT
Since SQL-92 INTERSECT & EXCEPT
UNION [ALL] Concatenates two results
Since SQL-92 INTERSECT & EXCEPT
UNION [ALL] Concatenates two results INTERSECT [ALL] Common rows from two results
Since SQL-92 INTERSECT & EXCEPT
UNION [ALL] Concatenates two results INTERSECT [ALL] Common rows from two results EXCEPT [ALL] Remove rows from first result
INTERSECT & EXCEPT Since SQL-92
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
10.3
[0]MariaDB MySQL
8.3
PostgreSQL
3.5.7
[0]SQLite
9.7
DB2 LUW
11gR1[0]
Oracle
2008R2[0]
SQL Server
[0]Not [all]LATERAL
Select-list sub-queries must be scalar[0]:
LATERAL Before SQL:1999
SELECT … , (SELECT column_1 FROM t1 WHERE t1.x = t2.y ) AS c FROM t2 …
(an atomic quantity that can hold only one value at a time[1])
[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/ScalarSelect-list sub-queries must be scalar[0]:
LATERAL Before SQL:1999
SELECT … , (SELECT column_1 FROM t1 WHERE t1.x = t2.y ) AS c FROM t2 …
(an atomic quantity that can hold only one value at a time[1])
[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar, column_2 More than
⇒Syntax error
Select-list sub-queries must be scalar[0]:
LATERAL Before SQL:1999
SELECT … , (SELECT column_1 FROM t1 WHERE t1.x = t2.y ) AS c FROM t2 …
(an atomic quantity that can hold only one value at a time[1])
[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar, column_2 More than
⇒Syntax error
More than
⇒Runtime error!
SELECT * FROM t1 CROSS JOIN LATERAL (SELECT * FROM t2 WHERE t2.x = t1.x ) derived_table ON (true)
LATERAL Since SQL:1999
Lateral derived queries can see table names defined before:
SELECT * FROM t1 CROSS JOIN LATERAL (SELECT * FROM t2 WHERE t2.x = t1.x ) derived_table ON (true)
LATERAL Since SQL:1999
V a l i d d u e t
A T E R A L k e y w
d
Lateral derived queries can see table names defined before:
SELECT * FROM t1 CROSS JOIN LATERAL (SELECT * FROM t2 WHERE t2.x = t1.x ) derived_table ON (true)
LATERAL Since SQL:1999
V a l i d d u e t
A T E R A L k e y w
d Useless, but still required
Lateral derived queries can see table names defined before:
SELECT * FROM t1 CROSS JOIN LATERAL (SELECT * FROM t2 WHERE t2.x = t1.x ) derived_table ON (true)
LATERAL Since SQL:1999
V a l i d d u e t
A T E R A L k e y w
d
Lateral derived queries can see table names defined before:
Use CROSS JOIN to omit the ON clause
But WHY?
Use-Cases LATERAL
inside a lateral derived table FETCH FIRST (or LIMIT, TOP) applies per row from left tables.
Use-Cases LATERAL
FROM t JOIN LATERAL (SELECT … FROM … WHERE t.c=… ORDER BY … LIMIT 10 ) derived_table
inside a lateral derived table FETCH FIRST (or LIMIT, TOP) applies per row from left tables.
Use-Cases LATERAL
FROM t JOIN LATERAL (SELECT … FROM … WHERE t.c=… ORDER BY … LIMIT 10 ) derived_table
inside a lateral derived table FETCH FIRST (or LIMIT, TOP) applies per row from left tables.
Use-Cases LATERAL
Add proper index for Top-N query
http://use-the-index-luke.com/sql/partial-results/top-n-queriesFROM t JOIN LATERAL (SELECT … FROM … WHERE t.c=… ORDER BY … LIMIT 10 ) derived_table
inside a lateral derived table FETCH FIRST (or LIMIT, TOP) applies per row from left tables.
news from several subscribed topics (“multi-source top-N”).
Use-Cases LATERAL
Add proper index for Top-N query
http://use-the-index-luke.com/sql/partial-results/top-n-queriesFROM t JOIN LATERAL (SELECT … FROM … WHERE t.c=… ORDER BY … LIMIT 10 ) derived_table
inside a lateral derived table FETCH FIRST (or LIMIT, TOP) applies per row from left tables.
news from several subscribed topics (“multi-source top-N”).
(TABLE often implies LATERAL)
Use-Cases LATERAL
FROM t JOIN TABLE (your_func(t.c))
LATERAL is the "for each" loop of SQL LATERAL plays well with outer and cross joins LATERAL is great for Top-N subqueries LATERAL can join table functions (unnest!)
LATERAL In a Nutshell
LATERAL Availability
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1
MariaDB
8.0.14 MySQL 9.3
PostgreSQL SQLite
9.1
DB2 LUW
11gR1[0] 12cR1
Oracle
2005[1]
SQL Server
[0]Undocumented. Requires setting trace event 22829. [1]LATERAL is not supported as of SQL Server 2016 but [CROSS|OUTER] APPLY can be used for the same effect.BOOLEAN Tests
Before we start
SQL uses a three-valued logic. Boolean values are either true, false or unknown(=null).
See: http://modern-sql.com/concept/three-valued-logic
BOOLEAN Aggregates
BOOLEAN Tests
Similar to is null, there are tests for each Boolean value (of which there are three: true, false, unknown/null) IS [NOT] [TRUE|FALSE|UNKNOWN]
Since SQL:2003
CREATE TABLE prices ( … valid_from DATE NOT NULL, valid_to DATE, -- null: open end … CHECK (valid_from < valid_to), ); WHERE valid_from < CURRENT_DATE AND (valid_to <= CURRENT_DATE) IS NOT FALSE
BOOLEAN Tests Since SQL:2003
CREATE TABLE prices ( … valid_from DATE NOT NULL, valid_to DATE, -- null: open end … CHECK (valid_from < valid_to), ); WHERE valid_from < CURRENT_DATE AND (valid_to <= CURRENT_DATE) IS NOT FALSE
BOOLEAN Tests Since SQL:2003
CREATE TABLE prices ( … valid_from DATE NOT NULL, valid_to DATE, -- null: open end … CHECK (valid_from < valid_to), ); WHERE valid_from < CURRENT_DATE AND (valid_to <= CURRENT_DATE) IS NOT FALSE
BOOLEAN Tests Since SQL:2003
UNKNOWN if VALID_TO is NULL
CREATE TABLE prices ( … valid_from DATE NOT NULL, valid_to DATE, -- null: open end … CHECK (valid_from < valid_to), ); WHERE valid_from < CURRENT_DATE AND (valid_to <= CURRENT_DATE) IS NOT FALSE
BOOLEAN Tests Since SQL:2003
UNKNOWN if VALID_TO is NULL Takes TRUE and UNKNOWN
BOOLEAN Tests Since SQL:2003
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1
MariaDB
5.0.51a
MySQL
8.3
PostgreSQL
3.23.0
[0]SQLite DB2 LUW Oracle SQL Server
[0]No IS [NOT] UNKNOWN. Use IS [NOT] NULL instead.BOOLEAN Type
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
Without NOT NULL it is a three-valued Boolean
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
SELECT … FROM … WHERE NOT(deleted)
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
Alias for int
MySQL MariaDB SQLite
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
Alias for int
MySQL MariaDB SQLite
INSERT … (…, deleted, …) VALUES (…, true, …)
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
Alias for int
MySQL MariaDB SQLite
INSERT … (…, deleted, …) VALUES (…, true, …) INSERT … (…, deleted, …) VALUES (…, false, …)
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
Alias for int
MySQL MariaDB SQLite
INSERT … (…, deleted, …) VALUES (…, true, …) INSERT … (…, deleted, …) VALUES (…, false, …) INSERT … (…, deleted, …) VALUES (…, 42, …)
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
Alias for int
MySQL MariaDB SQLite
INSERT … (…, deleted, …) VALUES (…, true, …) INSERT … (…, deleted, …) VALUES (…, false, …) INSERT … (…, deleted, …) VALUES (…, 42, …) UNIQUE,
BOOLEAN Type
CREATE TABLE … ( … deleted BOOLEAN NOT NULL, … )
Since SQL:2003
Alias for int
MySQL MariaDB SQLite
INSERT … (…, deleted, …) VALUES (…, true, …) INSERT … (…, deleted, …) VALUES (…, false, …) INSERT … (…, deleted, …) VALUES (…, 42, …) UNIQUE, +----+ | de | +----+ | 1 | | 0 | | 42 | +----+
BOOLEAN Type Since SQL:2003
Note that boolean in base tables is often questionable:
BOOLEAN Type Since SQL:2003
Note that boolean in base tables is often questionable:
BOOLEAN Type Since SQL:2003
Note that boolean in base tables is often questionable:
consider using an enum instead (can be evolved)
BOOLEAN Type Since SQL:2003
Note that boolean in base tables is often questionable:
consider using an enum instead (can be evolved) See: 3 Reasons I Hate Booleans In Databases by Jeff Potter https://medium.com/@jpotts18/646d99696580
BOOLEAN Type Since SQL:2003
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1
[0]MariaDB
5.0.51a
[0]MySQL
8.4
PostgreSQL
3.23.0
[0]SQLite DB2 LUW Oracle SQL Server
[0]BOOLEAN, TRUE, FALSE are aliases for TINYINT(1), 1, 0 respectivley.OVER
and
PARTITION BY
OVER (PARTITION BY) The Problem
Two distinct concepts could not be used independently:
OVER (PARTITION BY) The Problem
Two distinct concepts could not be used independently:
OVER (PARTITION BY) The Problem
Two distinct concepts could not be used independently:
OVER (PARTITION BY) The Problem
OVER (PARTITION BY) The Problem
SELECT c1 , c2 FROM t SELECT c1 , c2 FROM t
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No SELECT c1 , c2 FROM t SELECT c1 , c2 FROM t
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t
JOIN ( ) ta ON (t.c1=ta.c1)
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t
JOIN ( ) ta ON (t.c1=ta.c1)
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t
JOIN ( ) ta ON (t.c1=ta.c1)
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1, tot
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
OVER (PARTITION BY) The Problem
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t
JOIN ( ) ta ON (t.c1=ta.c1)
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1, tot
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
OVER (PARTITION BY) Since SQL:2003
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t FROM t
SELECT c1 , SUM(c2) tot FROM t GROUP BY c1
OVER (PARTITION BY) Since SQL:2003
Yes ⇠ Merge rows ⇢ No No ⇠ Aggregate ⇢ Yes SELECT c1 , c2 FROM t SELECT DISTINCT c1 , c2 FROM t SELECT c1 , c2 FROM t FROM t , SUM(c2) OVER (PARTITION BY c1)
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 6000 22 1000 6000 22 1000 6000 333 1000 6000 333 1000 6000 333 1000 6000
OVER (PARTITION BY) How it works
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 6000 22 1000 6000 22 1000 6000 333 1000 6000 333 1000 6000 333 1000 6000
OVER (PARTITION BY) How it works
Look here
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 6000 22 1000 6000 22 1000 6000 333 1000 6000 333 1000 6000 333 1000 6000
OVER (PARTITION BY) How it works
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 6000 22 1000 6000 22 1000 6000 333 1000 6000 333 1000 6000 333 1000 6000
OVER (PARTITION BY) How it works
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 6000 22 1000 6000 22 1000 6000 333 1000 6000 333 1000 6000 333 1000 6000
OVER (PARTITION BY) How it works
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 6000 22 1000 6000 22 1000 6000 333 1000 6000 333 1000 6000 333 1000 6000
OVER (PARTITION BY) How it works
dep salary 1 1000 6000 22 1000 6000 22 1000 6000 333 1000 6000 333 1000 6000 333 1000 6000 SELECT dep, salary, SUM(salary) OVER() FROM emp
OVER (PARTITION BY) How it works
)
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 1000 22 1000 2000 22 1000 2000 333 1000 3000 333 1000 3000 333 1000 3000
OVER (PARTITION BY) How it works
)
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 1000 22 1000 2000 22 1000 2000 333 1000 3000 333 1000 3000 333 1000 3000
OVER (PARTITION BY) How it works
) PARTITION BY dep
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 1000 22 1000 2000 22 1000 2000 333 1000 3000 333 1000 3000 333 1000 3000
OVER (PARTITION BY) How it works
) PARTITION BY dep
SELECT dep, salary, SUM(salary) OVER() FROM emp dep salary 1 1000 1000 22 1000 2000 22 1000 2000 333 1000 3000 333 1000 3000 333 1000 3000
OVER (PARTITION BY) How it works
) PARTITION BY dep
OVER
and
ORDER BY
(Framing & Ranking)
acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20
OVER (ORDER BY) The Problem
SELECT id, value, FROM transactions t
acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20
OVER (ORDER BY) The Problem
SELECT id, value, FROM transactions t
acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20
OVER (ORDER BY) The Problem
SELECT id, value, FROM transactions t
acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20
OVER (ORDER BY) The Problem
SELECT id, value, (SELECT SUM(value) FROM transactions t2 WHERE t2.id <= t.id) FROM transactions t
acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20
OVER (ORDER BY) The Problem
SELECT id, value, (SELECT SUM(value) FROM transactions t2 WHERE t2.id <= t.id) FROM transactions t
Range segregation (<=) not possible with GROUP BY or PARTITION BY
OVER (ORDER BY) Since SQL:2003
SELECT id, value, (SELECT SUM(value) FROM transactions t2 WHERE t2.id <= t.id) FROM transactions t acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +30 22 3
+20 333 4 +50 +70 333 5
+40 333 6
+20 ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 22 2 +20 22 3
333 4 +50 333 5
333 6
ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER (ORDER BY) Since SQL:2003
SELECT id, value, FROM transactions t SUM(value) OVER ( ) acnt id value balance 1 1 +10 +10 22 2 +20 +20 22 3
+10 333 4 +50 +50 333 5
+20 333 6
ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW PARTITION BY acnt
Use Cases OVER (SQL:2003)
moving averages
Use Cases
AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) moving_avg
OVER (SQL:2003)
moving averages
Use Cases
AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) moving_avg
OVER (SQL:2003)
moving averages
Use Cases
SELECT * FROM (SELECT ROW_NUMBER() OVER(PARTITION BY … ORDER BY …) rn , t.* FROM t) numbered_t WHERE rn <= 3 AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) moving_avg
OVER (SQL:2003)
moving averages
Use Cases
SELECT * FROM (SELECT ROW_NUMBER() OVER(PARTITION BY … ORDER BY …) rn , t.* FROM t) numbered_t WHERE rn <= 3 AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) moving_avg
OVER (SQL:2003)
moving averages
[… many more …]
Use Cases
SELECT * FROM (SELECT ROW_NUMBER() OVER(PARTITION BY … ORDER BY …) rn , t.* FROM t) numbered_t WHERE rn <= 3 AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) moving_avg
OVER (SQL:2003)
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1 10.2
MariaDB
8.0
MySQL
8.4
PostgreSQL
3.25.0
SQLite
7.0
DB2 LUW
8i
Oracle
2005[0] 2012
SQL Server
[0]No framingOVER (SQL:2003) Availability
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1 10.2
MariaDB
8.0
MySQL
8.4
PostgreSQL
3.25.0
SQLite
7.0
DB2 LUW
8i
Oracle
2005[0] 2012
SQL Server
[0]No framingOVER (SQL:2003) Availability
Impala Spark NuoDB BigQuery Hive
OVER
SQL:2003 frame exclusion
Since SQL:2003
OVER (ORDER BY … BETWEEN … exclude [ no others | current row | group | ties ] )
OVER (frame exclusion)
Since SQL:2003
OVER (ORDER BY … BETWEEN … exclude [ no others | current row | group | ties ] ) default
OVER (frame exclusion)
no others
Since SQL:2003
x 1 2 2 2 3
OVER (ORDER BY … BETWEEN … exclude [ no others | current row | group | ties ] ) default
OVER (frame exclusion)
no others
current row
Since SQL:2003
x 1 2 2 2 3
OVER (ORDER BY … BETWEEN … exclude [ no others | current row | group | ties ] ) default
OVER (frame exclusion)
current row
group
x = current_row
current row
Since SQL:2003
x 1 2 2 2 3
OVER (ORDER BY … BETWEEN … exclude [ no others | current row | group | ties ] ) default
OVER (frame exclusion)
group
ties group
x = current_row
current row
Since SQL:2003
x 1 2 2 2 3
OVER (ORDER BY … BETWEEN … exclude [ no others | current row | group | ties ] ) default
OVER (frame exclusion)
ties
OVER (frame exclusion) Since SQL:2011
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1
MariaDB MySQL
11
PostgreSQL
3.28.0 SQLite
DB2 LUW Oracle SQL Server
FILTER
FILTER Before we start
In SQL, most aggregate functions* drop null arguments prior to the aggregation.
*Exceptions: Some aggregate functions that return structured data: array_agg, json_objectagg, xmlagg See: http://modern-sql.com/concept/null#aggregates
SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue ELSE 0 END) JAN, SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, … FROM sales GROUP BY YEAR
FILTER The Problem
Pivot table: Years on the Y axis, month on X:
SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue ELSE 0 END) JAN, SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, … FROM sales GROUP BY YEAR
FILTER The Problem
Pivot table: Years on the Y axis, month on X:
SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue ELSE 0 END) JAN, SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, … FROM sales GROUP BY YEAR
FILTER The Problem
Pivot table: Years on the Y axis, month on X:
SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue ELSE 0 END) JAN, SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, … FROM sales GROUP BY YEAR
FILTER The Problem
Pivot table: Years on the Y axis, month on X: Optional: ELSE NULL is default
SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue ELSE 0 END) JAN, SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, … FROM sales GROUP BY YEAR
FILTER The Problem
Pivot table: Years on the Y axis, month on X: Optional: ELSE NULL is default Aggregates ignore NULL*
*Exceptions:array_agg, json_objectagg, xmlagg See: https://modern-sql.com/concept/null#aggregates
SELECT YEAR, SUM(CASE WHEN MONTH = 1 THEN revenue ELSE 0 END) JAN, SUM(CASE WHEN MONTH = 2 THEN revenue END) FEB, … FROM sales GROUP BY YEAR
FILTER The Problem
Pivot table: Years on the Y axis, month on X:
SELECT YEAR, SUM(revenue) FILTER (WHERE MONTH = 1) JAN, SUM(revenue) FILTER (WHERE MONTH = 2) FEB, … FROM sales GROUP BY YEAR;
FILTER Since SQL:2003
SQL:2003 allows FILTER (WHERE…) after aggregates:
FILTER Since SQL:2003
Year 2016 2016 2016 2016 2016 Month 1 2 3 ... 12 Revenue 1 23 345 ... 1234 Year 2016 Jan 1 Feb 23 Mar 345 ... ... Dec 1234 S U M ( … ) F I L T E R ( W H E R E … ) S U M ( … ) F I L T E R ( W H E R E m
t h = 2 ) S U M ( r e v e n u e ) F I L T E R ( W H E R E m
t h = 3 ) S U M ( r e v e n u e ) F I L T E R ( W H E R E m
t h = … ) S U M ( r e v e n u e ) F I L T E R ( W H E R E m
t h = 1 2 )
Pivot in SQL
to combine rows
rows per column
See: https://modern-sql.com/use-case/pivot
FILTER Availability
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1
MariaDB MySQL
9.4
PostgreSQL
3.25.0
[0]SQLite DB2 LUW Oracle SQL Server
[0]Only with OVER clauseInverse Distribution Functions (percentiles)
Inverse Distribution Functions
The Problem
Grouped rows cannot be ordered prior to aggregation.
(how to get the middle value (median) of a set)
SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = (SELECT FLOOR(COUNT(*)/2) FROM data d3)
Inverse Distribution Functions
The Problem
Grouped rows cannot be ordered prior to aggregation.
(how to get the middle value (median) of a set)
SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = (SELECT FLOOR(COUNT(*)/2) FROM data d3)
Inverse Distribution Functions
The Problem
Grouped rows cannot be ordered prior to aggregation.
(how to get the middle value (median) of a set)
Number rows
SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = (SELECT FLOOR(COUNT(*)/2) FROM data d3)
Inverse Distribution Functions
The Problem
Grouped rows cannot be ordered prior to aggregation.
(how to get the middle value (median) of a set)
Number rows Pick middle one
SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = (SELECT FLOOR(COUNT(*)/2) FROM data d3)
Inverse Distribution Functions
The Problem
Grouped rows cannot be ordered prior to aggregation.
(how to get the middle value (median) of a set)
Number rows Pick middle one
SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id<d2.id)) GROUP BY d1.val HAVING count(*) = (SELECT FLOOR(COUNT(*)/2) FROM data d3)
Inverse Distribution Functions
The Problem
Grouped rows cannot be ordered prior to aggregation.
(how to get the middle value (median) of a set)
Number rows Pick middle one
Since SQL:2003
Inverse Distribution Functions
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Median
Since SQL:2003
Inverse Distribution Functions
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Median Which value?
Since SQL:2003
Inverse Distribution Functions
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
Two variants:
(categories)
(linear interpolation)
1 2 3 4
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
Two variants:
(categories)
(linear interpolation)
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
Two variants:
(categories)
(linear interpolation)
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
Two variants:
(categories)
(linear interpolation)
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
Two variants:
(categories)
(linear interpolation)
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC(0.5)
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
Two variants:
(categories)
(linear interpolation)
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC(0.5)
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_CONT PERCENTILE_DISC(0.5)
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
Two variants:
(categories)
(linear interpolation)
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_DISC(0.5)
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_CONT PERCENTILE_DISC(0.5)
0.25 0.5 0.75 1
1 2 3 4 PERCENTILE_CONT(0.5) PERCENTILE_DISC(0.5)
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data
Since SQL:2003
Inverse Distribution Functions
Two variants:
(categories)
(linear interpolation)
Inverse Distribution Functions
Availability
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1 10.3
[0]MariaDB MySQL
9.4
PostgreSQL SQLite
11.1
DB2 LUW
9iR1
Oracle
2012[0]
SQL Server
[0]Only as window function (requires OVER clause)OVER
SQL:2011 groups option
OVER (groups option) Since SQL:2011
ORDER BY x <frame unit> between 1 preceding and 1 following
OVER (groups option) Since SQL:2011
ORDER BY x <frame unit> between 1 preceding and 1 following rows, range
OVER (groups option) Since SQL:2011
ORDER BY x <frame unit> between 1 preceding and 1 following rows, range
x 1 3 3.5 3.5 4
CURRENT ROW
OVER (groups option) Since SQL:2011
ORDER BY x <frame unit> between 1 preceding and 1 following rows, range
rows
count(*)
x 1 3 3.5 3.5 4
CURRENT ROW
OVER (groups option) Since SQL:2011
ORDER BY x <frame unit> between 1 preceding and 1 following rows, range
range
x between current_row - 1 and current_row + 1
rows
count(*)
x 1 3 3.5 3.5 4
CURRENT ROW
OVER (groups option) Since SQL:2011
ORDER BY x <frame unit> between 1 preceding and 1 following rows, range
New in SQL:2011
groups
range
x between current_row - 1 and current_row + 1
rows
count(*)
x 1 3 3.5 3.5 4
CURRENT ROW
groups
count(distinct x)
OVER (groups option) Since SQL:2011
ORDER BY x <frame unit> between 1 preceding and 1 following rows, range
New in SQL:2011
groups
range
x between current_row - 1 and current_row + 1
rows
count(*)
x 1 3 3.5 3.5 4
CURRENT ROW
Since SQL:2011 OVER (groups option)
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1
MariaDB MySQL
11
PostgreSQL
3.28.0 SQLite
DB2 LUW Oracle SQL Server
System Versioning
(Time Traveling)
INSERT UPDATE DELETE are
DESTRUCTIVE
System Versioning The Problem
System Versioning Since SQL:2011
Table can be system versioned, application versioned or both.
CREATE TABLE t (...,
System Versioning Since SQL:2011
Table can be system versioned, application versioned or both.
CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START,
System Versioning Since SQL:2011
Table can be system versioned, application versioned or both.
CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END,
System Versioning Since SQL:2011
Table can be system versioned, application versioned or both.
CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END, PERIOD FOR SYSTEM_TIME (start_ts, end_ts)
System Versioning Since SQL:2011
Table can be system versioned, application versioned or both.
CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END, PERIOD FOR SYSTEM_TIME (start_ts, end_ts) ) WITH SYSTEM VERSIONING
System Versioning Since SQL:2011
Table can be system versioned, application versioned or both.
ID Data start_ts end_ts 1 X 10:00:00
UPDATE ... SET DATA = 'Y' ...
ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00
DELETE ... WHERE ID = 1 INSERT ... (ID, DATA) VALUES (1, 'X')
System Versioning Since SQL:2011
ID Data start_ts end_ts 1 X 10:00:00
UPDATE ... SET DATA = 'Y' ...
ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00
DELETE ... WHERE ID = 1 INSERT ... (ID, DATA) VALUES (1, 'X')
System Versioning Since SQL:2011
ID Data start_ts end_ts 1 X 10:00:00
UPDATE ... SET DATA = 'Y' ...
ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00
DELETE ... WHERE ID = 1 INSERT ... (ID, DATA) VALUES (1, 'X')
System Versioning Since SQL:2011
ID Data start_ts end_ts 1 X 10:00:00
UPDATE ... SET DATA = 'Y' ...
ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00
DELETE ... WHERE ID = 1
ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00
System Versioning Since SQL:2011
Although multiple versions exist, only the “current”
After 12:00:00, SELECT * FROM t doesn’t return anything anymore.
ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00
System Versioning Since SQL:2011
ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00
With FOR … AS OF you can query anything you like: SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2019-05-02 10:30:00'
ID Data start_ts end_ts 1 X 10:00:00 11:00:00
System Versioning Since SQL:2011
System Versioning Since SQL:2011
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1 10.3
MariaDB MySQL PostgreSQL SQLite
10.1
DB2 LUW
10gR1[0] 11gR1[1]
Oracle
2016
SQL Server
[0]Short term using Flashback. [1]Flashback Archive. Proprietery syntax.FOR PORTION OF
(UPDATE, DELETE)
CREATE TABLE t ( ...,
Since SQL:2011
Table can be system versioned, application versioned or both.
FOR PORTION OF
CREATE TABLE t ( ..., start_ts TIMESTAMP(9),
Since SQL:2011
Table can be system versioned, application versioned or both.
FOR PORTION OF
CREATE TABLE t ( ..., start_ts TIMESTAMP(9), end_ts TIMESTAMP(9),
Since SQL:2011
Table can be system versioned, application versioned or both.
FOR PORTION OF
CREATE TABLE t ( ..., start_ts TIMESTAMP(9), end_ts TIMESTAMP(9), PERIOD FOR app (start_ts, end_ts) )
Since SQL:2011
Table can be system versioned, application versioned or both.
FOR PORTION OF
ID Data start_ts end_ts 1 X 10:00:00 12:00:00
UPDATE t FOR PORTION OF app FROM '10:30:00' TO '11:30:00' SET DATA = 'Y'
ID Data start_ts end_ts 1 X 10:00:00 10:30:00 1 Y 10:30:00 11:30:00 1 X 11:30:00 12:00:00
INSERT t (ID, DATA, start_ts , end_ts) VALUES ( 1, 'X', '10:00:00', '12:00:00')
Since SQL:2011 FOR PORTION OF
ID Data start_ts end_ts 1 X 10:00:00 12:00:00
UPDATE t FOR PORTION OF app FROM '10:30:00' TO '11:30:00' SET DATA = 'Y'
ID Data start_ts end_ts 1 X 10:00:00 10:30:00 1 Y 10:30:00 11:30:00 1 X 11:30:00 12:00:00
INSERT t (ID, DATA, start_ts , end_ts) VALUES ( 1, 'X', '10:00:00', '12:00:00')
Since SQL:2011 FOR PORTION OF
ID Data start_ts end_ts 1 X 10:00:00 12:00:00
UPDATE t FOR PORTION OF app FROM '10:30:00' TO '11:30:00' SET DATA = 'Y'
ID Data start_ts end_ts 1 X 10:00:00 10:30:00 1 Y 10:30:00 11:30:00 1 X 11:30:00 12:00:00
INSERT t (ID, DATA, start_ts , end_ts) VALUES ( 1, 'X', '10:00:00', '12:00:00')
Since SQL:2011 FOR PORTION OF
FOR PORTION OF Since SQL:2011
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1
MariaDB MySQL PostgreSQL SQLite
10.5
DB2 LUW Oracle SQL Server
10.4
FOR PORTION OF Since SQL:2011
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
5.1
MariaDB MySQL PostgreSQL SQLite
10.5
DB2 LUW Oracle SQL Server
10.4
Since SQL:2011
Application Versioning
For a useful application versioning, WITHOUT OVERLAPS constraints are required.
Apparently this was moved to MariaDB 10.5: https://jira.mariadb.org/browse/MDEV-16978
https://commons.wikimedia.org/wiki/File:Tamias_striatus_CT.jpg
Mühsam ernährt sich das Eichhörnchen
@ModernSQL modern-sql.com
My other website:
This is the training you are looking for: https://winand.at/