Advanced SQL II Advanced Aggregation and OLAP 5DV120 Database - PowerPoint PPT Presentation

Advanced SQL II — Advanced Aggregation and OLAP 5DV120 — Database System Principles Ume˚ a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 1 of 35

Aggregation Operators — Review Query: For each department, find the minimum, maximum, average salary, as well as the number of employees. SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor GROUP BY dept_name UNION SELECT dept_name , 0, 0, 0, 0 FROM department as D WHERE (NOT EXISTS (SELECT * FROM instructor NATURAL JOIN department AS DI WHERE D.dept_name=DI.dept_name )); An important rule: The attributes which are listed in the GROUP BY clause must include those which occur in the SELECT and are not aggregated. • Will not work (even though it clearly should): SELECT dept_name , building , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUME02RIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor NATURAL JOIN department GROUP BY dept_name UNION ... Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 2 of 35

Omission of GROUP BY — Review • If the GROUP BY clause is omitted, the aggregation is over the entire table. SELECT MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor ; • In this case, there must be no non-aggregated attributes in the SELECT clause. • Does not work: SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor ; • In the above case, dept name must appear in the GROUP BY clause. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 3 of 35

The HAVING Clause — Review Query: For each department with at least two distinct instructors, find the minimum, maximum, average salary, as well as the number of employees. • The following does not work: SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor WHERE (COUNT (*) >= 2); GROUP BY dept_name; • The problem is that the WHERE clause is evaluated before the aggregation. • The solution is to use a HAVING clause, which is evaluated after the aggregation. SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor GROUP BY dept_name HAVING (COUNT (*) >= 2); • The HAVING clause must come after the GROUP BY clause. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 4 of 35

Embedded Queries in the HAVING Clause — Review Query: Find the department(s) with the greatest number of instructors. SELECT I1.dept_name , COUNT(I1 .*) FROM instructor as I1 GROUP BY dept_name HAVING NOT EXISTS (SELECT I2.dept_name , COUNT(I2 .*) FROM instructor AS I2 GROUP BY I2.dept_name HAVING (COUNT(I2 .*) > COUNT(I1 .*)) ); • It is also possible to do this with an embedded subquery in the FROM clause. SELECT I.dept_name , I.n_instr FROM (SELECT dept_name , COUNT (*) AS n_instr FROM instructor GROUP BY dept_name ) AS I WHERE I.n_instr >= ALL (SELECT n_instr FROM (SELECT dept_name , COUNT (*) AS n_instr FROM instructor GROUP BY dept_name ) AS Pointless ); • Note the alias Pointless which is required by SQL rules. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 5 of 35

Window Functions II • When a window function is used more than once, it is helpful to declare it by name. SELECT dept_name , building , budget , CAST ( AVG(budget) OVER (PARTITION BY building) AS NUMERIC (8 ,2) ) AS avg_for_bldg , CAST ( budget - AVG(budget) OVER (PARTITION BY building) AS NUMERIC (8 ,2) ) As deviation FROM department ; • The above query is represented more compactly as: SELECT dept_name , building , budget , CAST (AVG(budget) OVER W AS NUMERIC (8 ,2)) AS avg_for_bldg , CAST (budget - AVG(budget) OVER W AS NUMERIC (8 ,2)) AS deviation FROM department WINDOW W AS (PARTITION BY building ); dept_name | building | budget | avg_for_bldg | deviation -- ----------+----------+-----------+--------------+----------- Music | Packard | 80000.00 | 80000.00 | 0.00 Finance | Painter | 120000.00 | 85000.00 | 35000.00 History | Painter | 50000.00 | 85000.00 | -35000.00 Comp. Sci. | Taylor | 100000.00 | 92500.00 | 7500.00 Elec. Eng. | Taylor | 85000.00 | 92500.00 | -7500.00 Biology | Watson | 90000.00 | 80000.00 | 10000.00 Physics | Watson | 70000.00 | 80000.00 | -10000.00 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 7 of 35

Window Functions III • Here is a more complex query which provides, for each instructor salary, its percentage of the average within the department. SELECT ID , name , dept_name , salary , CAST (( salary *100)/ AVG(salary) OVER (PARTITION BY dept_name) AS NUMERIC (5 ,2) ) AS pct_avg_in_dept FROM instructor NATURAL JOIN department ; id | name | dept_name | salary | pct_avg_in_dept -- -----+------------+------------+----------+----------------- 76766 | Crick | Biology | 72000.00 | 100.00 10101 | Srinivasan | Comp. Sci. | 65000.00 | 84.05 45565 | Katz | Comp. Sci. | 75000.00 | 96.98 83821 | Brandt | Comp. Sci. | 92000.00 | 118.97 98345 | Kim | Elec. Eng. | 80000.00 | 100.00 12121 | Wu | Finance | 90000.00 | 105.88 76543 | Singh | Finance | 80000.00 | 94.12 32343 | El Said | History | 60000.00 | 98.36 58583 | Califieri | History | 62000.00 | 101.64 15151 | Mozart | Music | 40000.00 | 100.00 33456 | Gold | Physics | 87000.00 | 95.60 22222 | Einstein | Physics | 95000.00 | 104.40 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 8 of 35

Window Functions IV • Window functions are also very useful for computing running averages. • Suppose that the following simple table of years and sales amounts is given. CREATE TABLE Sales (year NUMERIC (4 ,0) NOT NULL , amount NUMERIC (8 ,2) NOT NULL , PRIMARY KEY (year) ); • The following query provides a moving average of sales for the current year and two preceding. SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS 2 PRECEDING) AS NUMERIC (8 ,2) ) AS mvg_avg_3_yr FROM Sales; year | amount | mvg_avg_3_yr year | amount | mvg_avg_3_yr -- ----+-----------+-------------- -- ----+-----------+-------------- 1990 | 52134.00 | 52134.00 2001 | 95000.00 | 93333.33 1991 | 66000.00 | 59067.00 2002 | 98000.00 | 96000.00 1992 | 75000.00 | 64378.00 2003 | 96000.00 | 96333.33 1993 | 65000.00 | 68666.67 2004 | 101000.00 | 98333.33 1994 | 70000.00 | 70000.00 2005 | 99000.00 | 98666.67 1995 | 80000.00 | 71666.67 2006 | 104000.00 | 101333.33 1996 | 85000.00 | 78333.33 2007 | 120000.00 | 107666.67 1997 | 82000.00 | 82333.33 2008 | 125000.00 | 116333.33 1998 | 88000.00 | 85000.00 2009 | 115000.00 | 120000.00 1999 | 90000.00 | 86666.67 2010 | 130000.00 | 123333.33 2000 | 95000.00 | 91000.00 2011 | 175000.00 | 140000.00 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 9 of 35

Window Functions V • Other options for the moving average are also possible. • All preceding years as well as the current: SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS UNBOUNDED PRECEDING) AS NUMERIC (8 ,2) ) AS ubd_prec_mvg_avg FROM Sales; • A three-year running average which includes the preceding as well as the following year: SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS NUMERIC (8 ,2) ) AS moving_avg_1p_1f FROM Sales; Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 10 of 35

Advanced SQL II Advanced Aggregation and OLAP 5DV120 Database - PowerPoint PPT Presentation

Advanced SQL II Advanced Aggregation and OLAP 5DV120 Database System Principles Ume a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Advanced SQL II Advanced

OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL Server Features (in short) SQL

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Data Warehouse and OLAP II Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8

Fjord Bridging the between Gap OLAP SQL and Jens-Peter Dittrich 1, *

SQL SQL SQL = Structured Query Language Standard query language for relational

SQL and JS Pitfalls Assignment 2 Preparation SQL Concepts SQL vs. NoSQL

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Advanced SQL 01 The Core of SQL Torsten Grust Universitt Tbingen, Germany 1 The Core

Data Warehouse and OLAP Data Warehouse and OLAP Week 5 1 Midterm I Midterm I Friday, March

OLAP and Data Mining Chapter 17 OLTP Compared With OLAP On Line Transaction Processing

Saiku Saiku taking taking OLAP OLAP databases databases into into 21 21st st

15-721 DATABASE SYSTEMS Lecture #09 OLAP Indexes Andy Pavlo / / Carnegie Mellon University

Distributed OLAP Databases Lecture # 24 Database Systems Andy Pavlo AP AP Computer Science

Next Generation Is Islamic Fin inance Business Model - The Islamic Finance House - Facts

ELEN E6884 - Topics in Signal Processing Recap Topic: Speech Recognition Gaussian Mixture

Outline What is de c ision suppor t An Overview of Data What is a data war e house

DATA MANAGEMENT FOR BUSINESS INTELLIGENCE OLAP: On-Line Analytical Processing Salvatore Ruggieri

Do#the#middle#letters#of#OLAP#stand#for# Linear#Algebra#(LA)? ! Speaker: Lus A.

Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A de fi

Advanced SQL II Advanced Aggregation and OLAP 5DV120 Database - PowerPoint PPT Presentation

Advanced SQL II Advanced Aggregation and OLAP 5DV120 Database System Principles Ume a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Advanced SQL II Advanced

OLAP &amp; Data Mining OLAP &amp; Data Mining Agenda Agenda SQL Server Features (in short) SQL

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Data Warehouse and OLAP II Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8

Fjord Bridging the between Gap OLAP SQL and Jens-Peter Dittrich 1, *

SQL SQL SQL = Structured Query Language Standard query language for relational

SQL and JS Pitfalls Assignment 2 Preparation SQL Concepts SQL vs. NoSQL

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Advanced SQL 01 The Core of SQL Torsten Grust Universitt Tbingen, Germany 1 The Core

Data Warehouse and OLAP Data Warehouse and OLAP Week 5 1 Midterm I Midterm I Friday, March

OLAP and Data Mining Chapter 17 OLTP Compared With OLAP On Line Transaction Processing

Saiku Saiku taking taking OLAP OLAP databases databases into into 21 21st st

15-721 DATABASE SYSTEMS Lecture #09 OLAP Indexes Andy Pavlo / / Carnegie Mellon University

Distributed OLAP Databases Lecture # 24 Database Systems Andy Pavlo AP AP Computer Science

Next Generation Is Islamic Fin inance Business Model - The Islamic Finance House - Facts

ELEN E6884 - Topics in Signal Processing Recap Topic: Speech Recognition Gaussian Mixture

Outline What is de c ision suppor t An Overview of Data What is a data war e house

DATA MANAGEMENT FOR BUSINESS INTELLIGENCE OLAP: On-Line Analytical Processing Salvatore Ruggieri

Do#the#middle#letters#of#OLAP#stand#for# Linear#Algebra#(LA)? ! Speaker: Lus A.

Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A de fi

OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL Server Features (in short) SQL