advanced sql ii advanced aggregation and olap
play

Advanced SQL II Advanced Aggregation and OLAP 5DV120 Database - PowerPoint PPT Presentation

Advanced SQL II Advanced Aggregation and OLAP 5DV120 Database System Principles Ume a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Advanced SQL II Advanced


  1. Advanced SQL II — Advanced Aggregation and OLAP 5DV120 — Database System Principles Ume˚ a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 1 of 35

  2. Aggregation Operators — Review Query: For each department, find the minimum, maximum, average salary, as well as the number of employees. SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor GROUP BY dept_name UNION SELECT dept_name , 0, 0, 0, 0 FROM department as D WHERE (NOT EXISTS (SELECT * FROM instructor NATURAL JOIN department AS DI WHERE D.dept_name=DI.dept_name )); An important rule: The attributes which are listed in the GROUP BY clause must include those which occur in the SELECT and are not aggregated. • Will not work (even though it clearly should): SELECT dept_name , building , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUME02RIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor NATURAL JOIN department GROUP BY dept_name UNION ... Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 2 of 35

  3. Omission of GROUP BY — Review • If the GROUP BY clause is omitted, the aggregation is over the entire table. SELECT MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor ; • In this case, there must be no non-aggregated attributes in the SELECT clause. • Does not work: SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor ; • In the above case, dept name must appear in the GROUP BY clause. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 3 of 35

  4. The HAVING Clause — Review Query: For each department with at least two distinct instructors, find the minimum, maximum, average salary, as well as the number of employees. • The following does not work: SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor WHERE (COUNT (*) >= 2); GROUP BY dept_name; • The problem is that the WHERE clause is evaluated before the aggregation. • The solution is to use a HAVING clause, which is evaluated after the aggregation. SELECT dept_name , MIN(salary) AS min_sal , MAX(salary) AS max_sal , CAST (AVG(salary) as NUMERIC (8 ,2)) AS avg_sal , COUNT (*) AS n_emp FROM instructor GROUP BY dept_name HAVING (COUNT (*) >= 2); • The HAVING clause must come after the GROUP BY clause. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 4 of 35

  5. Embedded Queries in the HAVING Clause — Review Query: Find the department(s) with the greatest number of instructors. SELECT I1.dept_name , COUNT(I1 .*) FROM instructor as I1 GROUP BY dept_name HAVING NOT EXISTS (SELECT I2.dept_name , COUNT(I2 .*) FROM instructor AS I2 GROUP BY I2.dept_name HAVING (COUNT(I2 .*) > COUNT(I1 .*)) ); • It is also possible to do this with an embedded subquery in the FROM clause. SELECT I.dept_name , I.n_instr FROM (SELECT dept_name , COUNT (*) AS n_instr FROM instructor GROUP BY dept_name ) AS I WHERE I.n_instr >= ALL (SELECT n_instr FROM (SELECT dept_name , COUNT (*) AS n_instr FROM instructor GROUP BY dept_name ) AS Pointless ); • Note the alias Pointless which is required by SQL rules. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 5 of 35

  6. Window Functions I Review Query: For each building, find the average budget for all departments in that building. SELECT building , AVG(budget) AS avg_for_bldg FROM department GROUP BY building ORDER BY AVG(budget) DESC; • To provide this averaged information while still providing full information for each individual department, a window function is very useful. SELECT dept_name , building , budget , CAST ( AVG(budget) OVER (PARTITION BY building) AS NUMERIC (8 ,2) ) AS avg_for_bldg FROM department ORDER BY building , budget DESC ; dept_name | building | budget | avg_for_bldg -- ----------+----------+-----------+-------------- Music | Packard | 80000.00 | 80000.00 Finance | Painter | 120000.00 | 85000.00 History | Painter | 50000.00 | 85000.00 Comp. Sci. | Taylor | 100000.00 | 92500.00 Elec. Eng. | Taylor | 85000.00 | 92500.00 Biology | Watson | 90000.00 | 80000.00 Physics | Watson | 70000.00 | 80000.00 • Note that a GROUP BY clause is not required, despite the aggregation. Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 6 of 35

  7. Window Functions II • When a window function is used more than once, it is helpful to declare it by name. SELECT dept_name , building , budget , CAST ( AVG(budget) OVER (PARTITION BY building) AS NUMERIC (8 ,2) ) AS avg_for_bldg , CAST ( budget - AVG(budget) OVER (PARTITION BY building) AS NUMERIC (8 ,2) ) As deviation FROM department ; • The above query is represented more compactly as: SELECT dept_name , building , budget , CAST (AVG(budget) OVER W AS NUMERIC (8 ,2)) AS avg_for_bldg , CAST (budget - AVG(budget) OVER W AS NUMERIC (8 ,2)) AS deviation FROM department WINDOW W AS (PARTITION BY building ); dept_name | building | budget | avg_for_bldg | deviation -- ----------+----------+-----------+--------------+----------- Music | Packard | 80000.00 | 80000.00 | 0.00 Finance | Painter | 120000.00 | 85000.00 | 35000.00 History | Painter | 50000.00 | 85000.00 | -35000.00 Comp. Sci. | Taylor | 100000.00 | 92500.00 | 7500.00 Elec. Eng. | Taylor | 85000.00 | 92500.00 | -7500.00 Biology | Watson | 90000.00 | 80000.00 | 10000.00 Physics | Watson | 70000.00 | 80000.00 | -10000.00 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 7 of 35

  8. Window Functions III • Here is a more complex query which provides, for each instructor salary, its percentage of the average within the department. SELECT ID , name , dept_name , salary , CAST (( salary *100)/ AVG(salary) OVER (PARTITION BY dept_name) AS NUMERIC (5 ,2) ) AS pct_avg_in_dept FROM instructor NATURAL JOIN department ; id | name | dept_name | salary | pct_avg_in_dept -- -----+------------+------------+----------+----------------- 76766 | Crick | Biology | 72000.00 | 100.00 10101 | Srinivasan | Comp. Sci. | 65000.00 | 84.05 45565 | Katz | Comp. Sci. | 75000.00 | 96.98 83821 | Brandt | Comp. Sci. | 92000.00 | 118.97 98345 | Kim | Elec. Eng. | 80000.00 | 100.00 12121 | Wu | Finance | 90000.00 | 105.88 76543 | Singh | Finance | 80000.00 | 94.12 32343 | El Said | History | 60000.00 | 98.36 58583 | Califieri | History | 62000.00 | 101.64 15151 | Mozart | Music | 40000.00 | 100.00 33456 | Gold | Physics | 87000.00 | 95.60 22222 | Einstein | Physics | 95000.00 | 104.40 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 8 of 35

  9. Window Functions IV • Window functions are also very useful for computing running averages. • Suppose that the following simple table of years and sales amounts is given. CREATE TABLE Sales (year NUMERIC (4 ,0) NOT NULL , amount NUMERIC (8 ,2) NOT NULL , PRIMARY KEY (year) ); • The following query provides a moving average of sales for the current year and two preceding. SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS 2 PRECEDING) AS NUMERIC (8 ,2) ) AS mvg_avg_3_yr FROM Sales; year | amount | mvg_avg_3_yr year | amount | mvg_avg_3_yr -- ----+-----------+-------------- -- ----+-----------+-------------- 1990 | 52134.00 | 52134.00 2001 | 95000.00 | 93333.33 1991 | 66000.00 | 59067.00 2002 | 98000.00 | 96000.00 1992 | 75000.00 | 64378.00 2003 | 96000.00 | 96333.33 1993 | 65000.00 | 68666.67 2004 | 101000.00 | 98333.33 1994 | 70000.00 | 70000.00 2005 | 99000.00 | 98666.67 1995 | 80000.00 | 71666.67 2006 | 104000.00 | 101333.33 1996 | 85000.00 | 78333.33 2007 | 120000.00 | 107666.67 1997 | 82000.00 | 82333.33 2008 | 125000.00 | 116333.33 1998 | 88000.00 | 85000.00 2009 | 115000.00 | 120000.00 1999 | 90000.00 | 86666.67 2010 | 130000.00 | 123333.33 2000 | 95000.00 | 91000.00 2011 | 175000.00 | 140000.00 Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 9 of 35

  10. Window Functions V • Other options for the moving average are also possible. • All preceding years as well as the current: SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS UNBOUNDED PRECEDING) AS NUMERIC (8 ,2) ) AS ubd_prec_mvg_avg FROM Sales; • A three-year running average which includes the preceding as well as the following year: SELECT Year , Amount , CAST ( AVG(Amount) OVER (ORDER BY year ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS NUMERIC (8 ,2) ) AS moving_avg_1p_1f FROM Sales; Advanced SQL II — Advanced Aggregation and OLAP 20130417 Slide 10 of 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend