GROUP BY AND HAVING So far, weve applied aggregation operatorsto all - - PDF document

group by and having
SMART_READER_LITE
LIVE PREVIEW

GROUP BY AND HAVING So far, weve applied aggregation operatorsto all - - PDF document

M ATH IN SQL A GGREGATION O PERATORS Operators on sets of tuples. Significant extension of relational algebra. SUM ( [DISTINCT] A): the sum of all (unique) values in attribute A. AVG ( [DISTINCT] A): the average of all (unique)


slide-1
SLIDE 1

MATH IN SQL

2

AGGREGATION OPERATORS

 Operators on sets of tuples.  Significant extension of relational algebra.  SUM ( [DISTINCT] A): the sum of all (unique) values in

attribute A.

 AVG ( [DISTINCT] A): the average of all (unique) values

in attribute A.

SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10; SELECT AVG (S.age) FROM Sailors S;

3 3 3

AGGREGATION OPERATORS

 Operators on sets of tuples.  Significant extension of relational algebra.  MAX (A): the maximum value in attribute A.  MIN (A): the minimum value in attribute A. SELECT S.sname FROM Sailors S WHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2); SELECT MAX(rating) FROM Sailors;

slide-2
SLIDE 2

4

AGGREGATION OPERATORS

 Operators on sets of tuples.  Significant extension of relational algebra.  COUNT (*): the number of tuples.

SELECT COUNT (*) FROM Sailors S

5

AGGREGATION OPERATORS

 Operators on sets of tuples.  Significant extension of relational algebra.  COUNT ( [DISTINCT] A): the number of (unique) values

in attribute A. SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’;

6 6 6

AGGREGATION OPERATORS

 Find name and age of

the oldest sailor(s).

 The first query looks

correct, but is illegal.

 Thoughts as to why?  The second query is a

correct and legal solution.

SELECT S.sname, MAX (S.age) FROM Sailors S; SELECT S.sname, S.age FROM Sailors S WHERE S.age =

(SELECT MAX (S2.age)

FROM Sailors S2);

slide-3
SLIDE 3

7

GROUP BY AND HAVING

 So far, we’ve applied aggregation operatorsto all

(qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples.

 Find the age of the youngest sailor for each rating

value.

 Suppose we know that rating values go from 1 to 10;

we can write ten (!) queries that look like this:

 But in general, we don’t know how many rating values

exist, and what these rating values are.

 Plus, it’s a waste of time to write so many queries

SELECT MIN (S.age) FROM Sailors S WHERE S.rating= i; For i = 1, 2, ... , 10:

8

GROUP BY AND HAVING

 A group is a set of tuples that have the same

value for all attributes grouping-list.

 The target-list contains  attribute names  terms with aggregation operations.  Attribute list must be a subset of grouping-list.  Each answer tuple correspondsto a group, and

  • utput attributes must have a single value per

group.

SELECT [DISTINCT] target-list FROM

relation-list

WHERE

qualification

GROUP BY grouping-list HAVING

group-qualification Notice the notation

9 9 9

CONCEPTUAL EVALUATION

 Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 1  The cross-product of relation-list is computed  In this instance, it’s only Sailors

slide-4
SLIDE 4

10 10 10

CONCEPTUAL EVALUATION

 Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 2  Tuples that fail qualification are discarded  ‘unnecessary’ attributes are deleted

11 11 11

CONCEPTUAL EVALUATION

 Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 3  Remaining tuples are partitioned into

groups by the value of attributes in grouping-list

12 12 12

CONCEPTUAL EVALUATION

 Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 4  The group-qualification is then applied to

eliminate groups that do not satisfy this condition.

slide-5
SLIDE 5

13 13 13

CONCEPTUAL EVALUATION

 Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1  Step 5  One answer tuple is generated per qualifying group by

applying the aggregation operator.

14 14 14

GROUP BY AND HAVING

 Find the age of the youngest

sailor with age 18, for each rating with at least 2 such sailors.

SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1;

 Only S.rating and S.age are

mentioned in the SELECT,

GROUP BY or HAVING clauses;

  • ther attributes `unnecessary’.

 2nd column of result is

unnamed

What to do?

sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 71 zorba 10 16.0 64 horatio 7 35.0 29 brutus 1 33.0 58 rusty 10 35.0 rating age 1 33.0 7 45.0 7 35.0 8 55.5 10 35.0 rating 7 35.0 Answer relation

15 15 15

GROUP BY AND HAVING

 For each red boat, find the number of

reservations for this boat.

SELECT B.bid, COUNT (*) AS scount FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid;

 Grouping over a join of three relations.  What do we get if we remove B.color=‘red’

from the WHERE clause and add a HAVING clause with this condition?

 What if we drop Sailors and the condition

involving S.sid?

slide-6
SLIDE 6

16 16 16

GROUP BY AND HAVING

 Find the age of the youngest sailor with age > 18, for each

rating with at least 2 sailors (of any age).

SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating);

 Shows HAVINGclause can also contain a subquery.  What if HAVING clause is replaced by:

HAVINGCOUNT(*) >1

17 17 17

GROUP BY AND HAVING

 Find those ratings for which the average age is the minimum

  • ver all ratings.

 Aggregation operations cannot be nested!  WRONG:

SELECT S.rating FROM Sailors S WHERE S.age = (SELECT MIN (AVG (S2.age)) FROM Sailors S2);

 Correctsolution:

SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp);

ORDERING & TOP/BOTTOM

slide-7
SLIDE 7

19 19 19

ORDER BY

 The ORDER BY keyword is used to sort the

result-set by a specified column.

 The ORDER BY keyword sort the recordsin

ascending order by default.

 If you want to sort the records in a descending

  • rder, you can use the DESC keyword.

20 20 20

TOP/BOTTOM

 The TOP clause is used to specify the number of

records to return.

 The TOP clause can be very useful on large tables

with thousands of records

 Returning a large number of records can impact on

performance

 Can ‘sample’ the table using TOP  Not all database systems support the TOP clause

  • r implement it in different fashion

21 21 21

TOP/BOTTOM

SQL Server SELECT TOP number|percent column_name(s) FROM table_name Ex: SELECT TOP 5 * FROM Persons MySQL SELECT column_name(s) FROM table_name LIMIT number Ex: SELECT * FROM Persons LIMIT 5

slide-8
SLIDE 8

22 22 22

TOP/BOTTOM

Oracle SELECT column_name(s) FROM table_name WHERE ROWNUM <= number Ex: SELECT * FROM Persons WHERE ROWNUM <=5 DB2 SELECT column_name(s) FROM table_name FETCH FIRST number ROWS ONL Y Ex: SELECT * FROM Persons FETCH FIRST 5 ROWS ONL Y

23 23 23

TOP/BOTTOM

Can specify  Fixed number

SELECT TOP 10 * …

 A percent

SELECT TOP 10 PERCENT * … 24 24 24

TOP/BOTTOM

 How to return the oldest 5 rentals?  How to return the newest 5 rentals?

slide-9
SLIDE 9

25 25 25

TOP/BOTTOM

 How to return the 3rd newest rental?

26 26 26

SUMMARY

 SQL was an important factor in the early

acceptance of the relational model; more natural than earlier, procedural query languages.

 All queries that can be expressed in relational

algebra can also be formulated in SQL.

 In addition, SQL has significantly more

expressive power than relational algebra, in particular aggregation operationsand grouping.

 Many alternative ways to write a query; query

  • ptimizer looks for most efficient evaluation plan.

 In practice, users need to be aware of how queries

are optimized and evaluated for most efficient results.