More SQL January 30, 2020 Data Science CSCI 1951A Brown University - - PowerPoint PPT Presentation

more sql
SMART_READER_LITE
LIVE PREVIEW

More SQL January 30, 2020 Data Science CSCI 1951A Brown University - - PowerPoint PPT Presentation

More SQL January 30, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Announcements People with overrides-Friday EOD to use codes, then we will give then to someone


slide-1
SLIDE 1

More SQL

January 30, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter

1

slide-2
SLIDE 2

Announcements

  • People with overrides—-Friday EOD to use codes,

then we will give then to someone else. No new codes going out.

  • Project mixer—tomorrow at 4, 4th floor
  • Sign the collab policy!

2

slide-3
SLIDE 3

Outline

  • Last time: SQL for creating/manipulating data

tables

  • Today: SQL for querying databases
  • Keywords
  • NULLs
  • Execution Order, Nested Queries, Optimization

3

slide-4
SLIDE 4

Basic Template

SELECT <attribute list> FROM <table list> [ WHERE <condition> ] [ GROUP BY <attribute list> ] [ HAVING <condition> ] [ ORDER BY <attribute list> ];

4

slide-5
SLIDE 5

SELECT * FROM TWEET WHERE Text = “hey”

ID Time Text 389472 2019-01-01 12:34:56 hey 127890 2019-01-04 17:30:07 hey

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

SELECT <attribute list> FROM <table list> WHERE <condition>;

5

slide-6
SLIDE 6

SELECT ID, Time FROM TWEET WHERE Text = “hey”

ID Time 389472 2019-01-01 12:34:56 127890 2019-01-04 17:30:07

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

SELECT <attribute list> FROM <table list> WHERE <condition>;

6

slide-7
SLIDE 7

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

SELECT ID, Text FROM Tweet, Author WHERE ID = Tweet

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110

7

slide-8
SLIDE 8

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

SELECT ID, Person FROM Tweet, Author WHERE ID = Tweet

ID Time Text Person Tweet 389472 2019-01-01 12:34:56 hey s 389472 123794 2019-01-01 12:34:57 lol d 123794 596208 2019-01-02 3:14:15 :-D j 596208 782138 2019-01-04 15:04:57 1951A 4 lyfe d 782138 127890 2019-01-04 17:30:07 hey d 127890 173902 2019-01-05 3:34:18 i <3 1951A j 173902 893110 2019-01-06 12:21:53 i <3 1951A s 893110

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110

8

slide-9
SLIDE 9

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

SELECT ID, Person FROM Tweet, Author WHERE ID = Tweet “Join Condition”

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110 ID Time Text Person Tweet 389472 2019-01-01 12:34:56 hey s 389472 123794 2019-01-01 12:34:57 lol d 123794 596208 2019-01-02 3:14:15 :-D j 596208 782138 2019-01-04 15:04:57 1951A 4 lyfe d 782138 127890 2019-01-04 17:30:07 hey d 127890 173902 2019-01-05 3:34:18 i <3 1951A j 173902 893110 2019-01-06 12:21:53 i <3 1951A s 893110

9

slide-10
SLIDE 10

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110

SELECT ID, Person FROM Tweet, Author

ID Time Text Person Tweet 389472 2019-01-01 12:34:56 hey s 389472 389472 2019-01-01 12:34:56 hey d 123794 389472 2019-01-01 12:34:56 hey j 596208 389472 2019-01-01 12:34:56 hey d 782138 389472 2019-01-01 12:34:56 hey d 127890 389472 2019-01-01 12:34:56 hey j 173902 389472 2019-01-01 12:34:56 hey s 893110 123794 2019-01-01 12:34:57 lol s 389472 … … … … …

No condition -> cross product

10

slide-11
SLIDE 11

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110

SELECT ID, Person FROM Tweet, Author WHERE ID = Tweet

ID Time Text Person Tweet 389472 2019-01-01 12:34:56 hey s 389472 123794 2019-01-01 12:34:57 lol d 123794 596208 2019-01-02 3:14:15 :-D j 596208 782138 2019-01-04 15:04:57 1951A 4 lyfe d 782138 127890 2019-01-04 17:30:07 hey d 127890 173902 2019-01-05 3:34:18 i <3 1951A j 173902 893110 2019-01-06 12:21:53 i <3 1951A s 893110

*actually, even with join condition, will do cross product first

11

slide-12
SLIDE 12

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

aliasing (to avoid ambiguity) SELECT ID, Text FROM Tweet AS t, Author AS a WHERE t.ID = a.Tweet

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110

12

slide-13
SLIDE 13

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

aliasing (to avoid ambiguity) SELECT ID, Text FROM Tweet AS t, Author AS a WHERE t.ID = a.ID

AUTHOR

Person ID s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110

13

slide-14
SLIDE 14

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

SELECT ID, Text FROM Tweet AS t, Author AS a WHERE t.ID = a.Tweet AND a.Person = “d”

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110

14

slide-15
SLIDE 15

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A ID Time Text Person Tweet 389472 2019-01-01 12:34:56 hey s 389472 123794 2019-01-01 12:34:57 lol d 123794 596208 2019-01-02 3:14:15 :-D j 596208 782138 2019-01-04 15:04:57 1951A 4 lyfe d 782138 127890 2019-01-04 17:30:07 hey d 127890 173902 2019-01-05 3:34:18 i <3 1951A j 173902 893110 2019-01-06 12:21:53 i <3 1951A s 893110

SELECT ID, Text FROM Tweet AS t, Author AS a WHERE t.ID = a.Tweet AND a.Person = “d”

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110

15

slide-16
SLIDE 16

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110 ID Time Text Person Tweet 123794 2019-01-01 12:34:57 lol d 123794 782138 2019-01-04 15:04:57 1951A 4 lyfe d 782138 127890 2019-01-04 17:30:07 hey d 127890

SELECT ID, Text FROM Tweet AS t, Author AS a WHERE t.ID = a.Tweet AND a.Person = “d”

16

slide-17
SLIDE 17

TWEET

ID Time Text 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 596208 2019-01-02 3:14:15 :-D 782138 2019-01-04 15:04:57 1951A 4 lyfe 127890 2019-01-04 17:30:07 hey 173902 2019-01-05 3:34:18 i <3 1951A 893110 2019-01-06 12:21:53 i <3 1951A

AUTHOR

Person Tweet s 389472 d 123794 j 596208 d 782138 d 127890 j 173902 s 893110 ID Text 123794 lol 782138 1951A 4 lyfe 127890 hey

SELECT ID, Text FROM Tweet AS t, Author AS a WHERE t.ID = a.Tweet AND a.Person = “d”

17

slide-18
SLIDE 18

Clicker Question!

18

slide-19
SLIDE 19

PERSON

Handle Name s Sol d Diane j Josh

RETWEET

Person Tweet s 1 s 2 d 1

Find names of people who have retweeted.

Clicker Question!

19

slide-20
SLIDE 20
  • Pst. Hint —>

Person Sol Sol Diane

Clicker Question!

PERSON

Handle Name s Sol d Diane j Josh

RETWEET

Person Tweet s 1 s 2 d 1

20

slide-21
SLIDE 21

SELECT Name FROM PERSON AS p, RETWEET AS r WHERE r.Person = p.Name

Clicker Question!

PERSON

Handle Name s Sol d Diane j Josh

RETWEET

Person Tweet s 1 s 2 d 1 SELECT * FROM PERSON AS p, RETWEET AS r WHERE r.Person = p.Handle SELECT Name FROM PERSON AS p, RETWEET AS r WHERE r.Person = p.Handle

(a) (b) (c)

21

slide-22
SLIDE 22

Clicker Question!

PERSON

Handle Name s Sol d Diane j Josh

RETWEET

Person Tweet s 1 s 2 d 1 SELECT * FROM PERSON AS p, RETWEET AS r WHERE r.Person = p.Handle SELECT Name FROM PERSON AS p, RETWEET AS r WHERE r.Person = p.Handle

(a) (b) (c)

SELECT Name FROM PERSON AS p, RETWEET AS r WHERE r.Person = p.Name

22

slide-23
SLIDE 23

JOINS

SELECT ID, Text FROM TWEET, AUTHOR WHERE ID = Tweet AND Person = “d”

23

slide-24
SLIDE 24

JOINS

SELECT ID, Text FROM TWEET, AUTHOR WHERE ID = Tweet AND Person = “d” SELECT ID, Text FROM (TWEET JOIN AUTHOR ON ID = Tweet) WHERE Person = “d” =

24

slide-25
SLIDE 25

JOINS

SELECT ID, Text FROM TWEET, AUTHOR WHERE ID = Tweet AND Person = “d” SELECT ID, Text FROM (TWEET JOIN AUTHOR ON ID = Tweet) WHERE Person = “d” =

25

slide-26
SLIDE 26

JOINS

SELECT ID, Text FROM (TWEET JOIN AUTHOR ON ID = Tweet)

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109 Person Tweet ID Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A d 672109 782138 1951A 4 lyfe

26

slide-27
SLIDE 27

JOINS

SELECT ID, Text FROM (TWEET JOIN AUTHOR ON ID = Tweet)

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109 Person Tweet ID Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A d 672109 782138 1951A 4 lyfe

27

slide-28
SLIDE 28

JOINS

SELECT ID, Text FROM (TWEET JOIN AUTHOR ON ID = Tweet)

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109 Person Tweet ID Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A d 672109 782138 1951A 4 lyfe

28

slide-29
SLIDE 29

JOINS

SELECT ID, Text FROM (TWEET JOIN AUTHOR ON ID = Tweet)

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

Inner Join

Person Tweet ID Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A d 672109 782138 1951A 4 lyfe

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

29

slide-30
SLIDE 30

JOINS

SELECT ID, Text FROM (TWEET LEFT OUTER JOIN AUTHOR ON ID = Tweet)

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

Left Outer Join

Person Tweet ID Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A d 672109 NULL NULL 782138 1951A 4 lyfe

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

30

slide-31
SLIDE 31

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

Right Outer Join

Person Tweet ID Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A d 672109

NULL NULL

782138 1951A 4 lyfe

SELECT ID, Text FROM (TWEET RIGHT OUTER JOIN AUTHOR ON ID = Tweet)

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

31

slide-32
SLIDE 32

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

Full Outer Join

Person Tweet ID Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A d 672109

NULL NULL

NULL NULL 782138 1951A 4 lyfe

SELECT ID, Text FROM (TWEET FULL OUTER JOIN AUTHOR ON ID = Tweet)

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

32

slide-33
SLIDE 33

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

Natural Join

SELECT ID, Text FROM (TWEET JOIN AUTHOR)

assumes condition is ALL PAIRS of attributes with matching names

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

33

slide-34
SLIDE 34

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

Natural Join

SELECT ID, Text FROM (TWEET JOIN AUTHOR)

if no matches, forms full cross product

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

34

slide-35
SLIDE 35

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

Natural Join

SELECT ID, Text FROM (TWEET AS t(tweetid, text) JOIN AUTHOR AS a(person, tweetid)

Person tweetid tweetid Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A d 672109

NULL NULL

NULL NULL 782138 1951A 4 lyfe

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

35

slide-36
SLIDE 36

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

Natural (Inner) Join

SELECT ID, Text FROM (TWEET AS t(tweetid, text) JOIN AUTHOR AS a(person, tweetid)

Person tweetid tweetid Text s 389472 389472 hey j 596208 596208 :-D j 173902 173902 i <3 1951A s 893110 893110 i <3 1951A

36

slide-37
SLIDE 37

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

Natural (Inner) Join

SELECT ID, Text FROM (TWEET AS t(tweetid, text) JOIN AUTHOR AS a(person, tweetid)

Person tweetid Text s 389472 hey j 596208 :-D j 173902 i <3 1951A s 893110 i <3 1951A

37

slide-38
SLIDE 38

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

Natural (Inner) Join

SELECT ID, Text FROM (TWEET AS t(tweetid, foo) JOIN AUTHOR AS a(foo, tweetid)

foo tweetid foo s 389472 hey j 596208 :-D j 173902 i <3 1951A s 893110 i <3 1951A

38

slide-39
SLIDE 39

JOINS

TWEET

ID Text 389472 hey 596208 :-D 782138 1951A 4 lyfe 173902 i <3 1951A 893110 i <3 1951A

AUTHOR

Person Tweet s 389472 j 596208 j 173902 s 893110 d 672109

Natural (Inner) Join

SELECT ID, Text FROM (TWEET AS t(tweetid, foo) JOIN AUTHOR AS a(foo, tweetid)

foo tweetid

39

slide-40
SLIDE 40

Clicker Question! (x2)

40

slide-41
SLIDE 41

Clicker Question!

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 32 A 2 1951A A 6 32 A

SELECT Name, Course FROM (STUDENT LEFT OUTER JOIN GRADES ON ID = Student)

Name Course Diane 32 Sol 1951A NULL 32 Name Course Diane 32 Sol J 1951A Josh NULL Karlly NULL Mounika NULL

(a) (b)

41

slide-42
SLIDE 42

Clicker Question!

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 32 A 2 1951A A 6 32 A

SELECT Name, Course FROM (STUDENT LEFT OUTER JOIN GRADES ON ID = Student)

Name Course Diane 32 Sol 1951A NULL 32 Name Course Diane 32 Sol J 1951A Josh NULL Karlly NULL Mounika NULL

(a) (b)

42

slide-43
SLIDE 43

Clicker Question!

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 32 A 2 1951A A 6 32 A

SELECT Name, Course FROM (STUDENT RIGHT OUTER JOIN GRADES ON ID = Student)

Name Course Diane 32 Sol J 1951A NULL 32

(a)

SELECT Name, Course FROM (STUDENT NATURAL JOIN GRADES ON ID = Student)

(b)

Target ->

43

slide-44
SLIDE 44

Clicker Question!

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 32 A 2 1951A A 6 32 A

SELECT Name, Course FROM (STUDENT RIGHT OUTER JOIN GRADES ON ID = Student)

Name Course Diane 32 Sol J 1951A NULL 32

(a)

SELECT Name, Course FROM (STUDENT NATURAL JOIN GRADES ON ID = Student)

(b)

Target ->

SELECT Name, Course FROM (STUDENT RIGHT OUTER JOIN GRADES ON ID = Student)

44

slide-45
SLIDE 45

And now…a laundry list

  • f keywords…

45

slide-46
SLIDE 46

ORDER BY

TWEET

ID Time Text 782138 2019-01-04 15:04:57 1951A 4 lyfe 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 127890 2019-01-04 17:30:07 hey 893110 2019-01-06 12:21:53 i <3 1951A 596208 2019-01-02 3:14:15 :-D 173902 2019-01-05 3:34:18 i <3 1951A

SELECT Text FROM Tweet ORDER BY Time

Text hey lol :-D 1951A 4 lyfe hey i <3 1951A i <3 1951A

46

slide-47
SLIDE 47

ORDER BY

TWEET

ID Time Text 782138 2019-01-04 15:04:57 1951A 4 lyfe 389472 2019-01-01 12:34:56 hey 123794 2019-01-01 12:34:57 lol 127890 2019-01-04 17:30:07 hey 893110 2019-01-06 12:21:53 i <3 1951A 596208 2019-01-02 3:14:15 :-D 173902 2019-01-05 3:34:18 i <3 1951A

SELECT Text FROM Tweet ORDER BY ID

Text lol hey i <3 1951A hey :-D 1951A 4 lyfe i <3 1951A

47

slide-48
SLIDE 48

GROUP BY

TWEET

ID Likes Text 782138 1,000 1951A 4 lyfe 389472 10 hey 123794 100 lol 127890 hey 893110 8,000,000 i <3 1951A 596208 1 :-D 173902 1,000,000,000 i <3 1951A

SELECT Text, Count(*), AVG(Likes) FROM Tweet GROUP BY Text

Text Count(*) AVG(Likes) lol 1 100 hey 2 5 i <3 1951A 2 504,000,000 :-D 1 1 1951A 4 lyfe 1 1,000

48

slide-49
SLIDE 49

GROUP BY

TWEET

ID Likes Text 782138 1,000 1951A 4 lyfe 389472 10 hey 123794 100 lol 127890 hey 893110 8,000,000 i <3 1951A 596208 1 :-D 173902 1,000,000,000 i <3 1951A

SELECT Text, Count(*), AVG(Likes) FROM Tweet GROUP BY Text

Text Count(*) AVG(Likes) lol 1 100 hey 2 5 i <3 1951A 2 504,000,000 :-D 1 1 1951A 4 lyfe 1 1,000

SUM, MIN, MAX, COUNT, AVG

49

slide-50
SLIDE 50

HAVING

TWEET

ID Likes Text 782138 1,000 1951A 4 lyfe 389472 10 hey 123794 100 lol 127890 hey 893110 8,000,000 i <3 1951A 596208 1 :-D 173902 1,000,000,000 i <3 1951A

SELECT Text, Count(*), AVG(Likes) FROM Tweet GROUP BY Text HAVING COUNT(*) > 1

Text Count(*) AVG(Likes) hey 2 5 i <3 1951A 2 504,000,000

50

slide-51
SLIDE 51

HAVING

TWEET

ID Likes Text 782138 1,000 1951A 4 lyfe 389472 10 hey 123794 100 lol 127890 hey 893110 8,000,000 i <3 1951A 596208 1 :-D 173902 1,000,000,000 i <3 1951A

SELECT Text, Count(*), AVG(Likes) FROM Tweet GROUP BY Text HAVING COUNT(*) > 1

Text Count(*) AVG(Likes) hey 2 5 i <3 1951A 2 504,000,000

similar behavior to “WHERE”, but only used with aggregations/GROUP BYs

51

slide-52
SLIDE 52

LIKE

TWEET

ID Likes Text 782138 1,000 1951A 4 lyfe 389472 10 hey 123794 100 lol 127890 hey 893110 8,000,000 i <3 1951A 596208 1 :-D 173902 1,000,000,000 i <3 1951A

SELECT Text, Count(*), AVG(Likes) FROM Tweet WHERE Text LIKE ‘%1951A%’ GROUP BY Text

Text Count(*) AVG(Likes) 1951A 4 lyfe 1 1,000 i <3 1951A 2 504,000,000

52

slide-53
SLIDE 53

IN

SELECT Name FROM STUDENT WHERE ID IN (SELECT Student FROM GRADES WHERE Course = 1951A )

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 32 A 2 1951A A 6 32 A

Find names of students in 1951A

53

slide-54
SLIDE 54

IN

SELECT Name FROM STUDENT WHERE ID IN (SELECT Student FROM GRADES WHERE Course = 1951A )

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 32 A 2 1951A A 6 32 A

Find names of students in 1951A “Subquery” (More later, get excited)

54

slide-55
SLIDE 55

IN

SELECT Name FROM STUDENT WHERE ID IN (SELECT Student FROM GRADES WHERE Course = 1951A )

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 32 A 2 1951A A 6 32 A

Find names of students in 1951A Returns “bag”

  • f student IDs

55

slide-56
SLIDE 56

IN

SELECT Name FROM STUDENT WHERE ID IN (SELECT Student FROM GRADES WHERE Course = 1951A )

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 32 A 2 1951A A 6 32 A

Find names of students in 1951A Returns True if ID is in that bag

56

slide-57
SLIDE 57

SELECT Grade FROM GRADES WHERE Course = “1951A” AND Grade >= ALL (SELECT Grade FROM GRADES WHERE Course = 1951A )

ALL/ANY

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

What is the highest grade in 1951A?

57

slide-58
SLIDE 58

SELECT Grade FROM GRADES WHERE Course = “1951A” AND Grade >= ALL (SELECT Grade FROM GRADES WHERE Course = 1951A )

ALL/ANY

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

Returns True if condition holds for all tuples in bag

What is the highest grade in 1951A?

58

slide-59
SLIDE 59

SELECT Grade FROM GRADES WHERE Course = “1951A” AND Grade > ANY (SELECT Grade FROM GRADES WHERE Course = 1951A )

ALL/ANY

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

???

59

slide-60
SLIDE 60

SELECT Grade FROM GRADES WHERE Course = “1951A” AND Grade > ANY (SELECT Grade FROM GRADES WHERE Course = 1951A )

ALL/ANY

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

Return all grades except the lowest one.

60

slide-61
SLIDE 61

SELECT Grade FROM GRADES WHERE Course = “1951A” AND Grade > NOT ANY (SELECT Grade FROM GRADES WHERE Course = 1951A )

ALL/ANY

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

Return the lowest grade.

61

slide-62
SLIDE 62

SELECT Grade FROM GRADES WHERE Course = “1951A” AND Grade >= ALL (SELECT Grade FROM GRADES WHERE Course = 1951A )

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

ALL/ANY

Grade 3.5 3.5

62

slide-63
SLIDE 63

SELECT DISTINCT Grade FROM GRADES WHERE Course = “1951A” AND Grade >= ALL (SELECT Grade FROM GRADES WHERE Course = 1951A )

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

DISTINCT

Grade 3.5

63

slide-64
SLIDE 64

SELECT DISTINCT Grade FROM GRADES WHERE Course = “1951A” AND Grade >= ALL (SELECT Grade FROM GRADES WHERE Course = 1951A )

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

DISTINCT

Grade 3.5

Set operations (Union, Intersection, etc.) remove duplicates by default.

64

slide-65
SLIDE 65

SELECT NAME FROM STUDENT s WHERE NOT EXISTS (SELECT * FROM GRADES WHERE Course = 1951A AND Student = s.ID )

EXISTS

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

???

65

slide-66
SLIDE 66

SELECT NAME FROM STUDENT s WHERE NOT EXISTS (SELECT * FROM GRADES WHERE Course = 1951A AND Student = s.ID )

EXISTS

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

???

True as long as bag is not empty

66

slide-67
SLIDE 67

SELECT NAME FROM STUDENT s WHERE NOT EXISTS (SELECT * FROM GRADES WHERE Course = 1951A AND Student = s.ID )

EXISTS

STUDENT

ID Name 1 Diane 2 Sol J 3 Josh 4 Karlly 5 Mounika

GRADES

Student Course Grade 1 1951A 3.5 2 1951A 3.5 6 1951A 2.8

Students who are not in 1951A

67

slide-68
SLIDE 68

68

slide-69
SLIDE 69

Outline

  • Last time: SQL for creating/manipulating data

tables

  • Today: SQL for querying databases
  • Keywords
  • NULLs
  • Execution Order, Nested Queries, Optimization

69

slide-70
SLIDE 70

NULL!

  • Black hole! NULL is NULL is NULL and there is no coming back from it…
  • If an operand is NULL, the result is NULL:
  • NULL + 1 = NULL
  • NULL * 0 = NULL
  • Comparisons: All comparisons that involve a null value, evaluate to unknown
  • NULL = NULL -> Unknown
  • NULL != NULL -> Unknown
  • NULL < 13 -> Unknown
  • NULL > NULL -> Unknown

70

slide-71
SLIDE 71

NULL!

p q p OR q p AND q p = q TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

71

slide-72
SLIDE 72

NULL!

p q p OR q p AND q p = q TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE UNK TRUE UNK UNK FALSE UNK UNK FALSE UNK UNK TRUE TRUE UNK UNK UNK FALSE UNK FALSE UNK UNK UNK UNK UNK UNK

72

slide-73
SLIDE 73

NULL!

SELECT COUNT(*) FROM TWEET WHERE Likes > 10 WHERE: Only tuples which evaluate to true are part of the query result. (I.e. unknown and false treated equivalently.)

TWEET

ID Text Likes 389472 NULL 100 123794 NULL 3 596208 :-D NULL 782138 1951A 4 lyfe NULL 173902 i <3 1951A 19 893110 i <3 1951A 7539 Count(*) 3

73

slide-74
SLIDE 74

NULL!

SELECT Text, COUNT(*) FROM TWEET GROUP BY Text GROUP BY: If NULL exists, then there is a group for NULL.

TWEET

ID Text Likes 389472 NULL 100 123794 NULL 3 596208 :-D NULL 782138 1951A 4 lyfe NULL 173902 i <3 1951A 19 893110 i <3 1951A 7539 Text Count(*) NULL 2 :-D 1 1951A 4 lyfe 1 i <3 1951A 2

74

slide-75
SLIDE 75

NULL!

SELECT Text ID FROM TWEET WHERE Text = NULL For predicates with NULL, use IS (as opposed to “=“)

TWEET

ID Text Likes 389472 NULL 100 123794 NULL 3 596208 :-D NULL 782138 1951A 4 lyfe NULL 173902 i <3 1951A 19 893110 i <3 1951A 7539 ID

75

slide-76
SLIDE 76

NULL!

SELECT Text ID FROM TWEET WHERE Text IS NULL

TWEET

ID Text Likes 389472 NULL 100 123794 NULL 3 596208 :-D NULL 782138 1951A 4 lyfe NULL 173902 i <3 1951A 19 893110 i <3 1951A 7539 ID 389472 123794

For predicates with NULL, use IS (as opposed to “=“)

76

slide-77
SLIDE 77

NULL!

  • count(att): NULL is ignored
  • sum(att): NULL is ignored
  • avg(att): results from SUM and COUNT
  • min(att) and max(att): NULL is ignored
  • Exception! If NULL is the only value in the column,

then sum/avg/min/max all return “NULL”

77

slide-78
SLIDE 78

Clicker Question! (x2)

78

slide-79
SLIDE 79

Clicker Question! (a) (b)

What will be the result of this query?

SELECT COUNT(*) FROM TWEET

Count(*) 100

SELECT COUNT(*) FROM TWEET WHERE Text = “:)”

Count(*) 15

(c) I…don’t…know…

SELECT COUNT(*) FROM TWEET WHERE Text != “:)”

Count(*) 100 Count(*) 85

79

slide-80
SLIDE 80

Clicker Question! (a) (b)

What will be the result of this query?

SELECT COUNT(*) FROM TWEET

Count(*) 100

SELECT COUNT(*) FROM TWEET WHERE Text = “:)”

Count(*) 15

(c) I…don’t…know…

SELECT COUNT(*) FROM TWEET WHERE Text != “:)”

Count(*) 100 Count(*) 85

Can’ t say how many are NULL

80

slide-81
SLIDE 81

Clicker Question!

What will be the result of the below query?

SELECT COUNT(*) FROM RUNNERS WHERE ID NOT IN SELECT(Winner_ID FROM RACES)

RUNNERS

ID Name 1 Diane 2 Sol 3 Josh 4 Jon

RACES

Event_ID Event Winner_ID 1 PVD Marathon 2 2 PVD Half 3 3 PVD 2 yard jump 2 4 Race4NULL:
 Raising Awareness NULL

(a) (b)

Count(*) Count(*) 1

(c)

Count(*) 2

81

slide-82
SLIDE 82

Clicker Question!

What will be the result of the below query?

SELECT COUNT(*) FROM RUNNERS WHERE ID NOT IN SELECT(Winner_ID FROM RACES)

RUNNERS

ID Name 1 Diane 2 Sol 3 Josh 4 Jon

(a) (b)

Count(*) Count(*) 1

(c)

Count(*) 2 ID NOT IN (2,3,NULL) is the same as ID!=2 AND ID!=3 and ID!=NULL

RACES

Event_ID Event Winner_ID 1 PVD Marathon 2 2 PVD Half 3 3 PVD 2 yard jump 2 4 Race4NULL:
 Raising Awareness NULL

82

slide-83
SLIDE 83

83

slide-84
SLIDE 84

Outline

  • Last time: SQL for creating/manipulating data

tables

  • Today: SQL for querying databases
  • Keywords
  • NULLs
  • Execution Order, Nested Queries, Optimization

✔ ✔

84

slide-85
SLIDE 85

ID Text 389472 hey

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL Execution Tree SELECT ID, Text FROM TWEET WHERE Text = “hey”

TWEET WHERE SELECT

Execution Order

85

slide-86
SLIDE 86

ID Text 389472 hey

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL Execution Tree SELECT ID, Text FROM TWEET WHERE Text = “hey”

FROM WHERE SELECT

Execution Order

86

slide-87
SLIDE 87

ID Text 389472 hey

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL Execution Tree SELECT ID, Text FROM TWEET WHERE Text = “hey”

FROM WHERE SELECT

Execution Order

87

slide-88
SLIDE 88

ID Text 389472 hey

TWEET

ID Time Text 389472 12:34:5 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL Execution Tree SELECT ID, Text FROM TWEET WHERE Text = “hey”

FROM WHERE SELECT

Execution Order

88

slide-89
SLIDE 89

SQL

FROM SELECT WHERE

ID Text 389472 hey

Execution Tree

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SELECT ID, Text FROM TWEET WHERE Text = “hey”

Execution Order

A query can have multiple “equivalent” trees

89

slide-90
SLIDE 90

SQL

FROM SELECT WHERE

ID Text 389472 hey

Execution Tree

TWEET

ID Time Text 389472 12:34:5 hey 123794 12:34:5 7 lol 596208 3:14:15 :-D 782138 15:04:5 7 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:5 i <3 1951A

SELECT ID, Text FROM TWEET WHERE Text = “hey”

Execution Order

A query can have multiple “equivalent” trees

90

slide-91
SLIDE 91

SQL

FROM SELECT WHERE

ID Text 389472 hey

Execution Tree

TWEET

ID Time Text 389472 12:34:5 hey 123794 12:34:5 lol 596208 3:14:15 :-D 782138 15:04:5 7 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:5 i <3 1951A

SELECT ID, Text FROM TWEET WHERE Text = “hey”

Execution Order

A query can have multiple “equivalent” trees

91

slide-92
SLIDE 92

Clicker Question!

92

slide-93
SLIDE 93

Clicker Question! (a) (b)

WHERE(SELECT(FROM)) SELECT(WHERE(FROM))

Which is better? (c) I…don’t…know…it depends

93

slide-94
SLIDE 94

Clicker Question! (a) (b)

WHERE(SELECT(FROM)) SELECT(WHERE(FROM))

Which is better? (c) I…don’t…know…it depends

94

slide-95
SLIDE 95

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL SELECT ID, Text FROM TWEET WHERE Text = “hey”

Execution Order

WHERE(SELECT(FROM)) SELECT(WHERE(FROM)) Execution Tree

95

slide-96
SLIDE 96

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL SELECT ID, Time FROM TWEET WHERE Text = “hey” WHERE(SELECT(FROM)) SELECT(WHERE(FROM)) Execution Tree

Execution Order

96

slide-97
SLIDE 97

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL

Execution Order

WHERE(SELECT(FROM)) SELECT(WHERE(FROM)) Execution Tree SELECT ID, Time FROM TWEET WHERE Text = “hey”

97

slide-98
SLIDE 98

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL SELECT ID, Time FROM TWEET WHERE Text = “hey”

Execution Order

WHERE(SELECT(FROM)) SELECT(WHERE(FROM)) Execution Tree

98

slide-99
SLIDE 99

ID Time 389472 12:34:56 123794 12:34:57 596208 3:14:15 782138 15:04:57 173902 3:34:18 893110 12:21:53

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL SELECT ID, Time FROM TWEET WHERE Text = “hey”

Execution Order

WHERE(SELECT(FROM)) SELECT(WHERE(FROM)) Execution Tree

99

slide-100
SLIDE 100

Execution Order

ID Time 389472 12:34:56 123794 12:34:57 596208 3:14:15 782138 15:04:57 173902 3:34:18 893110 12:21:53

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL SELECT ID, Time FROM TWEET WHERE Text = “hey” WHERE(SELECT(FROM)) SELECT(WHERE(FROM)) Execution Tree

100

slide-101
SLIDE 101

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL SELECT ID, Time FROM TWEET WHERE Text = “hey”

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

Execution Order

WHERE(SELECT(FROM)) Execution Tree SELECT(WHERE(FROM))

101

slide-102
SLIDE 102

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL SELECT ID, Time FROM TWEET WHERE Text = “hey”

ID Time Text 389472 12:34:56 hey

Execution Order

WHERE(SELECT(FROM)) Execution Tree SELECT(WHERE(FROM))

102

slide-103
SLIDE 103

TWEET

ID Time Text 389472 12:34:56 hey 123794 12:34:57 lol 596208 3:14:15 :-D 782138 15:04:57 1951A 4 lyfe 173902 3:34:18 i <3 1951A 893110 12:21:53 i <3 1951A

SQL SELECT ID, Time FROM TWEET WHERE Text = “hey”

ID Time 389472 12:34:56

Execution Order

WHERE(SELECT(FROM)) Execution Tree SELECT(WHERE(FROM))

103

slide-104
SLIDE 104

Execution Order

SELECT A1…An FROM R1…Rk WHERE P “Canonical Execution Order”

FROM WHERE SELECT

SELECT(WHERE(FROM))

104

slide-105
SLIDE 105

Clicker Question!

105

slide-106
SLIDE 106

Clicker Question! How much memory do I need?

R1 R2 Rk … WHERE SELECT × × ×

say each R has O(m) tuples

SELECT A1…An FROM R1…Rk WHERE P

O(mk) O(m x k) O(m + k) O(mk-n) (a) (b) (c) (d)

106

slide-107
SLIDE 107

Clicker Question! How much memory do I need?

R1 R2 Rk … WHERE SELECT × × ×

say each R has O(m) tuples

SELECT A1…An FROM R1…Rk WHERE P

O(mk) O(m x k) O(m + k) O(mk-n) (a) (b) (c) (d)

107

slide-108
SLIDE 108

Clicker Question! How much memory do I need?

R1 R2 Rk … WHERE SELECT × × ×

say each R has O(m) tuples

SELECT A1…An FROM R1…Rk WHERE P

O(mk) O(m x k) O(m + k) O(mk-n) (a) (b) (c) (d)

mxm

108

slide-109
SLIDE 109

Clicker Question! How much memory do I need?

R1 R2 Rk … WHERE SELECT × × ×

say each R has O(m) tuples

SELECT A1…An FROM R1…Rk WHERE P

O(mk) O(m x k) O(m + k) O(mk-n) (a) (b) (c) (d)

mxm (mxm)xm

109

slide-110
SLIDE 110

Clicker Question! How much memory do I need?

R1 R2 Rk … WHERE SELECT × × ×

say each R has O(m) tuples

SELECT A1…An FROM R1…Rk WHERE P

O(mk) O(m x k) O(m + k) O(mk-n) (a) (b) (c) (d)

mxm (mxm)xm mk

110

slide-111
SLIDE 111

Clicker Question! How much memory do I need?

R1 R2 Rk … WHERE SELECT × × ×

say each R has O(m) tuples

SELECT A1…An FROM R1…Rk WHERE P

O(mk) O(m x k) O(m + k) O(mk-n) (a) (b) (c) (d)

mxm (mxm)xm mk

m = 1000, k = 3 —> 1 billion tuples

111

slide-112
SLIDE 112

Execution Order

TWEET AUTHOR WHERE (A.TWEET = T.ID)⋀(T.Date=“1/1/19”) ⋀(A.Person =“BarakckObama”) SELECT TWEET.Time

×

SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama”

“Canonical Execution Order” (FROM WHERE SELECT)

112

slide-113
SLIDE 113

Execution Order

6,000 /second = 500M/day = Billions and billions

http://www.internetlivestats.com/twitter-statistics/ https://www.omnicoreagency.com/twitter-statistics/

SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama” TWEET AUTHOR WHERE (A.TWEET = T.ID)⋀(T.Date=“1/1/19”) ⋀(A.Person =“BarakckObama”) SELECT TWEET.Time

×

113

slide-114
SLIDE 114

TWEET AUTHOR WHERE (A.TWEET = T.ID)⋀(T.Date=“1/1/19”) ⋀(A.Person =“BarakckObama”) SELECT TWEET.Time

×

Execution Order

6,000 /second = 500M/day = Billions and billions

http://www.internetlivestats.com/twitter-statistics/ https://www.omnicoreagency.com/twitter-statistics/

100s of millions SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama”

114

slide-115
SLIDE 115

Execution Order

O(really ****ing big)

SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama” TWEET AUTHOR WHERE (A.TWEET = T.ID)⋀(T.Date=“1/1/19”) ⋀(A.Person =“BarakckObama”) SELECT TWEET.Time

×

115

slide-116
SLIDE 116

Execution Order

SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama” TWEET AUTHOR WHERE (A.TWEET = T.ID)⋀(T.Date=“1/1/19”) ⋀(A.Person =“BarakckObama”) SELECT TWEET.Time

×

O(kind of tiny)

116

slide-117
SLIDE 117

Execution Order

SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama” TWEET AUTHOR WHERE (A.TWEET = T.ID)⋀(T.Date=“1/1/19”) ⋀(A.Person =“BarakckObama”) SELECT TWEET.Time

×

O(kind of tiny)

Thoughts??

117

slide-118
SLIDE 118

Execution Order

SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama” TWEET AUTHOR WHERE (A.TWEET = T.ID)⋀(T.Date=“1/1/19”) ⋀(A.Person =“BarakckObama”) SELECT TWEET.Time

×

Refactor!

118

slide-119
SLIDE 119

Execution Order

SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama” TWEET WHERE (A.TWEET = T.ID)⋀(A.Person =“BarakckObama”) SELECT TWEET.Time

×

WHERE(T.Date=“1/1/19”) AUTHOR

119

slide-120
SLIDE 120

Execution Order

SELECT TWEET.Time FROM TWEET, AUTHOR WHERE AUTHOR.TWEET = TWEET.ID and TWEET.Date == ’01/01/2019‘ and AUTHOR.Person = “BarackObama” TWEET WHERE (A.TWEET = T.ID) SELECT TWEET.Time

×

WHERE(T.Date=“1/1/19”) AUTHOR WHERE(A.Person=“BarakckObama”))

120

slide-121
SLIDE 121

Clicker Question!

121

slide-122
SLIDE 122

Clicker Question! (Demand?) Optimize this.

SELECT Grade FROM STUDENT, GRADES WHERE STUDENT.ID = GRADES.Student and GRADES.Course == ’1951A‘ and STUDENT.Year < GRADES.Tgt_Yr

STUDENT GRADES

WHERE (ID = Student) ⋀(Course = 1951A) ⋀(Year < Tgt_Yr) SELECT Grade

×

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Student Course Grade Tgt_Yr 1 32 A 1 2 1951A A 3 6 32 A 1

Find grades of students taking 1951A ahead of schedule

122

slide-123
SLIDE 123

STUDENT GRADES WHERE (ID = Student) SELECT Grade

×

WHERE (Year < Tgt_Yr) WHERE (Course = 1951A) STUDENT GRADES WHERE (ID = Student) ⋀(Year < Tgt_Yr) SELECT Grade

×

WHERE (Course = 1951A)

(a) (b) (c)

STUDENT GRADES WHERE (Year < Tgt_Yr) SELECT Grade

×

WHERE (ID = Student) WHERE (Course = 1951A)

123

slide-124
SLIDE 124

STUDENT GRADES WHERE (ID = Student) SELECT Grade

×

WHERE (Year < Tgt_Yr) WHERE (Course = 1951A) STUDENT GRADES WHERE (ID = Student) ⋀(Year < Tgt_Yr) SELECT Grade

×

WHERE (Course = 1951A)

(a) (b) (c)

STUDENT GRADES WHERE (Year < Tgt_Yr) SELECT Grade

×

WHERE (ID = Student) WHERE (Course = 1951A)

124

slide-125
SLIDE 125

STUDENT GRADES WHERE (ID = Student) SELECT Grade

×

WHERE (Year < Tgt_Yr) WHERE (Course = 1951A) STUDENT GRADES WHERE (ID = Student) ⋀(Year < Tgt_Yr) SELECT Grade

×

WHERE (Course = 1951A)

(a) (b) (c)

STUDENT GRADES WHERE (Year < Tgt_Yr) SELECT Grade

×

WHERE (ID = Student) WHERE (Course = 1951A)

Depends on

  • utput of

join

125

slide-126
SLIDE 126

Nested Queries

SELECT s.Name FROM STUDENT s WHERE NOT EXISTS( SELECT * FROM GRADES g WHERE s.ID = g.Student )

Find names students who are not in any classes.

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

126

slide-127
SLIDE 127

SELECT s.Name FROM STUDENT s WHERE NOT EXISTS( SELECT * FROM GRADES g WHERE s.ID = g.Student )

Find names students who are not in any classes.

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

Outer Query Inner Query

Nested Queries

127

slide-128
SLIDE 128

SELECT s.Name FROM STUDENT s WHERE NOT EXISTS( SELECT * FROM GRADES g WHERE s.ID = g.Student )

Find names students who are not in any classes.

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

Correlated! Inner query will return differently for every row…

Nested Queries

128

slide-129
SLIDE 129

SELECT s.Name FROM STUDENT s WHERE s.ID NOT IN( SELECT Student FROM GRADES )

Find names students who are not in any classes.

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

Not correlated! Inner query will always return the same thing.

Nested Queries

129

slide-130
SLIDE 130

Clicker Question! (x2)

130

slide-131
SLIDE 131

SELECT s.ID, s.Name, (SELECT COUNT(*) FROM GRADES g WHERE s.ID = g.Student) FROM STUDENT s

How many courses is each student taking?

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

Clicker Question! Is this query correlated? (a) uh huh (b) nuh uh

131

slide-132
SLIDE 132

SELECT s.ID, s.Name, (SELECT COUNT(*) FROM GRADES g WHERE s.ID = g.Student) FROM STUDENT s

How many courses is each student taking?

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

Clicker Question! Is this query correlated? (a) uh huh (b) nuh uh

Yes! This value will be different for every row (i.e. for every s.ID)

132

slide-133
SLIDE 133

How many courses is each student taking?

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

SELECT s.ID, s.Name, c.num_courses FROM STUDENT s, (SELECT Student, COUNT(*) AS num_courses FROM GRADES GROUP BY Student) c WHERE s.ID = c.Student

Clicker Question! Is this query correlated? (a) yeah sure (b) not really

133

slide-134
SLIDE 134

How many courses is each student taking?

STUDENT

ID Name Year 1 Diane 4 2 Sol J 5 3 Josh 5 4 Karlly 4 5 Mounik a 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

SELECT s.ID, s.Name, c.num_courses FROM STUDENT s, (SELECT Student, COUNT(*) AS num_courses FROM GRADES GROUP BY Student) c WHERE s.ID = c.Student

Clicker Question! Is this query correlated? (a) yeah sure (b) not really

This value is always the same, regardless

  • f the row

134

slide-135
SLIDE 135

(non)Clicker Question!

135

slide-136
SLIDE 136

Find students taking courses that are above their level.

STUDENT

ID Name Year 1 Wennie 4 2 Maulik 5 3 Gurnaa z 5 4 Jens 4 5 Erin 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

SELECT s.Name FROM STUDENT s WHERE EXISTS( SELECT * FROM GRADES WHERE s.ID = GRADES.Student AND s.Year < GRADES.Tgt_Yr )

(non)Clicker Question! Rewrite to remove the subquery altogether?

136

slide-137
SLIDE 137

Find students taking courses that are above their level.

STUDENT

ID Name Year 1 Wennie 4 2 Maulik 5 3 Gurnaa z 5 4 Jens 4 5 Erin 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

SELECT s.Name FROM STUDENT s WHERE EXISTS( SELECT * FROM GRADES WHERE s.ID = GRADES.Student AND s.Year < GRADES.Tgt_Yr )

(non)Clicker Question!

HINT! Use a Join Condition

Rewrite to remove the subquery altogether?

137

slide-138
SLIDE 138

Find students taking courses that are above their level.

STUDENT

ID Name Year 1 Wennie 4 2 Maulik 5 3 Gurnaa z 5 4 Jens 4 5 Erin 4

GRADES

Studen Cours GPA Tgt_Yr 1 32 4.0 1 2 1951A 3.5 3 6 32 2.8 1

SELECT s.Name FROM STUDENT s, GRADES g WHERE s.ID = g.Student AND s.Year < g.Tgt_Yr

(non)Clicker Question! Rewrite to remove the subquery altogether?

138

slide-139
SLIDE 139

Enough SQL…what about… NoSQL?

  • NoSQL: no schema! just key-value stores
  • dictionaries instead of tables
  • (basically, jsons)
  • Good for: fast development, flexibility, messy data
  • Bad for: data integrity, safety guarantees
  • “What about for my final project?”….uh, ask me later

139

slide-140
SLIDE 140

👌

140