NCSS: Databases and SQL Tim Dawborn Lecture 2, January, 2016 - - PowerPoint PPT Presentation

ncss databases and sql
SMART_READER_LITE
LIVE PREVIEW

NCSS: Databases and SQL Tim Dawborn Lecture 2, January, 2016 - - PowerPoint PPT Presentation

NCSS: Databases and SQL Tim Dawborn Lecture 2, January, 2016 Python/sqlite3 DB Design API JOINs 2 Outline 1 Connecting to an SQLite database using Python 2 What is a good database design? 3 A nice API 4 More on joins and subqueries...


slide-1
SLIDE 1

NCSS: Databases and SQL

Tim Dawborn Lecture 2, January, 2016

slide-2
SLIDE 2

Python/sqlite3 DB Design API JOINs 2

Outline

1 Connecting to an SQLite database using Python 2 What is a ‘good’ database design? 3 A nice API 4 More on joins and subqueries...

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-3
SLIDE 3

Python/sqlite3 DB Design API JOINs 3

Database Application Programming

  • Python standard library contains a module for interacting with

SQLite databases: sqlite3

  • You work primarily with Connection and Cursor objects
  • The connect method creates a Connection object to a SQLite

database

1

>>> import sqlite3

2

>>> conn = sqlite3.connect('sports.db')

3

>>> conn

4

<sqlite3.Connection object at 0x312720>

5

>>>

  • Once you are finished, the connection should be closed

6

>>> conn.close()

7

>>>

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-4
SLIDE 4

Python/sqlite3 DB Design API JOINs 4

Cursor Objects

  • Cursors are the standard way to interact with a database from

within a programming language

  • Cursor objects allow you to execute a query and iterate

through the results of a query

  • A Connection object allows you to obtain a cursor which

points into the database

1

>>> cur = conn.cursor()

2

>>> cur

3

<sqlite3.Cursor object at 0x387b00>

4

>>>

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-5
SLIDE 5

Python/sqlite3 DB Design API JOINs 5

Executing Queries

  • The cursor has an execute method which allows you to

execute one SQL query at a result

  • The cursor object itself can then be iterated through to obtain

the resultant rows

1

>>> cur.execute('SELECT * FROM events;')

2

>>> for row in cur:

3

... print(row)

4 5

(0, '100m', 16, 'M', '09:10')

6

(1, '200m', 16, 'M', '09:15')

7

(2, '100m', 17, 'M', '09:00')

8

(3, '100m', 17, 'F', '09:05')

9

>>>

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-6
SLIDE 6

Python/sqlite3 DB Design API JOINs 6

Executing Queries

  • The fetchone method returns one row at a result, or None
  • The previous is equivalent to the following

1

>>> cur.execute('SELECT * FROM events;')

2

>>> while True:

3

... row = cur.fetchone()

4

... if row is None:

5

... break

6

... print(row)

7 8

(0, '100m', 16, 'M', '09:10')

9

(1, '200m', 16, 'M', '09:15')

10

(2, '100m', 17, 'M', '09:00')

11

(3, '100m', 17, 'F', '09:05')

12

>>>

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-7
SLIDE 7

Python/sqlite3 DB Design API JOINs 7

Data Types: Python vs. SQL

  • We know SQLite has 5 data types
  • However, Python has a lot more than these five ...
  • str, int, float, list, dict, set, tuple, ...
  • How do we coerce Python data types to SQLite and

vice-versa?

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-8
SLIDE 8

Python/sqlite3 DB Design API JOINs 7

Data Types: Python vs. SQL

  • We know SQLite has 5 data types
  • However, Python has a lot more than these five ...
  • str, int, float, list, dict, set, tuple, ...
  • How do we coerce Python data types to SQLite and

vice-versa? SQLite Python

NULL None INTEGER int REAL float TEXT str BLOB bytes

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-9
SLIDE 9

Python/sqlite3 DB Design API JOINs 8

Joining tables

  • Why do we need to join tables together?

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-10
SLIDE 10

Python/sqlite3 DB Design API JOINs 8

Joining tables

  • Why do we need to join tables together?
  • Why not store all data in just one huge table?

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-11
SLIDE 11

Python/sqlite3 DB Design API JOINs 8

Joining tables

  • Why do we need to join tables together?
  • Why not store all data in just one huge table?

id 1 2 3 4 5 6 7 8 fname Barry Barry Prue Andrew Mathew Mara Scott Alec Karen Grant lname Schultz Schultz Robinson Varvel Nemes Barber Herdman Newton Barber Ovzinsky gender M M F M M F M M F M age 16 16 17 16 13 17 16 16 14 17 ename 100m 200m 100m 100m 100m 100m 100m 100m 200m eage 16 16 17 16 13 17 16 14 17 egender M M F M M F M M M at 09:10 09:15 09:05 09:10 09:10 09:05 09:10 09:10 09:00 result 00:15 00:40 00:20 00:17 00:20 00:21

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-12
SLIDE 12

Python/sqlite3 DB Design API JOINs 8

Joining tables

  • Why do we need to join tables together?
  • Why not store all data in just one huge table?

id 1 2 3 4 5 6 7 8 fname Barry Barry Prue Andrew Mathew Mara Scott Alec Karen Grant lname Schultz Schultz Robinson Varvel Nemes Barber Herdman Newton Barber Ovzinsky gender M M F M M F M M F M age 16 16 17 16 13 17 16 16 14 17 ename 100m 200m 100m 100m 100m 100m 100m 100m 200m eage 16 16 17 16 13 17 16 14 17 egender M M F M M F M M M at 09:10 09:15 09:05 09:10 09:10 09:05 09:10 09:10 09:00 result 00:15 00:40 00:20 00:17 00:20 00:21

  • Problem: A lot of data is repeated and too many NULLs...

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-13
SLIDE 13

Python/sqlite3 DB Design API JOINs 9

Central goal: minimizing redundancy

  • Central goal of database design: minimizing data redundancy
  • There’s a large body of theoretical work on this
  • Data Normalization — in a nutshell:
  • Keep a table about data of just one concept.

(such as persons or events)

  • Use foreign keys to link tables in your schema

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-14
SLIDE 14

Python/sqlite3 DB Design API JOINs 9

Central goal: minimizing redundancy

  • Central goal of database design: minimizing data redundancy
  • There’s a large body of theoretical work on this
  • Data Normalization — in a nutshell:
  • Keep a table about data of just one concept.

(such as persons or events)

  • Use foreign keys to link tables in your schema

people id fname lname gender age results event person result events id name age gender at

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-15
SLIDE 15

Python/sqlite3 DB Design API JOINs 10

A well-done database schema

  • The Good:
  • each table is about a well defined concept only
  • updates affect (typically) a single row only
  • The Bad:
  • many tables
  • tendency to introduce unique IDs
  • a lot of joins...
  • The Ugly:
  • it’s not always possible and very experience-driven

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-16
SLIDE 16

Python/sqlite3 DB Design API JOINs 11

Piecing it all together

  • How are we going to integrate all of this into our NCSSBook?
  • Model-View-Controller mentality
  • Loading and writing data to and from a SQLite database
  • Database is stored in a single flatfile
  • Ease of setup – no server required
  • Convenient for version control
  • Correct data modelling is important

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-17
SLIDE 17

Python/sqlite3 DB Design API JOINs 12

Example: Account table

  • What do you think about the following users table design?

id 1 2 3 4 5 6 user steve elaine miranda jesse mathew tim pass 1234 abcdef foobar 1Two3 156ytfv1h8 joshua email s@my.home eora@yahoo.com m123@gmail.com jesse@town.org mat@hotmail.com tim@iinet.net.au school

  • St. Mary’s
  • St. Mary’s

X Public School Y State High Y State High Mercedes College city Sydney Sydney Glebe Townsville Townsville Perth friends 2,3 1 1,4,5 3,5 3,4,6,10 5,10

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-18
SLIDE 18

Python/sqlite3 DB Design API JOINs 12

Example: Account table

  • What do you think about the following users table design?

id 1 2 3 4 5 6 user steve elaine miranda jesse mathew tim pass 1234 abcdef foobar 1Two3 156ytfv1h8 joshua email s@my.home eora@yahoo.com m123@gmail.com jesse@town.org mat@hotmail.com tim@iinet.net.au school

  • St. Mary’s
  • St. Mary’s

X Public School Y State High Y State High Mercedes College city Sydney Sydney Glebe Townsville Townsville Perth friends 2,3 1 1,4,5 3,5 3,4,6,10 5,10

  • friends should be a separate table rather than a CSV-string
  • Better also have schools in a separate table
  • Unencrypted passwords? Seriously!?

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-19
SLIDE 19

Python/sqlite3 DB Design API JOINs 13

Example: Login code

  • What do you think about the following login code?

1

import sqlite3

2

def login(username, password):

3

conn = sqlite3.connect('ncssbook.db')

4

cur = conn.execute('SELECT user, pass FROM users')

5

found = False

6

for row in cur:

7

if row[0] == username and row[1] == password:

8

found = True

9

conn.close()

10

return found

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-20
SLIDE 20

Python/sqlite3 DB Design API JOINs 13

Example: Login code

  • What do you think about the following login code?

1

import sqlite3

2

def login(username, password):

3

conn = sqlite3.connect('ncssbook.db')

4

cur = conn.execute('SELECT user, pass FROM users')

5

found = False

6

for row in cur:

7

if row[0] == username and row[1] == password:

8

found = True

9

conn.close()

10

return found

  • Do not scan a table and filter in Python – use SQL!
  • row[0] or row[1] relies on the positions; use row["user"]
  • Use just one connection for the whole program
  • Store passwords hashed or encrypted!

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-21
SLIDE 21

Python/sqlite3 DB Design API JOINs 14

Example: Login code – a better approach

1

import sqlite3

2 3

conn = sqlite3.connect('ncssbook.db')

4 5

def login(username, password):

6

cur = conn.execute('''

7

SELECT id

8

FROM users

9

WHERE user=? AND pass=?

10

''', (username, password))

11

row = cur.fetchone()

12

user_id = None if row is None else row['id']

13

conn.commit()

14

return user_id

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-22
SLIDE 22

Python/sqlite3 DB Design API JOINs 15

Making others’ lives easier

  • There are some operations that we’ll do a lot:
  • Create a (user) row in a table
  • Find a (user) row in a table
  • Update a (user) row in a table
  • Delete a (user) row in a table
  • It’s painful to have to write SQL every time we need to do

this

  • for every single type of table in the database!

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-23
SLIDE 23

Python/sqlite3 DB Design API JOINs 16

Idea 1: Let’s use functions

1

def find_user(username, conn):

2

cur = conn.execute('''SELECT * FROM users

3

WHERE user=?''', (username,))

4

row = cur.fetchone()

5

return row

  • What’s wrong with this?

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-24
SLIDE 24

Python/sqlite3 DB Design API JOINs 16

Idea 1: Let’s use functions

1

def find_user(username, conn):

2

cur = conn.execute('''SELECT * FROM users

3

WHERE user=?''', (username,))

4

row = cur.fetchone()

5

return row

  • What’s wrong with this?
  • The user of the function has to process the tuple
  • Exposes changes in the database schema
  • This is ugly and prone to failure

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-25
SLIDE 25

Python/sqlite3 DB Design API JOINs 17

Idea 2: Let’s use classes and objects

1

class User:

2

def __init__(self, username, fname, lname):

3

self.username = username

4

self.fname = fname

5

self.lname = lname

6 7

@staticmethod

8

def find(username):

9

cur = conn.execute('''SELECT * FROM users

10

WHERE user=?''', (username,))

11

row = cur.fetchone()

12

if row is None:

13

raise UserNotFound('{} does not exist'.format(username))

14

return User(row[0], row[1], row[2])

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-26
SLIDE 26

Python/sqlite3 DB Design API JOINs 18

Idea 2: Let’s use classes and objects

1

@staticmethod

2

def create(username, fname, lname):

3

cur = conn.execute('''INSERT INTO users

4

VALUES (?, ?, ?)''', (username, fname, lname))

5

return User(username, fname, lname)

6 7

@staticmethod

8

def delete(username):

9

cur = conn.execute('''DELETE FROM users

10

WHERE username = ?''', (username,))

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-27
SLIDE 27

Python/sqlite3 DB Design API JOINs 19

An API abstracts away from the database

  • Defining the User object makes life much nicer for other

programmers

  • No one else has to write SQL (like the function approach)
  • Internalise the tuple processing and produce a standardized
  • bject representation
  • We can transparently make changes to the database schema

without affecting anyone else

  • You should consider creating an object like this for each table

in your database

  • And if you’re really good, you could try and figure out how to

avoid rewriting for every new database table

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-28
SLIDE 28

Python/sqlite3 DB Design API JOINs 20

More Joins

  • Okay, back to more complex SQL
  • We can join more than two tables together
  • Example: List the names of all females who ran the 100m in

alphabetical order.

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-29
SLIDE 29

Python/sqlite3 DB Design API JOINs 20

More Joins

  • Okay, back to more complex SQL
  • We can join more than two tables together
  • Example: List the names of all females who ran the 100m in

alphabetical order.

1

SELECT fname, lname

2

FROM people p

3

JOIN results r ON p.id = r.person

4

JOIN events e ON e.id = r.event

5

WHERE e.name = '100m'

6

AND e.gender = 'F'

7

ORDER BY lname;

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-30
SLIDE 30

Python/sqlite3 DB Design API JOINs 21

Nested SQL queries

  • Remember: Every SQL query returns its result as a table.
  • This means we can nest SQL queries: an outer query can

check the results of an inner (nested) sub-query.

  • We can build complex query from smaller building blocks.

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-31
SLIDE 31

Python/sqlite3 DB Design API JOINs 22

Nesting SQL Queries (continued)

  • Example: Which females ran in the 100m?

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-32
SLIDE 32

Python/sqlite3 DB Design API JOINs 22

Nesting SQL Queries (continued)

  • Example: Which females ran in the 100m?

1

SELECT r.person

2

FROM results r

3

JOIN events e ON e.id = r.event

4

WHERE e.name = '100m' AND e.gender = 'F'

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-33
SLIDE 33

Python/sqlite3 DB Design API JOINs 22

Nesting SQL Queries (continued)

  • Example: Which females ran in the 100m?

1

SELECT r.person

2

FROM results r

3

JOIN events e ON e.id = r.event

4

WHERE e.name = '100m' AND e.gender = 'F'

  • Use this as a sub-query to answer the original question:

1

SELECT fname, lname

2

FROM people

3

WHERE id IN (

4

SELECT r.person

5

FROM results r

6

JOIN events e ON e.id = r.event

7

WHERE e.name = '100m' AND e.gender = 'F'

8

)

9

ORDER BY lname;

Tim Dawborn Databases & SQL Lecture 2, January, 2016

slide-34
SLIDE 34

Python/sqlite3 DB Design API JOINs 23

Scientific Databases: How complex SQL can become...

  • Part of our work here at Sydney Uni is to look at how

databases and SQL can help answering scientific questions.

  • For example in the context of genomics:
  • Another good example: SkyServer

http://cas.sdss.org/dr6/en/tools/chart/navi.asp

Tim Dawborn Databases & SQL Lecture 2, January, 2016