NCSS: Databases and SQL Tim Dawborn Lecture 2, January, 2016 - - PowerPoint PPT Presentation
NCSS: Databases and SQL Tim Dawborn Lecture 2, January, 2016 - - PowerPoint PPT Presentation
NCSS: Databases and SQL Tim Dawborn Lecture 2, January, 2016 Python/sqlite3 DB Design API JOINs 2 Outline 1 Connecting to an SQLite database using Python 2 What is a good database design? 3 A nice API 4 More on joins and subqueries...
Python/sqlite3 DB Design API JOINs 2
Outline
1 Connecting to an SQLite database using Python 2 What is a ‘good’ database design? 3 A nice API 4 More on joins and subqueries...
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 3
Database Application Programming
- Python standard library contains a module for interacting with
SQLite databases: sqlite3
- You work primarily with Connection and Cursor objects
- The connect method creates a Connection object to a SQLite
database
1
>>> import sqlite3
2
>>> conn = sqlite3.connect('sports.db')
3
>>> conn
4
<sqlite3.Connection object at 0x312720>
5
>>>
- Once you are finished, the connection should be closed
6
>>> conn.close()
7
>>>
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 4
Cursor Objects
- Cursors are the standard way to interact with a database from
within a programming language
- Cursor objects allow you to execute a query and iterate
through the results of a query
- A Connection object allows you to obtain a cursor which
points into the database
1
>>> cur = conn.cursor()
2
>>> cur
3
<sqlite3.Cursor object at 0x387b00>
4
>>>
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 5
Executing Queries
- The cursor has an execute method which allows you to
execute one SQL query at a result
- The cursor object itself can then be iterated through to obtain
the resultant rows
1
>>> cur.execute('SELECT * FROM events;')
2
>>> for row in cur:
3
... print(row)
4 5
(0, '100m', 16, 'M', '09:10')
6
(1, '200m', 16, 'M', '09:15')
7
(2, '100m', 17, 'M', '09:00')
8
(3, '100m', 17, 'F', '09:05')
9
>>>
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 6
Executing Queries
- The fetchone method returns one row at a result, or None
- The previous is equivalent to the following
1
>>> cur.execute('SELECT * FROM events;')
2
>>> while True:
3
... row = cur.fetchone()
4
... if row is None:
5
... break
6
... print(row)
7 8
(0, '100m', 16, 'M', '09:10')
9
(1, '200m', 16, 'M', '09:15')
10
(2, '100m', 17, 'M', '09:00')
11
(3, '100m', 17, 'F', '09:05')
12
>>>
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 7
Data Types: Python vs. SQL
- We know SQLite has 5 data types
- However, Python has a lot more than these five ...
- str, int, float, list, dict, set, tuple, ...
- How do we coerce Python data types to SQLite and
vice-versa?
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 7
Data Types: Python vs. SQL
- We know SQLite has 5 data types
- However, Python has a lot more than these five ...
- str, int, float, list, dict, set, tuple, ...
- How do we coerce Python data types to SQLite and
vice-versa? SQLite Python
NULL None INTEGER int REAL float TEXT str BLOB bytes
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 8
Joining tables
- Why do we need to join tables together?
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 8
Joining tables
- Why do we need to join tables together?
- Why not store all data in just one huge table?
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 8
Joining tables
- Why do we need to join tables together?
- Why not store all data in just one huge table?
id 1 2 3 4 5 6 7 8 fname Barry Barry Prue Andrew Mathew Mara Scott Alec Karen Grant lname Schultz Schultz Robinson Varvel Nemes Barber Herdman Newton Barber Ovzinsky gender M M F M M F M M F M age 16 16 17 16 13 17 16 16 14 17 ename 100m 200m 100m 100m 100m 100m 100m 100m 200m eage 16 16 17 16 13 17 16 14 17 egender M M F M M F M M M at 09:10 09:15 09:05 09:10 09:10 09:05 09:10 09:10 09:00 result 00:15 00:40 00:20 00:17 00:20 00:21
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 8
Joining tables
- Why do we need to join tables together?
- Why not store all data in just one huge table?
id 1 2 3 4 5 6 7 8 fname Barry Barry Prue Andrew Mathew Mara Scott Alec Karen Grant lname Schultz Schultz Robinson Varvel Nemes Barber Herdman Newton Barber Ovzinsky gender M M F M M F M M F M age 16 16 17 16 13 17 16 16 14 17 ename 100m 200m 100m 100m 100m 100m 100m 100m 200m eage 16 16 17 16 13 17 16 14 17 egender M M F M M F M M M at 09:10 09:15 09:05 09:10 09:10 09:05 09:10 09:10 09:00 result 00:15 00:40 00:20 00:17 00:20 00:21
- Problem: A lot of data is repeated and too many NULLs...
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 9
Central goal: minimizing redundancy
- Central goal of database design: minimizing data redundancy
- There’s a large body of theoretical work on this
- Data Normalization — in a nutshell:
- Keep a table about data of just one concept.
(such as persons or events)
- Use foreign keys to link tables in your schema
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 9
Central goal: minimizing redundancy
- Central goal of database design: minimizing data redundancy
- There’s a large body of theoretical work on this
- Data Normalization — in a nutshell:
- Keep a table about data of just one concept.
(such as persons or events)
- Use foreign keys to link tables in your schema
people id fname lname gender age results event person result events id name age gender at
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 10
A well-done database schema
- The Good:
- each table is about a well defined concept only
- updates affect (typically) a single row only
- The Bad:
- many tables
- tendency to introduce unique IDs
- a lot of joins...
- The Ugly:
- it’s not always possible and very experience-driven
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 11
Piecing it all together
- How are we going to integrate all of this into our NCSSBook?
- Model-View-Controller mentality
- Loading and writing data to and from a SQLite database
- Database is stored in a single flatfile
- Ease of setup – no server required
- Convenient for version control
- Correct data modelling is important
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 12
Example: Account table
- What do you think about the following users table design?
id 1 2 3 4 5 6 user steve elaine miranda jesse mathew tim pass 1234 abcdef foobar 1Two3 156ytfv1h8 joshua email s@my.home eora@yahoo.com m123@gmail.com jesse@town.org mat@hotmail.com tim@iinet.net.au school
- St. Mary’s
- St. Mary’s
X Public School Y State High Y State High Mercedes College city Sydney Sydney Glebe Townsville Townsville Perth friends 2,3 1 1,4,5 3,5 3,4,6,10 5,10
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 12
Example: Account table
- What do you think about the following users table design?
id 1 2 3 4 5 6 user steve elaine miranda jesse mathew tim pass 1234 abcdef foobar 1Two3 156ytfv1h8 joshua email s@my.home eora@yahoo.com m123@gmail.com jesse@town.org mat@hotmail.com tim@iinet.net.au school
- St. Mary’s
- St. Mary’s
X Public School Y State High Y State High Mercedes College city Sydney Sydney Glebe Townsville Townsville Perth friends 2,3 1 1,4,5 3,5 3,4,6,10 5,10
- friends should be a separate table rather than a CSV-string
- Better also have schools in a separate table
- Unencrypted passwords? Seriously!?
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 13
Example: Login code
- What do you think about the following login code?
1
import sqlite3
2
def login(username, password):
3
conn = sqlite3.connect('ncssbook.db')
4
cur = conn.execute('SELECT user, pass FROM users')
5
found = False
6
for row in cur:
7
if row[0] == username and row[1] == password:
8
found = True
9
conn.close()
10
return found
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 13
Example: Login code
- What do you think about the following login code?
1
import sqlite3
2
def login(username, password):
3
conn = sqlite3.connect('ncssbook.db')
4
cur = conn.execute('SELECT user, pass FROM users')
5
found = False
6
for row in cur:
7
if row[0] == username and row[1] == password:
8
found = True
9
conn.close()
10
return found
- Do not scan a table and filter in Python – use SQL!
- row[0] or row[1] relies on the positions; use row["user"]
- Use just one connection for the whole program
- Store passwords hashed or encrypted!
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 14
Example: Login code – a better approach
1
import sqlite3
2 3
conn = sqlite3.connect('ncssbook.db')
4 5
def login(username, password):
6
cur = conn.execute('''
7
SELECT id
8
FROM users
9
WHERE user=? AND pass=?
10
''', (username, password))
11
row = cur.fetchone()
12
user_id = None if row is None else row['id']
13
conn.commit()
14
return user_id
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 15
Making others’ lives easier
- There are some operations that we’ll do a lot:
- Create a (user) row in a table
- Find a (user) row in a table
- Update a (user) row in a table
- Delete a (user) row in a table
- It’s painful to have to write SQL every time we need to do
this
- for every single type of table in the database!
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 16
Idea 1: Let’s use functions
1
def find_user(username, conn):
2
cur = conn.execute('''SELECT * FROM users
3
WHERE user=?''', (username,))
4
row = cur.fetchone()
5
return row
- What’s wrong with this?
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 16
Idea 1: Let’s use functions
1
def find_user(username, conn):
2
cur = conn.execute('''SELECT * FROM users
3
WHERE user=?''', (username,))
4
row = cur.fetchone()
5
return row
- What’s wrong with this?
- The user of the function has to process the tuple
- Exposes changes in the database schema
- This is ugly and prone to failure
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 17
Idea 2: Let’s use classes and objects
1
class User:
2
def __init__(self, username, fname, lname):
3
self.username = username
4
self.fname = fname
5
self.lname = lname
6 7
@staticmethod
8
def find(username):
9
cur = conn.execute('''SELECT * FROM users
10
WHERE user=?''', (username,))
11
row = cur.fetchone()
12
if row is None:
13
raise UserNotFound('{} does not exist'.format(username))
14
return User(row[0], row[1], row[2])
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 18
Idea 2: Let’s use classes and objects
1
@staticmethod
2
def create(username, fname, lname):
3
cur = conn.execute('''INSERT INTO users
4
VALUES (?, ?, ?)''', (username, fname, lname))
5
return User(username, fname, lname)
6 7
@staticmethod
8
def delete(username):
9
cur = conn.execute('''DELETE FROM users
10
WHERE username = ?''', (username,))
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 19
An API abstracts away from the database
- Defining the User object makes life much nicer for other
programmers
- No one else has to write SQL (like the function approach)
- Internalise the tuple processing and produce a standardized
- bject representation
- We can transparently make changes to the database schema
without affecting anyone else
- You should consider creating an object like this for each table
in your database
- And if you’re really good, you could try and figure out how to
avoid rewriting for every new database table
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 20
More Joins
- Okay, back to more complex SQL
- We can join more than two tables together
- Example: List the names of all females who ran the 100m in
alphabetical order.
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 20
More Joins
- Okay, back to more complex SQL
- We can join more than two tables together
- Example: List the names of all females who ran the 100m in
alphabetical order.
1
SELECT fname, lname
2
FROM people p
3
JOIN results r ON p.id = r.person
4
JOIN events e ON e.id = r.event
5
WHERE e.name = '100m'
6
AND e.gender = 'F'
7
ORDER BY lname;
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 21
Nested SQL queries
- Remember: Every SQL query returns its result as a table.
- This means we can nest SQL queries: an outer query can
check the results of an inner (nested) sub-query.
- We can build complex query from smaller building blocks.
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 22
Nesting SQL Queries (continued)
- Example: Which females ran in the 100m?
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 22
Nesting SQL Queries (continued)
- Example: Which females ran in the 100m?
1
SELECT r.person
2
FROM results r
3
JOIN events e ON e.id = r.event
4
WHERE e.name = '100m' AND e.gender = 'F'
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 22
Nesting SQL Queries (continued)
- Example: Which females ran in the 100m?
1
SELECT r.person
2
FROM results r
3
JOIN events e ON e.id = r.event
4
WHERE e.name = '100m' AND e.gender = 'F'
- Use this as a sub-query to answer the original question:
1
SELECT fname, lname
2
FROM people
3
WHERE id IN (
4
SELECT r.person
5
FROM results r
6
JOIN events e ON e.id = r.event
7
WHERE e.name = '100m' AND e.gender = 'F'
8
)
9
ORDER BY lname;
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 23
Scientific Databases: How complex SQL can become...
- Part of our work here at Sydney Uni is to look at how
databases and SQL can help answering scientific questions.
- For example in the context of genomics:
- Another good example: SkyServer
http://cas.sdss.org/dr6/en/tools/chart/navi.asp
Tim Dawborn Databases & SQL Lecture 2, January, 2016