EGTDC Database Course 2004 Introduction to Databases Tim Booth : - - PowerPoint PPT Presentation

egtdc database course 2004 introduction to databases
SMART_READER_LITE
LIVE PREVIEW

EGTDC Database Course 2004 Introduction to Databases Tim Booth : - - PowerPoint PPT Presentation

EGTDC Database Course 2004 Introduction to Databases Tim Booth : tbooth@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk Course Overview What you will take home: What is a relational database? Why


slide-1
SLIDE 1

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

EGTDC Database Course 2004 Introduction to Databases

Tim Booth : tbooth@ceh.ac.uk

slide-2
SLIDE 2

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Course Overview

What you will take home:

  • What is a relational database?
  • Why would you use one?
  • How do you interact with one?
  • How do you design and build one?
  • What is the theory behind good relational design?
  • What software is available?
  • How can I build user interfaces, or display my data on the web?
slide-3
SLIDE 3

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

What is a database?

  • Most general definition:

“A database is an integrated collection of data organised to meet the needs of one or more users”

  • Databases stored on computer may be of various types:
  • Flat file database such as a comma-separated

file

  • Indexed file database such as BLAST dbs
  • 'Flat' relational database such as Access .mdb
  • Database managed by a dedicated application

(RDBMS) such as PostgreSQL or Oracle.

slide-4
SLIDE 4

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Benefits of a Relational Approach

  • Database structure reflects the structure of the data
  • Eliminate redundant data
  • Data validation
  • Powerful querying via SQL
  • Security, data integrity and multi-user access
  • Compatibility with a huge range of existing software and development

tools

slide-5
SLIDE 5

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

What is a Relational Database?

  • A relational database is a collection of tables. Each table has rows and

columns, and looks rather like a spreadsheet.

  • The movie table in the example database:
slide-6
SLIDE 6

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

The Example Database

  • The 'bighit' database represents a video store.
  • Each person has access to an individual copy of this database, and we

will explore it as the course progresses.

  • The next slide shows a selection of tables in the database, and links

between them.

slide-7
SLIDE 7

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

The Example Database

slide-8
SLIDE 8

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

More about database tables

  • Tables are properly called 'relations'.
  • A table holds a collection of records or 'tuples'. These are shown as rows

when you view the data.

  • Each record has a fixed set of fields or 'attributes', each of which contains

data of a specified type. These appear as columns in the data view.

  • Inside the database, the order of the rows and columns is not significant.

The database sorts them when you ask to display the data.

  • A database can have many different tables, corresponding to things we

want to store data about.

  • Other entities can also live in your database – we shall meet them later!
slide-9
SLIDE 9

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Access the sample database

  • Each user has a private copy of the sample database. We will log in now.
  • Access the command line by right-clicking the desktop and selecting 'New

Terminal'.

  • To start the PGAdminIII tool type:

pgadmin &

  • Select the server in the tree view, login with user/pass = ivgws0?/sushi
  • In the tree browser, open 'databases' then 'ivgws0?'.
  • Open the 'bighit' schema, then the 'tables' link.
  • Select the 'movie' table – note the definition which appears in the right

pane.

  • Right click and select 'view data'.
slide-10
SLIDE 10

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Introducing SQL

  • SQL is a language we will use to get at the data in the database
  • Generally human readable
  • Retrieve information with SELECT statements
  • eg. SELECT * FROM MOVIE
  • A result set comes back.
slide-11
SLIDE 11

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Bring up an SQL Editor Window

  • Click the pencil icon to get a new SQL editor window.
  • In the new window, use the eraser icon to clear any text already in the

input box.

slide-12
SLIDE 12

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Talking SQL

  • Type the query shown (1) and hit the green arrow (2) to run it.

2 2 1 1

slide-13
SLIDE 13

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

More about SQL

  • SQL is the Structured Query Language
  • The language and structure of databases are intimately linked.
  • Sometimes pronounced 'sequel' – especially by Microsoft devotees.
  • Invented by Dr. Edgar Frank 'Ted' Codd (1924-2003), who was from

Oxford, and set down the foundations of the relational database concept while working in the IBM San Jose Research Laboratory.

  • The classic paper 'A relational model of data for large shared data banks'

appeared in the ACM (Association for Computing Machinery) magazine, June 1970

slide-14
SLIDE 14

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Being More Specific

  • The general form of basic SELECT statements:

SELECT title, year FROM movie WHERE year > 1990 ORDER BY year SELECT WHAT FROM WHICH TABLE WHERE SOME TEST ORDER BY WHICH COLUMN

  • The last two parts are both optional.
  • SELECT * is a special case to get all columns.
slide-15
SLIDE 15

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Modifying Data

  • INSERT, DELETE and UPDATE are the commands to modify data.

DELETE FROM reservation where accountid = 23 DELETE FROM TABLE WHERE CONDITION INSERT INTO reservation (accountid, movieid, datereserved) VALUES (23, 15, '12 June') INSERT INTO TABLE (COLUMNS) VALUES (VALUES)

slide-16
SLIDE 16

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Modifying Data 2

  • INSERT, DELETE and UPDATE are the commands to modify data.

UPDATE movie SET genre = 'fantasy', rating = 'PG' WHERE id = 15 UPDATE TABLE SET COLUMN1 = VALUE1, COLUMN2 = VALUE2 WHERE CONDITION

slide-17
SLIDE 17

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Getting Exactly What You Want

  • In a RDB you try not to store redundant or derived data.
  • For example, we store the year of release of a movie, but not the age.
  • We can specify a calculation within the SELECT statement to calculate

the age of a movie on-the-fly. A) select title, year from movie B) select title, extract(YEAR FROM now()) - year from movie

  • 'extract' and 'now' are functions.
slide-18
SLIDE 18

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Name That Column

  • Column aliases allow you to specify the name of a column retrieved with

SELECT.

  • Running the previous example, we see that the second column gets

called '?column?', because the database has no idea what to call the result of the formula.

  • We can simply say:

select title, extract(YEAR FROM now()) - year as age from movie

slide-19
SLIDE 19

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Learning by Example

  • SQL allows you to derive data by supplying a variety of functions and

constructs.

  • Rather than just talking through them, we will introduce some by

example in the exercises.

slide-20
SLIDE 20

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Relational Databases in Biology

  • Some databases, such as the NCBI Taxonomy, can be downloaded and

set up on a local machine. ftp://ftp.ncbi.nih.gov/pub/taxonomy/

  • Ensembl permits direct access to the core database, which we will look at
  • later. The document describing how to get access is here:

http://cvsweb.sanger.ac.uk/cgi−bin/cvsweb.cgi/ ~checkout~/ensembl/docs/tutorial/ensembl_tutorial.pdf

  • For large datasets, deploying an RDBMS can provide fast, flexible access

to the data. Most data-portal websites have such a database at their heart.

slide-21
SLIDE 21

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Credits

  • The Bighit database is adapted from the website of Greg Riccardi -

http://riccardi.pearsoncmg.com/dbmgmt/.

  • Examples and quotes from Database Design and Management, by Rob

Eadie (Continuum, 1998).

  • Programming the Perl DBI, Alligator Descartes & Tim Bunce, (O'Reilly

2000)