nebc database course 2008 implementing a relational
play

NEBC Database Course 2008 Implementing a Relational Database Tim - PowerPoint PPT Presentation

NEBC Database Course 2008 Implementing a Relational Database Tim Booth : tbooth@ceh.ac.uk Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema


  1. NEBC Database Course 2008 Implementing a Relational Database Tim Booth : tbooth@ceh.ac.uk

  2. Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement schema and database

  3. Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: String Name: String primary key: FeatID foreign key: SourceSequence references Sequence

  4. Implementing our database • Most of hard work is already done • Create relations using SQL • Define the data types for our columns • Define primary and foreign keys • Add constraints • Add any appropriate default values

  5. Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName

  6. Create Table relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename , datatype , url );

  7. Add data types Reminder: • Numerical • integer,float,numeric • String/Text • varchar,text • Date/Time • timestamp,date • Boolean

  8. Add data types relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) );

  9. Primary Keys relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200), primary key (databasename) );

  10. Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(50), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase, foreign key (sourceorganism) references organism );

  11. Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase(databasename), foreign key (sourceorganism) references organism(organismnumber) );

  12. feature table CREATE TABLE feature ( featid varchar(50), name varchar(100), sourcesequence varchar(50), primary key (featid), foreign key (sourcesequence) references sequence(accessionnumber) );

  13. organism table CREATE TABLE organism ( organismnumber integer species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );

  14. Constraints • Constraints restrict the values that can be inserted or updated in columns • Types of constraints • NOT NULL • UNIQUE • Simply add to column definition url varchar(100) NOT NULL or url varchar(100) UNIQUE • NOT NULL and UNIQUE implicit on primary key

  15. Constraints • CHECK constraint numberoflegs integer CHECK (numberoflegs>2)

  16. publicdatabase table CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) UNIQUE, primary key (databasename) );

  17. Constraints • To keep links between tables working you need to preserve the matching values – referential integrity • automatically set up when you declare the primary and foreign keys • This will prevent you from deleting a record with a primary key before you have deleted all the child foreign key records

  18. Constraints • Example from 'BigHit' database

  19. Constraints

  20. organism table CREATE TABLE organism ( organismnumber integer, species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); How shall we create the unique primary key values for organismnumber?

  21. Sequences • Sequence is a database object in PostgreSQL which gives you an automatically incrementing numeric value (equivalent to 'autonumber' in Access) CREATE SEQUENCE my_seq (can specify increment,min and max) SELECT NEXTVAL('my_seq')

  22. Default values • Still don't want to have to select the value each time • Can set a default value for column which is automatically filled in every time a record is inserted CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );

  23. Create your database • To create your database run your SQL table and other object creation statements in a single script • Example - demodatabase.sql • Be sure create tables in the right order • Can't create table that refers to a primary key in a table that doesn't exist yet • You also need data...

  24. Populate your database • Insert data using INSERT sql statements INSERT INTO organism (species,strain,genomeseq,commonname) VALUES ('Oryctolagus cuniculus',NULL,'false','rabbit'); • Default values will inserted automatically CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), ...

  25. Populate your database • Be sure to insert data in correct order • Don't try and insert a foreign key value when the primary key value hasn't been inserted yet • Run the demodatabase.sql script

  26. Querying your database • Now that your database is set up and data has been inserted we can query it

  27. Database Sequence Type ID Accession Organism Name Number Swissprot Protein Phosphorylase B KPB1_Rabit Rabbit kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Proten Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein TRIC_RABIT Troponin I EMBL Nucleotide rabbit muscle OCPHOS2 rabbit phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein Q8MJF7 Rabbit pol protein EMBL Nucleotide OXPKA Rabbit Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase

  28. Querying your database

  29. Querying your database

  30. Querying your database

  31. Querying your database • What have we gained? • No data redundancy • Data is consistent • Enforced quality control – no missing data • Only have to change data once • Flexibility to run a variety of queries

  32. Views • Views are queries that are saved in the database as objects • Appear much like a table which can be queried in the same way • Good if underlying query is very complex CREATE VIEW viewname AS query

  33. Views

  34. Views

  35. Indexes • Searching data by scanning is slow • Indexes make this searching faster • Implicit indexes are set up for primary keys as these are used a lot for searching data

  36. Indexes • An index can be created on any column CREATE INDEX orgname_idx on organism (commonname) • An index is helpful on a column that is regularly searched on (i.e. Used in the WHERE clause)

  37. Index worked example • There are more movies in the file demodata/moremovies.csv • These are already loaded into the database table 'demodata.moremovies' • in PGAdmin3: • INSERT INTO bighit.movie (SELECT * FROM demodata.moremovies); • SELECT * FROM movie where rating = 'U';

  38. Index worked example • Explain the query • Now make an index: • CREATE INDEX myindex ON movie (rating); • Now explain the original query again • This works for very complex queries!

  39. Complex Query Analysis

  40. The “MART” Strategy • Normalisation is the process of removing data redundancy from your database design • But it adds complexity • Views can make querying simpler • Indexes can make querying faster • But... Sometimes this is not enough. Maintaining a summary table for quick querying is known as 'denormalisation' • Many large databases (eg. EnsEMBL) resort to this

  41. More features... • stored procedures • triggers • cascading updates • custom types • custom functions • extension modules • load balancing • replication • ...

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend