implementing a relational database
play

Implementing a Relational Database Joe Wood anjw@ceh.ac.uk - PowerPoint PPT Presentation

Implementing a Relational Database Joe Wood anjw@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk Implementing a relational database establish requirements Data Requirements data analysis


  1. Implementing a Relational Database Joe Wood anjw@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  2. Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement schema and database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  3. Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: String Name: String primary key: FeatID foreign key: SourceSequence references Sequence Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  4. Implementing our database • Most of hard work is already done • Create relations using SQL • Define the data types for our columns • Define primary and foreign keys • Add constraints • Add any appropriate default values Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  5. Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  6. Create Table relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename datatype url ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  7. Add data types Reminder: • Numerical • integer,float,numerical • String/Text • varchar,text • Date/Time • timestamp,date • Boolean Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  8. Add data types relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  9. Primary Keys relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200), primary key (databasename) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  10. Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase(databasename), foreign key (sourceorganism) references organism(organismnumber) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  11. Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase, foreign key (sourceorganism) references organism ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  12. feature table CREATE TABLE feature ( featid varchar(50), name varchar(100), sourcesequence varchar(50), primary key (featid), foreign key (sourcesequence) references sequence(accessionnumber) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  13. organism table CREATE TABLE organism ( organismnumber integer species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  14. Constraints • Constraints restrict the values that can be inserted or updated in columns • Types of constraints • NOT NULL • UNIQUE • Simply add to column definition url varchar(100) NOT NULL or url varchar(100) UNIQUE Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  15. Constraints • NOT NULL and UNIQUE implicit when primary key • CHECK constraint numberoflegs integer check (numberoflegs>2) Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  16. publicdatabase table CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) UNIQUE, primary key (databasename) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  17. Constraints • Foreign key constraints • To keep links between tables working you need to preserve the matching values – referential integrity • These constraints automatically set up for you by RDBMS when you declare the primary and foreign keys • This will prevent you from deleting a record with a primary key before you have deleted all the child foreign key records Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  18. Constraints • Example from 'BigHit' database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  19. Constraints Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  20. organism table CREATE TABLE organism ( organismnumber integer, species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); How shall we create the unique primary key values for organismnumber? Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  21. Sequences • Could do complicated checks to work out the last number entered, add 1 and insert for each new record • Sequence is a database object in PostgreSQL which essentially an automatically incrementing numeric value (equivalent to 'autonumber' in Access) CREATE SEQUENCE 'my_seq' (can specify increment,min and max) SELECT NEXTVAL('my_seq') Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  22. Default values • Still don't want to have to select the value each time • Can set a default value for column which is automatically filled in every time a record is inserted CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  23. Create your database • To create your database run your SQL table and other object creation statements in a single script • Example - demodatabase.sql • Be sure create tables in the right order • Can't create table that refers to a primary key in a table that doesn't exist yet • You also need data... Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  24. Populate your database • Insert data using INSERT sql statements INSERT INTO organism (species,strain,genomeseq,commonname) VALUES ('Oryctolagus cuniculus',NULL,'false','rabbit'); • Default values will inserted automatically CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), ... Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  25. Populate your database • Be sure to insert data in correct order • Don't try and insert a foreign key value when the primary key value hasn't been inserted yet • Run the demodatabase.sql script Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  26. Querying your database • Now that your database is set up and data has been inserted we can query it Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  27. Database Sequence Type ID Accession Organism Name Number Swissprot Protein Phosphorylase B KPB1_Rabit Rabbit kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Proten Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein TRIC_RABIT Troponin I OCPHOS2 rabbit EMBL Nucleotide rabbit muscle phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein Q8MJF7 Rabbit pol protein EMBL Nucleotide OXPKA Rabbit Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  28. Querying your database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  29. Querying your database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

  30. Querying your database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend