Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk - PowerPoint PPT Presentation

Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk

Modelling and designing a database A well thought out database will consist of a co-ordinated set of tables that satisfy a set of requirements specified by the intended users .

Modelling and designing a database • Define clearly the purpose of the database • Define the data requirements • Prepare a data model • Translate the data model into a database design • Implement the design

Modelling and designing a database PEN and PAPER • Define the purpose of the database • Define the data requirements • Prepare a data model • Translate the data model into a database design • Implement the design

Modelling and designing a database • Define clearly the purpose of the database • Define the data requirements • Prepare a data model • Translate the data model into a database design • Implement the design COMPUTER

What’s wrong with a spreadsheet? • Redundant data • Inconsistent data • Quality control • Control over effects of changing the data • Inflexible • Limited sharing of data

My made-up data table Database Sequence Type Name ID Organism Name Swissprot Protein Phosphorylase KPB1_Rabit Rabbit B kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Proten Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein Troponin I TRIC_RABIT EMBL Nucleotide rabbit muscle OCPHOS2 rabbit phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein pol protein Q8MJF7 Rabbit EMBL Nucleotide Rabbit OXPKA Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase

Redundant data Database Sequence Type Name ID Organism Name Swissprot Protein Phosphorylase KPB1_Rabit Rabbit B kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Peptide Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein Troponin I TRIC_RABIT EMBL Nucleotide rabbit muscle OCPHOS2 rabbit phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein pol protein Q8MJF7 Rabbit EMBL Nucleotide Rabbit OXPKA Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase

Inconsistent data Database Sequence Type Name ID Organism Name Swissprot Protein Phosphorylase KPB1_Rabit Rabbit B kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Peptide Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein Troponin I TRIC_RABIT EMBL Nucleotide rabbit muscle OCPHOS2 rabbit phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein pol protein Q8MJF7 Rabbit EMBL Nucleotide Rabbit OXPKA Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase

Quality control Database Sequence Type Name ID Organism Name Swissprot Protein Phosphorylase KPB1_Rabit Rabbit B kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Peptide Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein Troponin I TRIC_RABIT EMBL Nucleotide rabbit muscle OCPHOS2 rabbit phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein pol protein Q8MJF7 Rabbit EMBL Nucleotide Rabbit OXPKA Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase

Control over the effects of changing data Database Sequence Type Name ID Organism Name Swissprot Protein Phosphorylase KPB1_Rabit Rabbit B kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Peptide Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein Troponin I TRIC_RABIT EMBL Nucleotide rabbit muscle OCPHOS2 rabbit phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein pol protein Q8MJF7 Rabbit EMBL Nucleotide Rabbit OXPKA Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase

Modelling and designing a database • Define clearly the purpose of the database • Define the data requirements • Analyse the requirements and prepare a data model • Translate the data model into a database design • Implement the design

Modelling and designing a database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement schema and database

Modelling and designing a database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement PEN and PAPER schema and database

A systematic approach will lead to a database that meets your needs! poor design == tears later

Establishing requirements Consult with the users of your system and agree upon • what the users need the database to do • what data needs to be stored to accomplish this Include in your considerations the following: • Data that can be derived should not be stored e.g. if you are storing the birth year of a person, you should not also store their age; it can easily be derived • The way you choose to organise the data can be affected by issues such as whether the data may be needed to answer a number of different questions, and whether you anticipate that additional types of data may be added in the future. In the end, you want your database to be useful for what it needs to do!

An artificial example Mouse Feed A Feed B Feed C m1 20g 20g m2 40g m3 10g 10g 20g Mouse FeedA Mouse FeedB Mouse FeedC m1 20g m3 10g m1 20g m3 10g m2 40g m3 10g Mouse FeedType Amount m1 FeedA 20g m1 FeedB 20g m2 FeedB 40g m3 FeedA 10g m3 FeedB 10g m3 FeedC 20g

Define the data requirements What persistent data is to be stored in the database? Write a description of what is required of the database, including the kinds of data that should be stored and what questions users would like to be able to ask. Information about the limits on the data and any issues like whether a certain type of data is required or not should also be included.

Define the data requirements Example: Initial Statement of Database Requirements A database is required that will hold information about entries from public databases of interest along with the type of features in that sequence. The users should be able to search for a database accession number, organism, or public database name and retrieve a listing of all relevant sequences. They should also be able to retrieve any sequences containing particular feature types. For each public database sequence listed, the users would like to see the URL for that database. Questions: • Does a sequence from a particular public database have to be stored in order for an entry for that public database to be added? • Do they want to store the sequence data? (implications for size and keeping up to date – can you set up an automated retrieval system from a local or remote copy of the public database?) • Do the users want to search for just organism names, or do they wish to search for strains as well? • Do they wish to store only formal names, or common names also? • Is there additional information the users would like to retrieve about the organisms, databases, features, etc that are not stated in this initial description?

Does a sequence from a particular public database have to be stored in order for an entry for that public database to be added? No Do they want to store the sequence data? No – automated retrieval preferred Do the users want to search for just organism names, or do they wish to search for strains as well? Strains as well Do they wish to store only formal names, or common names also? Common names as well Is there additional information the users would like to retrieve about the organisms, databases, features, etc that are not stated in this initial description? Yes – they would like to be able to see retrieve a list of public database names stored, along with the type of data that is stored in that database. They would also like to see if the genome sequence of an organism is available. Revised Statement of Database Requirements A database is required that will hold information about entries from public databases of interest along with the type of features in that sequence. The users should be able to search for a database accession number, organism name, formal or common , and/or strain , or public database name and retrieve a listing of all relevant sequences as well as whether the full genome sequence of the organism in question is publicly available . They should also be able to retrieve any sequences containing particular feature types. For each public database sequence listed, the users would like to see the URL for that database and the type of data it stores . The sequence data itself does not need to be stored in the database.

Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk - PowerPoint PPT Presentation

Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk Modelling and designing a database A well thought out database will consist of a co-ordinated set

Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk Modelling and designing a

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Designing Database Applications Walid G. Aref Roadmap for Designing Database Applications 1.

Class 14 Slides SLIDE what is the designing principle how does designing principle

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

Entity-Relationship Modelling The first step in designing a database schema is to model the

Designing Your Fashion Portfolio From Concept To Presentation Designing Your Fashion Portfolio

The Modelling and Simulation Process 1. History of Modelling and Simulation 2. Modelling and

(Modelling) Semantics of Modelling Languages Hans Vangheluwe 7 September 2010, Lisboa, Portugal

Modelling with Differential Equations Modelling with Differential Equations Modelling with

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

National Address Database National Address Database What is a National Address Database?

c p e c Writing Message-Passing Parallel Programs with MPI 1 Edinburgh Parallel Computing

Further abstraction techniques Abstract classes and interfaces 2.0 Main concepts to be covered

Draft Report on Strategic USG Engagement in International Standardization to Achieve U.S.

For South Africa Alison Hughes, Bruno Merven Energy Research Centre University of Cape Town 1

Lecture #24: Programming Languages and Programs Metalinguistic Abstraction A programming

SE320 Software Verification and Validation Requirements Testing, Code Testing Frameworks Prof.

Program Analysis Techniques: System Zoos Perspective Kwangkeun Yi Programming Research

Natural Language Processing CSCI 4152/6509 Lecture 27 Parsing with Prolog Instructor: Vlado

Sambuz

Useful Links

Newsletter

Mail Us