Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk - - PowerPoint PPT Presentation

modelling and designing a database
SMART_READER_LITE
LIVE PREVIEW

Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk - - PowerPoint PPT Presentation

Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk Modelling and designing a database A well thought out database will consist of a co-ordinated set


slide-1
SLIDE 1

Modelling and Designing a Database

Bela Tiwari btiwari@ceh.ac.uk

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

slide-2
SLIDE 2

A well thought out database will consist of a co-ordinated set of tables that satisfy a set of requirements specified by the intended users.

Modelling and designing a database

slide-3
SLIDE 3

Modelling and designing a database

  • Define clearly the purpose of the database
  • Define the data requirements
  • Prepare a data model
  • Translate the data model into a database design
  • Implement the design
slide-4
SLIDE 4
  • Define clearly the purpose of the database
  • Define the data requirements
  • Prepare a data model
  • Translate the data model into a database design
  • Implement the design

Modelling and designing a database

slide-5
SLIDE 5
  • Define the purpose of the database
  • Define the data requirements
  • Prepare a data model
  • Translate the data model into a database design
  • Implement the design

PEN and PAPER

Modelling and designing a database

slide-6
SLIDE 6
  • Define clearly the purpose of the database
  • Define the data requirements
  • Prepare a data model
  • Translate the data model into a database design
  • Implement the design

COMPUTER

Modelling and designing a database

slide-7
SLIDE 7

What’s wrong with a spreadsheet?

  • Redundant data
  • Inconsistent data
  • Quality control
  • Control over effects of changing the data
  • Inflexible
  • Limited sharing of data
slide-8
SLIDE 8

Rabbit OXPKA

Rabbit phosphorylase

Nucleotide EMBL

TRIC_RABIT

Troponin I

protein Swissprot

rabbit PHS2_RABIT

Glycogen phosphorylase

Proten Swissprot

rabit 1ABB

Glycogen Phosphorylase

protein_structure

PDB

Rabbit Q8MJF7

pol protein

protein TrEMBL

Rabbit CK829726

Nucleotide dbEST

rabbit OCPHOS2

rabbit muscle phosphorylase mrna

Nucleotide EMBL

Trt3_rabit

Troponin T

Protein UniProt

Rabbit KPB1_Rabit

Phosphorylase B kinase alpha regulatory chain

Protein Swissprot

Organism ID Name Sequence Type Database Name

My made-up data table

slide-9
SLIDE 9

Rabbit OXPKA

Rabbit phosphorylase

Nucleotide EMBL

TRIC_RABIT

Troponin I

protein Swissprot

rabbit PHS2_RABIT

Glycogen phosphorylase

Peptide Swissprot

rabit 1ABB

Glycogen Phosphorylase

protein_structure

PDB

Rabbit Q8MJF7

pol protein

protein TrEMBL

Rabbit CK829726

Nucleotide dbEST

rabbit OCPHOS2

rabbit muscle phosphorylase mrna

Nucleotide EMBL

Trt3_rabit

Troponin T

Protein UniProt

Rabbit KPB1_Rabit

Phosphorylase B kinase alpha regulatory chain

Protein Swissprot

Organism ID Name Sequence Type Database Name

Redundant data

slide-10
SLIDE 10

Rabbit OXPKA

Rabbit phosphorylase

Nucleotide EMBL

TRIC_RABIT

Troponin I

protein Swissprot

rabbit PHS2_RABIT

Glycogen phosphorylase

Peptide Swissprot

rabit 1ABB

Glycogen Phosphorylase

protein_structure

PDB

Rabbit Q8MJF7

pol protein

protein TrEMBL

Rabbit CK829726

Nucleotide dbEST

rabbit OCPHOS2

rabbit muscle phosphorylase mrna

Nucleotide EMBL

Trt3_rabit

Troponin T

Protein UniProt

Rabbit KPB1_Rabit

Phosphorylase B kinase alpha regulatory chain

Protein Swissprot

Organism ID Name Sequence Type Database Name

Inconsistent data

slide-11
SLIDE 11

Rabbit OXPKA

Rabbit phosphorylase

Nucleotide EMBL

TRIC_RABIT

Troponin I

protein Swissprot

rabbit PHS2_RABIT

Glycogen phosphorylase

Peptide Swissprot

rabit 1ABB

Glycogen Phosphorylase

protein_structure

PDB

Rabbit Q8MJF7

pol protein

protein TrEMBL

Rabbit CK829726

Nucleotide dbEST

rabbit OCPHOS2

rabbit muscle phosphorylase mrna

Nucleotide EMBL

Trt3_rabit

Troponin T

Protein UniProt

Rabbit KPB1_Rabit

Phosphorylase B kinase alpha regulatory chain

Protein Swissprot

Organism ID Name Sequence Type Database Name

Quality control

slide-12
SLIDE 12

Rabbit OXPKA

Rabbit phosphorylase

Nucleotide EMBL

TRIC_RABIT

Troponin I

protein Swissprot

rabbit PHS2_RABIT

Glycogen phosphorylase

Peptide Swissprot

rabit 1ABB

Glycogen Phosphorylase

protein_structure

PDB

Rabbit Q8MJF7

pol protein

protein TrEMBL

Rabbit CK829726

Nucleotide dbEST

rabbit OCPHOS2

rabbit muscle phosphorylase mrna

Nucleotide EMBL

Trt3_rabit

Troponin T

Protein UniProt

Rabbit KPB1_Rabit

Phosphorylase B kinase alpha regulatory chain

Protein Swissprot

Organism ID Name Sequence Type Database Name

Control over the effects of changing data

slide-13
SLIDE 13
  • Define clearly the purpose of the database
  • Define the data requirements
  • Analyse the requirements and prepare a data model
  • Translate the data model into a database design
  • Implement the design

Modelling and designing a database

slide-14
SLIDE 14

Modelling and designing a database

establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement

schema and database

slide-15
SLIDE 15

Modelling and designing a database

establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement

schema and database

PEN and PAPER

slide-16
SLIDE 16

A systematic approach will lead to a database that meets your needs! poor design == tears later

slide-17
SLIDE 17

Establishing requirements

Consult with the users of your system and agree upon

  • what the users need the database to do
  • what data needs to be stored to accomplish this

Include in your considerations the following:

  • Data that can be derived should not be stored

e.g. if you are storing the birth year of a person, you should not also store their age; it can easily be derived

  • The way you choose to organise the data can be affected by issues such

as whether the data may be needed to answer a number of different questions, and whether you anticipate that additional types of data may be added in the future. In the end, you want your database to be useful for what it needs to do!

slide-18
SLIDE 18

An artificial example

Feed C Feed B Feed A Mouse 20g 10g 10g m3 40g m2 20g 20g m1 20g m1 10g m3 FeedA Mouse 40g m2 20g m1 10g m3 FeedB Mouse 10g m3 FeedC Mouse 40g FeedB m2 10g FeedA m3 10g FeedB m3 20g FeedC m3 20g FeedB m1 20g FeedA m1 Amount FeedType Mouse

slide-19
SLIDE 19

Define the data requirements

What persistent data is to be stored in the database? Write a description of what is required of the database, including the kinds of data that should be stored and what questions users would like to be able to ask. Information about the limits on the data and any issues like whether a certain type of data is required or not should also be included.

slide-20
SLIDE 20

Define the data requirements

Example: Initial Statement of Database Requirements A database is required that will hold information about entries from public databases of interest along with the type of features in that sequence. The users should be able to search for a database accession number, organism, or public database name and retrieve a listing of all relevant sequences. They should also be able to retrieve any sequences containing particular feature types. For each public database sequence listed, the users would like to see the URL for that database. Questions:

  • Does a sequence from a particular public database have to be stored in order for an

entry for that public database to be added?

  • Do they want to store the sequence data? (implications for size and keeping up to

date – can you set up an automated retrieval system from a local or remote copy of the public database?)

  • Do the users want to search for just organism names, or do they wish to search for

strains as well?

  • Do they wish to store only formal names, or common names also?
  • Is there additional information the users would like to retrieve about the organisms,

databases, features, etc that are not stated in this initial description?

slide-21
SLIDE 21

Does a sequence from a particular public database have to be stored in order for an entry for that public database to be added? No Do they want to store the sequence data? No – automated retrieval preferred Do the users want to search for just organism names, or do they wish to search for strains as well? Strains as well Do they wish to store only formal names, or common names also? Common names as well Is there additional information the users would like to retrieve about the organisms, databases, features, etc that are not stated in this initial description? Yes – they would like to be able to see retrieve a list of public database names stored, along with the type of data that is stored in that database. They would also like to see if the genome sequence of an organism is available.

Revised Statement of Database Requirements

A database is required that will hold information about entries from public databases of interest along with the type of features in that sequence. The users should be able to search for a database accession number, organism name, formal or common, and/or strain, or public database name and retrieve a listing of all relevant sequences as well as whether the full genome sequence of the organism in question is publicly available. They should also be able to retrieve any sequences containing particular feature types. For each public database sequence listed, the users would like to see the URL for that database and the type of data it stores. The sequence data itself does not need to be stored in the database.

slide-22
SLIDE 22

A way of formally representing your database requirements.

Your model will include a list of entity types, constraints, and assumptions and an E-R diagram. Entity: a representation of a thing that there is a need to record data about Relationship: an association between entities that needs to be recorded E-R diagram: a pictorial representation of your data requirements

Entity Relationship Modelling

slide-23
SLIDE 23

A good first step to defining the entities – underline the nouns in the requirements statement

Statement of Database Requirements A database is required that will hold information about sequences from public databases of interest along with the type of features in that sequence. The users should be able to search for a database accession number,

  • rganism name, formal or common, and/or strain, or public database name and retrieve a listing of all

relevant sequences as well as whether the full genome sequence of the organism in question is publicly

  • available. They should also be able to retrieve any sequences containing particular feature types. For each

public database sequences listed, the users would like to see the URL for that database and the type of data it stores. The sequence data itself does not need to be stored in the database. Possible entities: PublicDatabase Sequence Organism Feature

Other possibilities: GenomeSequence……………………..(should we go back to the users and ask about this?) CommonNames……………………......(should we go back to the users and ask about this?)

Entity Relationship Modelling

slide-24
SLIDE 24

Representing Entities and Relationships in an E-R diagram

Entity:

Entity Relationship Modelling

EntityName

Relationship

EntityTypeOne EntityTypeTwo

RelationshipName

slide-25
SLIDE 25

Relationships

E.g. A PublicDatabase may contain many Sequences Holds An Organism may have many Sequences recorded ContainsSeqs or ComesFrom

Entity Relationship Modelling

slide-26
SLIDE 26

Representing Entities and Relationships in an E-R diagram Different types of Relationships 1:1

Entity Relationship Modelling

EntityTypeOne EntityTypeTwo

RelationshipName

1:n (1 to many) m:n (many to many)

EntityTypeOne EntityTypeTwo

RelationshipName

EntityTypeOne EntityTypeTwo

RelationshipName

slide-27
SLIDE 27

Representing Entities and Relationships in an E-R diagram Different types of Relationships

Entity Relationship Modelling

  • ptional

mandatory

PublicDatabase Sequence

Holds

PublicDatabase Sequence

Holds

slide-28
SLIDE 28

Based on our Database Requirements

We defined our entities as: PublicDatabase Sequence Organism Feature What type of information do we need to store about each of these data types? What is the relationship between these entities? What type of relationship associates each entity? What will we use to identify each instance of each entity? What assumptions have we made in making our model? What constraints are there on our data?

Entity Relationship Modelling

slide-29
SLIDE 29

What type of information do we need to store about each of these data types? PublicDatabase(DatabaseName, DataType,URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name)

Entity Relationship Modelling

slide-30
SLIDE 30

What will we use to identify each instance of each entity? PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name) What assumptions have we made so far? Every public database has a unique name The AccessionNumbers of sequences are unique across all databases Feature ID’s are unique Every species has a strain Each species has at most one common name

Entity Relationship Modelling

slide-31
SLIDE 31

What will we use to identify each instance of each entity? PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name) What assumptions have we made so far? Every public database has a unique name The AccessionNumbers of sequences are unique across all databases Feature ID’s are unique Each species has at most one common name

Currently we have no unique identifier for the Organism entity. We will deal with this during the design stage.

Entity Relationship Modelling

slide-32
SLIDE 32

What is the relationship between these entities? PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name)

Entity Relationship Modelling

PublicDatabase Sequence Organism

Holds Contains ComesFrom

Feature

slide-33
SLIDE 33

What type of relationship associates each entity? PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name)

Entity Relationship Modelling

PublicDatabase Sequence Organism

Holds Contains ComesFrom

Feature

slide-34
SLIDE 34

What assumptions have we made? Every public database has a unique name The AccessionNumbers of sequences are unique across all databases Feature ID’s are unique Each species has at most one common name Feature types are not recorded unless they are found in a sequence in a database Information about a public database can be recorded even if no sequences from it are in our database Sequences do not require that you have recorded information about the organism they come from Organism information is only recorded if a sequence from that organism is present in the database

Entity Relationship Modelling

PublicDatabase Sequence Organism

Holds Contains ComesFrom

Feature

slide-35
SLIDE 35

Constraints

Not all aspects of the data can be represented by the model. Things that cannot be represented should be listed as constraints Examples: if modelling university course applications, a constraint might be that each student can only take a total of 5 courses in a year If modelling student records, a constraint might be that grades can be

  • nly in the range 0 to 100.
slide-36
SLIDE 36

Our model

PublicDatabase Sequence Organism

Holds Contains ComesFrom

Feature

Entity types: PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name) Constraints: Assumptions: Every public database has a unique name The AccessionNumbers of sequences are unique across all databases Feature ID’s are unique Each species has at most one common name Feature types are not recorded unless they are found in a sequence in a database Information about a public database can be recorded even if no sequences from it are in our database Sequences do not require that you have recorded information about the organism they come from Organism information is only recorded if a sequence from that organism is present in the database

slide-37
SLIDE 37

Modelling and designing a database

establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement

schema and database

slide-38
SLIDE 38

Modelling and designing a database

establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement

schema and database

slide-39
SLIDE 39

The next step…..design

A conceptual data model and the “logical schema specification”

  • f database design are different:

A conceptual data model like an E-R model represents the data independently of any concerns about implementation as a database. A logical schema, or design describes a set of tables and more closely describes how data will be stored in a database.

slide-40
SLIDE 40

Database Design

How are we going to represent our E-R model in the database? How do we represent the entities? How do we represent the relationships? How do we represent the constraints?

slide-41
SLIDE 41

Database Design

How do we represent the entities?

As a first step, we can represent each entity as a “relation” (a table), having attributes (columns) corresponding to the attributes of the entity.

Entity types: PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name) Relations: PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name)

The identifier from an Entity has become the primary key for the corresponding Relation.

slide-42
SLIDE 42

Database Design

What about the Organism entity?

Recall: We had no clear identifier for instances of this Entity. Organism (Species, Strain, GenomeSeq, CommonName) We can create a surrogate key for Organism. A surrogate key is a unique value that is not derived from any data in the database; its only significance is to act as the primary key of a relation. Organism (OrganismNo, Species, Strain, GenomeSeq, CommonName)

OrganismNo is a surrogate key in the Organism Relation

slide-43
SLIDE 43

Database Design

So, the Entities from our model are now translated into corresponding Relations in our design:

Entity types:

PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name)

Relations:

PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (OrganismNo, Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name) and each Relation has a designated primary key.

slide-44
SLIDE 44

Primary Key

A primary key is a unique and minimal identifier for instances within a relation.

For example: PublicDatabase(DatabaseName, DataType, URL)

DatabaseName DatabaseName, DataType DatabaseName, DataType, URL

Each of these uniquely identify any given instance of PublicDatabase. Only DatabaseName is both unique and minimal

slide-45
SLIDE 45

Database Design

For each relation, we need to define its name and its primary key. For each attribute, we will need to determine what the data type is.

relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName Recall: this is not the database implementation stage!! SQL has no type “String”. For implementation there will be a number of data types to choose from that are strings, but have varying conditions

  • r restrictions.

At the stage, you can describe the data type you with any descriptive name you

  • like. At implementation stage, you will choose between the available types in

the database system you are using.

Some other common data types for attributes include Integer, Real, Date, Time, etc.

In fact, in this database, most of the data types are “string”.

slide-46
SLIDE 46

Our design so far:

relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: string Name: string primary key: FeatID So, now the entities have been represented. What about the relationships?

slide-47
SLIDE 47

PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (OrganismNo, Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name)

Entity Relationship Modelling

PublicDatabase Sequence Organism

Holds Contains ComesFrom

Feature

slide-48
SLIDE 48

Representing 1:n relationships

PublicDatabase Sequence

Holds

AB117611 Q04917 AE017138 AK004199 Q919P1 2.103764336-104764335 EMBL SwissProt EnsEMBL E.g. Here, you would wish to record in the Sequence relation that sequence: AB117611 is held in the EMBL database Q04917 is held in the SwissProt database AE017138 is held in the EMBL database and so on.

slide-49
SLIDE 49

Representing 1:n relationships

I.e. we want to record the primary key information from the PublicDatabase relation in the appropriate rows of the Sequence relation. The information we wish to record about a Sequence in the Sequence relation might look something like: The description of the Sequence relation would have a new line added: relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase

EMBL Thang3 AE017138 SwissProt Anotherthing Q04917 EMBL Something AB117611 SourceDatabase ID AccessionNumber

slide-50
SLIDE 50

Representing 1:n relationships

Terminology

We declare a foreign key in the relation on the :n side of the relationship. foreign key: SourceDatabase references PublicDatabase This foreign key refers to a unique attribute (usually the primary key) in the relation on the 1: side of the relationship. foreign key: SourceDatabase references PublicDatabase

EMBL Thang3 AE017138 SwissProt Anotherthing Q04917 EMBL Something AB117611 SourceDatabase ID AccessionNumber

slide-51
SLIDE 51

Representing 1:n relationships

EnsEMBL 2.103764336- 104764335 2.103764336- 104764335 SwissProt 3MPH_AERPE Q919P1 EMBL AK004199 AK004199 EMBL AE107138 AE107138 SwissProt 143F_HUMAN Q04917 EMBL AB117611 AB117611 SourceDatabase ID AccessionNumber

www.ensembl.org

nucleotide EnsEMBL

ca.expasy.org/sprot/

peptide SwissProt

www.ebi.ac.uk/embl/index.html

nucleotide EMBL URL DataType DatabaseName 2.103764336-104764335 2.103764336-104764335 3MPH_AERPE Q919P1 AK004199 AK004199 AE107138 AE107138 143F_HUMAN Q04917 AB117611 AB117611 ID AccessionNumber

Original Sequence Relation The Holds relationship is now represented by the posting of SourceDatabase as a foreign key in the Sequence relation.

slide-52
SLIDE 52

Representing 1:n relationships

EnsEMBL 2.103764336- 104764335 2.103764336- 104764335 SwissProt 3MPH_AERPE Q919P1 EMBL AK004199 AK004199 EMBL AE107138 AE107138 SwissProt 143F_HUMAN Q04917 EMBL AB117611 AB117611 SourceDatabase ID AccessionNumber

Revised Sequence Relation

relation Sequence AccessionNumber: string ID: string primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase

You now need to represent all the other relationships to complete your design.

slide-53
SLIDE 53

Complete the design for this model…

PublicDatabase Sequence Organism

Holds Contains ComesFrom

Feature

Entity types: PublicDatabase(DatabaseName, DataType, URL) Sequence (AccessionNumber, ID) Organism (OrganismNo, Species, Strain, GenomeSeq, CommonName) Feature (FeatID, Name) E.g. for each relation, you will have a listing like: relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase

slide-54
SLIDE 54

Our design so far:

relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: String Name: String primary key: FeatID foreign key: SourceSequence references Sequence

slide-55
SLIDE 55

More on Relationships

EntityTypeOne EntityTypeTwo

RelationshipName

m:n (many to many)

EntityTypeOne EntityTypeTwo

RelationshipName

1:1

Remember these?

slide-56
SLIDE 56

Representing 1:1 relationships

A picture is worth a thousand words…

Team PostDoc

ConsistsOf

  • M. Smith
  • J. Jones
  • G. Hallelujah
  • M. Hunt
  • B. Dale
  • C. Cook

TeamA TeamB

HeadedBy

slide-57
SLIDE 57

Representing 1:1 relationships

Team PostDoc

ConsistsOf

  • M. Smith
  • J. Jones
  • G. Hallelujah
  • M. Hunt
  • B. Dale
  • C. Cook

TeamA TeamB

HeadedBy

slide-58
SLIDE 58

Representing 1:1 relationships

Team PostDoc

ConsistsOf HeadedBy relation Team TeamName: String TeamLoc: String TeamPhone: String primary key: TeamName relation PostDoc PostDocId: String Name: String SpecialistSubject: String primary key: PostDocId foreign key: TeamName references Team

How do we represent the relationship HeadedBy?

slide-59
SLIDE 59

Representing 1:1 relationships

A picture is worth a thousand words…

Team PostDoc

ConsistsOf HeadedBy

How do we represent the relationship HeadedBy?

Foreign key!

Because this relationship is symmetrical, you can choose to post the foreign key in Team or PostDoc. Make the decision based on what the most natural grouping of the data is.

slide-60
SLIDE 60

Representing 1:1 relationships

Team PostDoc

ConsistsOf HeadedBy

Here, you would probably choose to post the foreign key into the Team relation

relation Team TeamName: String TeamLoc: String TeamPhone: String primary key: TeamName foreign key: TeamHead references PostDocs

Reasoning:

It seems a natural attribute of a team to have a team leader. Each team will have a leader. Every PostDoc will not also be a team leader, so if you chose to put the foreign key in the PostDoc relation, many rows will contain null values for the TeamHead attribute.

slide-61
SLIDE 61

Representing 1:1 relationships

Team PostDoc

ConsistsOf HeadedBy

So far, this is the same as dealing with 1:n relationships.

You now need a mechanism to ensure that values of TeamHead in the Team relation are unique to represent this as a 1:1 relationship. In a model you can represent this by declaring this attribute as an alternate key of Team.

Alternate keys, like primary keys, constrain the set of valid states of a relation – i.e. no two rows can contain the same entry for this attribute.

relation Team TeamName: String TeamLoc: String TeamPhone: String primary key: TeamName alternate key: TeamHead foreign key: TeamHead references PostDocs

slide-62
SLIDE 62

Representing m:n relationships

SoftwareCategory BioinfProgram

BelongsTo

Here we have bioinformatics programs belonging software categories; each program can belong to one or more categories. Each software category can contain one or more bioinformatics programs. A picture is worth a thousand words…

NucleicRestriction NucleicTranslation ProteinComposition Display remap backtranseq restrict showseq transeq pepstats compseq abiview

slide-63
SLIDE 63

Representing m:n relationships

You can represent m:n relationships by using another relation to represent the relationship itself.

SoftwareCategory BioinfProgram

BelongsTo

SoftwareCategory BioinfProgram BelongsTo

You now have your m:n relationship represented as two 1:n relationships (and a new relation). You can represent these relationships in your model using the foreign key mechanism introduced earlier.

slide-64
SLIDE 64

SoftwareCategory BioinfProgram BelongsTo SoftwareCategory BioinfProgram

BelongsTo relation SoftwareCategory CategoryName: String Description: String primary key: CategoryName relation BioinfProgram ProgramName: String Description: String primary key: ProgramName relation SoftwareCategory CategoryName: String Description: String primary key: CategoryName relation BioinfProgram ProgramName: String Description: String primary key: ProgramName relation BelongsTo CategoryName: String ProgramName: String primary key: (CategoryName, ProgramName) foreign key: CategoryName references SoftwareCategory foreign key: ProgramName references BioinfProgram

slide-65
SLIDE 65

Modelling and designing a database

establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement

schema and database