Modern database systems & their applications Spring 2012 - - PowerPoint PPT Presentation

modern database systems their applications spring 2012
SMART_READER_LITE
LIVE PREVIEW

Modern database systems & their applications Spring 2012 - - PowerPoint PPT Presentation

Modern database systems & their applications Spring 2012 Lecturer: Serafim Dahl serafim@csc.kth.se DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 1 / 48 General information Course credits 7,5 hp


slide-1
SLIDE 1

Modern database systems & their applications Spring 2012 Lecturer: Serafim Dahl

serafim@csc.kth.se

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 1 / 48

slide-2
SLIDE 2

General information

  • Course credits 7,5 hp
  • In this course we elaborate on databases
  • You must have basic knowledge about relational database concepts,

relational algebra, relational calculus, functional dependencies and database modelling.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 2 / 48

slide-3
SLIDE 3

What will we do on the course?

We’ll look at:

  • problems with relational databases,
  • attempts to solve the problems and
  • how the problems actually were solved (if at all . . . ).
  • Modelling,
  • new data types (e.g. multimedia data types) and
  • indexes for new data types
  • Object / Object relational databases.
  • Database driven applications / web applications with db back-end

(architecture, how to build them)

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 3 / 48

slide-4
SLIDE 4

What we will actually do . . .

We will:

  • model, sometimes with inadequate tools, to get a grip on what tools that are adequate

in different situations,

  • practice some on XML and indexes and
  • build layered applications with database back-end

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 4 / 48

slide-5
SLIDE 5

Literature

Any textbook from the basic courses on database technology will do, e.g. Elmasri, Navathe: ”Fundamentals of Database Systems” Connolly, Begg: ”Database Systems”, Addison-Wesley Silberschatz, et al: ”Database System Concepts”, McGraw-Hill They focus on slightly different things but cover most of the issues on this course. The slides and the web will do for what the books don’t cover

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 5 / 48

slide-6
SLIDE 6

Database systems

user/appl user/appl user/appl

OS

DBMS

communication database

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 6 / 48

slide-7
SLIDE 7

Database systems . . .

...

conceptual internal schema Data definition manipulation Data Data external schema 1 schema 2 external external schema n administration schema

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 7 / 48

slide-8
SLIDE 8

Database applications

swingGUI applet DBMS DBMS DBMS jdbc esql wrapper

  • rm/map

HTTP HTTP RMI/IIOP front−end back−end web browser layer logical RMI/IIOP HTTP/XML RMI/IIOP RMI/IIOP RMI/IIOP

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 8 / 48

slide-9
SLIDE 9

Web database applications

Client appl Applet HTML XHTML XML Web server HTTP Sessions Streams Serialization Client tion layer Application layer JSP engine Servlets Servlet engine Data layer JDBC Java Mail JNDI RMI IIOP DB Jini Application server Presenta− Sockets Error log Security manager Connection pool File system Java− Spaces Bean Bean Bean Bean Session beans Entity beans EJB server PHP engine Bean Bean JSP JVM Mail server Catalogue service Java application CORBA application

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 9 / 48

slide-10
SLIDE 10

what is a relational database (on this course)?

A relational database is a set of schemas, {R1, R2, . . . , Rn}. Each schema Ri has more than one interpretation. It defines a type for the tuples that the corresponding relation may contain and the type for the relation as well and the set of attributes {A1, A2, . . . , Ak} that identify the values of the tuples The type for the tuples is defined by the fact that each attribute name Aj is associated with a domain Thus, a tuple define a mapping that to each attribute sets a value from its domain

  • r as a function that to each tuple in a database identifies a value from the power

domain of the database universe.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 10 / 48

slide-11
SLIDE 11

A relational database . . .

So, a relational database is a set of sets (of sets . . . ). It is important to note that an “SQL database” without restrictions is not really a relational database as it is possible to store duplicates. Thus, the SQL language defines relational databases as if the type was “bag”. To enforce the set type we need primary key restrictions. But that is not all . . .

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 11 / 48

slide-12
SLIDE 12

A relational database . . .

A relational database is a variable, logically consistent set of interrelated data,

  • rganized to reduce redundancy to a minimum and to facilitate maintenance of

data integrity. The set of data is collected for a specific purpose and the database shall

  • 1. correspond well with the “reality” it is supposed to represent
  • 2. not allow storing data that is impossible in the corresponding “reality”
  • 3. not allow storing undesirable data
  • 4. contain a minimum of redundant data
  • 5. not contain contradictory data (redundant data must match)

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 12 / 48

slide-13
SLIDE 13

A relational database . . .

Thus we must normalise the database 1NF All domains are atomic 2NF Every non-prime attribute is fully functionally dependent of every candidate key 3NF Every non-prime attribute is non-transitively dependent of every candidate key. 3NF , Alt For each FD X → A at least one of the following holds

  • X → A is trivial
  • X is a super-key
  • A is a prime attribute

BCNF Every determinant is a super-key (which implies the first two rules for 3NF)

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 13 / 48

slide-14
SLIDE 14

Important relational database properties

  • All FDs are preserved in a 3NF normalized DB
  • All search is performed by matching stored values
  • Keys, candidate keys & foreign keys

Restrictions

  • Domains, predefined and user defined
  • Unique for candidate keys
  • Triggers and assertions
  • NOT NULL

Design

  • Conceptual
  • Logical
  • Physical

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 14 / 48

slide-15
SLIDE 15

Functional dependencies

  • Used to describe restrictions
  • Transitive closures for verification of keys
  • Starting-point for normalization

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 15 / 48

slide-16
SLIDE 16

Normalization

  • Analysis
  • Find all candidate keys
  • Split the (universal) relation until the result satisfies the demands for 1NF, 2NF,

3NF/BCNF

  • Synthesis
  • Find a minimal set of FDs
  • Rewrite FD set on canonical form
  • The obtained set of FDs implicates the database structure
  • Analysis vs Synthesis
  • Analysis requires that we find all candidate keys (NP-complete), for synthesis
  • ne is enough
  • Synthesis require good mathematical knowledge about FDs, while analysis can

be performed mechanically

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 16 / 48

slide-17
SLIDE 17

Query languages, mathematics

Relational algebra: σ, π, ×, ⊲

⊳, ∪, \

Tuple calculus: {t|R(t) ∧ φ(t)} Domain calculus: {a, b, c|R(a, b, c) ∧ φ(a, b, c)}

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 17 / 48

slide-18
SLIDE 18

SQL, DML

select [ distinct ] list-of-attributes // Projection from list-of-base-relations // Cartesian product [ where list-of-predicates ] // Selection [ group by second-list-of-attributes // Partitioning [ having another-list-of-predicates ] ] // Demands on each partition [ order by third-list-of-attributes ] // Sorting

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 18 / 48

slide-19
SLIDE 19

SQL, DML . . .

We also have union, except, intersect, join, natural join, outer join But must distinguish between SQL(-86,-87,-89), SQL-92 (SQL2) – major revision of SQL (mostly sharpening semantics defs) and SQL3, which is still evolving: SQL:1999 introduced regular expressions, recursive queries, triggers, support for programming features (procedures, flow-control), non-scalar types, and some

  • bject-oriented features.

SQL:2003 introduced XML-stuff, window functions, standardized sequences SQL:2006 expanded XML features SQL:2008 expanded use of ORDER BY, added INSTEAD OF triggers, added TRUNCATE statement (delete all data in a relation).. . . )

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 19 / 48

slide-20
SLIDE 20

Communication with databases

May be achieved through a variety of techniques

embedded SQL (ESQLC, ESQLP , . . . ) SQL statements embedded into a standard programming language – works with nearly all programming languages JSQL SQL embedded into a Java program ODBC Object DataBase Connectivity – works with most any programming language JDBC Java DataBase Connectivity JSP , PHP , Perl, Python and other scripting languages for database access from web applications Perl, tk/tcl, Python for access from traditional applications.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 20 / 48

slide-21
SLIDE 21

The relational model

Introduced because of shortcomings in earlier models Advantages:

  • Easy to understand as a database may be viewed as a set of tables
  • Good theoretical foundation. Simple mathematics (extensions to well established

mathematical disciplines)

  • High degree of data independence (as data is managed through a DBMS)
  • High level languages for data manipulation
  • Good security, good data integrity
  • Optimizers allow for “ad hoc”-queries
  • Simple control to ensure that there are no duplicates

(as, with a tiny twist, everything are sets)

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 21 / 48

slide-22
SLIDE 22

The relational model – drawbacks

  • Bad correspondence between database representation and “reality”
  • Semantically weak – basically only one data type, the relation.
  • Weak support for maintaining data integrity (better in SQL3).
  • Complex structures are “flattened”, dynamical become static.
  • Hard to manage the transition from conceptual schema to external schema as the

conceptual schema describes logical structure but not dynamic change.

  • Some interesting queries can not be asked (recursive queries though works in SQL3).
  • Demands from new application areas are not easily satisfied.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 22 / 48

slide-23
SLIDE 23

The remedies

A number of models have been suggested as a remedy for the shortcomings of the relational model:

  • Semantic models. In reality a modelling refinement for traditional relational databases.

New modelling features are introduced to allow for a more fine grained semantic

  • structure. Grouping of attributes and relations between the groups allow for a primitive

sub-classing and inheritance. An important step towards modern databases.

  • Functional models. Builds upon (typed) λ-calculus. Everything is a function and

querying is performed using function application. Mainly traditional databases (network or relational) wrapped in a functional “shell”. Common language for application, definition and querying. An important step towards rich type systems.

  • Logical models. Models built on predicate calculus (thus relational). Relationships that

are not generally obvious in the relational model are represented by production rules. Tuples, parts of tuples and domains are represented as axioms. Also standard databases wrapped in a shell. Prolog or “prolog-like” languages for querying using logical inference. Form the basis for expert systems and deductive systems.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 23 / 48

slide-24
SLIDE 24

The remedies . . .

  • Nested relational model. (Complex-value-model). The name stems from the fact that

relations may contain relational attributes. Thus, not even 1NF. Demands extensions and modification of relational mathematics. Important step towards object and object-relational databases.

  • Object models. Builds upon class mechanisms in object oriented languages.

Difference between passive systems where the database contain only what is necessary to recreate objects and active systems where binary, runnable objects are stored able to respond to method calls (= execute in database active memory).

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 24 / 48

slide-25
SLIDE 25

Other important research and development

  • Persistent programming. “Persistence” – a radical change of “scope”.

Normally, the values of program variables survive as long as they are “in scope” and as long as the program is running. “Persistence” expands the “life time” of variables so that a persistent variable may conserve its values until outside of the running program and keep its value until the next time the program runs. You need to make a difference between transient (short-lived or ephemeral) objects and persistent (long-lived) objects or to be able to dynamically control the life-time of individual objects. Introduced in PS-Algol and Galileo, further developed in Napier and PJama. People from the PS-Algol, Galileo and Napier projects were employed by Sun for the PJama project.

  • Object-relational models. In object systems the class notion has replaced the relation

as central concept pushing back relations to be one of many data types (at most) while on object-relational systems the relation remains the central concept though the type system is richer than in traditional databases and, typically, extendible.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 25 / 48

slide-26
SLIDE 26

Modelling

When you create complex systems, systems that others will support or systems that are modelled from other systems, you must build models. Many tools exist, most of them graphical in some sense. IRM, ER, UML,. . . . The basic concepts in relational database modelling are object classes and relationship classes (here according to the ER method)

  • bject class

class relationship

Both may be decorated with attributes in ovals which with growing complexity are conveniently eliminated from the graphical model and placed in an attribute matrix. Here is the sample database (the warehouse) used in the introductory courses with the above notation:

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 26 / 48

slide-27
SLIDE 27

Modelling . . .

item name salary manager floor dept department works_on volume supply supplier company address itemno type volume sales employee

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 27 / 48

slide-28
SLIDE 28

Modelling . . .

The model contains the elements of the most basic level of database modelling and, with some common sense and some experience, quite enough for most modelling situations. My opinion is that it enough for an introductory course on database technology. You have the tools to discuss most of the problems that occur. With a couple of small extensions we can express the foreign key restriction stating that a person may only be employed at an already existing department.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 28 / 48

slide-29
SLIDE 29

Semantic modelling – step 1

item name salary manager floor dept department works_on volume supply supplier company address itemno type volume sales

employee

We have now reached a level normally referred to as semantic modelling

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 29 / 48

slide-30
SLIDE 30

Semantic modelling – step 2

Sometimes you find base-relations with a considerable overlap in the schema, like Staff (ssn, firstName, lastName, salary, phone, streetAddress, streetNo, postalCode, city) and Client(ssn, firstName, lastName, phone, streetAddress, streetNo, postalCode, city) with the common attributes

{ssn, firstName, lastName, phone, streetAddress, streetNo, postalCode, city}

where the overlap is not on foreign keys. This might be viewed as the common part being an object class, quite similar to the superclass notion in object oriented programming languages. We may picture this as

Person Staff Client Person Staff Client

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 30 / 48

slide-31
SLIDE 31

Semantic modelling – step 2 . . . (2)

We may put the common part in its own base-relation which means that the parts

  • f an object will be distributed over a set of base-relations. Thus it is necessary for

a special notation for “different kinds of inheritance” that are thinkable in the context of databases. Here I follow Elmasri & Navathe as it is clear (though slightly cumbersome in terms of graphics). Notation for the previous example:

d Person Client Staff

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 31 / 48

slide-32
SLIDE 32

Semantic modelling – step 2 . . . (3)

Let’s say that people in the staff are either permanently employed (perm) or employed by the hour (hour) and that she/he is either a salesperson (sales), an engineer (eng) or administrator (admin) and that, in addition may be an executive (exec).

admin d d sales eng exec perm hour staff

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 32 / 48

slide-33
SLIDE 33

Semantic modelling – step 2 . . . (4)

You may be an admin executive (multiple inheritance)

admin d d sales eng exec perm hour staff exec admin

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 33 / 48

slide-34
SLIDE 34

Semantic modelling – step 2 . . . (5)

In a university it is possible to be both a student and a teacher at the same time. Also, you may be a student representative and / or a student assistant. In addition, if you have your first degree you may be a post graduate student and belong to

  • ne faculty as a student and another as a teacher.

A complex situation, impossible to represent in a standard ER-model. But not impossible in a semantic model

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 34 / 48

slide-35
SLIDE 35

Semantic modelling – step 2 . . . (6)

d merit student d person systgrp faculty stud.ass. ex stud staff grad not grad O

ssn name familyname

jobtype

address amount date salary firstname no pcode street

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 35 / 48

slide-36
SLIDE 36

Semantic modelling – step 2 . . . (7)

Our analysis so far has been top-down If you have to use a bottom-up analysis you get other kind of model constructs and you no longer talk about classes and subclasses but rather about categories and subcategories. You often get a mix of classes, subclasses, categories and subcategories in semantic models. Even categories may be partial or total

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 36 / 48

slide-37
SLIDE 37

Semantic modelling – step 2 . . . (8)

U d magazine book book publication magazine ISSN ISBN publication

We have reached the advanced level in semantic modelling. A slightly extended example:

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 37 / 48

slide-38
SLIDE 38

Semantic modelling – step 2 . . . (9)

locations lname address sex salary minit name fname bdate ssn dependents_of dependent name works_for works_on manages start_date number_of_employees sex birth_date relationship Employee supervision hours name number location project controls department name number N N N N 1 1 1 1 1 1 supevisor supervisee N M

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 38 / 48

slide-39
SLIDE 39

Semantic modelling – step 2 . . . (10)

More graphical symbols means more semantic information but also more complicated to transform to a logical model or database structure

  • 1. For every regular (strong) entity type we create a relation. Include all simple attributes and only

the components of composite attributes. Choose one of the keys as primary key. employee (ssn, fname, minit, lname, bdate, address, salary, sex) department (name, number) project (number, name, location)

  • 2. Do the same for weak entity types. Include the primary key in the owner relation as partial key

attribute(s) and skip the identifying relationship type. I place the owner entity type primary key first and rename it for clarity (not necessary). If there are attributes in the identifying relationship type, include them too. dependent (e ssn, name, sex, birth date, relationship)

  • 3. For every 1:1 relationship type, if both related entity types have total participation, merge them,

including all attributes in the relationship type. If one has total participation, treat the relationship type as if it was of type 1:N with the “N-side” in the same direction as the total participation. Normally this minimizes null values (my choise for this example). If neither entity type has total participation you may treat the relationship as if it was of type M:N

  • r as one of the above cases. It takes thorough analysis to determine which method is the best.

department (name, number, m ssn, m start date)

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 39 / 48

slide-40
SLIDE 40

Semantic modelling – step 2 . . . (11)

  • 4. For every 1:N relationship type, include in the relation representing the 1-side entity type all

relationship type attributes plus the key attribute(s) of the entity type on N-side. Note that we already skipped “dependents of”. If there are more than one candidate key on the 1-side, you have a choice. It is customary to rename some or all included attributes for clarity. employee (ssn, fname, minit, lname, bdate, address, salary, sex, s ssn, d no) project (number, name, location, d no)

  • 5. For every M:N relationship type we create a base-relation where the key attributes from the entity

types become the key attributes in the base-relation. You may choose which of the candidate/primary keys to use. The chosen key attributes become foreign keys as well. Here we must use CASCADE both for update and delete works on (e ssn, p number, hours)

  • 6. For every multivalued attribute we create a base-relation where we include in the key both the

multivalued attribute itself and the key of the entity type that it belongs to and we also denote the entity type key as a foreign key dept locations (d number, d location) Here we must also use CASCADE both for update and delete

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 40 / 48

slide-41
SLIDE 41

Semantic modelling – step 2 . . . (12)

  • 7. Let every category be represented by a surrogate key and if the category itself contains

attributes, create a base-relation where the surrogate key becomes the primary key and a foreign key in the base-relations that make up the category there are no categories in the model

  • 8. For every N-ary relationship type (N > 2) create a base-relation where the key is formed by the

key attributes from the associated entity types and also make them foreign keys there are no N-ary relationship types in the model Final result employee (ssn, fname, minit, lname, bdate, address, salary, sex, s ssn, d no) department (name, number, m ssn, m start date) project (number, name, location, d no) dependent (e ssn, name, sex, birth date, relationship) works on (e ssn, p number, hours) dept locations(d number, d location) In this simple example we don’t get much more than in a regular ER-model. The only difference is that e ssn in dependent would have been just a foreign key in a regular ER-model and that we cannot get a good result from reverse engineering. But in more complex cases it helps.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 41 / 48

slide-42
SLIDE 42

Complex values

The objective is to be able to represent a larger variety of data structures than what is possible with the relational (ER) model.

  • Hierarchy
  • List
  • Continuous (array)
  • Discontinuous (linked)
  • Set
  • Bag

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 42 / 48

slide-43
SLIDE 43

Complex values . . .

Motivation – the “worlds” that we want to map onto the database often contain non-relational structures and partial structures Relational representation generally result in a fragmentation of the representation

  • f objects which, in turn, means complex queries using a lot of resources

The only mathematical demand on relational databases is 1NF meaning that all domains are atomic (in some sense) Possible representation: allow relation-valued or set-valued attributes in base-relations = nested relations.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 43 / 48

slide-44
SLIDE 44

Nested relations

Example: a document retrieval system where, for each document we store information about

  • Title
  • Author(s)
  • Date
  • Search words
  • Language code
  • Language

Relations must be in at least 1NF. Document Title Author Date Search-words Lang code Lang DBTheory Lindqvist 940322 database 46 swedish Dahl relation normalform ODBMS Johnson 940312 persistent english Peterson transient

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 44 / 48

slide-45
SLIDE 45

Nested relations . . .

The relation “document” is not in 1NF. We may have these dependencies: title

→ →

author title

date title

→ →

searchword author

langcode langcode

language Normalise to 3NF: (title, date) (title, searchword) (title, author) (author, langcode) (langcode, language) We may have to continue to higher normal forms (BCNF , 4NF, 5NF)

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 45 / 48

slide-46
SLIDE 46

Nested relations . . .

If we continue we may get into problems as FDs can not be preserved Some steps in the decomposition process can be NP-complete We may not be able to maintain a good correspondence between model and database structure. One solution is to abandon 1NF and accept non-atomic domains and relation-valued attributes

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 46 / 48

slide-47
SLIDE 47

Nested relations . . .

The document database can have the schema: Document (title, author, date, searchword) author (name, language) searchword (word) where “title” and “date” are atomic attributes while “author” and “searchword” are relation-valued attributes. Such a relational schema is denoted “nested” or NF2 (Non First Normal Form – NFNF)

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 47 / 48

slide-48
SLIDE 48

Nested relations . . .

A relation over a universe U is defined as flat) if in R(A1, A2, . . . , An)

A1, A2, . . . , An are distinct attributes in U

nested if in R(A1, A2, . . . , An, R1, R2, . . . , Rm)

A1, A2, . . . , An are distinct attributes in U and R1, R2, . . . , Rm are distinct relations

If a relational schema occur in another schema we call it internal and if not, we call it external. This means that a single tuple may contain an entire database. There are theories on how to build complete databases from a universal relation both with nested relations and with ordinary (flat) relations.

DD2471 (Lecture 01) Modern database systems & their applications Spring 2012 48 / 48