Overview of Database Systems Kanda Runapongsa (krunapon@kku.ac.th) - - PowerPoint PPT Presentation

overview of database systems
SMART_READER_LITE
LIVE PREVIEW

Overview of Database Systems Kanda Runapongsa (krunapon@kku.ac.th) - - PowerPoint PPT Presentation

Overview of Database Systems Kanda Runapongsa (krunapon@kku.ac.th) Dept of Computer Engineering Khon Kaen University Overview What is a Database, in particular, a relational DBMS? Why should we consider a DBMS to manage data? How


slide-1
SLIDE 1

Overview of Database Systems

Kanda Runapongsa (krunapon@kku.ac.th) Dept of Computer Engineering Khon Kaen University

slide-2
SLIDE 2

2

Overview

 What is a Database, in particular, a relational

DBMS?

 Why should we consider a DBMS to manage

data?

 How is application data represented in a

DBMS?

 How is data in a DBMS retrieved and

manipulated?

 How does a DBMS support concurrent access

and protect data during system failures?

 What are the main components of a DBMS?  Who is involved with databases in real life?

slide-3
SLIDE 3

3

Why Study Database?

The success of an organization

depends

Being able to acquire accurate data Being able to acquire timely data Being able to manage data effectively Being able to analyze data

Information processing is a rapidly

growing multibillion Bahts industry

We can acquire more and more data,

but does it mean that it is always good for us?

slide-4
SLIDE 4

4

What is a DBMS?

A database

A very large, integrated collection of

data

Models real-world enterprise

Entities (e.g., students, courses) Relationships

A Database Management System

(DBMS)

A software package designed to store

and manage databases

slide-5
SLIDE 5

5

File Systems vs. DBMS (1/4)

 A motivating scenario

 A company has a large collection (500 GB)

  • f data on employees, departments,

products, sales, and so on

 This data is accessed concurrently by several

employees

 Questions about the data must be answered

quickly

 Changes made to the data by different users

must be applied consistently

 Access to certain parts of the data must be

restricted

slide-6
SLIDE 6

6

File Systems vs. DBMS (2/4)

We can try to store the data in files This approach has many drawbacks

We probably do not have 500 GB of

main memory to hold all the data

We must therefore store data in a

storage device

Even if we have 500 GB of main

memory, on computer systems with 32- bit addressing, we cannot refer directly to all data items

slide-7
SLIDE 7

7

File Systems vs. DBMS (3/4)

Using Files to store and manage data

We have to write special programs to

answer each question that users may want to ask

These programs are likely to complex

because of large volume of data to be searched

We must protect the data from

inconsistent changes made by different users accessing the data concurrently

slide-8
SLIDE 8

8

File Systems vs. DBMS (4/4)

Using Files to store and manage data

We must ensure that data is restored to

a consistent state if the system crashes while changes are being made

Operating systems provide only a

password mechanism for security.

This is not sufficiently flexible to enforce

security policies in which different users have permission to access different subsets of the data

slide-9
SLIDE 9

9

Why Use a DBMS? (1/3)

Reduced application development

time

This can be done by using queries

Data independence

The DBMS can provide an abstract view

  • f the data to insulate application code

from such details

Efficient data access

A DBMS utilizes a variety of

sophisticated techniques to store and retrieve data efficiently

slide-10
SLIDE 10

10

Why Use a DBMS? (2/3)

Concurrent access

Users can think of the data as being

accessed by only one user at a time

Crash recovery

The DBMS protects users from the

effects of system failures

Uniform data administration

When several users share the data,

centralizing the administration of data can offer significant improvements, such as fine-tuning the storage of the data

slide-11
SLIDE 11

11

Why Use a DBMS? (3/3)

Data integrity

The DBMS can enforce integrity

constraints on the data

For example, before inserting salary

information for an employee, the DBMS can check that the department budget is not exceeded

Security

The DBMS can enforce access controls

that govern what data is visible to different classes of users

slide-12
SLIDE 12

12

Data Models (1/2)

A data model is a collection of high-

level data description constructs that hide many low-level storage details

A DBMS allows a user to define the

data to be stored in terms of a data model

What is the model that most DBMSs

today are based on?

The relational data model

slide-13
SLIDE 13

13

Data Models (2/2)

 The data models of the DBMS are closer

to how the DBMS stores data than to how a user thinks about the underlying enterprise

 A semantic data model is a more abstract,

high-level data model that makes it easier for a user to come up with a good initial description of the data in an enterprise

 What is a widely used semantic data

model?

 The entity-relationship model

slide-14
SLIDE 14

14

The Relational Data Model (1/2)

Most widely DBMS data model today Central data construct: a relation

A set of records A table with rows and columns

Every relation has a schema, which

describes the columns, or fields

A schema specifies the relation’s

name, the name of each filed (or attribute or column), and the type of each field

slide-15
SLIDE 15

15

The Relational Data Model (2/2)

The integrity constraints are

conditions that the records in a relation must satisfy

For example

We could specify that every student has

a unique sid value

The ability to specify uniqueness of the

values in a field increases the accuracy with which we can describe data

slide-16
SLIDE 16

16

Other Data Models

 Relational data model

 IBM’s DB2, Informix, Oracle, Sybase, Microsoft’s

Access, FoxBase, Paradox, Tandem, and Teradata

 Hierarchical model

 IBM’s IMS DMS

 Network model

 IDS and IDMS

 Object-oriented model

 ObjectStore and Versant

 Object-relational model

 IBM’s DB2, Informix, ObjectStore, Oracle, Versant

slide-17
SLIDE 17

17

Levels of Abstraction

The database description consists of

a schema at each of these three levels of abstraction

Physical schemas Conceptual schemas External schemas

A Data Definition Language (DDL) is

used to define the external and conceptual schemas

slide-18
SLIDE 18

18

Levels of Abstraction (Cont.)

All DBMS vendors also support SQL

commands to describe aspects of the physical schema

External  Conceptual

Conceptual  Physical

Physical  Disk

slide-19
SLIDE 19

19

Conceptual Schema

The conceptual schema (the logical

schema) describes the stored data in terms of the data model of the DBMS

In a relational DBMS,

The conceptual schema describes all

relations that are stored in the database

The choice of relations, and the

choice of fields for each relation is selected in the process of a conceptual design

slide-20
SLIDE 20

20

Physical Schema

The physical schema summarizes

how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and taps

We must decide what file

  • rganizations to use to store the

relations, and create auxiliary data structures called indexes to speed up data retrieval operations

slide-21
SLIDE 21

21

External Schema (1/2)

 External schemas allow data access to

be customized (and authorized) at the level of individual users or groups of users

 Any given database has exactly one

conceptual schema and one physical schema, but it may have several external schemas

 Each external schema consists of a

collection of a collection of one or more views and relations from the conceptual schema

slide-22
SLIDE 22

22

External Schema (2/2)

A view is conceptually a relation, but

the records in a view are not stored in the DBMS

Rather, they are computed using a

definition for the view, in terms of relations stored in the DBMS

A user can treat a view just like a

relation

Even though the records in the view

are not stored explicitly, they are computed as needed

slide-23
SLIDE 23

23

Example: University Database

 Conceptual schema:

 Students(sid:string, name:string, login:string,

age:integer, gpa:real)

 Courses(cid:string, cname:string,

credit:integer)

 Enrolled(sid:string, cid:string, grade:string)

 Physical schema:

 Relations stored as unordered files  Index on first column of Students

 External schema (view):

 Course_info(cid:string, enrollment:integer)

slide-24
SLIDE 24

24

Data Independence (1/2)

Application programs are insulated

from changes in the way the data is structured and stored

Data independence is achieved

through use of the three levels of data abstraction; in particular, the conceptual schema and external schema

One of the most important benefits of

using a DBMS

slide-25
SLIDE 25

25

Data Independence (2/2)

Physical data independence

 Protection from changes in physical structure

  • f data

Logical data independence

 The conceptual schema insulates users from

changes in the physical storage of data

 Protection from changes in logical structure of

data

 The users can be shielded from changes in the

logical structure of the data, or changes in the choice of relations to be stored

slide-26
SLIDE 26

26

Queries in a DBMS

Questions involving the data stored in

a DBMS are called queries

A DBMS provides a specification

language, called the query language, in which queries can be posed

A DBMS enables users to create,

modify, and query data through a Data Manipulation Language (DML)

slide-27
SLIDE 27

27

Concurrency Control (1/3)

An important task of a DBMS is to

schedule concurrent access to data so that each user can safely ignore the fact that others are accessing the data concurrently

A DBMS allows users to think of their

programs as if they were executing in isolation, one after the other in some

  • rder chosen by the DBMS
slide-28
SLIDE 28

28

Concurrency Control (2/3)

 Concurrent execution of user programs is

essential for good DBMS performance

 Because disk accesses are frequent, and

relatively slow, it is important to keep the cpu humming by working on several user programs concurrently

 Interleaving actions of different user programs

can lead to inconsistency :e.g., check is cleared while account balance is being computed

 DBMS ensures such problems don’t arise: users

can pretend they are using a single-user system

slide-29
SLIDE 29

29

Concurrency Control (3/3)

A locking protocol

A set of rules to be followed by each

transaction (and enforced by the DBMS)

To ensure that even though actions of

several transactions might be interleaved, the net effect is identical to executing all transactions in some serial

  • rder

A lock is a mechanism used to control

access to database objects

slide-30
SLIDE 30

30

Locks

Two kinds of locks are commonly

supported by a DBMS

Shared locks on an object can be held by

two different transactions at the same time

When do we want to use shared locks?

Read an object

Exclusive locks on an object ensures that

no other transactions hold any lock on this

  • bject

When do we want to use exclusive locks?

Write an object

slide-31
SLIDE 31

31

Transaction Management

A transaction is any one execution of

a user program in a DBMS

This is the basic unit of change as

seen by the DBMS

Partial transactions are not allowed The effect of a group of transactions

is equivalent to some serial execution

  • f all transactions
slide-32
SLIDE 32

32

Incomplete Transactions

 Transactions can be interrupted before

running to completion for a variety of reasons, e.g., a system crash

 A DBMS must ensure that the changes

made by such incomplete transactions are removed from the database

 For example, if the DBMS is in the middle of

transferring money form account A to account B, but the crash occurs, both accounts must be in the same state before the transfer

slide-33
SLIDE 33

33

Transaction: Key Concept

A transaction is an atomic sequence

  • f database actions (reads/writes)

“atomic sequence”: all instructions in

the sequence need to be executed together

There is no effect of the execution of

  • nly a single instruction for the

sequence that has multiple instructions

Each transaction, executed

completely, must leave the DB in a consistent state if DB is consistent when the transaction begins

slide-34
SLIDE 34

34

Scheduling Concurrent Transactions

 DBMS ensures that execution of {T1, .., Tn} is

equivalent to some serial execution of T1’,…, Tn’

 Before reading/writing an object, a transaction

requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction

 Idea; If an action of Ti (say, writing X) affects Tj

(which perhaps reading X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions

 What if Tj already has a lock on Y and Ti later

requests a lock on Y?

slide-35
SLIDE 35

35

Ensuring Atomicity (1/2)

DBMS ensures atomicity (all-or-

nothing property) even if system crashes in the middle of a Xact

Idea: keep a log (history) of all

actions carried out by the DBMS while executing a set of Xacts

slide-36
SLIDE 36

36

Ensuring Atomicity (2/2)

Before a change is made to the

database, the corresponding log entry is forced to a safe location (WAL protocol; OS support for this is often inadequate)

After a crash, the effects of partially

executed transactions are undone using the log (Thanks to WAL, if log entry wasn’t saved before the crash, corresponding change was not applied to database!)

slide-37
SLIDE 37

37

A Log (1/3)

 To do a database recovery  The DBMS maintains a log of all writes to the

database

 A crucial property of the log  Each write action must be recorded in the log

(on disk) before the corresponding change is reflected in the database itself

 Why it must be recorded on disk first?

If the system crashes just after making the

change in the database but before the change is recorded in the log, the DBMS would be unable to detect and undo this change

slide-38
SLIDE 38

38

A Log (2/3)

 The property that the change in database

must be recorded in the log first before making the change in the database itself is called Write-Ahead Log or WAL

 To ensure this property, the DBMS must

be able to selectively force a page in memory to disk

 The log is used to ensure that the

changes made by a successfully completed transaction are not lost due to a system crash

slide-39
SLIDE 39

39

A Log (3/3)

The following actions are recorded in

the log:

Ti writes an object: the old value and the

new value

Log record must go to disk before the

change in DBMS

Ti commits/aborts: a log record

indicating this action

Log records chained together by Xact

id, so it’s easy to undo a specific Xact (e.g., to resolve a deadlock)

slide-40
SLIDE 40

40

Recovery (1/2)

Bringing the database to a consistent

state after a system crash can be a slow process

Why is it slow? The DBMS must ensure that the effects

  • f all transactions that complete prior to

the crash are restored

The DBMS must also ensure that the

effects of incomplete transactions are undone

slide-41
SLIDE 41

41

Recovery (2/2)

We can speed up this slow process by?

Periodically forcing some information to

disk

The periodic operation is called a

checkpoint

Periodic checkpointing can reduce the

time needed to recover from a crash

However, checkpointing too often also

slows down normal execution

slide-42
SLIDE 42

42

Structure of a DBMS (1/6)

A typical DBMS has a layered

architecture

The DBMS accepts SQL commands

generated from a variety of user interfaces, produces query evaluation plans, execute these plans against the database, and returns the answers

When a user issues a query, the parsed

query is present to a query optimizer

slide-43
SLIDE 43

43

Structure of a DBMS (2/6)

A query optimizer uses information

about how the data is stored to produce an efficient execution plan for evaluating the query

An execution plan is a blueprint for

evaluating a query, and is usually presented as a tree of relational

  • perators
slide-44
SLIDE 44

44

Structure of a DBMS (3/6)

The codes that implements relational

  • perators sits on top of the file and

access methods layer

A file in a DBMS is a collection of

pages or a collection of records

The files and access methods layer

code sits on top of the buffer manager, which brings pages in from disk to main memory as needed in response to read requests

slide-45
SLIDE 45

45

Structure of a DBMS (4/6)

The lowest layer of the DBMS software

deals with management of space on disk, where the data is stored

Higher layers allocate, deallocate, read,

and write pages through this layer, called the disk space manager

The DBMS supports concurrency and

crash recovery by carefully scheduling user requests and maintaining a log of all changes to the database

slide-46
SLIDE 46

46

Structure of a DBMS (5/6)

The transaction manager ensures

that transactions request and release locks according to a suitable locking protocol and schedules the execution transactions

The lock manager keeps tracks of

requests for lock and grant locks on database objects when they become available

slide-47
SLIDE 47

47

Structure of a DBMS (6/6)

The recovery manager is responsible

for maintaining a log, and restoring the system to a consistent state after a crash

The disk space manager, buffer

manager, and file and access methods layers must interact with these components

slide-48
SLIDE 48

48

Databases make these folks happy…

End users and DBMS vendors DB application programmers

E.g., smart webmasters

Database administrator (DBA)

Designs logical/physical schemas Handles security and authorization Data availability, crash recovery Database tuning as needs evolve

slide-49
SLIDE 49

49

Summary (1/2)

DBMS used to maintain, query large

datasets

Benefits include recovery from

system crashes, concurrent access, quick application development, data integrity and security

Levels of abstraction give data

independence

slide-50
SLIDE 50

50

Summary (2/2)

A DBMS typically has a layered

architecture

DBAs hold responsible jobs and are

all well-paid

DBMS R&D is one of the broadest,

most exciting areas in CS

slide-51
SLIDE 51

51

Reference

Database Management Systems,

Third Edition by Ramakrishnan and Gehrke