Overview of Database Systems Kanda Runapongsa (krunapon@kku.ac.th) - - PowerPoint PPT Presentation
Overview of Database Systems Kanda Runapongsa (krunapon@kku.ac.th) - - PowerPoint PPT Presentation
Overview of Database Systems Kanda Runapongsa (krunapon@kku.ac.th) Dept of Computer Engineering Khon Kaen University Overview What is a Database, in particular, a relational DBMS? Why should we consider a DBMS to manage data? How
2
Overview
What is a Database, in particular, a relational
DBMS?
Why should we consider a DBMS to manage
data?
How is application data represented in a
DBMS?
How is data in a DBMS retrieved and
manipulated?
How does a DBMS support concurrent access
and protect data during system failures?
What are the main components of a DBMS? Who is involved with databases in real life?
3
Why Study Database?
The success of an organization
depends
Being able to acquire accurate data Being able to acquire timely data Being able to manage data effectively Being able to analyze data
Information processing is a rapidly
growing multibillion Bahts industry
We can acquire more and more data,
but does it mean that it is always good for us?
4
What is a DBMS?
A database
A very large, integrated collection of
data
Models real-world enterprise
Entities (e.g., students, courses) Relationships
A Database Management System
(DBMS)
A software package designed to store
and manage databases
5
File Systems vs. DBMS (1/4)
A motivating scenario
A company has a large collection (500 GB)
- f data on employees, departments,
products, sales, and so on
This data is accessed concurrently by several
employees
Questions about the data must be answered
quickly
Changes made to the data by different users
must be applied consistently
Access to certain parts of the data must be
restricted
6
File Systems vs. DBMS (2/4)
We can try to store the data in files This approach has many drawbacks
We probably do not have 500 GB of
main memory to hold all the data
We must therefore store data in a
storage device
Even if we have 500 GB of main
memory, on computer systems with 32- bit addressing, we cannot refer directly to all data items
7
File Systems vs. DBMS (3/4)
Using Files to store and manage data
We have to write special programs to
answer each question that users may want to ask
These programs are likely to complex
because of large volume of data to be searched
We must protect the data from
inconsistent changes made by different users accessing the data concurrently
8
File Systems vs. DBMS (4/4)
Using Files to store and manage data
We must ensure that data is restored to
a consistent state if the system crashes while changes are being made
Operating systems provide only a
password mechanism for security.
This is not sufficiently flexible to enforce
security policies in which different users have permission to access different subsets of the data
9
Why Use a DBMS? (1/3)
Reduced application development
time
This can be done by using queries
Data independence
The DBMS can provide an abstract view
- f the data to insulate application code
from such details
Efficient data access
A DBMS utilizes a variety of
sophisticated techniques to store and retrieve data efficiently
10
Why Use a DBMS? (2/3)
Concurrent access
Users can think of the data as being
accessed by only one user at a time
Crash recovery
The DBMS protects users from the
effects of system failures
Uniform data administration
When several users share the data,
centralizing the administration of data can offer significant improvements, such as fine-tuning the storage of the data
11
Why Use a DBMS? (3/3)
Data integrity
The DBMS can enforce integrity
constraints on the data
For example, before inserting salary
information for an employee, the DBMS can check that the department budget is not exceeded
Security
The DBMS can enforce access controls
that govern what data is visible to different classes of users
12
Data Models (1/2)
A data model is a collection of high-
level data description constructs that hide many low-level storage details
A DBMS allows a user to define the
data to be stored in terms of a data model
What is the model that most DBMSs
today are based on?
The relational data model
13
Data Models (2/2)
The data models of the DBMS are closer
to how the DBMS stores data than to how a user thinks about the underlying enterprise
A semantic data model is a more abstract,
high-level data model that makes it easier for a user to come up with a good initial description of the data in an enterprise
What is a widely used semantic data
model?
The entity-relationship model
14
The Relational Data Model (1/2)
Most widely DBMS data model today Central data construct: a relation
A set of records A table with rows and columns
Every relation has a schema, which
describes the columns, or fields
A schema specifies the relation’s
name, the name of each filed (or attribute or column), and the type of each field
15
The Relational Data Model (2/2)
The integrity constraints are
conditions that the records in a relation must satisfy
For example
We could specify that every student has
a unique sid value
The ability to specify uniqueness of the
values in a field increases the accuracy with which we can describe data
16
Other Data Models
Relational data model
IBM’s DB2, Informix, Oracle, Sybase, Microsoft’s
Access, FoxBase, Paradox, Tandem, and Teradata
Hierarchical model
IBM’s IMS DMS
Network model
IDS and IDMS
Object-oriented model
ObjectStore and Versant
Object-relational model
IBM’s DB2, Informix, ObjectStore, Oracle, Versant
17
Levels of Abstraction
The database description consists of
a schema at each of these three levels of abstraction
Physical schemas Conceptual schemas External schemas
A Data Definition Language (DDL) is
used to define the external and conceptual schemas
18
Levels of Abstraction (Cont.)
All DBMS vendors also support SQL
commands to describe aspects of the physical schema
External Conceptual
Conceptual Physical
Physical Disk
19
Conceptual Schema
The conceptual schema (the logical
schema) describes the stored data in terms of the data model of the DBMS
In a relational DBMS,
The conceptual schema describes all
relations that are stored in the database
The choice of relations, and the
choice of fields for each relation is selected in the process of a conceptual design
20
Physical Schema
The physical schema summarizes
how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and taps
We must decide what file
- rganizations to use to store the
relations, and create auxiliary data structures called indexes to speed up data retrieval operations
21
External Schema (1/2)
External schemas allow data access to
be customized (and authorized) at the level of individual users or groups of users
Any given database has exactly one
conceptual schema and one physical schema, but it may have several external schemas
Each external schema consists of a
collection of a collection of one or more views and relations from the conceptual schema
22
External Schema (2/2)
A view is conceptually a relation, but
the records in a view are not stored in the DBMS
Rather, they are computed using a
definition for the view, in terms of relations stored in the DBMS
A user can treat a view just like a
relation
Even though the records in the view
are not stored explicitly, they are computed as needed
23
Example: University Database
Conceptual schema:
Students(sid:string, name:string, login:string,
age:integer, gpa:real)
Courses(cid:string, cname:string,
credit:integer)
Enrolled(sid:string, cid:string, grade:string)
Physical schema:
Relations stored as unordered files Index on first column of Students
External schema (view):
Course_info(cid:string, enrollment:integer)
24
Data Independence (1/2)
Application programs are insulated
from changes in the way the data is structured and stored
Data independence is achieved
through use of the three levels of data abstraction; in particular, the conceptual schema and external schema
One of the most important benefits of
using a DBMS
25
Data Independence (2/2)
Physical data independence
Protection from changes in physical structure
- f data
Logical data independence
The conceptual schema insulates users from
changes in the physical storage of data
Protection from changes in logical structure of
data
The users can be shielded from changes in the
logical structure of the data, or changes in the choice of relations to be stored
26
Queries in a DBMS
Questions involving the data stored in
a DBMS are called queries
A DBMS provides a specification
language, called the query language, in which queries can be posed
A DBMS enables users to create,
modify, and query data through a Data Manipulation Language (DML)
27
Concurrency Control (1/3)
An important task of a DBMS is to
schedule concurrent access to data so that each user can safely ignore the fact that others are accessing the data concurrently
A DBMS allows users to think of their
programs as if they were executing in isolation, one after the other in some
- rder chosen by the DBMS
28
Concurrency Control (2/3)
Concurrent execution of user programs is
essential for good DBMS performance
Because disk accesses are frequent, and
relatively slow, it is important to keep the cpu humming by working on several user programs concurrently
Interleaving actions of different user programs
can lead to inconsistency :e.g., check is cleared while account balance is being computed
DBMS ensures such problems don’t arise: users
can pretend they are using a single-user system
29
Concurrency Control (3/3)
A locking protocol
A set of rules to be followed by each
transaction (and enforced by the DBMS)
To ensure that even though actions of
several transactions might be interleaved, the net effect is identical to executing all transactions in some serial
- rder
A lock is a mechanism used to control
access to database objects
30
Locks
Two kinds of locks are commonly
supported by a DBMS
Shared locks on an object can be held by
two different transactions at the same time
When do we want to use shared locks?
Read an object
Exclusive locks on an object ensures that
no other transactions hold any lock on this
- bject
When do we want to use exclusive locks?
Write an object
31
Transaction Management
A transaction is any one execution of
a user program in a DBMS
This is the basic unit of change as
seen by the DBMS
Partial transactions are not allowed The effect of a group of transactions
is equivalent to some serial execution
- f all transactions
32
Incomplete Transactions
Transactions can be interrupted before
running to completion for a variety of reasons, e.g., a system crash
A DBMS must ensure that the changes
made by such incomplete transactions are removed from the database
For example, if the DBMS is in the middle of
transferring money form account A to account B, but the crash occurs, both accounts must be in the same state before the transfer
33
Transaction: Key Concept
A transaction is an atomic sequence
- f database actions (reads/writes)
“atomic sequence”: all instructions in
the sequence need to be executed together
There is no effect of the execution of
- nly a single instruction for the
sequence that has multiple instructions
Each transaction, executed
completely, must leave the DB in a consistent state if DB is consistent when the transaction begins
34
Scheduling Concurrent Transactions
DBMS ensures that execution of {T1, .., Tn} is
equivalent to some serial execution of T1’,…, Tn’
Before reading/writing an object, a transaction
requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction
Idea; If an action of Ti (say, writing X) affects Tj
(which perhaps reading X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions
What if Tj already has a lock on Y and Ti later
requests a lock on Y?
35
Ensuring Atomicity (1/2)
DBMS ensures atomicity (all-or-
nothing property) even if system crashes in the middle of a Xact
Idea: keep a log (history) of all
actions carried out by the DBMS while executing a set of Xacts
36
Ensuring Atomicity (2/2)
Before a change is made to the
database, the corresponding log entry is forced to a safe location (WAL protocol; OS support for this is often inadequate)
After a crash, the effects of partially
executed transactions are undone using the log (Thanks to WAL, if log entry wasn’t saved before the crash, corresponding change was not applied to database!)
37
A Log (1/3)
To do a database recovery The DBMS maintains a log of all writes to the
database
A crucial property of the log Each write action must be recorded in the log
(on disk) before the corresponding change is reflected in the database itself
Why it must be recorded on disk first?
If the system crashes just after making the
change in the database but before the change is recorded in the log, the DBMS would be unable to detect and undo this change
38
A Log (2/3)
The property that the change in database
must be recorded in the log first before making the change in the database itself is called Write-Ahead Log or WAL
To ensure this property, the DBMS must
be able to selectively force a page in memory to disk
The log is used to ensure that the
changes made by a successfully completed transaction are not lost due to a system crash
39
A Log (3/3)
The following actions are recorded in
the log:
Ti writes an object: the old value and the
new value
Log record must go to disk before the
change in DBMS
Ti commits/aborts: a log record
indicating this action
Log records chained together by Xact
id, so it’s easy to undo a specific Xact (e.g., to resolve a deadlock)
40
Recovery (1/2)
Bringing the database to a consistent
state after a system crash can be a slow process
Why is it slow? The DBMS must ensure that the effects
- f all transactions that complete prior to
the crash are restored
The DBMS must also ensure that the
effects of incomplete transactions are undone
41
Recovery (2/2)
We can speed up this slow process by?
Periodically forcing some information to
disk
The periodic operation is called a
checkpoint
Periodic checkpointing can reduce the
time needed to recover from a crash
However, checkpointing too often also
slows down normal execution
42
Structure of a DBMS (1/6)
A typical DBMS has a layered
architecture
The DBMS accepts SQL commands
generated from a variety of user interfaces, produces query evaluation plans, execute these plans against the database, and returns the answers
When a user issues a query, the parsed
query is present to a query optimizer
43
Structure of a DBMS (2/6)
A query optimizer uses information
about how the data is stored to produce an efficient execution plan for evaluating the query
An execution plan is a blueprint for
evaluating a query, and is usually presented as a tree of relational
- perators
44
Structure of a DBMS (3/6)
The codes that implements relational
- perators sits on top of the file and
access methods layer
A file in a DBMS is a collection of
pages or a collection of records
The files and access methods layer
code sits on top of the buffer manager, which brings pages in from disk to main memory as needed in response to read requests
45
Structure of a DBMS (4/6)
The lowest layer of the DBMS software
deals with management of space on disk, where the data is stored
Higher layers allocate, deallocate, read,
and write pages through this layer, called the disk space manager
The DBMS supports concurrency and
crash recovery by carefully scheduling user requests and maintaining a log of all changes to the database
46
Structure of a DBMS (5/6)
The transaction manager ensures
that transactions request and release locks according to a suitable locking protocol and schedules the execution transactions
The lock manager keeps tracks of
requests for lock and grant locks on database objects when they become available
47
Structure of a DBMS (6/6)
The recovery manager is responsible
for maintaining a log, and restoring the system to a consistent state after a crash
The disk space manager, buffer
manager, and file and access methods layers must interact with these components
48
Databases make these folks happy…
End users and DBMS vendors DB application programmers
E.g., smart webmasters
Database administrator (DBA)
Designs logical/physical schemas Handles security and authorization Data availability, crash recovery Database tuning as needs evolve
49
Summary (1/2)
DBMS used to maintain, query large
datasets
Benefits include recovery from
system crashes, concurrent access, quick application development, data integrity and security
Levels of abstraction give data
independence
50
Summary (2/2)
A DBMS typically has a layered
architecture
DBAs hold responsible jobs and are
all well-paid
DBMS R&D is one of the broadest,
most exciting areas in CS
51
Reference
Database Management Systems,