Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction - - PowerPoint PPT Presentation

overview of database systems
SMART_READER_LITE
LIVE PREVIEW

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction - - PowerPoint PPT Presentation

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR Uniform Financial Report (I think) What is a database? 3 What is a database? Collection of data Files, notes, to do list, patient records,


slide-1
SLIDE 1

Overview of Database Systems

CS3860 - Jay Urbain, PhD Introduction to Database Systems

1

slide-2
SLIDE 2

2

UFR – Uniform Financial Report (I think)

slide-3
SLIDE 3

What is a database?

3

slide-4
SLIDE 4

What is a database?

Collection of data – Files, notes, to do list, patient records, play lists, etc.

4

slide-5
SLIDE 5

What is a Database Management System?

DBMS?

5

My family room

slide-6
SLIDE 6

Database Management System

DBMS: A Database Management System (DBMS) is a software package designed to store and manage databases.

  • Typically large, integrated collection of data
  • Models real-world enterprise

– Entities (e.g., students, courses, professors) – Relationships (e.g., Bono is enrolled in CS3860; urbain is teaching CS3860)

6

slide-7
SLIDE 7

Files vs. DBMS

Why not just use files?

7

slide-8
SLIDE 8

Why use a DBMS?

  • Data independence
  • Efficient access
  • Reduced application development time
  • Data integrity
  • Data security
  • Uniform data administration
  • Concurrent access, recovery from crashes
  • Distributability, scalability
  • Aggregate functions

8

slide-9
SLIDE 9

Why Study Databases?

  • Shift from computation to information

– At the low-end: scramble to web-space, big data – At the high-end: scientific applications, data analytics, collective intelligence, eScience, machine learning

  • Shift to functional computing on large datasets.
  • Datasets increasing in diversity & volume
  • Digital libraries, interactive video, Human Genome, the

web…

  • Integration of structured and unstructured data
  • Polyglot persistance

9

slide-10
SLIDE 10

Why Study Databases?

  • DBMS encompass most of CS

– OS, languages, theory, computational complexity, data structures, algorithms, AI, multimedia, logic

10

slide-11
SLIDE 11

11

Big science is data driven. IceCube Neutrino Observatory.

slide-12
SLIDE 12

12

Data analysis in the fight against human trafficking. All of society is online.

New York DA use MEMEX Data for all trafficking investigations this year.

slide-13
SLIDE 13

13

Increasingly many companies see themselves as data driven.

slide-14
SLIDE 14

14

https://www.youtube.com/watch?v=OvfU1NpCJQQ https://www.youtube.com/watch?v=3xGoBlI_fdg https://www.youtube.com/watch?v=OpDIEJrog3s

EVEN MORE “TRADITIONAL” COMPANIES…

slide-15
SLIDE 15

THE WORLD IS INCREASINGLY DRIVEN BY DATA…

15

This class teaches the basics of how to use & manage data.

slide-16
SLIDE 16

Big Data Landscape… Infrastructure is Changing

16

http://www.bigdatalandscape.com/

New tech. Same Principles.

slide-17
SLIDE 17

Why should you study databases?

  • Mercenary- make more $$$:

– Startups need DB talent right away = low employee # – Massive industry…

  • Intellectual:

– Science: data poor to data rich

  • No idea how to handle the data!

– Fundamental ideas to/from all of CS:

  • Systems, theory, AI, logic, stats, analysis….

17

Many great computer systems ideas started in DB.

slide-18
SLIDE 18

Data Models

  • A data model is a collection of concepts for describing data.
  • The relational data model is one of most popular models.
  • Main concept: relation, basically a table with rows (records) and

columns (attributes).

  • Every relation has a schema which describes the relation name, the

name of the columns (or fields), and the field types.

  • A schema is a description of a particular collection of data, using the

given data model.

  • A semantic data model is a more abstract high level representation

that makes it easier to come up with an initial description.

18

slide-19
SLIDE 19

Levels of Abstraction

  • Many views (external schema)

– Describe how users see the data

  • Single conceptual (logical)

schema

– Defines logical structure

  • Single physical schema

– Describes the files (pages, blocks) and indexes used.

19

slide-20
SLIDE 20

Example: University Database

  • Conceptual schema:

– Students(sid: string, name: string, login: string, age: integer, gpa: real); – Courses(cid: string, cname: string, credit: integer); – Enroll(sid: string, cid: string, grade: string);

  • Physical schema:

– Relations stored as ordered/unordered files – Index on first column of Students

  • External schema (View):

– Course_Info(cid: string, enrollment: integer);

20

slide-21
SLIDE 21

Data Independence

Applications insulated from how data is structured and stored.

  • Logical data independence: Protection from changes in

the logical structure (conceptual schema) of data.

  • Physical data independence: Protection from changes in

physical structure of data.

21

slide-22
SLIDE 22

Concurrency Control

  • Concurrent execution (threads) of user programs is

required for good DBMS performance.

  • Since disk access is frequent and slow, it is important to

keep the CPU humming by working on several user programs concurrently.

  • Interleaving actions of different user programs can lead

to inconsistency, e.g., check cleared while account balance is still being computed.

  • DBMS ensures such problems do not happen: users can

pretend they are using a single-user system.

  • Alternatively, other database models allow eventual

consistency.

22

slide-23
SLIDE 23

Transaction

  • A transaction is an execution of a DB program.
  • Key concept of transaction: an atomic sequence of

database actions (reads/writes).

  • Each transaction, executed completely, must leave the

DB in a consistent state (provided DB was consistent when the transaction begins).

23

slide-24
SLIDE 24

Transactions

  • Users can specify some simple integrity constraints on

the data and the DBMS will enforce these constraints.

  • The DBMS does not really understand the semantics

(meaning) of the data. – E.g., how interest is calculated on student overdue accounts.

  • So, ensuring that a transaction (runs alone) is ultimately

the user’s responsibility!

24

slide-25
SLIDE 25

Polyglot Persistence

  • Using different data storage technologies to handle

varying data storage needs.

  • An application that talks to different databases using

each for what they are best at to achieve an end goal,

  • http://www.dummies.com/programming/big-data/engineering/big-

data-and-polyglot-persistence/

  • http://www.informit.com/articles/article.aspx?p=1930511
  • https://martinfowler.com/bliki/PolyglotPersistence.html

25

slide-26
SLIDE 26

A Note on DMBS’s: there are many

26