CMPSCI 645 Database Design & Implementation Instructor: Gerome - - PowerPoint PPT Presentation
CMPSCI 645 Database Design & Implementation Instructor: Gerome - - PowerPoint PPT Presentation
Welcome to CMPSCI 645 Database Design & Implementation Instructor: Gerome Miklau Overview of Databases Gerome Miklau CMPSCI 645 Database Design & Implementation UMass Amherst Jan 29, 2008 Some slide content courtesy of Zack
Overview of Databases
Gerome Miklau
CMPSCI 645 – Database Design & Implementation UMass Amherst
Jan 29, 2008
Some slide content courtesy of Zack Ives, Ramakrishnan & Gehrke, Dan Suciu, Ullman & Widom
Today
- Student information form
- Overview of databases
- Course topics
- Course requirements
Databases & DBMS’s
- A database is a large, integrated collection of
data.
- A database management system (DBMS) is a
collection of software designed to store and manage databases, allowing:
– Define the kind of data stored – Querying/updating interface – Reliable storage & recovery of 100s of GB – Control access to data from many concurrent users
Can filesystems do it?
- Schema for files is limited
- No query language for data in files
- Files can store large amounts of data, but
– no efficient access to items within file – no recovery from failure
- Concurrent access not safe
No
Evolution
- Early DBMS’s (1960’s), evolved from
file systems.
- Data with many small items & many
queries or modifications:
– Airline reservations – Banking
Early DB systems
- Tree-based hierarchical data model
- Graph-based network data model
- Encouraged users to think about data
the way it was stored.
- No high level query language
Data model
The data model includes basic assumptions about what’s an “item” of data, how to represent it and interpret it.
The Relational Model
- The relational data model (Codd, 1970):
– Data independence: details of physical storage are hidden from users – High-level declarative query language
- say what you want, not how to compute it.
- mathematical foundation
– A theory of normalization guides the design of relations Side-note: Turing Awards in Databases 1973: Bachman, networked data model 1981: Codd, relational model 1998: Jim Gray, transaction processing
DBMS Benefit #1: Generality and Declarativity
- The programmer or user does not need to
know details like indices, sort orders, machine speeds, disk speeds, concurrent users, etc.
- Instead, the programmer/user programs with
a logical model in mind
- The DBMS “makes it happen” based on an
understanding of relative costs of different methods
Benefit #2: Efficiency and Scale
- Efficient storage of hundreds of GBs of data
- Efficient access to data
- Rapid processing of transactions
Benefit #3: Management of Concurrency and Reliability
- Simultaneous transactions handled safely.
- Recovery of system data after system failure.
- More formally: the ACID properties
– Atomicity - all or nothing – Consistency - sensible state not violated – Isolation - separated from effects – Durability - once completed, never lost
How Does One Build a Database?
- Start with a conceptual model
- Design & implement schema
- Write applications using DBMS and other
tools
– Many ways of doing this (DBMS, API writers, library authors, web server, etc.) – Common applications include PHP/JSP/servlet- driven web sites
- The DBMS takes care of query optimization
and execution
Conceptual Design
STUDENT COURSE Takes name sid cid name PROFESSOR Teaches semester fid name
Designing a Schema (Set of Relations)
- Convert to tables +
constraints
- Then need to do
“physical” design: the layout on disk, indices, etc.
sid name 1 Jill 2 Bo 3 Maya fid name 1 Diao 2 Saul 8 Weems sid cid 1 645 1 683 3 635 cid name sem 645 DB F05 683 AI S05 635 Arch F05 fid cid 1 645 2 683 8 635
STUDENT Takes COURSE PROFESSOR Teaches
Queries
- Find all courses that “Mary” takes
- What happens behind the scene ?
– Query processor figures out how to answer the query efficiently. SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.sid = T.sid and T.cid = C.cid
Queries, behind the scene
Query execution plan: SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.sid = T.sid and T.cid = C.cid Declarative SQL query
Students Takes
sid=sid sname name=“Mary” cid=cid
Courses
The optimizer chooses the best execution plan for a query
An Issue: 80% of the World’s Data is Not in a DB!
Examples:
– Scientific data (large images, complex programs that analyze the data) – Personal data – WWW and email (some of it is stored in something resembling a DBMS)
Data management is expanding to tackle these problems
DBMSs in the Real World
A huge industry for 20% of the world’s data!
- Big, mature relational databases
– IBM DB2, Oracle, Microsoft SQL Server – Adding advanced features, including “native XML” support
- “Middleware” above these systems
– SAP, Siebel, PeopleSoft, dozens of special-purpose apps
- Integration and warehousing systems
– BEA AquaLogic, DB2 Information Integrator
- Current trends:
– Web services; XML everywhere – Smarter, self-tuning systems – Distributed databases, column-stores
Database Research
- One of the broadest, most exciting areas in CS!
- A microcosm of CS in general
- languages, operating systems, concurrent
programming, data structures, algorithms, theory, distributed systems, statistical techniques.
- Theory and systems well-integrated.
Recent Trends in Databases
- XML
– Relational databases with XML support – Middleware between XML and relational databases – Large-scale XML message systems
- Main memory database systems
- Peer data management
- Stream data management
- Model management, provenance
- Security and privacy
- Modeling uncertainty, probabilistic databases
What is the Field of Databases ?
- To an applied researcher (SIGMOD/VLDB/ICDE)
– Query optimization – Query processing (yet-another join algorithm) – Transaction processing, recovery (but most stuff is already done) – Novel applications: data mining, high-dimensional search
- To a theoretical researcher (PODS/ICDT/LICS)
– Focus on the query languages – Query language = logic = complexity classes
Course topics
- Fundamentals: relational design, query
languages.
- Theory: expressiveness of query languages,
static analysis, complexity.
- Database internals: storage, indexing, query
processing, query optimization, transaction management.
- XML and semi-structured data models.
- Security: access control, privacy.
- Advanced topics: incomplete/probabilistic
DBs, parallel and distributed DBs.
Prerequisites
- Official: undergrad course in DB or OS
- Also:
– Elementary complexity theory
Grading
- Homework: 30%
- Paper reviews & participation: 15%
- Midterm: 30%
- Project: 25%
Homework: 30%
- ~ 4 assignments throughout the course
– written problem sets – practical experience with SQL, XQuery
Paper Reviews & Participation: 15%
- Approximately 5 classic papers will be
assigned
- Short written reviews are due before
the day of class. Email to:
– cs645-reviews@cs.umass.edu
First paper review: Read thru 1.4 of Codd’s paper Due Wed Feb 5th
Project: 25%
- General theme: apply database principles to a new
problem
- Suggested topics will be discussed next Tuesday
- Groups of 2 preferred. 3 possible.
- Project work will include:
– Reading some of the research literature – Implementation – Written report – In-class presentation
- Periodic consultation with the instructor
Midterm Exam (30%)
- Midterm scheduled for Apr 17th at 7pm
- (No Final!)
Textbook
Database Management Systems
Ramakrishnan and Gehrke
Other useful resources
- Database systems: the complete book (Ullman,
Widom and Garcia-Molina)
- Readings in Database Systems (Stonebraker and
Hellerstein)
- Foundations of Databases (Abiteboul, Hull, Vianu)
- Data on the Web (Abiteboul, Buneman, Suciu)
- Parallel and Distributed DBMS (Ozsu and Valduriez)
- Transaction Processing (Gray and Reuter)
- Data and Knowledge based Systems (volumes I, II)
(Ullman)
- Proceedings of SIGMOD, VLDB, PODS conferences.
Communication
- Instructor
– Office hours: by appointment – Email: miklau at cs dot umass dot edu
- Check the course webpage often
- You should have been added to the
mailing list.
31
Questions about the course?
32