cmpsci 645 database design implementation
play

CMPSCI 645 Database Design & Implementation Instructor: Gerome - PowerPoint PPT Presentation

Welcome to CMPSCI 645 Database Design & Implementation Instructor: Gerome Miklau Overview of Databases Gerome Miklau CMPSCI 645 Database Design & Implementation UMass Amherst Jan 29, 2008 Some slide content courtesy of Zack


  1. Welcome to CMPSCI 645 Database Design & Implementation Instructor: Gerome Miklau

  2. Overview of Databases Gerome Miklau CMPSCI 645 – Database Design & Implementation UMass Amherst Jan 29, 2008 Some slide content courtesy of Zack Ives, Ramakrishnan & Gehrke, Dan Suciu, Ullman & Widom

  3. Today • Student information form • Overview of databases • Course topics • Course requirements

  4. Databases & DBMS’s • A database is a large, integrated collection of data. • A database management system (DBMS) is a collection of software designed to store and manage databases, allowing: – Define the kind of data stored – Querying/updating interface – Reliable storage & recovery of 100s of GB – Control access to data from many concurrent users

  5. Can filesystems do it? No • Schema for files is limited • No query language for data in files • Files can store large amounts of data, but – no efficient access to items within file – no recovery from failure • Concurrent access not safe

  6. Evolution • Early DBMS’s (1960’s), evolved from file systems. • Data with many small items & many queries or modifications: – Airline reservations – Banking

  7. Early DB systems Data model The data model includes basic assumptions about what’s an “item” of data, how to represent it and interpret it. • Tree-based hierarchical data model • Graph-based network data model • Encouraged users to think about data the way it was stored. • No high level query language

  8. The Relational Model •The relational data model (Codd, 1970): – Data independence: details of physical storage are hidden from users – High-level declarative query language • say what you want, not how to compute it. • mathematical foundation – A theory of normalization guides the design of relations Side-note: Turing Awards in Databases 1973: Bachman, networked data model 1981: Codd, relational model 1998: Jim Gray, transaction processing

  9. DBMS Benefit #1: Generality and Declarativity • The programmer or user does not need to know details like indices, sort orders, machine speeds, disk speeds, concurrent users, etc. • Instead, the programmer/user programs with a logical model in mind • The DBMS “makes it happen” based on an understanding of relative costs of different methods

  10. Benefit #2: Efficiency and Scale • Efficient storage of hundreds of GBs of data • Efficient access to data • Rapid processing of transactions

  11. Benefit #3: Management of Concurrency and Reliability • Simultaneous transactions handled safely. • Recovery of system data after system failure. • More formally: the ACID properties – Atomicity - all or nothing – Consistency - sensible state not violated – Isolation - separated from effects – Durability - once completed, never lost

  12. How Does One Build a Database? • Start with a conceptual model • Design & implement schema • Write applications using DBMS and other tools – Many ways of doing this (DBMS, API writers, library authors, web server, etc.) – Common applications include PHP/JSP/servlet- driven web sites • The DBMS takes care of query optimization and execution

  13. Conceptual Design fid PROFESSOR name Teaches STUDENT COURSE Takes cid name semester sid name

  14. Designing a Schema (Set of Relations) STUDENT COURSE Takes sid name cid name sem sid cid 1 Jill 645 DB F05 1 645 2 Bo 683 AI S05 1 683 3 Maya 635 Arch F05 3 635 PROFESSOR Teaches • Convert to tables + fid name fid cid constraints 1 Diao 1 645 • Then need to do 2 Saul 2 683 “physical” design: the layout on disk, indices, 8 Weems 8 635 etc.

  15. Queries • Find all courses that “Mary” takes SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.sid = T.sid and T.cid = C.cid • What happens behind the scene ? – Query processor figures out how to answer the query efficiently.

  16. Queries, behind the scene Declarative SQL query Query execution plan: sname SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.sid = T.sid and T.cid = C.cid cid=cid sid=sid name=“Mary” Courses Takes Students The optimizer chooses the best execution plan for a query

  17. An Issue: 80% of the World’s Data is Not in a DB! Examples: – Scientific data (large images, complex programs that analyze the data) – Personal data – WWW and email (some of it is stored in something resembling a DBMS) Data management is expanding to tackle these problems

  18. DBMSs in the Real World A huge industry for 20% of the world’s data! • Big, mature relational databases – IBM DB2, Oracle, Microsoft SQL Server – Adding advanced features, including “native XML” support • “Middleware” above these systems – SAP, Siebel, PeopleSoft, dozens of special-purpose apps • Integration and warehousing systems – BEA AquaLogic, DB2 Information Integrator • Current trends: – Web services; XML everywhere – Smarter, self-tuning systems – Distributed databases, column-stores

  19. Database Research • One of the broadest, most exciting areas in CS! • A microcosm of CS in general • languages, operating systems, concurrent programming, data structures, algorithms, theory, distributed systems, statistical techniques. • Theory and systems well-integrated.

  20. Recent Trends in Databases • XML – Relational databases with XML support – Middleware between XML and relational databases – Large-scale XML message systems • Main memory database systems • Peer data management • Stream data management • Model management, provenance • Security and privacy • Modeling uncertainty, probabilistic databases

  21. What is the Field of Databases ? • To an applied researcher (SIGMOD/VLDB/ICDE) – Query optimization – Query processing (yet-another join algorithm) – Transaction processing, recovery (but most stuff is already done) – Novel applications: data mining, high-dimensional search • To a theoretical researcher (PODS/ICDT/LICS) – Focus on the query languages – Query language = logic = complexity classes

  22. Course topics • Fundamentals : relational design, query languages. • Theory : expressiveness of query languages, static analysis, complexity. • Database internals : storage, indexing, query processing, query optimization, transaction management. • XML and semi-structured data models. • Security: access control, privacy. • Advanced topics : incomplete/probabilistic DBs, parallel and distributed DBs.

  23. Prerequisites • Official: undergrad course in DB or OS • Also: – Elementary complexity theory

  24. Grading • Homework: 30% • Paper reviews & participation: 15% • Midterm: 30% • Project: 25%

  25. Homework: 30% • ~ 4 assignments throughout the course – written problem sets – practical experience with SQL, XQuery

  26. Paper Reviews & Participation: 15% • Approximately 5 classic papers will be assigned • Short written reviews are due before the day of class. Email to: – cs645-reviews@cs.umass.edu First paper review: Read thru 1.4 of Codd’s paper Due Wed Feb 5th

  27. Project: 25% • General theme: apply database principles to a new problem • Suggested topics will be discussed next Tuesday • Groups of 2 preferred. 3 possible. • Project work will include: – Reading some of the research literature – Implementation – Written report – In-class presentation • Periodic consultation with the instructor

  28. Midterm Exam (30%) • Midterm scheduled for Apr 17th at 7pm • (No Final!)

  29. Textbook Database Management Systems Ramakrishnan and Gehrke

  30. Other useful resources • Database systems: the complete book (Ullman, Widom and Garcia-Molina) • Readings in Database Systems (Stonebraker and Hellerstein) • Foundations of Databases (Abiteboul, Hull, Vianu) • Data on the Web (Abiteboul, Buneman, Suciu) • Parallel and Distributed DBMS (Ozsu and Valduriez) • Transaction Processing (Gray and Reuter) • Data and Knowledge based Systems (volumes I, II) (Ullman) • Proceedings of SIGMOD, VLDB, PODS conferences.

  31. Communication • Instructor – Office hours: by appointment – Email: miklau at cs dot umass dot edu • Check the course webpage often • You should have been added to the mailing list. 31

  32. Questions about the course? 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend