what is a database system
play

What is a database system ? Database: a large, integrated collection - PowerPoint PPT Presentation

2 What is a database system ? Database: a large, integrated collection of data. Models a real world enterprise Entities (teams, games) Relationships (Orphan Pamuk received the Nobel Prize) Course introduction Constraints (


  1. 2 What is a database system ? • Database: a large, integrated collection of data. • Models a real ‐ world enterprise – Entities (teams, games) – Relationships (Orphan Pamuk received the Nobel Prize) Course introduction – Constraints ( at least one doctor on duty during off ‐ hours) – More recently, active components (“business logic”) Introduction to databases • Database Management System (DBMS): a CSCC43 Winter 2013 software system designed to store, manage, Ryan Johnson and facilitate access to databases. Thanks to Arnold Rosenbloom and Renee Miller for material in these slides 3 4 In the beginning… Early computing challenges • Time sharing • Bare hardware • There was The Mainframe – ~100 terminals per mainframe – No OS – Cost: millions – Users share hardware – No device drivers – Watts: millions – Want to share data , too – No file system – Size: acres – Speed: 40kHz – Memory: 2kB – Storage: 3.5MB (tape) SAGE (1954) SABRE (1960) UNIVAC (1951) => “The Database” => File Management System Few organizations could afford two! 1

  2. 5 6 “The Database” File management systems (FMS) • Abstract concept dating back to the 1950’s • File management ca. 1935 – Centralized repository for all the enterprise’s data – File: box of punchcards – Realtime updates from many sources – Metadata: label on the box – Concurrent access by many users – Ad ‐ hoc report: no big deal – Interactive (ad ‐ hoc) exploration and reporting – Hardware change: no big deal • Semi Automatic Ground Environment (SAGE) • File management ca. 1955 – Computer ‐ aided tracking and interception of aircraft – File: several km of magnetic tape – Dozens of SAGE installations (big one in North Bay) – Metadata: embedded in application logic – Hundreds of radar stations throughout North America – Ad ‐ hoc report: hire a couple programmers – Thousands of operators – Hardware change: hire a dozen programmers… Goal: all relevant information at your fingertips Huge need for portability, abstraction 7 Database Management System Why study databases?? • Shift from computation to information • File management systems meet The Database – always true for corporate computing – Protect users from each other (isolation, consistency) – Web made this point for personal computing – Protect application from data changes (at logical level) – more and more true for scientific computing – Protect data from hardware changes (at physical level) • Need for DBMS has exploded • Split personality remains to this day – Corporate: retail swipe/clickstreams, “customer relationship mgmt”, “supply chain mgmt”, “data warehouses”, etc. – Theory/applications (declarative access to changing data) – Scientific: digital libraries, Human Genome project, Sloan Digital Sky – Systems (make it run fast on ever ‐ changing hardware) Survey, physical sensors, grid physics network • Why so important? • A practical discipline spanning much of CS – Rate of change of DB applications is incredibly slow – OS, languages, theory, AI, multimedia, logic – Yet with a focus on real ‐ world apps – d app /dt << d platform /dt This semester: the theory/application side 2

  3. 9 10 What’s the intellectual content? Is the WWW a DBMS? • Representing information • Fairly sophisticated search available – Crawler indexes pages on the web – data modeling – Keyword ‐ based search for pages • Languages and systems for querying data • But… – complex queries with real semantics* – Data is mostly unstructured and untyped – over massive data sets – Search only (can’t modify, summarize, analyze, correlate, …) • Concurrency control for data manipulation – Few (zero) guarantees of freshness, accuracy, durability, consistency – DBMS lurking behind most Web sites provides these functions – controlling concurrent access • The picture is changing – ensuring transactional semantics – New standards like XML can help data modeling • Reliable data storage – The WWW/DB boundary is blurry! – maintain data semantics even if the lights go out * semantics: the meaning or relationship of meanings of a sign or set of signs 11 12 “Search” vs. Query Is my file system a DBMS? • What if you wanted to find • Strong shared heritage out which actors donated to – Direct descendant of file management system Steven Harper’s campaign? – Excellent insulator against hardware changes • Try “actors donate to harper • But… campaign” in your favorite search engine. – Data is mostly unstructured and untyped – No concept of constraints, relationships – Minimal support for atomicity, isolation, consistency • Stephen Harper (politician) or Hill Harper (actor)? • The picture is changing • Did Harper give or – File systems adopting database concepts (logging, transactions) receive the donation? – Object ‐ oriented file systems provide finer grain data model • Year? Comparison with other – The FS/DBMS boundary is blurry! donations? 3

  4. 13 14 Database vs. file system OS support for data management • Thought experiment #1 • Again, strong shared heritage – You and your project partner are editing the same file. – Another direct descendant of file management system – You both save it at the same time. – Powerful API abstractions – Whose changes survive? – Bring your favorite programming language A) Yours B) Partner’s C) Both D) Neither E) Who knows – Enforces protections on files, objects • Thought experiment #2 • But… – You’re updating a file when the lights go out – Scheduling, resource management inadequate for big data – Which of your changes survive? A) All B) None C) All since last save D) Who knows – Error handling: “program terminated with SIGSEGV” – Ad ‐ hoc query? Hire a programmer… • How to code against “who knows” ??? – Concurrency? Write code very, very carefully… – Very, very carefully… 15 16 DBMS vs. {OS, FS, WWW} Concept: transaction • Key services missing from some or all • “Business transaction” • Database transaction – Recovery, isolation, consistency – Old idea: withdraw money, – Sequence of reads and writes reserve seats, escrow, etc. to underlying data – Support for ad ‐ hoc queries – Atomic : I deliver and you pay, – Writes [appear to] take effect – Effective concurrency control or neither atomically – Preserve semantics across crashes, outages – Consistent : Sell each seat to – Each transaction moves the • SMOP? Simple matter of programming? only one person system between consistent states** – Isolated : Doctor doesn’t talk – Not really (we’ll see this semester) about the patient next door – Transactions can’t see (or – In fact, OS/FS often get in the way (next semester) interfere with) each other – Durable : Sales receipt, – Analogy: Memory management in C++ vs. Java confirmation number, etc. – Once the system returns success it will not lose the data • Misquoting Greenspun’s tenth rule: ** user responsible to write sane transactions Any sufficiently complex data processing system resembles a Formalized into an entire programming model buggy, half ‐ implemented, and poorly performing DBMS 4

  5. Concept: concurrency control Concept: data models • Concurrent execution: key to high performance. • Data model: a collection of concepts for – Disk accesses frequent, pretty slow describing data. – Keep the CPU working on several programs concurrently • Schema: a description of a particular • Interleaving two programs’ actions: trouble! collection of data, using a given data – Print statements during active account transfer model. – He and She both withdraw the last $100 from the ATM • DBMS ensures “anomalies” don’t arise • Many possible data models – Give users/programmers illusion of a single ‐ user system – Network, hierarchical, relational, object ‐ oriented, … – Thank goodness! Don’t have to program “very, very – The relational model is the most widely used today carefully”. A good data model is key to data independence 19 20 Concept: data independence Advantages of a DBMS • FMS (1950’s) • Data independence – File, metadata management • Efficient data access – Hardware abstraction layer • CODASYL/DBTG (1965) • Data integrity & security – Decouple application from schema – Decouple schema from physical data layout • Data administration • Edgar Codd (1970) • Concurrent access, crash recovery – Relational algebra – Move from procedural to declarative • Reduced application development time • Charles Bachman (1973) – Programmer navigates data instead of (merely) writing code • So why not use them always? – Move from machine ‐ centric to data ‐ centric programming – Expensive/complicated to set up & maintain • Fast forward to today – SQL, ODBC/JDBC, federation, web services, … – Cost & complexity must be offset by need – Data integration, cleaning, performance tuning, … – General ‐ purpose, not suited for special ‐ purpose tasks (e.g. text search!) Big Deal™… but still a work in progress 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend