storing data review
play

Storing Data Review Data collection is an important issue Dirty - PDF document

Storing Data Review Data collection is an important issue Dirty data Multiple sources Data collection Data consolidation Data cleaning Data integration CMPT 354: Database I -- Storing Data 2 What Is the Next


  1. Storing Data

  2. Review • Data collection is an important issue – Dirty data – Multiple sources • Data collection • Data consolidation – Data cleaning – Data integration CMPT 354: Database I -- Storing Data 2

  3. What Is the Next After We Get Data? • Store data – Holders: a paper notebook, a spreadsheet, a relational database, … – We focus on relational tables in this course • Does a database need some structure? If so, how to construct the structure? • How can a user store data into a database? • Can we prevent two students sharing a same student-id from happening? CMPT 354: Database I -- Storing Data 3

  4. Outline • How to create a relational table? • How to insert data into a table? • Integration constraints: making the data correct • Data first or schema first? CMPT 354: Database I -- Storing Data 4

  5. Storing Data in Tables • For each web page, store the information about – URL, language, last update time, number of outgoing hyperlinks • A spreadsheet works! URL Language Last update # of out links www.sfu.ca HTML Sept. 5 45 CMPT 354: Database I -- Storing Data 5

  6. Tables • Relational databases store data in tables – A database can have many tables • Schema: WebPage(URL, language, update_date, n_out_links) CMPT 354: Database I -- Storing Data 6

  7. Create a Table • CREATE statement CREATE TABLE WebPage ( URL CHAR(256), language CHAR(40), update_date DATETIME, n_out_links INT ) DATETIME is the DATE type in SQL Server CMPT 354: Database I -- Storing Data 7

  8. Data Types (1) • Character strings – Fixed length: CHAR(n) – Varying length, up to n characters: VARCHAR(n) • Bit strings – Fixed length: BIT(n) – Varying length, up to n bits: BIT VARYING(n) • Boolean values BOOLEAN – TRUE, FALSE, or UNKNOWN CMPT 354: Database I -- Storing Data 8

  9. Data Types (2) • Integers INT or INTEGER – SHORTINT is also supported in some systems • Floating-point numbers – FLOAT or REAL – DOUBLE PRECESION – Real numbers with a fixed decimal point: DECIMAL(n, d) • 123.45 is a possible value of type DECIMAL(5, 2) • Dates and times – Exact time: DATE and TIME – Durations of time: INTERVAL CMPT 354: Database I -- Storing Data 9

  10. Table and Data • Table itself is not user data! – Is table data? • Table is a container of data • Data is represented as tuples in tables – Tuple (‘www.sfu.ca’, ‘HTML’, 9/5/2006, 45) CMPT 354: Database I -- Storing Data 10

  11. Insert a Tuple into a Table INSERT INTO WebPage(URL, language, update_date, n_out_links) VALUES (‘SFU’, ‘HTML’, ’9/5/2007’, 45) • The attributes and the corresponding values must be in the same order CMPT 354: Database I -- Storing Data 11

  12. Storing Complex Data • Multimedia objects are often huge – Music, video, e-books, … • BLOB: binary large object block • Complex data can be stored as BLOB’s – Applications are in charge of handling and explanation • Databases only store and retrieve BLOB’s as whole CMPT 354: Database I -- Storing Data 12

  13. Example • Extend the WebPage table to store web page content ALTER TABLE WebPage ADD content VARCHAR(MAX) CMPT 354: Database I -- Storing Data 13

  14. Storage in SQL Server – In Row Figure from http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true CMPT 354: Database I -- Storing Data 14

  15. Storage in SQL Server – Out Row Figure from http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true CMPT 354: Database I -- Storing Data 15

  16. Tips on Straightforward Design • In many cases, we can start with one table containing all attributes we want to store – Universal table as will be explained later • NULL value: value unknown, unavailable or non-applicable ALTER TABLE WebPage ADD title CHAR(256) INSERT INTO WebPage(URL, language, update_date, n_out_links, title) VALUES (‘www.my_page.net’, ‘HTML’, ‘9/5/2007’, 2, NULL) CMPT 354: Database I -- Storing Data 16

  17. Integrity Constraints – NOT NULL • Each web page must have a unique URL – NULL is not allowed in attribute URL CREATE TABLE WebPage ( URL CHAR(256) NOT NULL, language CHAR(40), update_date DATETIME NOT NULL, n_out_links INT NOT NULL ) CMPT 354: Database I -- Storing Data 17

  18. Integrity Constraints – Primary Key • Each web page must have a unique URL, an update time, and the number of out links – The language of an empty page can be undetermined CREATE TABLE WebPage ( URL CHAR(256) NOT NULL, language CHAR(40), update_date DATETIME NOT NULL, n_out_links INT NOT NULL, primary key (URL) ) CMPT 354: Database I -- Storing Data 18

  19. Data Integrity and Constraints • Integrity constraints: ensure accuracy and consistency of data in a database • Referential integrity: the properties that should be satisfied when data entries are referenced – Not NULL – Primary key – More will be discussed later CMPT 354: Database I -- Storing Data 19

  20. Schema-First or Data-First • Schema first approaches – Design a schema, collect data according to the schema – Traditional, well controlled applications: student information systems, library loan systems • Data first approaches – Data exists before a database is designed, or evolving data – More and more large applications: integrating data sources on the web (e.g., a database of blogs) CMPT 354: Database I -- Storing Data 20

  21. Summary • Creating tables in relational databases • Inserting tuples into a table • Specifying integrity constraints CMPT 354: Database I -- Storing Data 21

  22. To-Do List • Check the data types in SQL Sever – Some data types in standard SQL may not be available in SQL Server. Please find their correspondences • Create a table for web pages in SQL Server – Record URL, language, update time, number of outgoing links, and title – Use VARCHAR to store content CMPT 354: Database I -- Storing Data 22

  23. Advanced To-Do List • How can you extend the web page table to store the outgoing links? – Each outgoing link is a URL • SQL Server 2005 comes with a database AdventureWorks. Try to understand what data is stored in the database CMPT 354: Database I -- Storing Data 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend