Storing Data Review Data collection is an important issue Dirty - - PDF document
Storing Data Review Data collection is an important issue Dirty - - PDF document
Storing Data Review Data collection is an important issue Dirty data Multiple sources Data collection Data consolidation Data cleaning Data integration CMPT 354: Database I -- Storing Data 2 What Is the Next
CMPT 354: Database I -- Storing Data 2
Review
- Data collection is an important issue
– Dirty data – Multiple sources
- Data collection
- Data consolidation
– Data cleaning – Data integration
CMPT 354: Database I -- Storing Data 3
What Is the Next After We Get Data?
- Store data
– Holders: a paper notebook, a spreadsheet, a relational database, … – We focus on relational tables in this course
- Does a database need some structure? If
so, how to construct the structure?
- How can a user store data into a database?
- Can we prevent two students sharing a
same student-id from happening?
CMPT 354: Database I -- Storing Data 4
Outline
- How to create a relational table?
- How to insert data into a table?
- Integration constraints: making the data
correct
- Data first or schema first?
CMPT 354: Database I -- Storing Data 5
Storing Data in Tables
- For each web page, store the information
about
– URL, language, last update time, number of
- utgoing hyperlinks
- A spreadsheet works!
URL Language Last update # of out links www.sfu.ca HTML
- Sept. 5
45
CMPT 354: Database I -- Storing Data 6
Tables
- Relational databases store data in tables
– A database can have many tables
- Schema: WebPage(URL, language,
update_date, n_out_links)
CMPT 354: Database I -- Storing Data 7
Create a Table
- CREATE statement
CREATE TABLE WebPage ( URL CHAR(256), language CHAR(40), update_date DATETIME, n_out_links INT ) DATETIME is the DATE type in SQL Server
CMPT 354: Database I -- Storing Data 8
Data Types (1)
- Character strings
– Fixed length: CHAR(n) – Varying length, up to n characters: VARCHAR(n)
- Bit strings
– Fixed length: BIT(n) – Varying length, up to n bits: BIT VARYING(n)
- Boolean values BOOLEAN
– TRUE, FALSE, or UNKNOWN
CMPT 354: Database I -- Storing Data 9
Data Types (2)
- Integers INT or INTEGER
– SHORTINT is also supported in some systems
- Floating-point numbers
– FLOAT or REAL – DOUBLE PRECESION – Real numbers with a fixed decimal point: DECIMAL(n, d)
- 123.45 is a possible value of type DECIMAL(5, 2)
- Dates and times
– Exact time: DATE and TIME – Durations of time: INTERVAL
CMPT 354: Database I -- Storing Data 10
Table and Data
- Table itself is not user data!
– Is table data?
- Table is a container of data
- Data is represented as tuples in tables
– Tuple (‘www.sfu.ca’, ‘HTML’, 9/5/2006, 45)
CMPT 354: Database I -- Storing Data 11
Insert a Tuple into a Table
INSERT INTO WebPage(URL, language, update_date, n_out_links) VALUES (‘SFU’, ‘HTML’, ’9/5/2007’, 45)
- The attributes and the corresponding values
must be in the same order
CMPT 354: Database I -- Storing Data 12
Storing Complex Data
- Multimedia objects are often huge
– Music, video, e-books, …
- BLOB: binary large object block
- Complex data can be stored as BLOB’s
– Applications are in charge of handling and explanation
- Databases only store and retrieve BLOB’s
as whole
CMPT 354: Database I -- Storing Data 13
Example
- Extend the WebPage table to store web
page content
ALTER TABLE WebPage ADD content VARCHAR(MAX)
CMPT 354: Database I -- Storing Data 14
Storage in SQL Server – In Row
Figure from http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true
CMPT 354: Database I -- Storing Data 15
Storage in SQL Server – Out Row
Figure from http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true
CMPT 354: Database I -- Storing Data 16
Tips on Straightforward Design
- In many cases, we can start with one table
containing all attributes we want to store
– Universal table as will be explained later
- NULL value: value unknown, unavailable or
non-applicable
ALTER TABLE WebPage ADD title CHAR(256) INSERT INTO WebPage(URL, language, update_date, n_out_links, title) VALUES (‘www.my_page.net’, ‘HTML’, ‘9/5/2007’, 2, NULL)
CMPT 354: Database I -- Storing Data 17
Integrity Constraints – NOT NULL
- Each web page must have a unique URL
– NULL is not allowed in attribute URL CREATE TABLE WebPage ( URL CHAR(256) NOT NULL, language CHAR(40), update_date DATETIME NOT NULL, n_out_links INT NOT NULL )
CMPT 354: Database I -- Storing Data 18
Integrity Constraints – Primary Key
- Each web page must have a unique URL, an
update time, and the number of out links
– The language of an empty page can be undetermined CREATE TABLE WebPage ( URL CHAR(256) NOT NULL, language CHAR(40), update_date DATETIME NOT NULL, n_out_links INT NOT NULL, primary key (URL) )
CMPT 354: Database I -- Storing Data 19
Data Integrity and Constraints
- Integrity constraints: ensure accuracy and
consistency of data in a database
- Referential integrity: the properties that
should be satisfied when data entries are referenced
– Not NULL – Primary key – More will be discussed later
CMPT 354: Database I -- Storing Data 20
Schema-First or Data-First
- Schema first approaches
– Design a schema, collect data according to the schema – Traditional, well controlled applications: student information systems, library loan systems
- Data first approaches
– Data exists before a database is designed, or evolving data – More and more large applications: integrating data sources on the web (e.g., a database of blogs)
CMPT 354: Database I -- Storing Data 21
Summary
- Creating tables in relational databases
- Inserting tuples into a table
- Specifying integrity constraints
CMPT 354: Database I -- Storing Data 22
To-Do List
- Check the data types in SQL Sever
– Some data types in standard SQL may not be available in SQL Server. Please find their correspondences
- Create a table for web pages in SQL Server
– Record URL, language, update time, number of
- utgoing links, and title
– Use VARCHAR to store content
CMPT 354: Database I -- Storing Data 23
Advanced To-Do List
- How can you extend the web page table to
store the outgoing links?
– Each outgoing link is a URL
- SQL Server 2005 comes with a database
- AdventureWorks. Try to understand what