Storing Data Review Data collection is an important issue Dirty - - PDF document

storing data review
SMART_READER_LITE
LIVE PREVIEW

Storing Data Review Data collection is an important issue Dirty - - PDF document

Storing Data Review Data collection is an important issue Dirty data Multiple sources Data collection Data consolidation Data cleaning Data integration CMPT 354: Database I -- Storing Data 2 What Is the Next


slide-1
SLIDE 1

Storing Data

slide-2
SLIDE 2

CMPT 354: Database I -- Storing Data 2

Review

  • Data collection is an important issue

– Dirty data – Multiple sources

  • Data collection
  • Data consolidation

– Data cleaning – Data integration

slide-3
SLIDE 3

CMPT 354: Database I -- Storing Data 3

What Is the Next After We Get Data?

  • Store data

– Holders: a paper notebook, a spreadsheet, a relational database, … – We focus on relational tables in this course

  • Does a database need some structure? If

so, how to construct the structure?

  • How can a user store data into a database?
  • Can we prevent two students sharing a

same student-id from happening?

slide-4
SLIDE 4

CMPT 354: Database I -- Storing Data 4

Outline

  • How to create a relational table?
  • How to insert data into a table?
  • Integration constraints: making the data

correct

  • Data first or schema first?
slide-5
SLIDE 5

CMPT 354: Database I -- Storing Data 5

Storing Data in Tables

  • For each web page, store the information

about

– URL, language, last update time, number of

  • utgoing hyperlinks
  • A spreadsheet works!

URL Language Last update # of out links www.sfu.ca HTML

  • Sept. 5

45

slide-6
SLIDE 6

CMPT 354: Database I -- Storing Data 6

Tables

  • Relational databases store data in tables

– A database can have many tables

  • Schema: WebPage(URL, language,

update_date, n_out_links)

slide-7
SLIDE 7

CMPT 354: Database I -- Storing Data 7

Create a Table

  • CREATE statement

CREATE TABLE WebPage ( URL CHAR(256), language CHAR(40), update_date DATETIME, n_out_links INT ) DATETIME is the DATE type in SQL Server

slide-8
SLIDE 8

CMPT 354: Database I -- Storing Data 8

Data Types (1)

  • Character strings

– Fixed length: CHAR(n) – Varying length, up to n characters: VARCHAR(n)

  • Bit strings

– Fixed length: BIT(n) – Varying length, up to n bits: BIT VARYING(n)

  • Boolean values BOOLEAN

– TRUE, FALSE, or UNKNOWN

slide-9
SLIDE 9

CMPT 354: Database I -- Storing Data 9

Data Types (2)

  • Integers INT or INTEGER

– SHORTINT is also supported in some systems

  • Floating-point numbers

– FLOAT or REAL – DOUBLE PRECESION – Real numbers with a fixed decimal point: DECIMAL(n, d)

  • 123.45 is a possible value of type DECIMAL(5, 2)
  • Dates and times

– Exact time: DATE and TIME – Durations of time: INTERVAL

slide-10
SLIDE 10

CMPT 354: Database I -- Storing Data 10

Table and Data

  • Table itself is not user data!

– Is table data?

  • Table is a container of data
  • Data is represented as tuples in tables

– Tuple (‘www.sfu.ca’, ‘HTML’, 9/5/2006, 45)

slide-11
SLIDE 11

CMPT 354: Database I -- Storing Data 11

Insert a Tuple into a Table

INSERT INTO WebPage(URL, language, update_date, n_out_links) VALUES (‘SFU’, ‘HTML’, ’9/5/2007’, 45)

  • The attributes and the corresponding values

must be in the same order

slide-12
SLIDE 12

CMPT 354: Database I -- Storing Data 12

Storing Complex Data

  • Multimedia objects are often huge

– Music, video, e-books, …

  • BLOB: binary large object block
  • Complex data can be stored as BLOB’s

– Applications are in charge of handling and explanation

  • Databases only store and retrieve BLOB’s

as whole

slide-13
SLIDE 13

CMPT 354: Database I -- Storing Data 13

Example

  • Extend the WebPage table to store web

page content

ALTER TABLE WebPage ADD content VARCHAR(MAX)

slide-14
SLIDE 14

CMPT 354: Database I -- Storing Data 14

Storage in SQL Server – In Row

Figure from http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true

slide-15
SLIDE 15

CMPT 354: Database I -- Storing Data 15

Storage in SQL Server – Out Row

Figure from http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true

slide-16
SLIDE 16

CMPT 354: Database I -- Storing Data 16

Tips on Straightforward Design

  • In many cases, we can start with one table

containing all attributes we want to store

– Universal table as will be explained later

  • NULL value: value unknown, unavailable or

non-applicable

ALTER TABLE WebPage ADD title CHAR(256) INSERT INTO WebPage(URL, language, update_date, n_out_links, title) VALUES (‘www.my_page.net’, ‘HTML’, ‘9/5/2007’, 2, NULL)

slide-17
SLIDE 17

CMPT 354: Database I -- Storing Data 17

Integrity Constraints – NOT NULL

  • Each web page must have a unique URL

– NULL is not allowed in attribute URL CREATE TABLE WebPage ( URL CHAR(256) NOT NULL, language CHAR(40), update_date DATETIME NOT NULL, n_out_links INT NOT NULL )

slide-18
SLIDE 18

CMPT 354: Database I -- Storing Data 18

Integrity Constraints – Primary Key

  • Each web page must have a unique URL, an

update time, and the number of out links

– The language of an empty page can be undetermined CREATE TABLE WebPage ( URL CHAR(256) NOT NULL, language CHAR(40), update_date DATETIME NOT NULL, n_out_links INT NOT NULL, primary key (URL) )

slide-19
SLIDE 19

CMPT 354: Database I -- Storing Data 19

Data Integrity and Constraints

  • Integrity constraints: ensure accuracy and

consistency of data in a database

  • Referential integrity: the properties that

should be satisfied when data entries are referenced

– Not NULL – Primary key – More will be discussed later

slide-20
SLIDE 20

CMPT 354: Database I -- Storing Data 20

Schema-First or Data-First

  • Schema first approaches

– Design a schema, collect data according to the schema – Traditional, well controlled applications: student information systems, library loan systems

  • Data first approaches

– Data exists before a database is designed, or evolving data – More and more large applications: integrating data sources on the web (e.g., a database of blogs)

slide-21
SLIDE 21

CMPT 354: Database I -- Storing Data 21

Summary

  • Creating tables in relational databases
  • Inserting tuples into a table
  • Specifying integrity constraints
slide-22
SLIDE 22

CMPT 354: Database I -- Storing Data 22

To-Do List

  • Check the data types in SQL Sever

– Some data types in standard SQL may not be available in SQL Server. Please find their correspondences

  • Create a table for web pages in SQL Server

– Record URL, language, update time, number of

  • utgoing links, and title

– Use VARCHAR to store content

slide-23
SLIDE 23

CMPT 354: Database I -- Storing Data 23

Advanced To-Do List

  • How can you extend the web page table to

store the outgoing links?

– Each outgoing link is a URL

  • SQL Server 2005 comes with a database
  • AdventureWorks. Try to understand what

data is stored in the database