Data Warehouse and OLAP Data Warehouse and OLAP Week 5 1 Midterm - - PowerPoint PPT Presentation

data warehouse and olap data warehouse and olap
SMART_READER_LITE
LIVE PREVIEW

Data Warehouse and OLAP Data Warehouse and OLAP Week 5 1 Midterm - - PowerPoint PPT Presentation

Data Warehouse and OLAP Data Warehouse and OLAP Week 5 1 Midterm I Midterm I Friday, March 4 Scope Homework assignments 1 4 Open book Team Homework Assignment #7 Team Homework Assignment #7 Read pp. 121 139, 146


slide-1
SLIDE 1

Data Warehouse and OLAP Data Warehouse and OLAP

Week 5

1

slide-2
SLIDE 2

Midterm I Midterm I

  • Friday, March 4
  • Scope

– Homework assignments 1 – 4 – Open book

slide-3
SLIDE 3

Team Homework Assignment #7 Team Homework Assignment #7

R d 121 139 146 150 f h b k

  • Read pp. 121 – 139, 146 – 150 of the text book.
  • Do Examples 3.8, 3.10 and Exercise 3.4 (b) and (c). Prepare for

the results of the homework assignment. the results of the homework assignment.

  • Due date

– beginning of the lecture on Friday March11th.

slide-4
SLIDE 4

Topics Topics

  • Definition of data warehouse
  • Multidimensional data model
  • Data warehouse architecture
  • From data warehousing to data mining
slide-5
SLIDE 5

What is Data Warehouse? (1) What is Data Warehouse? (1)

  • A data warehouse is a repository of information

collected from multiple sources, stored under a unified schema, and that usually resides at a single site

  • A data warehouse is a semantically consistent data

store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise need to make strategic decisions

slide-6
SLIDE 6

What is Data Warehouse? (2) What is Data Warehouse? (2)

  • Data warehouses provide on‐line analytical

Data warehouses provide on line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitate effective data generalization and data mining

  • Many other data mining functions, such as

association, classification, prediction, and clustering, b i t t d ith OLAP ti t h can be integrated with OLAP operations to enhance interactive mining of knowledge at multiple levels of abstraction abstraction

slide-7
SLIDE 7

What is Data Warehouse? (3) What is Data Warehouse? (3)

  • A decision support database that is maintained

separately from the organization’s operational database

  • “A data warehouse is a subject‐oriented, integrated,

time‐variant, and nonvolatile collection of data in support of management’s decision‐making process [Inm96].”—W. H. Inmon

slide-8
SLIDE 8

Data Ware ho use F rame wo rk Data Ware ho use F rame wo rk

data mining Figure 1.7 Typical framework of a data warehouse for AllElectronics

8

slide-9
SLIDE 9

Data Warehouse is S bj O i d Subject-Oriented

  • Organized around major subjects, such as customer,

product, sales, etc.

  • Focusing on the modeling and analysis of data for

decision makers, not on daily operations or transaction processing

  • Provide a simple and concise view around particular

subject issues by excluding data that are not useful in the decision support process

slide-10
SLIDE 10

Data Warehouse is I t t d Integrated

  • Constructed by integrating multiple, heterogeneous data

sources sources – relational databases, flat files, on‐line transaction records

  • Data cleaning and data integration techniques are applied

Data cleaning and data integration techniques are applied. – Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources

  • E.g., Hotel price: currency, tax, breakfast covered, etc.
slide-11
SLIDE 11

D t W h i Ti V i t Data Warehouse is Time Variant

  • The time horizon for the data warehouse is significantly

longer than that of operational systems g p y – Operational database: current value data – Data warehouse data: provide information from a historical perspective (e.g., past 5‐10 years)

  • Every key structure in the data warehouse

– Contains an element of time, explicitly or implicitly

slide-12
SLIDE 12

D t W h i N l til Data Warehouse is Nonvolatile

  • A physically separate store of data transformed from the
  • perational environment
  • Operational update of data does not occur in the data

warehouse environment – Does not require transaction processing, recovery, and concurrency control mechanisms R i l t ti i d t i – Requires only two operations in data accessing:

  • initial loading of data and access of data
slide-13
SLIDE 13

OL T P vs OL AP OL T P vs. OL AP

T a ble 3.1 Co mpariso n be twe e n OL

T P and OL AP

13

slide-14
SLIDE 14

Why Separate is Data Warehouse Needed? (1) (1)

  • Why not perform on‐line analytical processing directly on
  • perational databases instead of spending additional time
  • perational databases instead of spending additional time

and resources to construct a separate data warehouse?

slide-15
SLIDE 15

Why Separate is Data Warehouse N d d? ( ) Needed? (2)

  • High performance for both systems

g p y – DBMS— tuned for OLTP: searching for particular records, indexing, hashing, concurrency control, recovery – Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation (summarization and ti ) aggregation)

slide-16
SLIDE 16

Topics Topics

  • Definition of data warehouse
  • Multidimensional data model
  • Data warehouse architecture
  • From data warehousing to data mining
slide-17
SLIDE 17

From Tables and Spreadsheets to D C b Data Cubes

  • A data warehouse is based on a multidimensional

A data warehouse is based on a multidimensional data model

  • This model views data in the form of a data cube

This model views data in the form of a data cube

  • A data cube allows data to be modeled and viewed in

multiple dimensions multiple dimensions

slide-18
SLIDE 18

From Tables and Spreadsheets to Data C b (1) Cubes (1)

  • A data cube is defined by facts and dimensions

A data cube is defined by facts and dimensions – Facts are data which data warehouse focus on

  • Fact tables contain numeric measures (such as

Fact tables contain numeric measures (such as dollars_sold) and keys to each of the related dimension tables – Dimensions are perspectives with respect to fact

  • Dimension tables describe the dimension with
  • attributes. For example, item (item_name, brand, type),
  • r time(day week month quarter year)
  • r time(day, week, month, quarter, year)
slide-19
SLIDE 19

F ig

fro m

ure 1.6. F

ra m a re latio agme nts o

  • nal datab
  • f re latio ns

base fo r AllE le c tro nic s

19

slide-20
SLIDE 20

F ro m T able s and Spre adshe e ts t D t C b (2) to Data Cube s (2)

dimensions

T a ble 3.2 A 2-D vie w o f sale s data fo r AllE

le c tro nic s ac c o rding to the di i ti d it h th l f b h l t d i

Facts (numerical measures)

dime nsio ns time and ite m, whe re the sale s are fro m branc he s lo c ate d in the c ity o f Vanc o uve r. T he me asure displaye d is do llar_so ld (in tho usands).

20

slide-21
SLIDE 21

F ro m T able s and Spre adshe e ts t D t C b (3) to Data Cube s (3)

T a ble 3.3 A 3-D vie w o f sale s data fo r AllE

le c tro nic s ac c o rding to the dime nsio ns time , ite m, and lo c atio n. T he me asure displaye d is do llar_so ld (in tho usands).

21

slide-22
SLIDE 22

F ro m T able s and Spre adshe e ts t D t C b (4) to Data Cube s (4)

F ig ure 3.1 A 3-D data c ube re pre se ntatio n o f the data in T

able 3.3, ac c o rding to the dime nsio ns time , ite m, and lo c atio n. T he me asure displaye d is do llar_so ld (in tho usands).

22

slide-23
SLIDE 23

F ro m T able s and Spre adshe e ts t D t C b (5) to Data Cube s (5)

F ig ure 3.2 A 4-D data c ube re pre se ntatio n, ac c o rding to the dime nsio ns

time , ite m, lo c atio n, and supplie r. T he me asure displaye d is do llar so ld (in time , ite m, lo c atio n, and supplie r. T he me asure displaye d is do llar_so ld (in tho usands).

23

slide-24
SLIDE 24

Cuboid Cuboid

  • A data cube is a lattice of cuboids
  • The total number of cuboids
  • The apex cuboid
  • The base cuboid

24

slide-25
SLIDE 25

F ig ure 3.14 L

attic e o f c ubo ids, making up a 3-D data c ube . E ac h

25

g

g p c ubo id re pre se nts a diffe re nt gro up-by. T he base c ubo id c o ntains the thre e dime nsio ns c ity, ite m, and ye ar.

slide-26
SLIDE 26

The Curse of Dimensionality The Curse of Dimensionality

  • How many cuboids are there in a n‐dimensional data cube?
  • How many cuboids are there in a n dimensional data cube
  • How many cuboids are there in a n‐dimensional data cube

and each dimension (i) has the number of level, (Li)?

26

slide-27
SLIDE 27

Conceptual Modeling of Data W h Warehouses

M d li d t h di i &

  • Modeling data warehouses: dimensions & measures

– Star schema: A fact table in the middle connected to a set

  • f dimension tables
  • f dimension tables

– Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake F t t ll ti M lti l f t t bl h di i – Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation g y

slide-28
SLIDE 28

Star Sc he ma Star Sc he ma

time key time item _ y day day_of_the_week month quarter item_key item_name brand type Sales Fact Table time key q year type supplier_type branch_key item_key time_key dollars_sold location_key branch_key branch_name location_key street cit branch location unit_sold branch_type city province_or_street country

F ig ure 3.4 Star sc he ma o f a data ware ho use fo r sale s.

28

slide-29
SLIDE 29

Sno wflake Sc he ma Sno wflake Sc he ma

supplier key supplier time_key time item Sales Fact Table time_key supplier_key supplier_type day day_of_the_week month quarter item_key item_name brand type location key branch_key item_key year type supplier_key units_sold dollars_sold location_key branch_key branch_name b h t branch location_key location city branch_type street city city_key city province_or_street country

F ig ure 3.4 Sno wflake sc he ma o f a data ware ho use fo r sale s.

country

29

slide-30
SLIDE 30

F ac t Co nste llatio n F ac t Co nste llatio n

ti k item_key Shipping Fact Table time_key time item Sales Fact Table from_location shipper_key time_key day day_of_the_week month quarter item_key item_name brand item Sales Fact Table k time_key unit shipped dollars_sold to_location location key year type supplier_type branch_key item_key _ pp unit_sold dollars_sold location_key branch_key branch_name branch type location_key street branch location hi shipper_key shipper_name location key shipper branch_type city province_or_street country shipper

F ig ure 3.5 F

ac t c o nste llatio n sc he ma o f a data ware ho use fo r sale s and shipping.

location_key shipper_type

30

slide-31
SLIDE 31

Exercise Exercise

  • Exercise 3 5 (a) – page 153

Exercise 3.5 (a) page 153

31