Outline What is de c ision suppor t An Overview of Data What is a - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline What is de c ision suppor t An Overview of Data What is a - - PDF document

Outline What is de c ision suppor t An Overview of Data What is a data war e house Warehousing and OLAP Why we ne e d it, and how it diffe r s fr om a re gular RDBMS Technology Diffe r e nc e be twe e n OL AP and OL T


slide-1
SLIDE 1

1

An Overview of Data Warehousing and OLAP Technology

Pre se nte r: Otto Disc ussion: Jim Ma r. 07 2006

2

  • What is de c ision suppor

t

  • What is a data war

e house

  • Why we ne e d it, and how it diffe r

s fr

  • m a

re gular RDBMS

  • Diffe r

e nc e be twe e n OL AP and OL T P

  • T

ypic al OL AP ar c hite c tur e

  • Database De sign Me thodology
  • Star

and Snowflake sc he ma s

  • Challe nge s of mate r

ialize d vie ws

  • Imple me ntation of the OL

AP Se r ve r

  • Me tadata r

e quir e me nts

Outline

3

Motive for a Data Warehouse

  • Busine sse s ha ve a lot of data, ope r

ational data and fac ts.

  • T

his da ta is usually in diffe re nt da ta ba se s and in diffe r e nt physic al plac e s.

  • De c ision make r

s ne e d to ac c e ss infor mation (data that ha s be e n summar ize d) vir tually on the single site .

  • T

his ac c e ss ne e ds to be fa st re gar dle ss of the size of the data, and how old the data is.

4

What is decision support

  • De c ision suppor

t syste ms are a c lass of c ompute r ize d infor mation syste ms that suppor t de c ision making ac tivitie s.

  • De c ision support syste ms usually re quire

c onsolidating data form ma ny he te r

  • ge ne ous sour

c e s: the se might inc lude e xte r nal sour c e s.

  • Suc h us stoc k mar

ke t fe e ds.

5 Retains the history Major subject areas Data from different data sources. Changes as new data trickle in

What is data warehouse

  • Data war

e housing pr

  • vide s ar

c hite c tur e s and tools for busine ss e xe c utive s to syste matic ally or ganize , unde r stand a nd use the ir data to make str ate gic de c isions. – Jiawe i Han

  • A data war

e house is a subje c t- or ie nte d, inte gr ate d, time - var iant, a nd non-vola tile c olle c tion of data in suppor t of manage me nt’s de c ision ma king pr

  • c e ss.

6

Difference between OLAP and OLTP

Que r y thr

  • ug hput

T ransac tion throughput

Me tr ic

100 GB- T B 100 MB-GB

DB size

Hundr e ds thousands

# use rs

Millions te ns

# re c ac c e sse d

Comple x que r y Shor t, simple tr ansa c tion

Unit of work

L

  • ts of sc a ns

Re ad/ wr ite

Ac c e ss

Ad- hoc r e pe titive

Usage

Histor ic al, summar ize d, multidime nsional,… Cur re nt, up-to-date de taile d.

Data

Subje c t-or ie nte d Applic ation- or ie nte d

DB De sign

De c ision suppor t Day to day ope rations

F unc tion

Knowle dg e wor ke r Cle r k, IT pr

  • fe ssional

Use r s OL AP OL T P

slide-2
SLIDE 2

2

7

Why Do we Separate DW From DB ?

  • Pe r

for manc e re asons: – OL

AP r e quir e s spe c ia l data organization that suppor ts multidime nsional vie ws.

– OL

AP que rie s would de grade ope r ationa l DB.

– OL

AP is r e ad only.

– No c onc urre nc y c ontrol a nd re c ove ry.

  • De c ision suppor

t r e quir e s histor ic al data.

  • De c ision suppor

t r e quir e s c onsolidate d data.

8

Typical OLAP architecture

9

Database Design Methodology

  • Most data wa re house s use a star sc he ma

to re pre se nt the multi-dime nsional mode l.

  • E

ac h dime nsion is r e pr e se nte d by a dime nsion-table that de sc r ibe s it.

  • A fac t- table c onne c ts to all dime nsion-

table s with a multiple join. E ac h tuple in the fac t-table c onsists of a pointe r to e ac h of the dime nsion-table s.

  • T

he links be twe e n the fac t-table in the c e ntr e and the dime nsion-table s form a shape like a star. (Star Sc he ma)

10

Example of Star Schema

11

Database Design Methodology (con’t)

  • E

ac h dime nsion is r e pr e se nte d by one table .

Un-nor

malize d (intr

  • duc e s r

e dundanc y)

E x: (Vanc ouve r, BC, Cana da, North Ame r ic a) (Vic toria , BC, Ca nada, North Ame r ic a)

Nor malize dime nsion table s Snowflake Sc he ma

12

Example of Snowflake Schema

slide-3
SLIDE 3

3

13

What I s the Best Design ??

Pe r for manc e be nc hmar king c an be de te r mine what is the be st de sign.

  • Snowflake sc he ma: E

a sie r to maintain dime nsion table s whe n dime nsion table ar e ve r y lar ge (r e duc e s ove r all spac e ).

  • Star

sc he ma : Mor e e ffe c tive for data c ube br

  • wsing (le ss joins).

14

T he c halle nge s in e xploiting mate r ialize d vie ws a re like in using indic e s.

– Ide ntify the vie ws to mate r

ialize .

– E

xploit the mate r ialize d vie ws to answe r que r ie s.

– E

ffic ie ntly upda te the ma te r ia lize d vie ws during loa d and r e fr e sh.

T he c ur r e ntly industr ia l solution- Vie ws c onsist of joins of the fa c t table with a subse t

  • f dime nsion ta ble s. But it’s a little more c omple x,

if using the snowflake sc he ma . Why ? ? ( muc h joins )

Challenges of Materialized views

15

I mplementation of the OLAP Server

ROL AP: Re la tiona l OL AP – da ta is store d in table s in r e lational database or e xte nde d- r e lational database s. T he y use an RDBMS to manage the war e house data and aggr e gations using ofte n a star sc he ma.

  • T

he y suppor t e xte nsions to SQL . Advantage : Sc alable . Disadvantage : No dir e c t ac c e ss to c e lls.

16

I mplementation of the OLAP Server

MOL AP: Multidime nsiona l OL AP - imple me nts the multidime nsional vie w by stor ing data in spe c ia l multidime nsiona l da ta struc ture s. Advantage : F ast inde xing to pr e -c ompute d aggr e gations. Only value s are stor e d. Disadvantage : Not ve r y sc alable .

17

  • Administr

ative me tadata − Sour

c e data base and the ir c onte nts

− Ba c k- e nd a nd front- e nd tools − De finitions of the wa re house sc he ma − Pr

e - de fine d que rie s and r e por ts

− Da ta ma rk loc ations a nd c onte nts − Data r

e fr e sh and purging polic ie s

− Use r profile s a nd use r a c c e ss c ontr

  • l polic ie s

Metadata requirements

18

  • Busine ss me ta data

– Busine ss te r

ms a nd de finitions

– Owne r

ship of data

– Charging polic ie s

  • Ope r

ational me tadata – Data line age : histor

y of migr a te d data and se que nc e of tra nsforma tions a pplie d

– Curre nc y of da ta : a c tive , a rc hive d, purg e d – Monitoring informa tion: wa re house usa g e

statistic s, e r r

  • r r

e por ts, audit trails

Metadata requirements

slide-4
SLIDE 4

4

19

Discussion

  • How doe s the he te rog e ne ity in da ta wa re house s

diffe r fr

  • m the topic s tha t we ’ve disc usse d in

data inte gration? What ar e some a pplic ations that you would use data inte gr a tion for? A data war e house ? Can you think of any a pplic ations for whic h “both” would be a g ood solution?

  • Do you think that mate r

ia lize d vie ws ar e mor e impor tant or

  • n an RDBMS? Why? Whic h one do

you think is e asie r to use mate r ia lize d vie ws in? why?

  • Do you think that star sc he mas are more use ful

in da ta war e house s than in RDMMSs? Why or why not?