DATA MANAGEMENT FOR BUSINESS INTELLIGENCE OLAP: On-Line Analytical - - PowerPoint PPT Presentation

data management for business intelligence
SMART_READER_LITE
LIVE PREVIEW

DATA MANAGEMENT FOR BUSINESS INTELLIGENCE OLAP: On-Line Analytical - - PowerPoint PPT Presentation

DATA MANAGEMENT FOR BUSINESS INTELLIGENCE OLAP: On-Line Analytical Processing Salvatore Ruggieri Computer Science Department, University of Pisa Master in Big Data Analytics and Social Mining BI Architecture 2 Cube, A. Albano WHICH DBMS FOR


slide-1
SLIDE 1

DATA MANAGEMENT FOR BUSINESS INTELLIGENCE OLAP: On-Line Analytical Processing

Salvatore Ruggieri

Computer Science Department, University of Pisa Master in Big Data Analytics and Social Mining

slide-2
SLIDE 2

Cube, A. Albano

BI Architecture

2

slide-3
SLIDE 3

Cube, A. Albano

WHICH DBMS FOR DATAWAREHOUSING?

3

slide-4
SLIDE 4

Cube, A. Albano

ON-LINE ANALYTICAL PROCESSING (OLAP)

  • An OLAP server provides a multidimensional view starting from a

datawarehouse

  • The multidimensional view can be navigated through pivot tables,

reports, 2-D or 3-D plots, or it can be queried using a query language (eg., MDX – MultiDimensional eXpressions)

data cube

DATA WAREHOUSE OLAP SERVER

slide-5
SLIDE 5

Cube, A. Albano 5

MULTIDIMENSIONAL MODEL (CUBE) The multidimensional model is useful to understand interactive data analysis, and how to improve the execution performance.

S1 S2 S3 * *

D3 D2 D1

P1 *

P3

P2

StoreId ProductId DateId

slide-6
SLIDE 6

Cube, A. Albano 6

2-D CUBE Fact Table 2-D Cube CROSS TABULATION M

slide-7
SLIDE 7

Cube, A. Albano 7

3-D CUBE Fact Table 3-D Cube

slide-8
SLIDE 8

Cube, A. Albano 8

CUBE OPERATOR: SLICE Sales SLICE FOR DateId = ‘D1’;

SLICE

slide-9
SLIDE 9

Cube, A. Albano 9

CUBE OPERATOR: DICE Sales DICE FOR DateId = ‘D1’ StoreId IN (‘S1’, ‘S2’);

DICE

slide-10
SLIDE 10

Cube, A. Albano 10

CUBE OPERATOR: PIVOT

Rotate: reorient the cube, visualization, 3D to series of 2D planes

PIVOT (Sales SLICE FOR DateId = ‘D1’);

SLICE PIVOT

slide-11
SLIDE 11

Cube, A. Albano 11

CUBE OPERATORS: ROLL-UP and DRILL-DOWN SALES ROLL-UP ON DateId

Roll-up aggregates data by dimension reduction or by navigating attribute hierarchy (Drill-down is the reverse of roll-up)

Hypothesis: one measure and aggregations by sum. (total Qty by ProductId and by StoreId)

slide-12
SLIDE 12

Cube, A. Albano 12

CUBE OPERATORS: ROLL-UP and DRILL-DOWN 0-D cube or Apex-cube

slide-13
SLIDE 13

Cube, A. Albano 13

CUBE OPERATORS: DRILL THROUGH

Drill-through produces the facts that satisfy a cell coordinate

slide-14
SLIDE 14

Cube, A. Albano

product

Branch manager look at sales

  • f his/her stores

for any product and any period Product managers look at sales

  • f some products

in any period and in any market Finance manager look at sales

  • f a period compared to the previous period

for any product and any market

time CUBE NAVIGATION BY DIFFERENT USERS

14

slide-15
SLIDE 15

Cube, A. Albano 15

TEXTUAL NOTATION FOR CUBE OPERATORS Sales(StoreId, ProductId, DateId) is the cube with dimensions StoreId, ProdottoId, DataId, and measure M A cube operation is denoted by substituting a dimension with a value Hypothesis: one measure and aggregations by sum.

slide-16
SLIDE 16

Cube, A. Albano 16

TEXTUAL NOTATION FOR CUBE OPERATORS (cont) dice slice dice Sales(StoreId, ProductId, ‘D1’) Sales(‘S1’, ProductId, ‘D1’) Sales(‘S1’, ‘P1’, ‘D1’)

slide-17
SLIDE 17

Cube, A. Albano 17

TEXTUAL NOTATION FOR CUBE OPERATORS (cont.) Each dimension domain is extended with the value “*”, that means summarize data (sum) by all the dimension values. Sales(StoreId, ProductId, *) Sales by roll-up on DateId with sum(M)

slide-18
SLIDE 18

Cube, A. Albano 18

CUBE OPERATORS: EXAMPLES Sales(StoreId, ProductId, DateId) = Sales(StoreId, ProductId, *) = Sales(StoreId, *, *) = Sales(*, *, *) =

slide-19
SLIDE 19

Cube, A. Albano

  • What is

19

CUBE OPERATORS: EXAMPLES

Sales(StoreId, ‘P1’, *) =

slide-20
SLIDE 20

Cube, A. Albano 20

EXTENDED CUBE A data cube is extended with the value ‘*’ for each dimensions, and in the corresponding cells is stored the sum of the measure.

slide-21
SLIDE 21

Cube, A. Albano 21

EXTENDED CUBE With the ‘*’ values, the cube becames a set of cuboids:

  • white cells are the data cube
  • gray cells are roll-up by a

dimension,

  • dark gray cells are roll-up by

two dimensions

  • black cells are roll-up by all

dimensions. S1 S2 S3 * * D3 D2 D1 P1 * P3 P2

slide-22
SLIDE 22

Cube, A. Albano 22

EXTENDED CROSS TABULATION CROSS TABULATION EXTENDED CROSS TABULATION

slide-23
SLIDE 23

Cube, A. Albano 23

DW LATTICE: A LATTICE OF CUBOIDES On the set of cuboids is defined the following partial order relation: C1 ≤ C2 if C1 dimensions are included in C2 dimensions.

slide-24
SLIDE 24

Cube, A. Albano

HOW MANY CUBOIDS? HOW MANY CELLS?

  • D = {d1, …, dN} dimensions (degenerate or flat)
  • 2N cuboids
  • Let #di = number of values for dimension di
  • How many cells in total?

∑𝐷 𝐸↑▒∏𝑒∈𝐷↑▒#d = ∏𝑗=1..𝑂↑▒(#​𝑒↓𝑗 +1)

24

slide-25
SLIDE 25

Cube, A. Albano 25

CUBOIDS MATERIALIZATION Complete

Partial

slide-26
SLIDE 26

Cube, A. Albano 26

AGGREGATION FUNCTIONS TYPES

Distributive E.g., sum(), min(), max(), count() sum(V) = sum(V1) + sum(V2) sum({v}) = v count(V) = count(V1) + count(V2) count({v}) = 1

V = V1 ∪ V2 V1 ∩ V2 = ∅

Holistic E.g., median(), mode(), rank(). Algebraic E.g., avg(), standard_deviation() avg(V) = sum(V)/count(V) var(V) = sum(V2) – sum(V)2 / count(V) sum({v}2) = v*v count(V)-1

slide-27
SLIDE 27

Cube, A. Albano 27

CUBOIDS MATERIALIZATION Complete

Partial

If the materialization is partial, which cuboids do we select?

slide-28
SLIDE 28

Cube, A. Albano 28

OLAP SYSTEMS SOLUTIONS

We will see how report developers use SQL to write queries, but there are business intelligence tools that allows a user or a developer to make data analysis and to build beautiful reports without any knowledge of SQL... which is generated automatically. OLAP refers to the technique of performing complex business analysis

  • ver the information stored in a data warehouse.
slide-29
SLIDE 29

Cube, A. Albano 29

OLAP SYSTEMS: SOLUTION 1

The DW is managed by a specialized RDBMS (Relational Data Server) The OLAP Client provides presentation and reporting tools to deal with data analysis and visualization, and interacts with the Data Server.

slide-30
SLIDE 30

Cube, A. Albano 30

OLAP SYSTEMS: SOLUTION 2

The OLAP Client interacts with an OLAP Server, that supports multidimensional data and operations, and can be one of the following type:

  • MOLAP, which stores in the local memory both the data cube, taken from the Data

Server, and the aggregates of the extended cube, using a specialized data structure. A MOLAP server does not support SQL, but MDX.

  • ROLAP which stores both the data and the aggregates of the extended cube in the

Data Server. ROLAP servers may also implement functionalities not supported in

the SQL of the Data server.

  • HOLAP which stores the data in the Data Server, and the aggregates of the

extended cube in the local memory.

slide-31
SLIDE 31

Cube, A. Albano

The ROLAP case

slide-32
SLIDE 32

Cube, A. Albano 32

OLAP SYSTEMS: SOLUTION 3

The OLAP client interacts with a local DOLAP system (Desktop OLAP) which manages small amount of data extracted from the OLAP server, the Data server or an operational DBMS. It a good choice for those who travel and move extensively, by using portable computers. E.g., Microsoft Power Pivot (Add-in of Excel)

slide-33
SLIDE 33

Cube, A. Albano 33

MULTIDIMENSIONAL MODEL (CUBE) The multidimensional model is useful to understand interactive data analysis, and how to improve the execution performance.

S1 S2 S3 * *

D3 D2 D1

P1 *

P3

P2

StoreId ProductId DateId

slide-34
SLIDE 34

Cube, A. Albano 34

OLAP SYSTEMS: SOLUTION 3

The OLAP client interacts with a local DOLAP system (Desktop OLAP) which manages small amount of data extracted from the OLAP server, the Data server or an operational DBMS. It a good choice for those who travel and move extensively, by using portable computers. DEMO WITH Microsoft Power Pivot (Add-in of Excel)

slide-35
SLIDE 35

Cube, A. Albano

OLAP SYSTEMS

35

PowerBI