overview multidimensional databases
play

Overview Multidimensional Databases Cubes: Dimensions, Facts, - PowerPoint PPT Presentation

Overview Multidimensional Databases Cubes: Dimensions, Facts, Measures OLAP queries Relational design Redundancy Strengths and weaknesses of the multidimensional model Original slides were written by Torben Bach Pedersen


  1. Overview Multidimensional Databases • Cubes: Dimensions, Facts, Measures • OLAP queries • Relational design • Redundancy • Strengths and weaknesses of the multidimensional model Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course 2 Why not ER model? The multidimensional model • One purpose • ER model: many purposes � Flexible � Data analysis � General • Better at that purpose • All types of data are “equal”, no difference between: � Less flexible � What is important � Not suited for OLTP systems � What just describes the important • ER models are large • More built in “meaning” � 50-1000 entities/relations � What is important � Hard to get an overview � What describes the important • ER models implemented in RDBMSes � What we want to optimize � Normalized databases spread information � easy for querying � When analyzing data, the information must be integrated again Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4

  2. The multidimensional model Cube Example • Data is divided into: � Facts � Dimensions • Facts are the important entity: a sale � Facts have measures that can be aggregated: sales price • Dimensions describe facts � A sale has the dimensions Product, Store and Time • Goal for dimensional modeling: Dimension � Surround facts with as much context (dimensions) as possible values � But you should not try to model all relationships in the data (unlike ER modeling!) Cell (aggregated measure) Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6 Cubes Dimensions • A “cube” may have many dimensions! • Dimensions: core of multidimensional databases � Theoretically no limit for the number of dimensions • Dimensions are used for � Typical cubes have 4-12 dimensions � Selection of data • But only 2-3 dimensions can be viewed at a time � Grouping of data at the right level of detail � Dimensionality reduced by queries via projection/aggregation • Dimensions consist of dimension values • A cube consists of cells � Product dimension values: “milk”, “cream”, … � A given combination of dimension values � Time dimension values: “1/1/2001”, “2/1/2001”,… � empty cell = no data for this combination • Dimension values may have an ordering � sparse cube: few non-empty cells � Used for comparing cube data across values � dense cube: many non-empty cells � Especially used for Time dimension � Cubes become sparse at high dimensionality Aalborg University 2007 - DWML course 7 Aalborg University 2007 - DWML course 8

  3. Dimensions Dimension Example • Dimensions have hierarchies with levels � Typically 3-5 levels (of detail) � Dimension values are organized in a tree structure ◆ Product : Product � Type � Category ◆ Store : Store � Area � City � County ◆ Time : Day � Month � Quarter � Year � Dimensions have a bottom level and a top level (ALL) • Levels may have attributes � Simple, non-hierarchical information � Day has Workday as attribute • Dimensions should contain much information � Time dimensions may contain holiday, season, events,… Schema Instance � Good dimensions have 50-100 or more attributes/levels Aalborg University 2007 - DWML course 9 Aalborg University 2007 - DWML course 10 Dimension Example (cont’) • Not necessarily total order • Why we need hierarchy in dimension values? • Can be partial order � Hint: Compare the following schemas from the user’s view • Why each dimension should contain many attributes? � Hint: What happens if the user cannot find holiday, season, T T events,… from the Time dimension? Category Year Week T Type Month Category T Day Product Product Type Product Schema Time Schema Product Product Schema A Product Schema B Aalborg University 2007 - DWML course 11 Aalborg University 2007 - DWML course 12

  4. Facts Types Of Facts • Event fact (transaction) • Facts represent the subject of the desired analysis � A fact for every business event (sale) � The “important” in the business that should be analyzed • “ Fact-less ” facts • A fact is most often identified via its dimension values � A fact per event (customer contact) � A fact is a non-empty cell � No numerical measures � Some models give facts an explicit identity � An event has happened for a given dimension value combination • Generally a fact should • Snapshot fact � Be attached to exactly one dimension value in each dimension � A fact for every dimension combination at given time intervals � Only be attached to dimension values in the bottom levels � Captures current status (inventory) • Cumulative snapshot facts � A fact for every dimension combination at given time intervals � Captures cumulative status up to now (sales in year to date) Aalborg University 2007 - DWML course 13 Aalborg University 2007 - DWML course 14 Granularity Measures • Granularity of facts is important � What does a single fact mean? • Measures represent the fact property that the users � Level of detail want to study and optimize � Given by combination of bottom levels � Example: total sales price � Example: “total sales per store per day per product” • A measure has two components • Important for number of facts � Numerical value : (sales price) � Scalability � Aggregation formula (SUM): used for aggregating/combining • Often the granularity is a single business transaction a number of measure values into one � Example: sale � Measure value determined by dimension value combination � Sometimes the data is aggregated ( total sales per store per day � Measure value is meaningful for all aggregation levels per product) • Most multidimensional models have measures � Might be necessary due to scalability • Generally, transaction detail can be handled � Except perhaps huge clickstreams etc. Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16

  5. Types Of Measures Schema Documentation Measure type Property Example Occurence Additive Can be aggregated Often occur in • No well-defined standard sales price over all dimensions event facts • Our own notation Semi-additive Cannot be Often occur in Seen to the right inventory � aggregated over snapshot facts T level corresponds to ALL � some dimensions - • Modeling and OLAP tools typically time have their own notation Non-additive Cannot be Occur in all average aggregated over types of facts sales price any dimensions Aalborg University 2007 - DWML course 17 Aalborg University 2007 - DWML course 18 (Relational) OLAP Queries OLAP Queries • Two kinds of queries Slice/Dice : Starting level selection, (City, Year, Product) � Navigation queries examine one dimension Year=2000 ◆ SELECT DISTINCT l FROM d [WHERE p] � Aggregation queries summarize fact data ◆ SELECT d1.l1,d2.l2,SUM(f.m) FROM d1,d2,f WHERE f.dk1=d1.dk1 AND f.dk2=d2.dk2 [AND p] Roll-up: get Drilll-down: GROUP BY d1.l1,d2.l2 overview more detail • Fast, interactive analysis of large amounts of data • Spreadsheet on a cube All Time Aalborg University 2007 - DWML course 19 Aalborg University 2007 - DWML course 20

  6. ROLAP MOLAP • Relational OLAP • Multidimensional OLAP • Data stored in relational tables • Data stored in special multidimensional data structures � Star (or snowflake) schemas used for modeling • Pros � SQL used for querying � Less storage use (“foreign keys” not stored) • Pros � Faster query response times � Leverages investments in relational technology • Cons � Scalable (billions of facts) � Up till now not so good scalability (changing) � Flexible, designs easier to change � Less flexible, e.g., cube must be re-computed when design � New, performance enhancing techniques adapted from MOLAP changes ◆ Indices, materialized views, special treatment of star schemas � Does not reuse an existing investment (but often bundled with RDBMS) • Cons � Not as open technology � Storage use (often 3-4 times MOLAP) � Response times Aalborg University 2007 - DWML course 21 Aalborg University 2007 - DWML course 22 HOLAP Relational Implementation • Hybrid OLAP • The cube is often implemented in an RDBMS • Detail data stored in relational tables (ROLAP) • Fact table stores facts � One column for each measure • Aggregates stored in multidimensional structures (MOLAP) � One column for each dimension (foreign key to dimension table) • Pros � Dimensions keys make up composite primary key � Scalable (as ROLAP) • Dimension table stores dimension � Fast (as MOLAP) � Integer key column (surrogate keys) • Cons � Don’t use production keys/codes! Why? � Complexity • Goal for dimensional modeling: surround the facts with as much context (dimensions) as we can • Granularity of the fact table is important � What does one fact table row represent? � Important for the size of the fact table � Often corresponding to a single business transaction (sale) � But it can be aggregated (sales per product per day per store) Aalborg University 2007 - DWML course 23 Aalborg University 2007 - DWML course 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend