Value-driven Approach Designing Extended Data Warehouses Nabila - - PowerPoint PPT Presentation

value driven approach designing extended data warehouses
SMART_READER_LITE
LIVE PREVIEW

Value-driven Approach Designing Extended Data Warehouses Nabila - - PowerPoint PPT Presentation

Laboratoire dInformatique et dAutomatique pour les Systmes Value-driven Approach Designing Extended Data Warehouses Nabila BERKANI & Selma KHOURI Carlos ORDONEZ Ladjel BELLATRECHE ESI University of Houston LIAS/ISAE-ENSMA


slide-1
SLIDE 1

Value-driven Approach Designing Extended Data Warehouses

Laboratoire d’Informatique et d’Automatique pour les Systèmes DOLAP’2019, Lisbon, Marsh 26, 2019

Nabila BERKANI & Selma KHOURI

ESI Algiers, Algeria

(n_berkani, s_khouri)@esi.dz

Ladjel BELLATRECHE

LIAS/ISAE-ENSMA Poitiers, France

bellatreche@ensma.fr

Carlos ORDONEZ

University of Houston USA

carlos@central.uh.edu

slide-2
SLIDE 2

Impact of Big Data on DW

2 2017-Present 1999-2014 2015-Present 1998-2015

DaWak Conference

DOLAP Workshop

2016: Thinking

slide-3
SLIDE 3

30 years of existence: Maturity

3

Data Sources Requirements

Mappings

Multidimensional Modeling

Field

Origin

Year Author Discipline

Temp1 Temp2

Join Filte r Load to DSA Load to DSA Extract

Store

Extract-Transform-Load

Variety

Quest of Value Deployment

Extract

Exploitation

Store Relational

Mappings Sources/Requirements Instance Extraction DW Schema Definition Cross-phase design

Designers, Data Preparators, Architects Administrators, “Deployers”

☛ Actors of the Design

Data Analysts

☛ Actors of Exploitation

1. Design life-cycle well identified

  • 2. Diversity of Actors

àAugmentation of DW by Big Data Vs

slide-4
SLIDE 4

Agenda

4

qValue & Variety (2Vs) qAugmenting DW by Linked Open Data q2Vs-driven Design Approach qCase Study qSummary

slide-5
SLIDE 5

Value: # places

5

Decision Makers

Integration

  • N. W. Paton

Deployment Exploitation

Sources Requirements

Queries, Statistics, … Visualisation Decision Analysis

Valueà user feedback:

  • N. Konstantinou

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

CM Value

Value à offered services [D. Bork] Value à usage of modern architectures: Teradata Value à new programming paradigms: Spark

§ FR: Value à money [A. G. Sutcliffe’2018] § NfR: Value à satisfaction of qualities (security, privacy, …)

àinterdependencies between value (phases) & value (operational DW)

Value à integration of new resources (LOD, …)

àRecent efforts on building value ontologies:

  • T. P. Sales, F. A. Baião, G. Guizzardi, J. P. A. Almeida, N. Guarino, J. Mylopoulos: The Common Ontology
  • f Value and Risk. ER 2018: 121-135
slide-6
SLIDE 6

Value increases Variety

6

Person in Charge (PiC) of Value

VALEUR

q Examples of requirements related to value1 :

  • Media: Has the coverage of media changed over time?
  • Politics: Speeches EU parliament that contain « human

rights » by country

  • Finance: Evolution of Debates related to Greece crisis by

country

q Measurement of the value depends on the studied domain

➡ Interaction between designers and PiC of value: multidisciplinary in DW

à Usage of Linked Open Data :

  • Traditional Management of Variety

+

  • Variety of Formalisms (graphs)

VARIETY

Internal Sources External Sources

+

Designer

Libraries

DB News papers

1http://www.talkofeurope.eu/data

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

slide-7
SLIDE 7

Augmenting DW by 2Vs

7

Data Sources Requirements

Mappings

Multidimensional Modeling

Field

Origin

Year Author Discipline

Temp1 Temp2

Join Filte r Load to DSA Load to DSA Extract

Store

ETL (Variety) Deployment (Variety)

Extract

Exploitation (Value)

Mappings Sources/Requirements Instance Extraction DW Schema Definition Cross-phase design

Designers, Data Preparator, Architects Administrators, “Deployer”

☛ Actors of the Design

Data Analysts, “PiC of value”

☛ Actors of Exploitation

  • 2. Diversity of Actors

High Variety of Sources || Global Processing

Storen

Relational

Store1

Graph

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

+

slide-8
SLIDE 8

Formalisation

q Inputs: 1. Set of internal sources: SInt ={SI1, SI2, …, SIm} 2. Set of external resources: SExt = ={SE1, SE2, …, SEn} 3. Each source (internal/external) Si has:

§ Its own physical format (Fi) § Its conceptual model CMi § Is related to a discipline D (medicine, engineering, etc.)

4. Set of requirements to be satisfied 5. [Optional]: An operational DW ([Ravat et al. 2017]), where:

§ Its conceptual model CMDW § Its format(s) Format (SDW) = {f1, f2, …, fk} à polystore storage

q Objective:

§ Definition of all phases of DW augmenting its value

q Challenges:

§ Metrics of Value Value(DW)= Operator(1≤ i≤ n+m) [Weight(Si , D) * Value (Si)]; Si ∈ Sint ∪ Sext [Ballou et al.]*

*Ballou, D. P., & Tayi, G. K. (1999). Enhancing data quality in data warehouse environments. Com of the ACM, 42(1), 73-78

8

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

slide-9
SLIDE 9

# Scenarios

(b) Parallel Design (c) Query-driven Design

ETL

Users Requirements LOD Synchronise Internal Sources

ETL LOD

ETL

LOD Users Internal Sources

(a) Serial Design

Requirements DW

ETL

Requirements LOD Internal Sources Users

On-demand LOD graphs materialized

Graph Query MD Query

Pool of Results

Query Results visualized

Merge

Materialize

DW

LOD Cube Internal Cube

ETL-LOD

OLAP Queries

Data Cube

ED Extract

Temp1 Temp2

Join Filter Load to DSA Load to DSA Extract

Store

Internal ETL External ETL

Temp1 Temp2

Join Filter Load to DSA Load to DSA Extract

Store

9

3 Scenarios

☛ LOD is seen as source ☛ Two Parallel ETL

☛ On-demand ETL: data extracted from the existing DW and LOD,

then potentially loaded into DW à (requirement satisfaction)

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

1. Pivot schema : generic schema vs. LOD schema (graph) 2. Redefinition of operators (overloading) 3. Synchronisation of internal and external data: 3 scenarios

Challenges?

slide-10
SLIDE 10

Value Metrics

qThree metrics related to:

  • 1. Requirement satisfaction

𝑾𝒃𝒎𝒗𝒇(𝑺𝒇𝒓, 𝑻𝒋) = 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒔𝒇𝒕𝒒𝒑𝒐𝒕𝒇𝒕 𝒑𝒈 𝒔𝒇𝒓𝒗𝒋𝒔𝒇𝒏𝒇𝒐𝒖 𝒑𝒐 𝑻𝒋 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒔𝒇𝒕𝒒𝒑𝒐𝒕𝒇𝒕 𝒑𝒈 𝒃𝒎𝒎 𝒔𝒇𝒓𝒗𝒋𝒔𝒇𝒏𝒇𝒐𝒖𝒕

  • 2. Conceptual modelling (multidimensional concepts)

𝑾𝒃𝒎𝒗𝒇 𝑫𝒑𝒐𝒅𝒇𝒒𝒖𝒕, 𝑻𝒋 = 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒅𝒑𝒐𝒅𝒇𝒒𝒖𝒕 𝒑𝒈 𝑬𝑿 𝒕𝒅𝒊𝒇𝒏𝒃 𝒄𝒛 𝒋𝒐𝒖𝒇𝒉𝒔𝒃𝒖𝒋𝒐𝒉 𝑻𝒋 𝒖𝒑𝒖𝒃𝒎 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒖𝒊𝒇 𝑬𝑿 𝒅𝒑𝒐𝒅𝒇𝒒𝒖𝒕

  • 3. Target DW population

𝑾𝒃𝒎𝒗𝒇(𝑱𝒐𝒕𝒖𝒃𝒐𝒅𝒇𝒕, 𝑻𝒋) = 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒋𝒐𝒕𝒖𝒃𝒐𝒅𝒇𝒕 𝒑𝒈 𝑬𝑿 𝒄𝒛 𝒋𝒐𝒖𝒇𝒉𝒔𝒃𝒖𝒋𝒐𝒉 𝑻𝒋 𝒖𝒑𝒖𝒃𝒎 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒋𝒐𝒕𝒖𝒃𝒐𝒅𝒇𝒕 𝒑𝒈 𝒖𝒊𝒇 𝑬𝑿

Value(DW) = Operator(1≤ i≤ n+m) [weight(Si, D) * Value(Si)], where Si ∈ Sint ∪ SExt

10

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

slide-11
SLIDE 11

Case Study

11

§ 4 internal sources generated from LUBM benchmark § 15 initial requirements

☛ University Research Analysis

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

q Analysis:

§ 6 requirements are not satisfied by internal sources (Oracle 12c release 1) àExternal source: Dbpedia

slide-12
SLIDE 12

Experiments

12 Metrics Sources Dimensions/ Measures Value (S*)MD Value(S*)Req. Value (S*)Instances Instances Response time Internal Sources

6/1 31% 6% 10% 550K 1.1

Serial Design

10/7 71% 80% 94% 7,7x106 3.2

Parallel Design

11/8 73% 84% 85% 3,1x106 2.6

Query-driven design

12/8 74% 96% 84% 2,9x106 1.7

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

*All sources have the same weight *Operator: Avg Augmented Schema

slide-13
SLIDE 13

Summary

✔2Vs for the DW renaissance ✔Value = pool of multidisciplinary expertise ✔DW life cycle design revisited (new formalization) ✔3 augmented scenarios ☛Veracity & 2V ☛More automation (query rewriting) ☛Value Query Language (Thank Patrick)

13

§ Variety & Value § LOD & DW § 2Vs Design Approach § Case Study § Summary

Special issue on: Business Intelligence and Analytics for Value Creation in the Era of Big Data and Linked Open Data: International Journal of Information Management, Elsevier (Q1; IF=4.810)