A Generic Solution for A Generic Solution for Warehousing Business - - PowerPoint PPT Presentation

a generic solution for a generic solution for warehousing
SMART_READER_LITE
LIVE PREVIEW

A Generic Solution for A Generic Solution for Warehousing Business - - PowerPoint PPT Presentation

A Generic Solution for A Generic Solution for Warehousing Business Warehousing Business Process Data Process Data Malu Castellanos Malu Castellanos Joint work with Fabio Casati, Umesh Umesh Dayal, Norman Salazar Dayal, Norman Salazar


slide-1
SLIDE 1

A Generic Solution for A Generic Solution for Warehousing Business Warehousing Business Process Data Process Data

Malu Castellanos Malu Castellanos Joint work with Fabio Casati, Joint work with Fabio Casati, Umesh Umesh Dayal, Norman Salazar Dayal, Norman Salazar Hewlett Hewlett-

  • Packard Laboratories

Packard Laboratories VLDB 2007 VLDB 2007 September 25, 20007 September 25, 20007 Vienna Vienna

slide-2
SLIDE 2
  • !""

!""

#$ #$

  • $"$

$"$

$ $

  • %&$$

%&$$

slide-3
SLIDE 3

'()$ '()$

#$*$ #$*$ " " +,-.&' +,-.&' #$*$ #$*$ " " +,-.&' +,-.&'

  • /

/$)$0 $)$0

slide-4
SLIDE 4

1)$23 1)$23

+$" +$"4 4) )4 4 " "

  • 5( '.

5( '.

+ ) " + ) "

  • 6$

6$

7" 7" $ $ " "

  • ) ""

) ""

  • &

& )$ )$

slide-5
SLIDE 5

1)$28 1)$28

7" $" 7" $"

  • )$)

)$)

  • %&9':" ""(

%&9':" ""(

"

+"" :):

slide-6
SLIDE 6

1)$2; 1)$2;

1 14 4" " " " <$'. <$'.

  • =)$""

=)$"" $ $

  • )$

)$

>?" " >?" "4 4) )

slide-7
SLIDE 7

Objective Objective

+$$" +$$" ()$) ()$)

  • ?))$

?))$

  • ) " $

) " $ "$ "$

  • %"

%" @' @'A A

  • '.$

'.$

slide-8
SLIDE 8

Main solution ingredients Main solution ingredients

# () # ()

  • 5 ")

5 ")

  • $ ))"

$ ))"

B,)%) B,)%)

  • % $"

% $"

  • 1"

1"

  • """

"""4 4 )$ )$

  • )

)

  • =(:)$)(

=(:)$)( $ $

B5&' B5&'

  • :"$

:"$

#"'$ #"'$

  • !$$ "?

!$$ "?

slide-9
SLIDE 9

Main ingredients Main ingredients

  • 5 ")

5 ")

  • $ ))"

$ ))"

B,)%) B,)%)

  • % $"

% $"

  • 1"

1"

  • """

"""4 4 )$ )$

  • )

)

  • =(:)$)(

=(:)$)( $ $

B5&' B5&'

  • :"$

:"$

#"'$ #"'$

  • !$$ "?

!$$ "?

slide-10
SLIDE 10

Common reporting requirements Common reporting requirements

Process metrics: based on process progression Process metrics: based on process progression data data

Process statistics Process statistics Time intervals Time intervals Path & outcomes Path & outcomes Correlation with previous step Correlation with previous step

Resource metrics Resource metrics

Performance of resources Performance of resources Correlation between resources and process metrics Correlation between resources and process metrics

Business data metrics Business data metrics

Correlation of business data with process data Correlation of business data with process data Correlation of business data with resources Correlation of business data with resources

Defined & computed on abstracted versions of a Defined & computed on abstracted versions of a process process

slide-11
SLIDE 11

Main ingredients Main ingredients

# () # ()

  • 5 ")

5 ")

  • $ ))"

$ ))"

  • % $"

% $"

  • 1"

1"

  • """

"""4 4 )$ )$

  • )

)

  • =(:)$)(

=(:)$)( $ $

B5&' B5&'

  • :"$

:"$

#"'$ #"'$

  • !$$ "?

!$$ "?

slide-12
SLIDE 12

Process warehouse model Process warehouse model

Challenges for a generic model Challenges for a generic model

– – Multi Multi-

  • level instance data

level instance data

Step level facts, process instance level facts, data Step level facts, process instance level facts, data-

  • related facts

related facts Facts may have to be self Facts may have to be self-

  • correlated

correlated

– – Business data complexities Business data complexities

Different from process to process Different from process to process Complex structures Complex structures Can change at every step during the process Can change at every step during the process

  • representation hard to generalize

representation hard to generalize

– – Process and steps executions go through a lifecycle Process and steps executions go through a lifecycle

Step status changes (created, activated, completed, etc Step status changes (created, activated, completed, etc --

  • ->

> process events mark progression); num of states can be process events mark progression); num of states can be unlimited (suspend/reactivate) unlimited (suspend/reactivate) Different systems supporting the execution have different Different systems supporting the execution have different lifecycle phases lifecycle phases

slide-13
SLIDE 13

Main elements of the generic Main elements of the generic warehouse model warehouse model

Single granularity for steps (rather than at the Single granularity for steps (rather than at the level of status changes) level of status changes) Single fact table for any step of any process Single fact table for any step of any process

– – Enables analyses across processes Enables analyses across processes – – Includes aggregation of most common step event Includes aggregation of most common step event measures measures

Correlation with previous step data handled via Correlation with previous step data handled via additional columns additional columns Separate business data tables for each process Separate business data tables for each process type type Blind links to handle step/process correlation Blind links to handle step/process correlation with business data with business data

slide-14
SLIDE 14

Process warehouse schema Process warehouse schema

Calendar_Date Calendar_Date_Key System_Date Day_of_W eek Day_Num ber_O f_W eek Day_Num ber_O f_Month Day_Num ber_In_Year W eek_Ending_Date W eek_Number_In_Month Employee Employee_Key Employee_UEID F irst_Name Last_Name Display_Name Employee_email Employee_Phone Process_Instance W ork_O bject_Key W ork_O bject_Id Process_Start_Date_Key (FK) Process_End_Date_Key (F K) Process_Duration Process_Start_Time_Key (FK) Process_End_Time_Key (F K) Process_Instance_Status_Key (F K) Entity_Table_Key Process_Instance_Status_Info Process_Instance_Status_Key Has_Exception_O ccurred Process_Instance_Status Proc_Inst_Last_Step_Created Proc_Inst_Last_Step_Completed Process_Type Process_Type_Key Process_Name Entity_Data_Table_Name Process_G roup Q ueue Q ueue_Key Q ueue_Type Q ueue_Name Task_Action_Info Task_Action_Key Task_Action Task_Action_Reason Task_Execution Top_W ork_O bject_Key (F K) Task_Type_Key (F K) Process_Type_Key (F K) Prev ious_Task_Type_Key (F K) Prev ious_Task_Action_Id (F K) State_New_Time_Id (F K) State_Active_Time_Id (FK) State_New_Date_Key (F K) State_Active_Date_Key (F K) Task_Action_Key (F K) New_To_Ended_Duration_Secs New_To_Active_Duration_Secs Activ e_To_Ended_Duration_Secs Task_F lag_Key (F K) Entity_Data_Key This_Q ueue_Key (F K) Prev ious_Q ueue_Key (FK) F inal_Proc_Key (F K) Initial_Proc_Key (F K) Prev ious_F inal_Proc_Key (F K) State_Ended_Date_Key (F K) State_Ended_Time_Key (F K) Message_Sent_Count Task_F lags Task_F lag_Key Is_F irst Is_Last Task_Status Has_Deadline_Expired Task_Type Task_Type_Key Task_Type_Desc Task_Max_Duration Entity_Data_Table_Name Time Tim e_Key Tim estamp_In_Seconds Hour Minutes Seconds AM _PM_F lag Invoice_Business_Data Inv

  • ice_Key

Inv

  • ice_Num ber

Receiv e_Date_Key Inv

  • ice_Status_Info_Key

Business_Center_Key Payment_Terms_Key Inv

  • ice_Reversal_Date_Key

Scan_Date_Key Inv

  • ice_Date_Key

Due_Date_Key Posting_Date_Key BaseLine_Date_Key Payment_Date_Key Vendor_Key Transaction_Currency_Key

slide-15
SLIDE 15

Main ingredients Main ingredients

# () # ()

  • 5 ")

5 ")

  • $ ))"

$ ))"

B,)%) B,)%)

  • % $"

% $"

  • 1"

1"

  • """

"""4 4 )$ )$

  • =(:)$)(

=(:)$)( $ $

B5&' B5&'

  • :"$

:"$

#"'$ #"'$

  • !$$ "?

!$$ "?

slide-16
SLIDE 16

Mapping events to abstract processes Mapping events to abstract processes

Two facets to provide abstracted process Two facets to provide abstracted process representations representations

1. 1. A way to A way to model model the abstraction the abstraction

Describe the high level process Describe the high level process Describe how its progression maps to underlying Describe how its progression maps to underlying IT events IT events

2.

  • 2. ETL mechanism to load warehouse with

ETL mechanism to load warehouse with abstracted process execution data abstracted process execution data

slide-17
SLIDE 17

Modeling abstract processes Modeling abstract processes

Describe the process flow & relevant biz data Describe the process flow & relevant biz data Specify how abstracted biz data is populated & Specify how abstracted biz data is populated & maintained maintained

– – Mappings between IT events and biz data Mappings between IT events and biz data – – Correlation logic between events and business data Correlation logic between events and business data instances & indirectly to correct process instance instances & indirectly to correct process instance

Specify how process progression is computed Specify how process progression is computed

– – Mappings between changes to business data and Mappings between changes to business data and start and completion of process steps start and completion of process steps

Associate steps to resources based on Associate steps to resources based on mappings to business data mappings to business data HP HP-

  • BPI

BPI

: 5 " )$

  • '

:

slide-18
SLIDE 18

Invoice (abstract data)

IT Event captured by a probe Progression information maps to process data changes Audit Msg

Invoice_ID=123 Amount=$100 response=OK

Mapping from IT event to Mapping from IT event to process progression process progression

IT event maps to process data changes

slide-19
SLIDE 19

Why indirect mapping of IT events to Why indirect mapping of IT events to process progression through changes process progression through changes to business data? to business data?

Many different events may cause the same Many different events may cause the same change to a business data item change to a business data item Same business data can be used to support and Same business data can be used to support and mark progression of instances of different mark progression of instances of different process types process types In practice, for abstract processes the In practice, for abstract processes the progression often depends on biz data changes progression often depends on biz data changes Benefits Benefits

– – Reduces specification & maintenance effort Reduces specification & maintenance effort – – Specs are more robust to changes in the info sources Specs are more robust to changes in the info sources (event specs updated but no need for biz data or (event specs updated but no need for biz data or progression info) progression info)

slide-20
SLIDE 20

Main ingredients Main ingredients

# () # ()

  • 5 ")

5 ")

  • $ ))"

$ ))"

B,)%) B,)%)

  • % $"

% $"

  • 1"

1"

  • """

"""4 4 )$ )$

  • )

)

  • =(:)$)(

=(:)$)( $ $

  • :"$

:"$

#"'$ #"'$

  • !$$ "?

!$$ "?

slide-21
SLIDE 21

Loading process data Loading process data

Modeling specs used by the ETL to map Modeling specs used by the ETL to map across levels of abstraction across levels of abstraction

IT events captured with probes and logged with IT events captured with probes and logged with timestamps timestamps ETL reads event tables in logs and orders them by ETL reads event tables in logs and orders them by time time Events are mapped to biz data changes Events are mapped to biz data changes Biz data changes are Biz data changes are ‘ ‘replayed replayed’ ’ in order and in order and relevant changes are detected for computing relevant changes are detected for computing process progression process progression Process progression creates records for the step Process progression creates records for the step execution data which are loaded into the execution data which are loaded into the warehouse warehouse

slide-22
SLIDE 22

Web server

Application server

ERP system Message broker … probe probe probe Data extraction Process progression and business data Event log Event log ERP log Message broker log Event log Business data changes Process mapping

Process to data mapping defs

probe probe

Data to event mapping defs

Event data Business data mapping Staging Area

Extraction & abstraction of process data Extraction & abstraction of process data

slide-23
SLIDE 23

ETL generation ETL generation

Automates Automates staging area staging area creation & creation & maintenance maintenance Automates generation of executable Automates generation of executable transformation scripts transformation scripts

– – Indirection of mappings from IT events to Indirection of mappings from IT events to process progression process progression Two Two-

  • phased

phased transformation stage transformation stage

Phase 1: IT events mapped to biz data changes Phase 1: IT events mapped to biz data changes Phase 2: biz data changes mapped to process Phase 2: biz data changes mapped to process progression progression

slide-24
SLIDE 24

Staging area Staging area

Three types of tables Three types of tables

Landing tables Landing tables

– – Buffering of extracted IT events data Buffering of extracted IT events data – – Checks for errors in the extraction Checks for errors in the extraction – – Refreshes at every cycle Refreshes at every cycle

Image tables Image tables

– – Keep an image of the IT events records extracted since the first Keep an image of the IT events records extracted since the first extraction extraction – – Input to first transformation phase Input to first transformation phase

Comparisons between landing & image tables Comparisons between landing & image tables

– – To detect duplicates To detect duplicates – – Determine manipulation operation (I, d, u) Determine manipulation operation (I, d, u)

Intermediate tables Intermediate tables

– – Output of first transformation phase Output of first transformation phase – – Business data changes Business data changes – – Input to second transformation phase Input to second transformation phase

slide-25
SLIDE 25

Intermediate tables Intermediate tables

Alternative design: 2 separate ETL Alternative design: 2 separate ETL processes but processes but … …

– – Inefficient Inefficient

Extraction and staging of business data changes Extraction and staging of business data changes Additional tables to keep all biz data changes to Additional tables to keep all biz data changes to mark process progression mark process progression

– – DW only stores the last version of a biz data instance DW only stores the last version of a biz data instance

%$$

  • :

: / ' +,

  • +

)$ %$$

  • 5

& 5

' +, ' $

  • &
slide-26
SLIDE 26

ETLTransformation ETLTransformation phases phases

  • É
  • !

"" ! "

  • ""

#"$

  • %""&
slide-27
SLIDE 27

Executable mapping generation Executable mapping generation

How to execute the transformations? How to execute the transformations?

– – Agnostic to underlying tool Agnostic to underlying tool – – Modeling: Modeling: declarative declarative mappings mappings – – Mapping Generator derives Mapping Generator derives prescriptive prescriptive mappings mappings

Two phases Two phases

– – Prescriptive Prescriptive logical logical mappings mappings Canonical language to express executable Canonical language to express executable semantics (pseudo semantics (pseudo-

  • SQL)

SQL) – – Prescriptive Prescriptive executable executable mappings mappings Specific translators (or manually) Specific translators (or manually) – – Orthogonal to the two transformation phases Orthogonal to the two transformation phases

slide-28
SLIDE 28

Mapping Generator Mapping Generator

Core: mapping templates Core: mapping templates

– – Parameterized logical scripts in canonical language Parameterized logical scripts in canonical language

Capture executable semantics Capture executable semantics

– – Factor out commonalities of mapping between the layers of Factor out commonalities of mapping between the layers of abstraction abstraction – – Exploits DW semantics Exploits DW semantics – – Captures other correspondences not specified by the Captures other correspondences not specified by the declarative mapping (e.g., duration) declarative mapping (e.g., duration)

Parameters: event Parameters: event-

  • , biz entity

, biz entity-

  • , process step

, process step-

  • related

related Templates instantiated by declarative mappings Templates instantiated by declarative mappings Different template types (e.g., Different template types (e.g., bizEntity_to_endStep bizEntity_to_endStep) ) Not executable Not executable Canonical language translator Canonical language translator

slide-29
SLIDE 29

Mapping generation phases Mapping generation phases

  • ""
  • ""

"" ' "" "" %"" " &

  • &
  • (

( '

  • )
  • (
  • *

#"

  • "

"

slide-30
SLIDE 30

Main ingredients Main ingredients

# () # ()

  • 5 ")

5 ")

  • $ ))"

$ ))"

B,)%) B,)%)

  • % $"

% $"

  • 1"

1"

  • """

"""4 4 )$ )$

  • )

)

  • =(:)$)(

=(:)$)( $ $

B5&' B5&'

  • :"$

:"$

  • !$$ "?

!$$ "?

slide-31
SLIDE 31

Prototyping Prototyping

Co Co-

  • development

development

– – Source data not available until very late Source data not available until very late – – Sources and data stores change frequently Sources and data stores change frequently – – Wrong reporting requirements initially Wrong reporting requirements initially – – Hard to begin BI development test before Hard to begin BI development test before completion of source application completion of source application – – Essential to rapidly prototype warehouse Essential to rapidly prototype warehouse solution solution

slide-32
SLIDE 32

Prototyping via emulation Prototyping via emulation

Testing requirements Testing requirements

– – Realistic data generation Realistic data generation – – Flexibility to simulate different conditions (e.g., Flexibility to simulate different conditions (e.g., resource unavailability, poor performances) resource unavailability, poor performances) – – Actually test the complete ETL process Actually test the complete ETL process – – Only by emulating the process Only by emulating the process-

  • based

based application application

slide-33
SLIDE 33

Emulation Emulation

Emulation environment that supports Emulation environment that supports

– – Events and data in the sources generated according Events and data in the sources generated according to correct process logic to correct process logic – – Data on resources that contribute to the step Data on resources that contribute to the step executions and correctly correlated to step execution executions and correctly correlated to step execution – – Meaningful business data associated with the process Meaningful business data associated with the process

Two main components Two main components

– – Process execution engine Process execution engine

Models process & controls its flow Models process & controls its flow

– – Data generator (web) service Data generator (web) service

Produces events of different types Produces events of different types

slide-34
SLIDE 34

Data generation Web service

getPoolValue(context, poolName, duration [, sql]) setEventGeneration(distribution, callbackURL, template )

Pool settings

getContext(distribution, callbackURL, template )

Emulated event data

Process emulation Process emulation

slide-35
SLIDE 35

Conclusions Conclusions

Our generic solution is original Our generic solution is original

– – Workflow analysis systems don Workflow analysis systems don’ ’t provide capabilities to t provide capabilities to

Generate a warehouse that is dependent of the business Generate a warehouse that is dependent of the business process process Collect & aggregate data coming from sources Collect & aggregate data coming from sources Support for process abstraction Support for process abstraction Support rapid prototyping Support rapid prototyping

– – Other mapping generation efforts exclusively match the Other mapping generation efforts exclusively match the users specified correspondences users specified correspondences

Solution can be implemented with a variety of Solution can be implemented with a variety of DBs DBs, ETL tools, reporting tools , ETL tools, reporting tools One caveat: abstraction can only apply when it is One caveat: abstraction can only apply when it is possible to associate process progression with IT possible to associate process progression with IT events events

slide-36
SLIDE 36

)? )?