Ontology-based Data Access for Extracting Event Logs from Legacy - - PowerPoint PPT Presentation

ontology based data access for extracting event logs from
SMART_READER_LITE
LIVE PREVIEW

Ontology-based Data Access for Extracting Event Logs from Legacy - - PowerPoint PPT Presentation

Ontology-based Data Access for Extracting Event Logs from Legacy Data The onprom Tool and Methodology D. Calvanese 1 , T. E. Kalayci 1 , M. Montali 1 , S. Tinella 2 1 KRDB Research Centre for Knowledge and Data, Free University of Bozen-Bolzano


slide-1
SLIDE 1

Ontology-based Data Access for Extracting Event Logs from Legacy Data The onprom Tool and Methodology

  • D. Calvanese1, T. E. Kalayci1, M. Montali1, S. Tinella2

1KRDB Research Centre for Knowledge and Data, Free University of Bozen-Bolzano (Italy) 2EBITmax srl (Italy)

1

slide-2
SLIDE 2

Table of contents

  • 1. Introduction
  • 2. Case Study and Motivation
  • 3. The onprom Tool and Methodology
  • 4. Conclusions and Future Work

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

2

slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Introduction

Organizations increasingly recognizing the importance of analyzing their business processes for

  • quality assurance
  • optimization
  • continuous improvement

Process Mining [1]

  • The most promising and effective framework to tackle this need
  • It is at the intersection of model-driven engineering and data science

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

4

slide-5
SLIDE 5

Introduction

Insights are automatically extracted from event data to represent the footprint of process executions inside the company to [2]

  • discover and enrich process models
  • provide operational support
  • check compliance
  • analyse bottlenecks
  • compare process variants
  • suggest improvements.

Plethora of process mining techniques and technologies in several application domains1

1http://tinyurl.com/ovedwx4

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

5

slide-6
SLIDE 6

Applicability

Applicability of process mining depends on two crucial factors

  • the availability of high-quality event data
  • the representation in a format that is understandable by process mining

algorithms

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

6

slide-7
SLIDE 7

First Setting

Adopting a business process or enterprise management system that logs cases, events and corresponding attributes explicitly

  • XESame [3], ProMimport [4], Disco2, Celonis3, and Minit4 support the

conversion from CSV or spreadsheet files into XES

  • Techniques for the extraction of event logs from redo-logs of relational

databases [5]

  • Techniques that leverages the relational technology to access the event log

directly, instead of materializing it into XML [6].

2https://fluxicon.com/disco/ 3http://www.celonis.de/en/ 4http://www.minitlabs.com

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

7

slide-8
SLIDE 8

Second Setting

Adopting a more general management system, configuring it for its own specific needs, and combining it with domain-specific databases and other legacy data sources

  • Cases and events may not be explicitly stored, but instead implicitly present

inside the company information system

  • Not a single notion of case and related events and they change depending on

the perspective of interest, focus of the company, etc.

  • Not enough techniques, methodologies and tools that support domain experts

and process analysts in a such setting

  • In that case
  • Logs are extracted manually (like extract-transform-load (ETL))
  • This is a redundant, labor intensive and error prone process

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

8

slide-9
SLIDE 9

Our Proposal

  • An approach based on conceptual modeling to semi-automatize the extraction
  • f event logs from legacy information systems by leveraging the technique first

presented in [7]

  • In our approach (onprom), humans only focus on the conceptual issues involved

in the extraction:

  • Which are relevant concepts and relations?
  • How do such concepts/relations map to the underlying information system?
  • Which concepts/relations relate to the notion of case, event, and event

attributes?

  • Once this information is provided the log extraction process is handled in a fully

automatized way, leveraging the ontology-based data access (OBDA) [8, 9, 10] paradigm.

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

9

slide-10
SLIDE 10

Case Study and Motivation

slide-11
SLIDE 11

Case Study

EBITmax5 provides consultancy services in program management and BPM for a number of small and large enterprises.

  • Recently, they incorporated process mining to complement their standard consultancy

services, enriching and comparing models with fine-grained insights automatically extracted from data, and accounting for how business processes are executed in reality.

  • They run a pilot project on the service provisioning and financial processes of Markas6:

The analysis of Accounts Payable Process (App), which is used by Markas to handle payments to external suppliers and corresponding invoices.

  • For the internal management of the App, Markas does not employ a workflow

management system, but relies on shared guidelines on how to handle payments, and

  • n an ERP system to track the executed operations.
  • Markas management would like to understand if the App is executed as expected and,

if not, where do deviations appear for the orders created in 2015.

5http://www.ebitmax.it 6http://www.markas.com/en/home.html

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

11

slide-12
SLIDE 12

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Preparing Conceptual Data Model First step is preparation of the conceptual data model that accounts for the data maintained in the ERP at a higher level of abstraction

  • To discuss with managers and domain experts about the semantics of such data
  • Provides the basis to understand where and how they are stored within the ERP

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

12

slide-13
SLIDE 13

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Choosing a perspective The second step consists in combining the research question with the data model, so as to choose a perspective for the analysis, and in particular deciding:

  • 1. the subject of the analysis, i.e., which notion of case to adopt

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

13

slide-14
SLIDE 14

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Choosing a perspective The second step consists in combining the research question with the data model, so as to choose a perspective for the analysis, and in particular deciding:

  • 1. the subject of the analysis, i.e., which notion of case to adopt
  • 2. which relevant events should be considered in the evolution of cases

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

14

slide-15
SLIDE 15

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Choosing a perspective The second step consists in combining the research question with the data model, so as to choose a perspective for the analysis, and in particular deciding:

  • 1. the subject of the analysis, i.e., which notion of case to adopt
  • 2. which relevant events should be considered in the evolution of cases
  • 3. which event attributes should be included

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

15

slide-16
SLIDE 16

Data Model Excerpt of App

0..1 0..1 0..1 0..1 PO poNo : int poCrTime : ts ActivePO subTime : ts TD tdNo : int regTime : ts Invoice invNo : int invCrTime : ts payTime : ts [0..1]

refers to is for

Case Case Event SubmitOrder Timestamp: subTime Case: this Event SubmitOrder Timestamp: subTime Case: this Event GetTD Timestamp: regTime Case: refers to− Event GetTD Timestamp: regTime Case: refers to− Event RegisterInvoice Timestamp: invCrTime Case: is for Event RegisterInvoice Timestamp: invCrTime Case: is for Event PaySupplier Timestamp: payTime Case: is for Event PaySupplier Timestamp: payTime Case: is for

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

16

slide-17
SLIDE 17

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Manual Construction of Data EBITmax started a fine-grained analysis of the ERP system and its underlying database to extract the desired information manually:

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

17

slide-18
SLIDE 18

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Manual Construction of Data EBITmax started a fine-grained analysis of the ERP system and its underlying database to extract the desired information manually:

  • 1. identification of the relevant tables

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

18

slide-19
SLIDE 19

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Manual Construction of Data EBITmax started a fine-grained analysis of the ERP system and its underlying database to extract the desired information manually:

  • 1. identification of the relevant tables
  • 2. creation of filter views that maintain, and clean up, the relevant information

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

19

slide-20
SLIDE 20

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Manual Construction of Data EBITmax started a fine-grained analysis of the ERP system and its underlying database to extract the desired information manually:

  • 1. identification of the relevant tables
  • 2. creation of filter views that maintain, and clean up, the relevant information
  • 3. merging of filter views into composite views that provide a higher level of

abstraction

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

20

slide-21
SLIDE 21

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Manual Construction of Data EBITmax started a fine-grained analysis of the ERP system and its underlying database to extract the desired information manually:

  • 1. identification of the relevant tables
  • 2. creation of filter views that maintain, and clean up, the relevant information
  • 3. merging of filter views into composite views that provide a higher level of

abstraction

  • 4. creation of a single log view that coherently rearranges the information present

in accordance with the chosen perspective for case, events, and attributes.

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

21

slide-22
SLIDE 22

Traditional Methodology

Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N

Log Extraction and Process Mining Finally, EBITmax converted the log view into a CSV file, and analysed it using the Disco process mining toolkit7.

7https://fluxicon.com/disco/

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

22

slide-23
SLIDE 23

Results and Experienced Issues

Results

  • One of the most interesting, and quite common, deviations: orders submitted

and paid without registering the transport document

  • Markas realized (by looking at the data) that it was caused by the introduction
  • f digital invoices in the Italian market

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

23

slide-24
SLIDE 24

Results and Experienced Issues

Results

  • One of the most interesting, and quite common, deviations: orders submitted

and paid without registering the transport document

  • Markas realized (by looking at the data) that it was caused by the introduction
  • f digital invoices in the Italian market

Experienced Issues

  • The manual creation of views requires a detailed knowledge of the ERP tables

and is a demanding and error-prone task.

  • Whenever the perspective of the analysis is changed, it is necessary to go

through all the data preparation phases again. This contrasts with the process mining best practices:

  • 1. quality assurance of the input event log
  • 2. feasibility of quickly going through several batches of analysis by changing

perspective on the company’s data

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

24

slide-25
SLIDE 25

The onprom Tool and Methodology

slide-26
SLIDE 26

The onprom Methodology

  • We assume the existence of a legacy information system I = R, D, with

schema R and set D of facts about the domain of interest.

  • In the typical case where the information system is a relational database
  • R accounts for the schema of the tables and their columns
  • D is a set of data structured according to such tables
  • On top of I, our methodology is centered on the usage of conceptual models in

two respects:

  • 1. As documentation artifacts that explicitly capture not only knowledge about the

domain of interest, but also how legacy information systems relate to that knowledge.

  • 2. As computational artifacts to automatize the extraction process as much as

possible.

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

26

slide-27
SLIDE 27

The onprom Methodology

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

Conceptual data model T For the structural knowledge of the domain of interest

  • UML class diagrams as a concrete language and
  • corresponding logic-based, formal encoding in terms of OWL 2 QL

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

27

slide-28
SLIDE 28

The onprom Methodology

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

The mapping specification M

  • Explicitly links I to T , consists of a set of logical implications that map

patterns of data over schema R to high-level facts over T

  • Patterns over the data D are expressed as queries over R (e.g., SQL SELECT

statements, when R is relational),

  • while facts over T are expressed as logical terms involving objects

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

28

slide-29
SLIDE 29

Mapping Specification Examples

  • rder/{oid } poNo {oid } .

SELECT No AS oid FROM Markas-Purchase WHERE No LIKE ‘15%’

Each value oid that is stored in the No column of Markas-Purchase and that begins with number 15, corresponds to an object term order/oid in the ontology

  • rder/{oid } rdf:type ActivePO .

SELECT No AS oid FROM Markas-Purchase WHERE No LIKE ‘15%’ AND Posting Date IS NOT NULL

Each order tuple in Markas-Purchase identified by value oid is mapped to an

  • bject order/oid of type ActivePO in T , whenever its posting date has a non-null

value.

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

29

slide-30
SLIDE 30

Ontology-based Data Access

  • I, T , M constitutes an OBDA system
  • a user can pose a conceptual query Q (using sparql) over T , the OBDA

system

  • 1. leverages T and M to automatically reformulate Q as a corresponding concrete

query Q′ over I;

  • 2. submits Q′ to I; and
  • 3. automatically translates the so-obtained answers into meaningful answers over

the vocabulary of T .

  • This approach is conceptually identical to the one in which the mapping M is

used a l` a ETL to materialize data from D as facts over T , with the advantage that:

(i) users do not need to code procedures for data extraction (ii) data are not replicated (iii) data are retrieved using the standard query engine of the information system

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

30

slide-31
SLIDE 31

Ontology-based Data Access

. . . . . . . . . . . .

Ontology global vocabulary conceptual view Mappings how to populate the ontology Data Sources external and heterogeneous query result

Logical transparency in accessing data: does not know where and how data is stored; can only see a conceptual view of data.

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

31

slide-32
SLIDE 32

Annotating the Data Model

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

Once the OBDA system is ready, user should choose a perspective, which corresponds to annotate T with a set L of event data annotations:

  • 1. to indicate which class in T (possibly with additional restrictions) represents a

case,

  • 2. which events are present in T and to which classes they refer,
  • 3. which attributes are attached to events, and where they are located in T .

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

32

slide-33
SLIDE 33

Annotating the Data Model

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

Case Annotation

  • The case annotation specifies the class that constitutes the reference point for

the analysis

  • Each object instantiating the case class represents an instance of the process

according to the chosen perspective, and provides the basis for correlating events

  • The set of all events referring to the same case object form a trace for such an
  • bject
  • Additional restrictions on which instances to consider can be applied

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

33

slide-34
SLIDE 34

Annotating the Data Model

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

Event Annotation

  • Specifies that the annotated class provides information about the occurrence(s)
  • f a type of event that is relevant for the chosen perspective
  • To discover which classes in T may be subject to an event annotation, our

methodology combines two constraints:

  • 1. each event class has to be directly or indirectly linked to the case class
  • 2. each event class has to be directly or indirectly linked to a timestamp attribute,

providing the information on when instances of such an event occurred

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

34

slide-35
SLIDE 35

Annotating the Data Model

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

Attribute Annotations

  • They decorate event annotations with information about their features.

Mandatory attributes are:

  • case (how to reach the case class from the event class)
  • timestamp (how to reach the timestamp attribute for the event class)
  • activity (a constant string or a navigational path specifying the activity that

event refers)

  • Optional attributes are related to resources

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

35

slide-36
SLIDE 36

Automatic Log Extraction

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

An event log represented in XES standard is automatically extracted from I:

  • The mapping specification is combined with the annotations and a series of

queries is answered to find:

  • all the cases present in the information system
  • for each case, all the events referring to that case together with the

corresponding attribute values

  • Expert works at the conceptual level, no need to code ad-hoc views, utilizes the

multi-perspective process mining

  • Each new perspective requires new set of annotations, but T and M remain

unchanged

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

36

slide-37
SLIDE 37

The onprom Tool

  • We have developed a toolchain named onprom, consisting of various plug-ins of

the ProM extensible process mining framework8 [3]:

  • 1. a UML Editor, to design the domain ontology
  • 2. an Annotation Editor, allowing domain expert to specify the event data

annotations

  • 3. a Log Extractor, used to extract from the underlying database the XES event

log, based on the annotated domain ontology and the mapping specification.

  • They are implemented as separate projects in Java
  • They exchange data relying on the mechanisms built in ProM, when used as

ProM plug-ins

  • They also can be used as stand-alone tools that operate using files for input

and output (as an onprom toolchain)

8http://www.promtools.org/

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

37

slide-38
SLIDE 38

Annotated App Data Model in Annotation Editor

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

38

slide-39
SLIDE 39

Conclusions and Future Work

slide-40
SLIDE 40

Conclusions

  • We proposed a framework for the extraction of event logs from legacy

information systems and compared with manual extraction using a real case study

  • It comes with a methodology centered around conceptual models
  • to capture domain knowledge,
  • to link such knowledge to the underlying data, and
  • to annotate such knowledge with event-related information, reflecting the chosen

perspective for process mining.

  • The framework supported with a toolchain
  • to handle such conceptual models, and
  • automatically extract a XES event log in accordance with the chosen perspective,

leveraging ontology-based data access (OBDA) techniques.

  • The toolchain exploits OBDA features offered by ontop and is fully integrated

with the ProM process mining framework.

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

40

slide-41
SLIDE 41

Future Work

  • We are currently investigating (together with EBITmax) the concrete

application of onprom to the Markas case study

  • We are also doing some experiments on a conference submission system use

case to experiment onprom

  • We are actively working on extending the annotation editor
  • Currently, we rely to what is natively offered by the ontop for the specification
  • f mappings.
  • A natural next step is to manage that within our toolchain, leveraging recent

approaches on the graphical specification of mappings, developed within the recently concluded Optique EU Project9

9http://optique-project.eu

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

41

slide-42
SLIDE 42

Questions and Thanks

Web Site Please visit for more information, related papers, to download onprom and to watch screencasts: http://onprom.inf.unibz.it Acknowledgement

This research has been partially supported by the Euregio IPN12 “KAOS: Knowledge-Aware Operational Support” project, which is funded by the “European Region Tyrol-South Tyrol-Trentino” (EGTC) and by the UNIBZ internal project “OnProm”. We thank Ario Santoso for the development of the log extraction plug-in of

  • nprom, and Wil van der Aalst for the interesting discussions and insights on the problem of extracting event

logs from legacy information systems.

20th Int. Conf. on Business Information Systems 30 June 2017

  • D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella

42

slide-43
SLIDE 43

Additional Slides

slide-44
SLIDE 44

Bootstrapping

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

  • The creation of a suitable data model and mapping specification is a

labor-intensive and challenging task

  • If the information system has a “high-level” structure, such a phase can be

(partially) automatized through bootstrapping techniques [11]

  • Bootstrapping synthesize a conceptual data model that mirrors the structure of

the information system, together with suitable mappings

slide-45
SLIDE 45

Bootstrapping

high-level IS? Create data model Create mappings Bootstrap model + mappings Enrich model + mappings Choose per- spective Create case + event an- notations Export to XES/CSV Do process mining Other perspective? N Y Y N

  • The creation of a suitable data model and mapping specification is a

labor-intensive and challenging task

  • If the information system has a “high-level” structure, such a phase can be

(partially) automatized through bootstrapping techniques [11]

  • Bootstrapping synthesize a conceptual data model that mirrors the structure of

the information system, together with suitable mappings

  • The result of bootstrapping can be manually improved and enriched towards the

creation of the final OBDA system

slide-46
SLIDE 46

UML Editor

  • We have developed a graphical editor for UML class diagrams to provide

domain experts support for the design of ontologies expressed in OWL 2 QL. The editor can import standard OWL 2 QL ontologies to modify and enhance an already existing or independently developed OWL 2 QL ontology.

  • The developed UML class diagram can be saved in a proprietary JSON format

for further processing and as input for the Annotation Editor. It can also be exported as a standard OWL 2 QL ontology, hence ready to be processed by

  • ntop.
  • The graphical layout information is maintained in the form of OWL 2

annotations, resulting in an ontology fully compliant with the W3C standard.

slide-47
SLIDE 47

UML Editor

To maintain the UML Editor lightweight, and to guarantee at the same time that the designed UML class diagrams can indeed be expressed in OWL 2 QL, the following simplifying assumptions have been made on the form of UML class diagrams supported by the tool:

  • 1. we do not support completeness of UML generalization hierarchies, since the

presence of such construct would fundamentally undermine the virtual OBDA approach based on query reformulation [9]

  • 2. in line with Semantic Web languages, we support explicitly only associations of

arity 2, and do not support association classes currently

  • 3. multiplicities in associations (resp., of attributes) are restricted to be either 0 or
  • 1. Hence, we can express functionality and mandatory participation
  • 4. we do not support ISA between associations
  • 5. we ignore all those features of UML class diagrams that are more relevant for

the software engineering perspective, and less for the conceptual perspective of UML (stereotypes, method specifications, and aggregations)

slide-48
SLIDE 48

Annotation Editor

  • We have developed an Annotation Editor that supports the different forms of

annotation.

  • It provides the possibility of specifying in a simple, intuitive way the event data

annotations over the domain-specific ontology T to the process mining experts.

  • The editor supports some advanced operations to simplify the annotation task:
  • 1. Properties and paths can be chosen using navigational selection over the diagram

via mouse-click operations.

  • 2. The editor takes into account multiplicities on associations and attributes; when

the user is selecting properties of the case and of events (in particular the timestamp), the editor enables only navigation paths that are functional, thus guaranteeing that the properties to include in the extracted log are uniquely determined.

  • The annotated domain ontology can be exported using a proprietary JSON

format, that can then be imported by the log extraction plug-in.

slide-49
SLIDE 49

Log Extraction Plugin

  • The two previous plug-ins support the design phase of the log extraction

framework

  • This plug-in is deployed in the event log extraction phase to support the

automated extraction of event logs that are compatible with XES

  • This plug-in makes use of the following inputs:
  • 1. the information system I = R, D, with the corresponding database schema R;
  • 2. the domain ontology T , e.g, as generated via the UML Editor;
  • 3. the mapping specification M between T and R.
  • 4. annotations L, which are created using the Annotation Editor.
  • It exploits the query rewriting functionalities provided by ontop to generate

from the above inputs a new mapping specification Mlog, which establishes a direct correspondence between the data D in I and the elements of XES, i.e., trace, event, . . . , essentially bypassing the domain ontology T .

  • An ontology capturing the main elements of XES, together with Mlog and I,

constitutes a new OBDA system that is then used for the event log extraction, by relying on the data access functionalities of ontop.

slide-50
SLIDE 50

References i

  • W. van der Aalst et al., “Process mining manifesto,” in Proc. of the Business

Process Management (BPM) Int. Workshops, vol. 99 of LNBIP, pp. 169–194, Springer, 2012.

  • W. M. P. van der Aalst, Process Mining - Data Science in Action.

Springer, 2nd ed., 2016.

  • H. M. W. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, and W. M. P. van der

Aalst, “XES, XESame, and ProM 6,” in Information Systems Evolution – Selected Extended Papers of the CAiSE Forum, vol. 72 of LNBIP, pp. 60–75, Springer, 2010.

  • C. W. G¨

unther and W. M. P. van der Aalst, “A generic import framework for process event logs,” in Proc. of the Business Process Management (BPM) Int. Workshops, vol. 4103 of LNCS, pp. 81–92, Springer, 2006.

slide-51
SLIDE 51

References ii

  • W. M. P. van der Aalst, Extracting Event Data from Databases to Unleash

Process Mining, pp. 105–128. Springer, 2015.

  • A. Syamsiyah, B. F. van Dongen, and W. M. P. van der Aalst, “DB-XES:

enabling process discovery in the large,” in Proc. of the 6th Int. Symposium on Data-driven Process Discovery and Analysis (SIMPDA), vol. 1757 of CEUR, ceur-ws.org, pp. 63–77, 2016.

  • D. Calvanese, M. Montali, A. Syamsiyah, and W. M. P. van der Aalst,

“Ontology-driven extraction of event logs from relational databases,” in Proc. of BPI, vol. 256 of LNBIP, pp. 140–153, Springer, 2016.

  • A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati,

“Linking data to ontologies,” J. on Data Semantics, vol. X, pp. 133–173, 2008.

slide-52
SLIDE 52

References iii

  • D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi,
  • M. Rodriguez-Muro, and R. Rosati, “Ontologies and databases: The DL-Lite

approach,” in RW Tutorial Lectures, vol. 5689 of LNCS, pp. 255–356, Springer, 2009.

  • D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk,
  • M. Rodriguez-Muro, and G. Xiao, “Ontop: Answering SPARQL queries over

relational databases,” Semantic Web J., vol. 8, no. 3, pp. 471–487, 2017.

  • E. Jim´

enez-Ruiz, E. Kharlamov, D. Zheleznyakov, I. Horrocks, C. Pinkel, M. G. Skjæveland, E. Thorstensen, and J. Mora, “BootOX: Bootstrapping OWL 2 Ontologies and R2RML Mappings from Relational Databases,” in Proc. of ISWC Posters & Demonstrations Track, vol. 1486 of CEUR, ceur-ws.org, 2015.

slide-53
SLIDE 53

References iv

  • B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz,

“OWL 2 Web Ontology Language profiles (second edition),” W3C Recommendation, W3C, Dec. 2012. Available at http://www.w3.org/TR/owl2-profiles/. IEEE Computational Intelligence Society, “IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams,” IEEE Std 1849-2016, pp. i–50, 2016.

  • N. Antonioli, F. Castan`
  • , S. Coletta, S. Grossi, D. Lembo, M. Lenzerini,
  • A. Poggi, E. Virardi, and P. Castracane, “Ontology-based data management for

the Italian public debt,” in Proc. of FOIS, vol. 267 of Frontiers in Artificial Intelligence and Applications, pp. 372–385, IOS Press, 2014.