Data-Centric Workflow and Business Processes Victor Vianu and - - PowerPoint PPT Presentation

data centric workflow and business processes
SMART_READER_LITE
LIVE PREVIEW

Data-Centric Workflow and Business Processes Victor Vianu and - - PowerPoint PPT Presentation

Data-Centric Workflow and Business Processes Victor Vianu and Jianwen Su Outline n Introduction Jianwen n Theoretical work on analysis of data-driven workflows Victor n Survey of issues in practical data-driven workflows Jianwen n Discussions of


slide-1
SLIDE 1

Data-Centric Workflow and Business Processes

Victor Vianu and Jianwen Su

slide-2
SLIDE 2

Outline

n Introduction Jianwen n Theoretical work on analysis of data-driven workflows Victor n Survey of issues in practical data-driven workflows Jianwen n Discussions of Further Research Challenges Jianwen & Victor

20160413 2 Foundations of Data Management

slide-3
SLIDE 3

n A BP is an assembly of (human) tasks to accomplish an objective v Eg: Obtaining a Permit n Each workflow model matches a BP model n Each workflow activity is a software program ( ) that interfaces one task

in the BP

n A WfM system manages executions, resources, documents, etc. n Will be sloppy: BP ≈ Process ≈ Work/low

BPMS ≈ WfMS

Business Processes & Workflow Management

Application Init review Review Approval Fee Certi/icate Delivery

BP Workflow Management (WfM) System workflow

3 20160413 Foundations of Data Management

slide-4
SLIDE 4

20160413 4

Rising Demand of WfMSs

n Ubiquitous of BPs / workflows v old (traditional business apps) to

new (e-science, healthcare, digital governmental apps)

v all sizes (by # of tasks, # of enactments, …) v the number of workflow apps rapidly increasing n Digitization/IT causes pressure to scale, automate v E-documents, ability to get a lot of inputs: enhanced productivity

à raised expectation of workflow

v Rapidly changing environments: shortens idea-to-system time n Rapidly growing market: “Suites” alone $2.7B 2015 (Gartner) n Workflow management remains an art v design, development (implementation), making changes

are mostly ad hoc and rely on human creativity

Foundations of Data Management

slide-5
SLIDE 5

n The need for workflow management is ubiquitous n Current workflow technologies provide inadequate support

for a variety of essential functionalities

n For transactional workflow, a key inhibitor is the lack of intuitively clear

ways to combining the various aspects of workflow

n There is a “long tail” phenomenon in applications that need and/or use

workflow management technologies

n Application areas of business, digital government, healthcare, and scientific

workflow face many common/overlapping problems, but are developing paradigms, techniques and tools largely in isolation

20160413 5

http://dcw2009.cs.ucsb.edu/

5 Findings [2009 NSF Workshop on Data-Centric Workflows]

Foundations of Data Management

slide-6
SLIDE 6

Key Application Challenges

n Complexity v Detailed complexity,

dynamic complexity

n Facilitate Human interactions v Design stake holders,

client/performer

n Extra-functional aspects v Compliance, guarantees,

security/privacy, …

n Variation, evolution, and long tail v E.g., legacy systems, … n Workflow Interoperation

20160413 6

n Unifying holistic conceptual model

for Wf/BP management CHEVI

n Design and runtime issues CEVI n Reasoning

CVI

n Workflow analytics/discovery/

improvement CHEV

n Provenance

CE

n Process mining

CHEV

n Interoperation

CI

Research Challenges

http://dcw2009.cs.ucsb.edu/

Foundations of Data Management

slide-7
SLIDE 7

Five Classes of Biz Process Models

n Data agnostic : data mostly not present v WF (Petri) nets, BPMN, UML activity diagrams, … n Data-aware : data (variables) present but missing storage and

management

v BPEL, YAWL, BPMN 2.0 (?), … n Storage-aware : persistent stores but mappings to/from biz process data

managed ad hoc

v jBPM, Activiti, JTang, … n Artifact-centric : entity (biz data) and lifecycle v GSM (Barcelona), (D)EZ-Flow, … n Universal artifact : add automation for modeling all five types, data-storage

mapping

v Universal Artifacts (UA)

7 20160415 20160413 Foundations of Data Management

slide-8
SLIDE 8

Outline

n Introduction Jianwen n Theoretical work on analysis of data-driven workflows Victor n Survey of issues in practical data-driven workflows Jianwen n Discussions of Further Research Challenges Jianwen & Victor

20160413 8 Foundations of Data Management

slide-9
SLIDE 9

Outline

n Introduction Jianwen n Theoretical work on analysis of data-driven workflows Victor n Survey of issues in practical data-driven workflows Jianwen v Independence of Data and Execution Management v Workflow Management v Workflow Interoperation n Discussions of Further Research Challenges Jianwen & Victor

20160413 9 Foundations of Data Management

slide-10
SLIDE 10

Enterprise System Architectures

n Evolve around information/database systems n Multiple applications with overlapping data

20160413 10

Application OS 1970’s Application OS 1980’s DBMS Warehouse Management OS Purchase Order Manag. OS DBMS HR Application OS DBMS

HR DB POM DB WH data /iles

[Weske 2012]

Foundations of Data Management

slide-11
SLIDE 11

Enterprise Resource Planning (ERP)

n Composed of (extended) database systems and

application specific software for applications since 2000’s

n Typically too complex to integrate, or interoperate: very challenging

20160413 11

product planning marketing & sales

Database System Enterprise Resource Planning

application system application system

...

inventory management

standard DBMS technology

1990’s

Foundations of Data Management

...

Supply Chain Management

...

Customer Relation Management

slide-12
SLIDE 12

Key Obstacle: Ad Hoc Workflow Management in ERP

n Most enterprise applications include business processes/workflows n WfMSs: handcrafted, out-sourced, or standalone (jBPM, Activiti, …) n BPs are key assets/functions of enteprises

20160413 12

product planning marketing & sales

Database System Enterprise Resource Planning

application system application system

...

inventory management

standard DBMS technology

Workflow Management: ad hoc, handcrafted, many context dependent decisions

Foundations of Data Management

slide-13
SLIDE 13

Typical Workflow Management System Architecture

n Used in YAWL, jBPM, Activiti, Barcelona, …,

and possibly systems from major vendors

n During execution, data can be held in each of the shaded shapes

à creates many problems

20160413 13

Execution Engine Local data store Enterprise database

Task wrapper

. . .

Task wrapper Task wrapper

WfMS

[van der Aalst-van Hee 2004] Includes all data required for control flow decisions, correlations, …

Foundations of Data Management

slide-14
SLIDE 14

Example: Enterprise Database Fails

n DBMS does recovery, but data may not be

consistent with data in the local store, engine, and wrappers

n Other failures similar: local data store v Engine (workflow “log”?) v Wrapper (more trouble if keeping persistent data, e.g., MVC) n Other difficulties: Process change, compliance, …

20160413 14

Execution Engine Local data store

Enterprise database

Task wrapper

. . .

Task wrapper Task wrapper

WfMS

Enterprise database

ShippingAddress: undefined Update ShippingAddress

Foundations of Data Management

[S.-Yang EVL-BP15]

slide-15
SLIDE 15

Independence of Data Management and Execution Management

n Clean separation of responsibilities v WfMS: Execution v DBMS: Data n Allows Divide-and-Conquer for management functions v Helps in many aspects

Execution Independence the freedom of changing the process execution system while leaving conceptual BP models unchanged and vice versa

15

[Sun-S.-Yang BPM14 TMIS16]

20160415 20160413 Foundations of Data Management

slide-16
SLIDE 16

Intersections of Databases & Software Engineering

n Could benefit a wide range of data-intensive or data-centric software

applications Currently lacks:

n Models, frameworks, and principles n Theoretical foundations n Tools and techniques

20160413 16 Foundations of Data Management

slide-17
SLIDE 17

Conceptualizing Running Workflows (or Other Software)

Work/low instances Database

n Each workflow (BP) instance consists of

a universal artifact and a lifecycle

n Data mappings are ad hoc in current development practice n Primitive mapping support: ADO.NET Entity Framework n

17 20160413

. . .

Foundations of Data Management

[Melnik-Adya-Bernstein TODS08]

slide-18
SLIDE 18

Example: Enterprise Database (& Lifecycle)

n Includes keys, foreign keys, and

a cardinality specification on each foreign key

18

Repair Application w(ID) w(Customer Name) r(Customer Address) . . . Application Review . . . Repairperson Assignment w(Service ID) w(Repairperson Name) w(Repairperson Phone) . . . Document Archive . . . On-site Repair w(Material ID) w(Material) . . . Post-repair Visit . . .

tUser tLastName tFirstName tPhone tAddress . . . tRepair tRepairID tCustomerLN tCustomerFN tReason tDate . . . tServiceInfo tServiceID tRerpairID_SI tTime

. . .

tRepairPerson tServiceID_P tRerpairpersonLN tRepairpersonFN . . . tMaterialInfo tMaterialID tServiceID_MI tMaterial

. . .

tReview tReviewID tServiceID_R tReviewResult

. . .

* * * * + ?

20160413 Foundations of Data Management

slide-19
SLIDE 19

Example: The (Universal) Artifact

Tuple and (nested) set constructs

19

aID unique aRepairInfo aCustomer aRepair aReason aDate aCust_Name aCust_Addr aCust_Last_Name aCust_First_Name aService unique aTime aRepairPerson aService_Info aRP LastName aRP FirstName aRP Phone aMaterial aReviewID aResult aMaterial Info unique in aMaterial Info aMaterial_Info aReview_Info unique

20160413 Foundations of Data Management

slide-20
SLIDE 20

Data Mapping Idea

SIID Date RepairID

S01 11/15 R101 S02 11/29 R102 S03 12/17 R101

RIID CustName

R101 David R102 Peter

ServiceInfo RepairInfo

UserName Address

Peter A2 David A1 James A3

User

R101 David A1 S01 11/29

Repair RID Customer CName Addr Services SID

. . . . . .

Date

{ }

. . .

S03 12/17

. . .

Addr = Addr.Customer.RID @RepairInfo(RIID).CustName @User(UserName).Address @ @

[Sun-S.-Wu-Yang ICDE14]

20 20160415 20160413 Foundations of Data Management

slide-21
SLIDE 21

21

n aID : tRepair.tRepairID n aReason = aReason.aRepairInfo.aID@tRepair(tRepairID).tReason n aCustAddr = aCustAddr.aCust_Name.[aCust_Last_Name,

aCust_First_Name]@tUser(tLastName, tFirstName).tAddress

Example: Cross Reference Paths

aID unique aRepairInfo aCustomer aRepair aReason aDate aCust_Name aCust_Addr aCust_Last_Name aCust_First_Name

tUser tLastName tFirstName tPhone tAddress

. . .

tRepair tRepairID tCustomerLN tCustomerFN tReason tDate

. . .

20160413 Foundations of Data Management

[Sun-S.-Wu-Yang ICDE14]

slide-22
SLIDE 22

n Database updatability:

for each update Δd on d, there is an update Δe such that Δe(µ(d)) = µ(Δd(d))

n Artifact updatability:

for each update Δe on µ(d), there is an update Δd such that µ(Δd(d)) = Δe(µ(d))

Updatability

Workflow e = µ(d) DB d µ

Δe Δd

µ

Δd(d) Δe(µ(d))

22 20160415 20160413 Foundations of Data Management

slide-23
SLIDE 23

n Database updatability: forward, can always be done n Artifact updatability: backward, often not possible

Very closely related to database view update problem

[Bancilhon-Spyratos TODS81]

v View complement [BS81] [Lechtenbörger et al PODS03] v Clean source [Dayal-Bernstein TODS82][Wang et al DKE06] n For cross-ref-path expressions:

Every non-overlapping UA map is updatable [Sun-S.-Wu-Yang ICDE14]

Updatability & View Update

23 20160413 Foundations of Data Management

slide-24
SLIDE 24

n µ is isolating if each update on a single artifact (instance) will not affect

write (and/or read) attributes of other artifact instances

n Main result: Isolation can be tested v Testing “conflicting” updates

v EXPTIME with conditional updates

Isolation of BP Instances

Snapshot DB d µ . . .

[Sun-S.-Wu-Yang ICDE14]

24 20160413 Foundations of Data Management

slide-25
SLIDE 25

n Fundamentals v What are these mappings? v Updatability, what else? v Mapping languages n Design principles v Isolation, for lifecycles? runtime mechanisms? v Data design completeness, needs ontology v Implementability: translating IOPEs on artifact to DB n Transactions v Workflow vs databases

Connecting Artifacts and Databases

20160413 25 Foundations of Data Management

slide-26
SLIDE 26

Outline

n Introduction Jianwen n Theoretical work on analysis of data-driven workflows Victor n Survey of issues in practical data-driven workflows Jianwen v Independence of Data and Execution Management v Workflow Management v Workflow Interoperation n Discussions of Further Research Challenges Jianwen & Victor

20160413 26 Foundations of Data Management

slide-27
SLIDE 27

Four Stages of Biz Process Life Cycle

n Need to manage process models (schemas)

20160413 27

(re-)design implementation/ configuration enactment/ monitoring diagnosis/ requirements

Foundations of Data Management

slide-28
SLIDE 28

n Examples of what’s needed: v Design tools, discovery, analytics, runtime support, changes,

composition, constructing views, …

n A rough classification of management operations:

A repository of work/low schemas (Raw) Log (a sequence of records) Work/low schemas Execution traces Static analysis Log query Data tables etc. Resource planning Process mining, BI

Input Output

Workflow Management

20160413 28 Foundations of Data Management

Model manipulations Process mining

slide-29
SLIDE 29

Model Manipulations

n Why querying model repository? v Find workflows (schemas) that do “this” from a repository

(e.g., ~200K schemas in CNR)

v “This” may be, a single activity, a sequence of activities, a graph of

activities, or even with cycles

l Could involve data contents, e.g., purchases including an X-ray machine,

without manager’s approval, …

n Views/model translations v Behavior preserving, and: quality of models, understandability n Workflow change/evolution

20160413 29 Foundations of Data Management

slide-30
SLIDE 30

Repository Queries

n A classification of about 20 approaches: [Wang-Jin-Wong-Wen WWWJ13] v Exact match vs similarity based v Graph vs behaviors

  • e.g., [Deutch-Milo JCSS12], VisTrail [Scheidegger-Vo-Koop-Freire-Silva SIGMOD08]

v Operation semantics (ontology) n May return models/schemas or “top” traces n Overlapping vs containment (equivalence) n Data not included

20160413 30 Foundations of Data Management

slide-31
SLIDE 31

Types of Process Mining

n The following types of process mining can be distinguished: v Determine basic performance metrics v Determine process model v Determine organizational model v Analyze social network

(i.e., relations between actors)

v Analyze performance characteristics

(i.e., derive rules explaining performance)

20160413 31

If …then …

Start Register order Prepare shipment Ship goods (Re)send bill Receive payment Contact customer Archive order End

Foundations of Data Management

slide-32
SLIDE 32

Determine Basic Performance Metrics

n Process/control-flow perspective: flow time, waiting time, processing time,

synchronization time, e.g.

v What is the average flow time of orders? v What is the maximum waiting time for activity approve? v What percentage of requests is handled within 10 days? v What is the minimum processing time of activity reject? v What is the average time between scheduling an activity and actually starting it?

n Resource perspective: frequencies, time, utilization, and variability, e.g.

v How many times did Sue complete activity reject claim? v How many times did John withdraw activity go shopping? v How many times did Clare suspend some running activity? v How much time did Peter work on instances of activity reject claim? v How much time did people with role Manager work on this process? v What is the utilization of John? v What is the average utilization of people with role Manager? v How many times did John work for more than 2 hours without interruption?

32 20160413 Foundations of Data Management

slide-33
SLIDE 33

33

Example (ARIS PPM)

n IDS Scheer’s ARIS Process Performance Manager

20160413 Foundations of Data Management

slide-34
SLIDE 34

Determine Organizational Model

n Discover the organizational model (i.e., roles, departments, etc.) without

prior knowledge about the structure of the organization

34 20160413 Foundations of Data Management

slide-35
SLIDE 35

Analyze Social Network

n Social Network Analysis (SNA) n Based on: v Handover of work v Subcontracting v Working together v Reassignments v Doing similar tasks

20160413 35 Foundations of Data Management

slide-36
SLIDE 36

Analyze Performance Characteristics

n Each case (process/workflow instance) has a number of properties: v Resource that worked on a specific activity v Value of a characteristic data element

(e.g., size of order, age of customer, etc.)

v Performance metrics of case (e.g., flow time) n Using machine-learning techniques it is possible to find relevant relations

between these properties

n Examples: v If John and Mike work

together, it takes longer

v Expensive cases require less processing v . . .

20160413 36

caseid Act A Act B ... Act Z Data D1 Data D2 ... Data D9 Proc time Wait Time Flow time 1 John Mike Anne $50 20y 80% 12h 3d 3.5d 2 Clare Jim Ike $75 15y 75% 6h 3d 3.25d 3 John Mike Clare $55 20y 80% 18h 4d 4.75d ... ... ... ... ... ... ... ... ... ... ... ...

Foundations of Data Management

slide-37
SLIDE 37

37

Process Mining: Determine Process Model

n Discover a process model (e.g., in terms of a Petri net or workflow net)

without prior knowledge about the structure of the process

n Useful for: v Process discovery

(What is the process?)

v Delta analysis

(Are we doing what was specified?)

v Performance analysis

(How can we improve?)

B A C E F D

20160413

activity log

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

Foundations of Data Management

slide-38
SLIDE 38

n Fitness:

the discovered model should allow for the behavior seen in the event log

n Precision:

the discovered model should not allow for behavior completely unrelated to what was seen in the event log

n Generalization:

the discovered model should generalize the example behavior seen in the event log

n Simplicity:

the discovered model should be as simple as possible

20160413 38

Process Discovery Criteria

Foundations of Data Management

slide-39
SLIDE 39

Alpha Algorithm

39

α

22 Op berge n en ein de 10 regist reren 14 ein dcontrolere, tekene n Standaard 17 bepale n vervolg 9 Bepalen vervolg 1 18 regist reren offert e ges lo t e n 13 inv., 1e controle, printen STANDAARD 3 controleren comple ethei d/j u is theid 1 st a rt 2 collecti e f of partic u lier 12 Bepalen

  • ffert e

st a nda ard of NI ET klaar voor invoe ren Go edg ekeurde of fe rte begin proces klaar voor controle comple et/ juis t klaar voor re gist reren naar registreren

  • ffert e u

it g eprint klaar voor einde Standaard offerte afgek e urde offert e 20 ont van gst verklaring P2 accoord verklaring 7 ont van gst gegeven s P1 ont brek e nde gegeven s 19 wachten op accoord verklaring 16 ein dcontrolere, tekene n nie t st d . 15 inv, 1e controle, printen NI ET STD. ret our gewenst wachten2 4 dubbele aanvra ag? 5 nav raa g VA (telef oon ) 6 opv rag en

  • nt brek e

nde gegeven s NS u i tg eprint D2 geen ret our

  • nt van

gen Niet Standaard of fert e 21 regist reren offert e afgelegd is collectief

  • pv a

gen gegevens wachten dubbele D1 Ge en react ie 8 verlo pen deadlin e 11 afwijz e n Afgekeu rd NS afgewe z en collecti e f retour ree ds ontvang en P of C retou r ge wenst partic u lier zon der retou r collecti e f partic u lier en invoeren partic u lier en afwijzen nie t compleet/onjuis t partic u lier collecti e f incomp leet voldo ende

  • nv o

ldoende

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

n Minimal information in log:

case id’s and task id’s

n Additional data ignored:

event type, time, resources, and data

n This example log: three

possible sequences (traces):

v ABCD v ACBD v EF n Alternative log representation

[ABCD2, ACBD2, EF]

20160413 Foundations of Data Management

slide-40
SLIDE 40

Four Ordering Relations >, →, ||, #

n Direct succession:

x > y iff in some case x is directly followed by y

n Causality:

x → y iff x > y ∧ y ≯ x

n Parallel:

x || y iff x > y ∧ y > x

n Choice:

x # y iff x ≯ y ∧ y ≯ x

n Footprint:

40

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D A>B A>C B>C B>D C>B C>D E>F A→B A→C B→D C→D E→F B||C C||B

[ABCD2, ACBD2, EF]

A#D A#E …

20160413 Foundations of Data Management

A B C D E F A # → → # # # B ← # || → # # C ← || # → # # D # ← ← # # # E # # # # # → F # # # # ← #

slide-41
SLIDE 41

α Algorithm and Process Mining

n Given a sound and structured workflow net N, a log L of N is complete if

(1) L contains all possible > relations in N and (2) every transition of N occurs

n Given a sound & structured workflow net N such that

no two transitions share input places and share output places, and a complete log L of N, α(L)=N

n Can’t handle short loops (size < 3), an improvement: α+ v Given a complete log, α/α+ can mine any sound & structured wf-net n Not considered: frequencies, data n Other approaches: causal nets, Markov model (state machines) v Genetic algorithms, “region-based” mining (e.g., state sets or languages)

20160413 41

[vanderAalst+ TKDE04]

Foundations of Data Management

slide-42
SLIDE 42

Mining Processes with Data

n Majority of existing process mining methods cannot be applied

directly to artifact-centric BPs

v existing process mining techniques tailored to classical monolithic

processes where each process execution can be described just by the flow of activities

v for processes with n-to-m relationships (among log records) classical

process mining techniques yield incomprehensible results

n Possible approach: v chain of methods that can be applied to discover artifact lifecycle models

in GSM notation

20160413 42

[Popova-Fahland-Dumas IJCIS15]

Foundations of Data Management

slide-43
SLIDE 43

20160413 43

Artifact Discovery: The Main Idea

n All events may be recorded together in single log, without any case/

artifact-based grouping

[Popova-Fahland-Dumas IJCIS15]

Timestamp EventType Data (attribute-value pairs) 11-24,17:12 ReceivePO items=(it0), POrderID=1 11-24,17:13 CreateMO supplier=supp6, items=(it0), POrderID=1, MOrderID=1 11-24,19:56 ReceiveMO supplier=supp6, items=(it0), POrderID=1, MOrderID=1 11-24,19:57 ReceiveSupplResp supplier=supp6, items=(it0), POrderID=1, MOrderID=1, answer=accept 11-25,07:20 ReceiveItems supplier=supp6, items=(it0), POrderID=1, MOrderID=1 11-25,08:31 Assemble items=(it0), POrderID=1, MOrderID=1 11-25,08:53 ReceivePO items=(it0,it1,it2,it3), POrderID=2

Foundations of Data Management

slide-44
SLIDE 44

n Does not assume any database schemas n Log records has attribute-value pairs n Entity precedence: creation time of entities (instances)

Artifact Structure Discovery

20160413 44

[Popova-Fahland-Dumas IJCIS15]

Foundations of Data Management

slide-45
SLIDE 45

Outline

n Introduction Jianwen n Theoretical work on analysis of data-driven workflows Victor n Survey of issues in practical data-driven workflows Jianwen v Independence of Data and Execution Management v Workflow Management v Workflow Interoperation n Discussions of Further Research Challenges Jianwen & Victor

20160413 45 Foundations of Data Management

slide-46
SLIDE 46

n Orchestration: Hub-and-spoke or mediated

Store Seller Warehouse Bank Store Seller Warehouse Bank Mediator

Web Services approaches are fundamentally process centric Data-centric approach can enable

  • Explicit modeling of

correlations

  • Mediation at scale

20160413 46

Web Services Collaboration Models (Extreme Cases)

n Choreography: Peer to peer

Foundations of Data Management

slide-47
SLIDE 47

n A choreography defines how biz processes should collaborate to achieve a

business goal

n Goal: Support for choreography languages: v Design “correctness”, auto realization, mechanisms for monitoring, ... v However: Notion of correlation not explicitly modeled

Store Seller Warehouse Bank

choreography

20160413 47

Choreography of Web Services

Foundations of Data Management

slide-48
SLIDE 48

A Classification of Collaboration Models

Distributed (choreography) WSCDL Let’s Dance

  • Conv. Protocol

Choreography4Artifacts Centralized (orchestration) BPEL BPMN YAWL Roman Services HUB No logical data model Logical data model, Centralized data Logical data model, Distributed data

20160413 48 Foundations of Data Management

Control Flow Data

slide-49
SLIDE 49

O1 P1 P3 P2 M1 F1 F2

Order Purchase Payment Ful/illment Data: if the amount is less $100, then a fulfillment instance can ship even the check is not received Instance-level correlation: Which process instances are correlated during the runtime? Who sends messages to whom?

20160413 49

Two Key Aspects of Choreography Languages

n Use of artifacts at each node can provide systematic support for correlations

[Sun-Xu-S. ICSOC12]

Foundations of Data Management

slide-50
SLIDE 50

n Two artifact instances are correlated if they are involved in a common

collaborative BP instance

v Messaging only between correlated instances n Correlations of a collaborative BP are defined in a diagram, with one

artifact as the root or primary process

v Directed edge indicates

creation of BP instance(s)

v Cardinality constraints

are also defined

v Some syntactic

restrictions (acyclic, “1” on root, …)

n Correlations can also be derived

Payment Purchase Ful=illment Order

1 m m 1 1 1

store bank warehouse seller

20160413 50

Correlation Diagrams

Foundations of Data Management

slide-51
SLIDE 51

n Participants’ interactions with the hub: v Read data from the hub v Perform tasks for hub that progress artifacts along their lifecycle v Subscribe for notifications based on conditions, transitions, etc. n Performed using SOA/REST, but in an unconventional manner

Candidate Pool Travel Agencies Hiring Departments Evaluator Pool Human Resources (HR) Reimbursement Hiring Interoperation Hub

Interoperation Hub Example

Stakeholder Organizations of type “Hiring Dept”: System also tracks individual people in each stakeholder

  • rganization

Job Application Job Opening

20160413 51 Foundations of Data Management

[Hull et al ICSOC09, SRIIGC12]

data-centric, supports rather than controls

slide-52
SLIDE 52

Outline

n Introduction Jianwen n Theoretical work on analysis of data-driven workflows Victor n Survey of issues in practical data-driven workflows Jianwen v Independence of Data and Execution Management v Workflow Management v Workflow Interoperation n Discussions of Further Research Challenges Jianwen & Victor

20160413 52 Foundations of Data Management

slide-53
SLIDE 53

Discussions and Further Research Challenges

n Unified holistic conceptual models (data + process) n Design tools, reasoning, views/transformation, process quality n Runtime issues: data consistency, wf transactions, logging n Process mining n Process improvement n Provenance n Interoperation

20160413 53 Foundations of Data Management

slide-54
SLIDE 54

Reasoning and Synthesis Problems

n Verification: given all four elements n Incremental checking v E.g., for process changes n Synthesis: given artifacts, activities, and a property, construct process

model

n Functional properties and non-functional properties

20160413 54 Foundations of Data Management

Artifacts (data model)

+

Activities (services, tasks)

+

if C enable … Process Model (eg GSM)

⎟= ϕ

Property

slide-55
SLIDE 55

Interoperation/Collaboration

n Very similar: n Realization problem: construct process models given other 3 n Also interesting to verify e.g. LTL+FO properties of collaborative BPs

20160413 55 Foundations of Data Management

Artifacts (data model)

+

Activities (services, tasks)

+

if C enable … Process Models with communication

⎟= C

choreography

slide-56
SLIDE 56

Process Model Management (Data-Centric Wfs)

n Static analysis is most studied, approximate or incremental verification

needed

n BI in the DB community, in absence of workflow models n Very few research in other categories n Wf log: “fuzzy” notion but central to many problems

20160413 56 Foundations of Data Management

A repository of work/low schemas (Raw) Log (a sequence of records) Work/low schemas Model manipulations Process mining Execution traces Static analysis Log query Data tables etc. Resource planning Process mining, BI

Input Output

slide-57
SLIDE 57

Execution Management

n Data management: data mappings v Specification, maintenance of consistency n Workflow transactions and db transactions n Logging n Combining workflow and analytics: v Workflow execution à Wrangling à Machine Learning à Wf change

20160413 57 Foundations of Data Management