Processes @ your Service Using Process Mining to Turn Big Data into - - PowerPoint PPT Presentation

processes your service
SMART_READER_LITE
LIVE PREVIEW

Processes @ your Service Using Process Mining to Turn Big Data into - - PowerPoint PPT Presentation

33 Processes @ your Service Using Process Mining to Turn Big Data into Real Value prof.dr.ir. Wil van der Aalst PAGE 0 Web Engineering model, specify, observe configure, implement behavior (e.g., processes event data) PAGE 1 PAGE 2


slide-1
SLIDE 1

33

PAGE 0

Processes @ your Service

Using Process Mining to Turn Big Data into Real Value

prof.dr.ir. Wil van der Aalst

slide-2
SLIDE 2

Web Engineering

PAGE 1

model, specify, configure, implement processes

  • bserve

behavior (e.g., event data)

slide-3
SLIDE 3

PAGE 2

slide-4
SLIDE 4

PAGE 3

Big Data ?

slide-5
SLIDE 5

PAGE 4

Big … or fast and efficient?

slide-6
SLIDE 6

PAGE 5

The future is bright, but how to get started? What are the main pitfalls of process modeling? What is process mining? Why is process discovery difficult? What are the main research challenges?

slide-7
SLIDE 7

PAGE 6

The future is bright, but how to get started? What are the main pitfalls of process modeling? What is process mining? Why is process discovery difficult? What are the main research challenges?

slide-8
SLIDE 8 register request examine casually examine thoroughly check ticket decide pay compensation reject request reinitiate request start end start register request examine thoroughly examine casually check ticket decide pay compensation reject request end e1 AND OR XOR OR AND XOR end e2 e3 e4 e5 e6

start register request examine thoroughly examine casually check ticket decide pay compensation reject request new information end c1 c2 OR-split OR-join c3

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket decide pay compensation reject request reinitiate request

e g h f

end c1 c2 c3 c4 c5

[start] register request examine thoroughly examine casually check ticket decide pay compensation reject request reinitiate request [end] [c1,c2] [c1,c4] [c2,c3] [c3,c4] check ticket examine casually examine thoroughly [c5]
slide-9
SLIDE 9

PAGE 8

  • enormous investments in process models
  • large collections of "dead" process models
  • not taken seriously, unrelated to reality
slide-10
SLIDE 10

PAGE 9

problem #1 Aiming for one model that suits all purposes

slide-11
SLIDE 11

PAGE 10

slide-12
SLIDE 12

PAGE 11

problem #2 Straightjacketing smaller interacting processes into one monolithic model

slide-13
SLIDE 13

PAGE 12

register request examine casually examine thoroughly check ticket decide pay compensation reject request reinitiate request start end

slide-14
SLIDE 14

What is the process instance?

PAGE 13 Order Customer : CustID Amount : Euro Created : DateTime Paid : DateTime Completed : DateTime Orderline Product : ProdID NofItems : PosInt TotalWeight : Weight Entered : DateTime BackOrdered : DateTime Secured : DateTime Delivery DelAddress : Address Contact : PhoneNo Attempt When : DateTime Successful : Bool

1 1..* 0..* 1 0..1 1..*

OrderID : OrderID OrderID : OrderID OrderLineID : OrderLineID DelID : DelID DelID : DelID DelID : DelID

slide-15
SLIDE 15

PAGE 14

problem #3 Using a static hierarchical decomposition as the only abstraction mechanism

slide-16
SLIDE 16

PAGE 15

most process modeling notations assume a fixed hierarchy no seamless zoom-in and zoom out! traditional hierarchy concepts don't support "Google Maps" abstraction

slide-17
SLIDE 17

PAGE 16

problem #4 Modeling humans as if they are machines doing a single task

slide-18
SLIDE 18

PAGE 17

"My processes are unique, my people are artists!"

slide-19
SLIDE 19

PAGE 18

?

slide-20
SLIDE 20

PAGE 19

problem #5 Being vague about vagueness

slide-21
SLIDE 21

PAGE 20

register request examine casually examine thoroughly check ticket decide pay compensation reject request reinitiate request start end

slide-22
SLIDE 22

PAGE 21

problem #6 Abstracting from the things that matter

slide-23
SLIDE 23

PAGE 22

slide-24
SLIDE 24

PAGE 23

The future is bright, but how to get started? What are the main pitfalls of process modeling? What is process mining? Why is process discovery difficult? What are the main research challenges?

slide-25
SLIDE 25

Positioning Process Mining

PAGE 24

process mining

data-oriented analysis

(data mining, machine learning, business intelligence)

process model analysis

(simulation, verification, optimization, gaming, etc.)

performance-

  • riented

questions, problems and solutions compliance-

  • riented

questions, problems and solutions

slide-26
SLIDE 26

PAGE 25

www.olifantenpaadjes.nl

slide-27
SLIDE 27

PAGE 26

slide-28
SLIDE 28

PAGE 27

Let us take a step back and see how models and behavior relate: Let's play!

slide-29
SLIDE 29

Play-Out

PAGE 28

event log process model

slide-30
SLIDE 30

A B C D E p2 end p4 p3 p1 start

Play-Out (Classical use of models)

PAGE 29

A B C D A C B D A B C D A E D A C B D A C B D A E D A E D

slide-31
SLIDE 31

Play-In

PAGE 30

event log process model

slide-32
SLIDE 32

A B C D E p2 end p4 p3 p1 start

Play-In

PAGE 31

A C B D A B C D A E D A C B D A C B D A E D A E D A B C D

slide-33
SLIDE 33

Example Process Discovery

(Vestia, Dutch housing agency, 208 cases, 5987 events)

PAGE 32

slide-34
SLIDE 34

Example Process Discovery

(ASML, test process lithography systems, 154966 events)

PAGE 33

slide-35
SLIDE 35

Example Process Discovery

(AMC, 627 gynecological oncology patients, 24331 events)

PAGE 34

slide-36
SLIDE 36

Replay

PAGE 35

event log process model · extended model showing times, frequencies, etc. · diagnostics · predictions · recommendations

slide-37
SLIDE 37

A B C D E p2 end p4 p3 p1 start

Replay

PAGE 36

A B C D

slide-38
SLIDE 38

A B C D E p2 end p4 p3 p1 start

Replay

PAGE 37

A E D

slide-39
SLIDE 39

A B C D E p2 end p4 p3 p1 start

Replay can detect problems

PAGE 38

A C D

Problem! missing token Problem! token left behind

slide-40
SLIDE 40

Conformance Checking

(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)

PAGE 39

slide-41
SLIDE 41

A B C D E p2 end p4 p3 p1 start

Replay can extract timing information

PAGE 40

A5 B8 C9 D13

5 8 9 13

3 4 5 4

3 2 6 5 8 7 6 4 7 7 4 3

slide-42
SLIDE 42

PAGE 41

Performance Analysis Using Replay

(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)

slide-43
SLIDE 43

PAGE 42

Models are like the glasses required to see and understand event data!

slide-44
SLIDE 44

PAGE 43

slide-45
SLIDE 45

PAGE 44

  • conformance checking to diagnose deviations
  • squeezing reality into the model to do model-based

analysis

Alignments are essential!

slide-46
SLIDE 46

PAGE 45

process model event log synchronous move move on model only move on log

  • nly
slide-47
SLIDE 47

Example: BPI Challenge 2012

(Dutch financial institute, doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f)

PAGE 46

“O_DECLINED” and “W_Wijzigen contractgegevens” are often skipped Many moves on log of “O_CANCELLED”, ”O_CREATED”, ”O_SELECTED”, “O_SENT” occurred with the same frequency value (i.e. 60) before parallel branch Many moves on log of “W_Afhandelen leads” ( > 2200 times)

  • ccurred in the end of

traces Loops of “W_Completeren aanvraag” and “W_Nabellen offertes” are often performed

Work of Arya Adriansyah (Replay project)

slide-48
SLIDE 48

PAGE 47

“O_DECLINED” and “W_Wijzigen contractgegevens” are often skipped Many moves on log of “O_CANCELLED”, ”O_CREATED”, ”O_SELECTED”, “O_SENT” occurred with the same frequency value (i.e. 60) before parallel branch Many moves on log of “W_Afhandelen leads” ( > 2200 times)

  • ccurred in the end of

traces Loops of “W_Completeren aanvraag” and “W_Nabellen offertes” are often performed

Synchronous moves of “Completeren aanvraag” Move on log of “Completeren aanvraag” Moves on model towards end of traces Move on log of “O_CANCELLED” and “A_CANCELLED”

“O_ACCEPTED” has average sojourn time of 27.07 minutes, while “A_REGISTERED”, ”A_ACTIVATED”, and “A_APPROVED” have average sojourn time of 29.56 minutes Activity “W_Wijzigen contractgegevens” is the bottleneck, but it occured rarely (only 4 times) The average waiting time for the input place of “W_Nabellen offertes+START” is very long (2.83 days) compares to the average waiting time of other places

slide-49
SLIDE 49

PAGE 48

The future is bright, but how to get started? What are the main pitfalls of process modeling? What is process mining? Why is process discovery difficult? What are the main research challenges?

slide-50
SLIDE 50

Language identification in the limit (Mark Gold 1967)

PAGE 49

Language identification in the limit by E Mark Gold, Information and Control, 10(5):447–474, 1967.

abc abd abc ? ab(c|d) ? ad abbc ac … (ad)|(ab(c|d)) ? ab*(c|d) ? A language is learnable in the limit if there exists a perfect child that generates only finitely many hypotheses.

slide-51
SLIDE 51

Learning is not easy …

  • Even simple languages like

regular languages are not learnable in the limit.

  • Many settings: evil or well-

behaving mothers, with or without negative examples, frequencies, etc.

PAGE 50

sentence  trace in event log language  process model

slide-52
SLIDE 52

PAGE 51

at the start of the century, process mining emerged as a new research topic remarkable progress over a relatively short period See keynote at Process Mining Camp 2013, http://fluxicon.com/camp/2013/

slide-53
SLIDE 53

Process discovery challenge

(oversimplied no resources, data, etc.)

PAGE 52

a

start a = register request b = examine file c = check ticket d = decide e = reinitiate request f = send acceptance letter g = pay compensation h = send rejection letter

b c d g h e

end c1 c2 c3 c4 c5 t1

f

t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 c6 c7 c8 c9 a,c,d,f,g a,b,c,d,e,c,d,g,f a,c,d,h

process model event log

slide-54
SLIDE 54

Process discovery algorithms

(small selection)

PAGE 53

α algorithm α++ algorithm α# algorithm language-based regions state-based regions genetic mining heuristic mining hidden Markov models neural networks automata-based learning stochastic task graphs conformal process graph mining block structures multi-phase mining partial-order based mining fuzzy mining LTL mining ILP mining distributed genetic mining

slide-55
SLIDE 55

Problem

PAGE 54

real process event data process model

record process discovery conformance checking process discovery conformance checking

“unknown” “only examples”

real process is unknown event logs covers only a fraction

  • f all possible behavior

model needs to provide an abstraction: Murphy's Law of Process Mining

  • nly positive examples
slide-56
SLIDE 56

We only have example behavior (event log) and do not know the real process …

PAGE 55

M0 M0 M1

ideal or desired model based on perfect knowledge of real process

FN TP FP TN

descriptive or normative model (man-made or discovered)

L

event log regular behavior in log covered by model exceptional behavior in log covered by model exceptional behavior in log not covered by the model regular behavior in log not covered by model Problem II: in practice it is unclear where this line is Problem I: event log does not provide information about the whole universe of traces only a selected part

W.M.P. van der Aalst. Mediating Between Modeled and Observed Behavior: The Quest for the "Right" Process. In IEEE International Conference on Research Challenges in Information Science (RCIS 2013), pages 31-43. IEEE Computing Society, 2013.

slide-57
SLIDE 57

Balance four forces

PAGE 56

fitness simplicity generalization precision

P r

  • c

e s s M i n i n g

ability to explain

  • bserved behavior

avoiding underfitting Occam’s Razor avoiding

  • verfitting

lift gravity thrust drag

slide-58
SLIDE 58

Example: one log four models

PAGE 57

acdeh abdeg adceh abdeh acdeg adceg adbeh acdefdbeh adbeg acdefbdeh acdefbdeg acdefdbeg adcefcdeh adcefdbeh adcefbdeg acdefbdefdbeg adcefdbeg adcefbdefbdeg adcefdbefbdeh adbefbdefdbeg adcefdbefcdefdbeg 455 191 177 144 111 82 56 47 38 33 14 11 9 8 5 3 2 2 1 1 1 # trace 1391

fitness simplicity generalization precision

Process Mining ability to explain
  • bserved behavior
avoiding underfitting Occam’s Razor avoiding
  • verfitting
lift gravity thrust drag a start register request b examine thoroughly c examine casually d check ticket decide pay compensation reject request reinitiate request e g h f end a start register request c examine casually d check ticket decide reject request e h end N3 : fitness = +, precision = -, generalization = +, simplicity = + N2 : fitness = -, precision = +, generalization = -, simplicity = + a start register request b examine thoroughly c examine casually d check ticket decide pay compensation reject request reinitiate request e g h f end N1 : fitness = +, precision = +, generalization = +, simplicity = + a start register request c examine casually d check ticket decide reject request e h end N4 : fitness = +, precision = +, generalization = -, simplicity = - a register request d examine casually c check ticket decide reject request e h a c examine casually d check ticket decide e g a d examine casually c check ticket decide e g register request register request pay compensation pay compensation a register request b d check ticket decide reject request e h a register request d b check ticket decide reject request e h a b d check ticket decide e g register request pay compensation examine thoroughly examine thoroughly examine thoroughly

… (all 21 variants seen in the log)

slide-59
SLIDE 59

Model N1

PAGE 58

acdeh abdeg adceh abdeh acdeg adceg adbeh acdefdbeh adbeg acdefbdeh acdefbdeg acdefdbeg adcefcdeh adcefdbeh adcefbdeg acdefbdefdbeg adcefdbeg adcefbdefbdeg adcefdbefbdeh adbefbdefdbeg adcefdbefcdefdbeg 455 191 177 144 111 82 56 47 38 33 14 11 9 8 5 3 2 2 1 1 1 # trace 1391

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket decide pay compensation reject request reinitiate request

e g h f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

slide-60
SLIDE 60

Model N2

PAGE 59

acdeh abdeg adceh abdeh acdeg adceg adbeh acdefdbeh adbeg acdefbdeh acdefbdeg acdefdbeg adcefcdeh adcefdbeh adcefbdeg acdefbdefdbeg adcefdbeg adcefbdefbdeg adcefdbefbdeh adbefbdefdbeg adcefdbefcdefdbeg 455 191 177 144 111 82 56 47 38 33 14 11 9 8 5 3 2 2 1 1 1 # trace 1391

a

start register request

c

examine casually

d

check ticket decide reject request

e h

end

N2 : fitness = -, precision = +, generalization = -, simplicity = +

slide-61
SLIDE 61

Model N3

PAGE 60

acdeh abdeg adceh abdeh acdeg adceg adbeh acdefdbeh adbeg acdefbdeh acdefbdeg acdefdbeg adcefcdeh adcefdbeh adcefbdeg acdefbdefdbeg adcefdbeg adcefbdefbdeg adcefdbefbdeh adbefbdefdbeg adcefdbefcdefdbeg 455 191 177 144 111 82 56 47 38 33 14 11 9 8 5 3 2 2 1 1 1 # trace 1391

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket decide pay compensation reject request reinitiate request

e g h f

end

N3 : fitness = +, precision = -, generalization = +, simplicity = +

slide-62
SLIDE 62

Model N4

PAGE 61

acdeh abdeg adceh abdeh acdeg adceg adbeh acdefdbeh adbeg acdefbdeh acdefbdeg acdefdbeg adcefcdeh adcefdbeh adcefbdeg acdefbdefdbeg adcefdbeg adcefbdefbdeg adcefdbefbdeh adbefbdefdbeg adcefdbefcdefdbeg 455 191 177 144 111 82 56 47 38 33 14 11 9 8 5 3 2 2 1 1 1 # trace 1391 a

start register request

c

examine casually

d

check ticket decide reject request

e h

end

N4 : fitness = +, precision = +, generalization = -, simplicity = - a

register request

d

examine casually

c

check ticket decide reject request

e h a c

examine casually

d

check ticket decide

e g a d

examine casually

c

check ticket decide

e g

register request register request pay compensation pay compensation

a

register request

b d

check ticket decide reject request

e h a

register request

d b

check ticket decide reject request

e h a b d

check ticket decide

e g

register request pay compensation examine thoroughly examine thoroughly examine thoroughly

… (all 21 variants seen in the log)

slide-63
SLIDE 63

PAGE 62

formal (not just a picture) fast (should not take years) sound (result should at least be free

  • f deadlocks,

etc.) ability to balance all conformance dimensions (fitness, precision, generalization, and simplicity) incl. noise provide guarantees (not just a best effort) 1 2 3 4 5

slide-64
SLIDE 64

PAGE 63

The future is bright, but how to get started? What are the main pitfalls of process modeling? What is process mining? Why is process discovery difficult? What are the main research challenges?

slide-65
SLIDE 65

PAGE 64

Finding sheep with five legs

we are getting close…

slide-66
SLIDE 66

PAGE 65

Distributing process mining problems to cope with big data

slide-67
SLIDE 67

PAGE 66

On-the-fly process mining Operational support

slide-68
SLIDE 68

Concept drift

PAGE 67

Concept drift

slide-69
SLIDE 69

Cross-organizational mining

PAGE 68

cross-organizational / comparative process mining

slide-70
SLIDE 70

PAGE 69

context aware process mining

slide-71
SLIDE 71

PAGE 70

Suppor Supporting the pr ting the process

  • cess
  • f
  • f pr

proces

  • cess

s mining mining

slide-72
SLIDE 72

PAGE 71

The future is bright, but how to get started? What are the main pitfalls of process modeling? What is process mining? Why is process discovery difficult? What are the main research challenges?

slide-73
SLIDE 73

PAGE 72

slide-74
SLIDE 74

The Sexiest Job of the 21st century (thanks to Moore's Law)

73

1965

slide-75
SLIDE 75

PAGE 74

How to get started?

slide-76
SLIDE 76

600+ plug-ins available covering the whole process mining spectrum

75

Download from: www.processmining.org

  • pen-source (L-GPL)
slide-77
SLIDE 77

Commercial Alternatives

  • Disco (Fluxicon)
  • Perceptive Process Mining

(before Futura Reflect and BPM|one)

  • ARIS Process Performance

Manager

  • QPR ProcessAnalyzer
  • Interstage Process Discovery

(Fujitsu)

  • Discovery Analyst (StereoLOGIC)
  • XMAnalyzer (XMPro)

76

slide-78
SLIDE 78

How to Get Started?

Collect event data

  • Minimal requirement:

events referring to an activity name and a process instance.

  • Good to have:

timestamps, resource information, additional data elements.

  • Challenges: scoping and

sometimes correlation. Collect questions

  • What kind problems would

you like to address (cost, time, risk, compliance, service, etc.)?

  • Related to discovery,

conformance, enhancement?

  • Iterative process: can be

“curiosity driven” initially.

77

slide-79
SLIDE 79

Conclusion

PAGE 78

process mining data-oriented analysis (data mining, machine learning, business intelligence) process model analysis (simulation, verification, etc.) performance-oriented questions, problems and solutions compliance-oriented questions, problems and solutions