Process Mining
prof.dr.ir. Wil van der Aalst
www.processmining.org Tutorial Computational Intelligence in HealthCare 20 - 24 September 2010, Eindhoven, the Netherlands
Process Mining Tutorial Computational Intelligence in HealthCare 20 - - PowerPoint PPT Presentation
Process Mining Tutorial Computational Intelligence in HealthCare 20 - 24 September 2010, Eindhoven, the Netherlands prof.dr.ir. Wil van der Aalst www.processmining.org Focus of most modeling and analysis techniques is on right-hand side
prof.dr.ir. Wil van der Aalst
www.processmining.org Tutorial Computational Intelligence in HealthCare 20 - 24 September 2010, Eindhoven, the Netherlands
Focus of most modeling and analysis techniques is on right-hand side …
PAGE 1diagnosis/ requirements configuration/ implementation enactment/ monitoring adustment (re)design models data insight discussion verification performance analysis animation specification documentation configuration
Let’s play …
PAGE 2event log process model
Play-In
event log process model
Play-Out
event log process model
Replay
showing times, frequencies, etc.
Menu
1. Introduction to process mining 2. Two types of processes:
3. The Alpha algorithm 4. Over/underfitting 5. Replay
6. Process mining Software (BI versus ProM)
PAGE 3http://www.canarypete.be/
Growth of data
PAGE 6Data Mining
Smoker Drinker Weight Short (91/10) Yes No Long (30/1) No Yes Long (150/20) Short (321/25) <81.5 ≥81.5Process Mining = Process Analysis
start register initial conditions check_A needed? check_A modify conditions check_B needed? check_B check_C needed? check_C asses risk decline c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 make+
Process Mining
really happening?"
we do what was agreed upon?"
"Where are the bottlenecks?"
case be late?"
to redesign this process?"
Process Discovery Example
PAGE 9>,→,||,# relations
for some case x is directly followed by y.
not y>x.
y>x
not y>x.
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task A case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task D case 4 : task D
A>B A>C A>E B>C B>D C>B C>D E>D A→B A→C A→E B→D C→D E→D B||C C||B
ABCD ACBD AED
Basic Idea Used by α Algorithm (1)
a b (a) sequence pattern: a→b
Basic Idea Used by α Algorithm (2)
a b c (b) XOR-split pattern: a→b, a→c, and b#c
a b c (c) XOR-join pattern: a→c, b→c, and a#b
a b c (b) XOR-split pattern: a→b, a→c, and b#c
Basic Idea Used by α Algorithm (3)
a b c (d) AND-split pattern: a→b, a→c, and b||c
a b c (e) AND-join pattern: a→c, b→c, and a||b
a b c (d) AND-split pattern: a→b, a→c, and b||c
Example Revisited
PAGE 14A B C D E p2 end p4 p3 p1 start B#E C#E …
Result produced by α algorithm
A>B A>C A>E B>C B>D C>B C>D E>D A→B A→C A→E B→D C→D E→D B||C C||B
Process mining: Linking events to models
software system (process) model event logs
models analyzes
discovery
records events, e.g., messages, transactions, etc. specifies configures implements analyzes supports/ controls
extension conformance
“world”
people machines
components business processes
Old Toolset
/ name of department PAGE 1622-
software system (process) model event logs
models analyzes
discovery
records events, e.g., messages, transactions, etc. specifies configures implements analyzes supports/ controls
extension conformance
“world”
people machines
components business processes
ProMimport MXML
New Toolset
/ name of department PAGE 1722-
software system (process) model event logs
models analyzes
discovery
records events, e.g., messages, transactions, etc. specifies configures implements analyzes supports/ controls
extension conformance
“world”
people machines
components business processes
XESame 5.2 ►6.0 MXML ►XES
Motivation for changes
/ name of department PAGE 1822-
software system (process) model event logs
models analyzes
discovery
records events, e.g., messages, transactions, etc. specifies configures implements analyzes supports/ controls
extension conformance
“world”
people machines
components business processes
XESame 5.2 ►6.0 MXML ►XES
dist stri ributio ion deco coupling ng log
c and nd UI dealing ng w with h hund ndre reds of
ug-ins ns extendible ble se sema mantics no p no prog rogra ramm mming ing map mapping p prob roblems
Where did we apply process mining?
Justitieel Incasso Bureau, Justice department)
(e.g., Philips Healthcare, ASML, Ricoh, Thales)
Example of a Lasagna Process
Example: WMO Harderwijk
Maatschappelijke Ondersteuning” (WMO) Harderwijk
handicaps, elderly, etc.).
Event log
(796 applications, 5187 events)
PAGE 22Helicopter view of 1.5 years
PAGE 23Huge variance in durations
PAGE 24Process discovered using Genetic Miner
PAGE 25Various representations
PAGE 26Fuzzy Miner
PAGE 27Seamless abstraction
PAGE 28more detailed more abstract
Fuzzy Replay
PAGE 29Conformance checking using Replay
PAGE 30= should not have happened but did = should have happened but did not
Performance analysis using Replay
PAGE 31Performance information
PAGE 32Prediction based
Spaghetti Processes
Balanci ncing ng B Between n Underfitt ittin ing a g and Overfit ittin ting
Process spectrum
structured (Lasagna) unstructured (Spaghetti)
How can process mining help?
measurement
improvements
(recommendation and prediction)
problems
by PowerPoint
“analytics”
PAGE 41Learning processes: The Alpha Algorithm
Process Mining: The alpha algorithm
α
algorithm
22 Opbergen en einde 10 registreren 14 eindcontrolere, tekenen Standaard 17 bepalen vervolg 9 Bepalen vervolg1 18 registreren offerte gesloten 13 inv., 1e controle, printen STANDAARD 3 controleren compleetheid/juistheid 1 start 2 collectief of particulier 12 Bepalen offerte standaard of NIET klaar voor invoeren Goedgekeurde offerte begin proces klaar voor controle compleet/juist klaar voor registreren naar registrerenAlpha algorithm
Without transactional information (just completes)
Starting point: event logs
event logs, audit trails, databases, message logs, etc. www.xes-standard.org
XES (compatible with MXML)
Event log consists of:
− events
extensions loaded every trace has a name every event has a name and a transition classifier = name + transition start of trace (i.e. process instance) name of trace name of event (activity name) resource transition timestamp
start of trace name of trace name of event (activity name) resource data associated to event timestamp end of trace (i.e. process instance)
Example log
id’s and task id’s.
type, time, resources, and data.
possible sequences:
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
>,→,||,# relations
for some case x is directly followed by y.
not y>x.
y>x
not y>x.
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
A>B A>C B>C B>D C>B C>D E>F A→B A→C B→D C→D E→F B||C C||B
ABCD ACBD EF
Basic idea (1) x y
Basic idea (2)
x z y
Basic idea (3)
x z y
Basic idea (4)
x y z
Basic idea (5)
x y z
It is not that simple! Basic Alpha algorithm
Let W be a workflow log over T. α(W) is defined as follows.
∀a ∈ A∀b ∈ B a →W b ∧ ∀a1,a2 ∈ A a1#W a2 ∧ ∀b1,b2 ∈ B b1#W b2 },
YW ∧ b ∈ B } ∪{ (iW,t) | t ∈ TI} ∪{ (t,oW) | t ∈ TO}, and
Example revisited
W:
case case 1 1 : t : task ask A A case case 2 2 : t : task ask A A case case 3 3 : t : task ask A A case case 3 3 : t : task ask B B case case 1 1 : t : task ask B B case case 1 1 : t : task ask C C case case 2 2 : t : task ask C C case case 4 4 : t : task ask A A case case 2 2 : t : task ask B B case case 2 2 : t : task ask D D case case 5 5 : t : task ask E E case case 4 4 : t : task ask C C case case 1 1 : t : task ask D D case case 3 3 : t : task ask C C case case 3 3 : t : task ask D D case case 4 4 : t : task ask B B case case 5 5 : t : task ask F F case case 4 4 : t : task ask D D
A B C D E F
α(W)
A>B A>C B>C B>D C>B C>D E>F A→B A→C B→D C→D E→F B||C C||B
Exercise (1)
consisting only of the following traces?
some case x is directly followed by y.
y>x.
Let W be a workflow log over T. α(W) is defined as follows.
∀a ∈ A∀b ∈ B a →W b ∧ ∀a1,a2 ∈ A a1#W a2 ∧ ∀b1,b2 ∈ B b1#W b2 },
(A′,B′) },
(p(A,B),b) | (A,B) ∈ YW ∧ b ∈ B } ∪{ (iW,t) | t ∈ TI} ∪{ (t,oW) | t ∈ TO}, and
Another example taken step-by-step ...
A>B A>C A>E B>C D>D C>B C>D E>D A→B A→C A→E B→D C→D E→D B||C C||B
A and B need to be non-empty.
A>B A>C A>E B>C D>D C>B C>D E>D A→B A→C A→E B→D C→D E→D B||C C||B
# #
Exercise (2)
consisting only of the following traces?
some case x is directly followed by y.
y>x.
Let W be a workflow log over T. α(W) is defined as follows.
∀a ∈ A∀b ∈ B a →W b ∧ ∀a1,a2 ∈ A a1#W a2 ∧ ∀b1,b2 ∈ B b1#W b2 },
(A′,B′) },
(p(A,B),b) | (A,B) ∈ YW ∧ b ∈ B } ∪{ (iW,t) | t ∈ TI} ∪{ (t,oW) | t ∈ TO}, and
Exercise (3)
consisting only of the following traces?
some case x is directly followed by y.
y>x.
Let W be a workflow log over T. α(W) is defined as follows.
∀a ∈ A∀b ∈ B a →W b ∧ ∀a1,a2 ∈ A a1#W a2 ∧ ∀b1,b2 ∈ B b1#W b2 },
(A′,B′) },
(p(A,B),b) | (A,B) ∈ YW ∧ b ∈ B } ∪{ (iW,t) | t ∈ TI} ∪{ (t,oW) | t ∈ TO}, and
More on Process Discovery
Examples of process discovery techniques
Genetic Mining
(Ana Karla Alves de Medeiros et al.)
Design choices
representation fitness crossover mutation
Properties of Genetic Mining
tasks, invisible tasks, etc.
combinations with other approaches (heuristics post-optimization, etc.).
Challenge: Balancing Between Underfitting and Overfitting
The essence
A B C D E
ABCD ACBD AED ABCD ABCD AED ACBD ...
But ...
A B C D E
Any log containg activities A, B, C, D, and E.
start end
Finding a balance
A D C E B A D C E B
ACD BCE ... ACD ACE BCE BCD ...
(a) (b) (c) (d)
more behavior more behavior
A D C E B A D C E B
ACD ACE BCE BCD 99 85
A D C E B A D C E B
ACD ACE BCE BCD 99 88 85 78
A D C E B A D C E B
ACD ACE BCE BCD 99 2 85 3
Structure: Is this the simplest model (Occam's Razor)? Fitness: Is the event log possible according to the model? Precision: Is the model not underfitting (allow for too much)? Generalization: Is the model not overfitting (only allow for the “accidental” examples)?
Evaluating process mining results
Representing process models
PAGE 82Highlights more important paths More significant nodes are emphasized
Aggregation
Clustering of coherent, less significant structures
Abstraction
Removing isolated, less significant structures
More to learn from maps...
Fuzzy miner
Showing reality
Back to the future …
software system (process) model event logs
models analyzes
discovery
records events, e.g., messages, transactions, etc. specifies configures implements analyzes supports/ controls
extension conformance
“world”
people machines
components business processes
PAGE 89Pre redi dict ct: When wil will I b be h home? ? At 1 11.26! Rec ecomme
to get home ASAP et home ASAP? Take Take a a lef eft tu t turn! Detec etect: You You d drive too ve too fas ast! t!
Operational Support: Detect, Predict, and Recommend
PAGE 91current data historic data (simulation) models learn (discover and enhance) detect predict recommend alerts predictions recommendations
Operational Support and Conformanc Checking Based on Replay
A B C D E p2 end p4 p3 p1 start
Play Out (Classical use of models)
PAGE 93A B C D A C B D A B C D A E D A C B D A C B D A E D A E D
Play In (Process Discovery)
PAGE 94A B C D E p2 end p4 p3 p1 start
ABCD ACBD AED ACBD AED ABCD … a process discovery algorithm like the α algorithm
A B C D E p2 end p4 p3 p1 start
Replay
PAGE 95A B C D
A B C D E p2 end p4 p3 p1 start
Replay can detect problems
PAGE 96AC D
Problem! missing token Problem! token left behind
A B C D E p2 end p4 p3 p1 start
Replay can extract timing information
PAGE 97A5B8 C9D13
5 8 9 13
3 4 5 4
3 2 6 5 8 7 6 4 7 7 4 3
Example: Conformance Checker
Conformance checker
(Anne Rozinat et al.)
Fitness by replay
m=missing,r=remaining,c=consumed,p=produced
No problem (m=0, r=0)
Another (impossible) trace
Fitness calculation
Examples
f=1.000 f=0.995 f=0.540
Diagnostics
Other Metrics
needed such as behavioral and structural appropriateness, etc.
Another Replay Example: Time-based Operational Support (TOS)
Architecture
PAGE 109Step 1: Learn Transition System
PAGE 110Prefix-set-activity abstraction is used here. Other abstractions possible.
Step 2: Replay Log for Time Information
PAGE 111e=elapsed, r=remaining, and s=sojourn.
Step 3: Calculate statistics
PAGE 112 PAGE 112Step 4: Start Operational Support
PAGE 113A B C D
known past unknown future current state
A B A B ? ? A B C ?
detect: B does not fit the model (not allowed, too late, etc.) predict: some prediction is made about the future (e.g. completion date or outcome)
T=10
recommend: based on past experiences C is recommended (e.g., to minimize costs)
PAGE 114Business Intelligence Tools?
Business Intelligence Tools?
Process Mining Software
ARIS Process Performance Manager Interstage Automated Business Process Discovery & Visualization Process Discovery Focus Futura Reflect Enterprise Visualization Suite Comprehend BPM|one fluxicon/nitro ProcessGold
Conclusion
More Information
PAGE 120 PAGE 120IEEE Task Force on Process Mining