PRETSA: Event Log Sanitization for Privacy-aware Process Discovery - - PowerPoint PPT Presentation

pretsa event log sanitization for privacy aware process
SMART_READER_LITE
LIVE PREVIEW

PRETSA: Event Log Sanitization for Privacy-aware Process Discovery - - PowerPoint PPT Presentation

PRETSA: Event Log Sanitization for Privacy-aware Process Discovery Stephan A. Fahrenkrog-Petersen, Han van der Aa & Matthias Weidlich Motivation hu-berlin.de/pda 2 Related Work [Sweeney et al., 2002] Process Mining [Monreale et


slide-1
SLIDE 1

PRETSA: 
 Event Log Sanitization for Privacy-aware Process Discovery

Stephan A. Fahrenkrog-Petersen, Han van der Aa & Matthias Weidlich

slide-2
SLIDE 2

hu-berlin.de/pda

Motivation

2

slide-3
SLIDE 3

hu-berlin.de/pda

Related Work

3

Information System Individual Event Data Sanitized 
 Event Data Process Mining
 Artifact

Privatized Process Mining Process Mining Event Log
 Sanitization

Process Mining
 Artifact

Data Contribution Data Extraction

[Mannhardt et al., 2019] [Monreale et al., 2014] [Sweeney et al., 2002]

slide-4
SLIDE 4

hu-berlin.de/pda

Research Problem

  • Use Case: Process Discovery with performance data
  • Privacy Issue: Surveillance of individual process workers 


—> Illegal e.g. in Germany

  • Preserve as much utility as possible

4

slide-5
SLIDE 5

hu-berlin.de/pda

Attack Model

  • Trace Linkage Attack
  • Link trace with background knowledge
  • Identity Disclosure
  • Membership Disclosure
  • Attribute Disclosure

5

slide-6
SLIDE 6

hu-berlin.de/pda

Background: k-anonymity

6

slide-7
SLIDE 7

hu-berlin.de/pda

Background: t-closeness

  • Extension of k-anonymity
  • Limiting difference in global and local distribution
  • Earth Mover’s Distance as measure

7

slide-8
SLIDE 8

hu-berlin.de/pda

PRETSA:

PREfix-Tree based event log SAnitization

PRETSA Event Log Event Log
 with
 k-anonymity
 &
 t-closeness Process Discovery Process Model
 with
 Performance Data

8

slide-9
SLIDE 9

hu-berlin.de/pda

PRETSA - Walkthrough

  • Example with an Order-to-Cash process
  • Assume k=8

9

Sequence variant # σ1 create po,update po,receive gd,check in,pay in 10 σ2 create po,update po,receive gd,check in,reject in 5 σ3 create po,receive gd,update po,check in,pay in 7 σ4 create po,receive gd,update po,check in,reject in 5 σ5 create po,receive gd,update po,update po,check in,pay in 1

slide-10
SLIDE 10

hu-berlin.de/pda

PRETSA - Prefix tree

  • PRETSA generates a

prefix tree from an event log

  • Each node in the tree is

an equivalence class

10

Root create_po (28) update_po (15) receive_gd (13) receive_gd (15) check_in (15) pay_in (10) reject_in (5) update_po (13) update_po (1) check_in (12) pay_in (7) reject_in 
 (5) check_in (1) pay_in (1)

slide-11
SLIDE 11

hu-berlin.de/pda

PRETSA - Walkthrough

11

Root create_po (28) update_po (15) receive_gd (13) receive_gd (15) check_in (15) pay_in (10) reject_in (5) update_po (13) update_po (1) check_in (12) pay_in (7) reject_in 
 (5) check_in (1) pay_in (1)

k=8

  • Go through the tree until

violation is found

slide-12
SLIDE 12

hu-berlin.de/pda

PRETSA - Walkthrough

12

Root create_po (28) update_po (15) receive_gd (13) receive_gd (15) check_in (15) pay_in (15) update_po (13) update_po (1) check_in (12) pay_in (7) reject_in 
 (5) check_in (1) pay_in (1)

k=8

  • PRETSA deleted the

branch with violation

  • Move the traces into

most similar branch

slide-13
SLIDE 13

hu-berlin.de/pda

PRETSA - Result

13

Root create_po (28) update_po (15) receive_gd (13) receive_gd (15) check_in (15) pay_in (15) update_po (13) check_in (13) pay_in (13) k=8

  • Resulting tree
slide-14
SLIDE 14

hu-berlin.de/pda

Evaluation Setup

  • Utility benefit?
  • PRETSA vs. Baseline
  • Datasets: Traffic fines, Sepsis & CoSeLog

14

slide-15
SLIDE 15

hu-berlin.de/pda

Experimental Setup

  • Compare…
  • …generated event logs —> Nr. Variants
  • …fitness/precision of process models
  • …performance annotations relative error

15

slide-16
SLIDE 16

hu-berlin.de/pda

Utility Evaluation - Baseline

  • Only release variants that fulfill:
  • k-anonymity
  • t-closeness
  • Delete all other variants

16

Sequence variant # σ1 create po,update po,receive gd,check in,pay in 10 σ2 create po,update po,receive gd,check in,reject in 5 σ3 create po,receive gd,update po,check in,pay in 7 σ4 create po,receive gd,update po,check in,reject in 5 σ5 create po,receive gd,update po,update po,check in,pay in 1

slide-17
SLIDE 17

hu-berlin.de/pda

Evaluation - Event Logs

17

slide-18
SLIDE 18

hu-berlin.de/pda

Evaluation - Process Models

18

slide-19
SLIDE 19

hu-berlin.de/pda

Evaluation - Perfomance Annotations

19

slide-20
SLIDE 20

hu-berlin.de/pda

PRETSA…

…ensures privacy (k-anonymity & t-closeness) for event logs …uses a prefix tree representation of the event log …provides event logs with high utility for process discovery …is available on GitHub under MIT license:
 github.com/samadeusfp/PRETSA Questions? Reach out to fahrenks@hu-berlin.de

20