Authorship Obfuscation Using Heuristic Search Masters Thesis Defence - - PowerPoint PPT Presentation

authorship obfuscation
SMART_READER_LITE
LIVE PREVIEW

Authorship Obfuscation Using Heuristic Search Masters Thesis Defence - - PowerPoint PPT Presentation

Authorship Obfuscation Using Heuristic Search Masters Thesis Defence by Janek Bevendorff on 20 June 2018 Supervisors: Prof. Dr. Benno Stein, PD Dr. Andreas Jakoby Unmasking for short texts Obfuscation against unmasking Obfuscation


slide-1
SLIDE 1

Authorship Obfuscation

Using Heuristic Search

Master’s Thesis Defence by Janek Bevendorff on 20 June 2018

Supervisors: Prof. Dr. Benno Stein, PD Dr. Andreas Jakoby

slide-2
SLIDE 2
  • Unmasking for short texts
  • Obfuscation against unmasking
  • Obfuscation against compression models
  • Authorship verification quality measure proposal
  • Obfuscation safety analysis and definitions
  • Side effect analysis
  • JS∆ as authorship metric
  • Adaptive obfuscation
  • Design of an admissible obfuscation heuristic
  • Analysis of consistency and monotonicity properties
  • Design and implementation of an efficient obfuscation framework
  • Development of obfuscation operators
  • Inspection of search space challenges and solutions

20.06.2018 2

slide-3
SLIDE 3
  • Unmasking for short texts
  • Obfuscation against unmasking
  • Obfuscation against compression models
  • Authorship verification quality measure proposal
  • Obfuscation safety analysis and definitions
  • Side effect analysis
  • JS∆ as authorship metric
  • Adaptive obfuscation
  • Design of an admissible obfuscation heuristic
  • Analysis of consistency and monotonicity properties
  • Design and implementation of an efficient obfuscation framework
  • Development of obfuscation operators
  • Inspection of search space challenges and solutions

20.06.2018 3

slide-4
SLIDE 4

Authorship

slide-5
SLIDE 5

20.06.2018 5

slide-6
SLIDE 6

20.06.2018 5

slide-7
SLIDE 7

20.06.2018 6

slide-8
SLIDE 8

20.06.2018 6

?

slide-9
SLIDE 9

20.06.2018 7 Koppel and Schler, Authorship verification as a one-class problem, 2004

slide-10
SLIDE 10

20.06.2018 7 Koppel and Schler, Authorship verification as a one-class problem, 2004

slide-11
SLIDE 11

20.06.2018 7 Koppel and Schler, Authorship verification as a one-class problem, 2004

slide-12
SLIDE 12

20.06.2018 7 Koppel and Schler, Authorship verification as a one-class problem, 2004

slide-13
SLIDE 13

20.06.2018 7 Koppel and Schler, Authorship verification as a one-class problem, 2004

slide-14
SLIDE 14

20.06.2018 7 Koppel and Schler, Authorship verification as a one-class problem, 2004

slide-15
SLIDE 15

20.06.2018 8 Koppel and Schler, Authorship verification as a one-class problem, 2004

slide-16
SLIDE 16

20.06.2018 9

Same author

slide-17
SLIDE 17

20.06.2018 9

Same author Different authors

slide-18
SLIDE 18

20.06.2018 10

slide-19
SLIDE 19

20.06.2018 10

the

Treasure

Island

Dr

Livesey

gentlemen

rest

having

slide-20
SLIDE 20

I

will

begin

the

story

adventures

my

certain

morning

  • f

the

Treasure

Island

Dr

Livesey

gentlemen

rest

having

11 20.06.2018

slide-21
SLIDE 21

20.06.2018 12

slide-22
SLIDE 22

20.06.2018 13 1.0 0.5 0.9 0.8 0.7 0.6 Accuracy 3 6 9 12 15 18 21 Rounds Different authors Same author

slide-23
SLIDE 23

20.06.2018 14 1.0 0.5 0.9 0.8 0.7 0.6 Accuracy 3 6 9 12 15 18 21 Rounds Different authors Same author

slide-24
SLIDE 24

20.06.2018 15

Training Test

Confidence Level Threshold Precision % Classified Very High 0.9 1.00 6.2 0.8 1.00 12.5 0.7 1.00 13.8 High 0.6 1.00 18.8 0.5 1.00 30.0 Moderate 0.4 0.93 43.8 0.3 0.83 55.0 0.2 0.68 70.0 Low 0.1 0.82 87.5 0.0 0.76 100.0

slide-25
SLIDE 25

Obfuscation

slide-26
SLIDE 26

20.06.2018 17

slide-27
SLIDE 27

20.06.2018 17

slide-28
SLIDE 28

20.06.2018 18 Different authors Same author 1.0 0.5 0.9 0.8 0.7 0.6 Accuracy 3 6 9 12 15 18 21 Rounds

slide-29
SLIDE 29

20.06.2018 18 Different authors Same author 1.0 0.5 0.9 0.8 0.7 0.6 Accuracy 3 6 9 12 15 18 21 Rounds

slide-30
SLIDE 30

20.06.2018 19 Different authors Same author 1.0 0.5 0.9 0.8 0.7 0.6 Accuracy 3 6 9 12 15 18 21 Rounds

slide-31
SLIDE 31

20.06.2018 20

KLD ԡ 𝑄 𝑅 = ෍

𝑗

𝑄[𝑗] log2 𝑄[𝑗] 𝑅[𝑗]

slide-32
SLIDE 32

20.06.2018 21

JSD ԡ 𝑄 𝑅 = KLD ԡ 𝑄 𝑁 + KLD ԡ 𝑅 𝑁 2 𝑁 = 𝑄 + 𝑅 2 KLD ԡ 𝑄 𝑅 = ෍

𝑗

𝑄[𝑗] log2 𝑄[𝑗] 𝑅[𝑗]

slide-33
SLIDE 33

20.06.2018 21

JSD ԡ 𝑄 𝑅 = KLD ԡ 𝑄 𝑁 + KLD ԡ 𝑅 𝑁 2 𝑁 = 𝑄 + 𝑅 2 KLD ԡ 𝑄 𝑅 = ෍

𝑗

𝑄[𝑗] log2 𝑄[𝑗] 𝑅[𝑗]

→ maximize

slide-34
SLIDE 34

20.06.2018 22

𝜖 𝜖𝑅[𝑗] 𝑄[𝑗] log2 𝑄[𝑗] 𝑅[𝑗] = − 𝑄[𝑗] 𝑅[𝑗] ln 2

slide-35
SLIDE 35

20.06.2018 22

𝜖 𝜖𝑅[𝑗] 𝑄[𝑗] log2 𝑄[𝑗] 𝑅[𝑗] = − 𝑄[𝑗] 𝑅[𝑗] ln 2 RKL (𝑗) = 𝑄[𝑗] 𝑅[𝑗]

slide-36
SLIDE 36

20.06.2018 22

𝜖 𝜖𝑅[𝑗] 𝑄[𝑗] log2 𝑄[𝑗] 𝑅[𝑗] = − 𝑄[𝑗] 𝑅[𝑗] ln 2 RKL (𝑗) = 𝑄[𝑗] 𝑅[𝑗]

→ maximize

slide-37
SLIDE 37

Text 1 Text 2 (to be obfuscated) Text 1 Text 2 (to be obfuscated)

20.06.2018 23

n-gram frequencies n-grams ranked left to right gre ly_ par bor y_h hel eme ny_ dis gro

slide-38
SLIDE 38

Text 1 Text 2 (to be obfuscated) Text 1 Text 2 (to be obfuscated)

20.06.2018 23

n-gram frequencies n-grams ranked left to right gre ly_ par bor y_h hel eme ny_ dis gro

slide-39
SLIDE 39

Text 1 Text 2 (to be obfuscated) Text 1 Text 2 (to be obfuscated)

20.06.2018 23

n-gram frequencies n-grams ranked left to right gre ly_ par bor y_h hel eme ny_ dis gro

slide-40
SLIDE 40

Text 1 Text 2 (to be obfuscated) Text 1 Text 2 (to be obfuscated)

20.06.2018 23

n-gram frequencies n-grams ranked left to right gre ly_ par bor y_h hel eme ny_ dis gro

slide-41
SLIDE 41

Text 1 Text 2 (to be obfuscated) Text 1 Text 2 (to be obfuscated)

20.06.2018 23

n-gram frequencies n-grams ranked left to right gre ly_ par bor y_h hel eme ny_ dis gro

slide-42
SLIDE 42

Text 1 Text 2 (to be obfuscated) Text 1 Text 2 (to be obfuscated)

20.06.2018 23

n-gram frequencies n-grams ranked left to right gre ly_ par bor y_h hel eme ny_ dis gro

slide-43
SLIDE 43

Text 1 Text 2 (to be obfuscated) Text 1 Text 2 (to be obfuscated)

20.06.2018 23

n-gram frequencies n-grams ranked left to right gre ly_ par bor y_h hel eme ny_ dis gro

slide-44
SLIDE 44

20.06.2018 24 1.4 0.4 1.2 1.0 0.8 0.6 JS distance (JS∆) 27 Text length (characters) 28 29 210 211 212 213 214 JSΔ = 2 ⋅ JSD( ԡ 𝑄 𝑅) Different authors Same author

slide-45
SLIDE 45

20.06.2018 24 1.4 0.4 1.2 1.0 0.8 0.6 JS distance (JS∆) 27 Text length (characters) 28 29 210 211 212 213 214 JSΔ = 2 ⋅ JSD( ԡ 𝑄 𝑅) Different authors Same author ɛ0

slide-46
SLIDE 46

20.06.2018 24 1.4 0.4 1.2 1.0 0.8 0.6 JS distance (JS∆) 27 Text length (characters) 28 29 210 211 212 213 214 JSΔ = 2 ⋅ JSD( ԡ 𝑄 𝑅) Different authors Same author ɛ0.5 ɛ0

slide-47
SLIDE 47

Confidence Level Threshold Precision % Classified Very High 0.9 1.00 6.2 0.8 1.00 12.5 0.7 1.00 13.8 High 0.6 1.00 18.8 0.5 1.00 30.0 Moderate 0.4 0.93 43.8 0.3 0.83 55.0 0.2 0.68 70.0 Low 0.1 0.82 87.5 0.0 0.76 100.0 20.06.2018 25

slide-48
SLIDE 48

Confidence Level Threshold Precision % Classified Very High 0.9 0.00 2.5 0.8 0.00 5.0 0.7 0.00 8.7 High 0.6 0.00 17.5 0.5 0.00 27.5 Moderate 0.4 0.00 42.5 0.3 0.67 66.7 0.2 0.50 70.0 Low 0.1 0.42 85.0 0.0 0.53 100.0 20.06.2018 25 Confidence Level Threshold Precision % Classified Very High 0.9 1.00 6.2 0.8 1.00 12.5 0.7 1.00 13.8 High 0.6 1.00 18.8 0.5 1.00 30.0 Moderate 0.4 0.93 43.8 0.3 0.83 55.0 0.2 0.68 70.0 Low 0.1 0.82 87.5 0.0 0.76 100.0

slide-49
SLIDE 49

Heuristic Search

slide-50
SLIDE 50

20.06.2018 27

s

slide-51
SLIDE 51

20.06.2018 27

s

CLOSED

slide-52
SLIDE 52

20.06.2018 27

s

CLOSED OPEN

slide-53
SLIDE 53

20.06.2018 27

s

𝑔(𝑜)

slide-54
SLIDE 54

20.06.2018 27

s

𝑔(𝑜)

slide-55
SLIDE 55

20.06.2018 27

s

𝑔(𝑜)

slide-56
SLIDE 56

20.06.2018 27

s

slide-57
SLIDE 57

20.06.2018 27

s

𝑔(𝑜)

slide-58
SLIDE 58

20.06.2018 28

𝑔 𝑜 = 𝑕 𝑜 + ℎ(𝑜)

slide-59
SLIDE 59

20.06.2018 28

𝑔 𝑜 = 𝑕 𝑜 + ℎ(𝑜) ℎ 𝑜 ≤ ℎ∗(𝑜)

slide-60
SLIDE 60

20.06.2018 29

ℎ𝑞𝑠𝑗𝑝𝑠 𝑜 = 𝜁 − JSΔ𝑜

slide-61
SLIDE 61

20.06.2018 29

ℎ𝑞𝑠𝑗𝑝𝑠 𝑜 𝑕𝑜𝑝𝑠𝑛 𝑜 = 𝜁 − JSΔ𝑜 = 𝑕(𝑜) JSΔ𝑜 − JSΔ0

slide-62
SLIDE 62

ℎ 𝑜 = ℎ𝑞𝑠𝑗𝑝𝑠 𝑜 ⋅ 𝑕𝑜𝑝𝑠𝑛 𝑜

20.06.2018 29

slide-63
SLIDE 63

20.06.2018 30

Linear Gain

𝜁

slide-64
SLIDE 64

20.06.2018 30

Linear Gain

𝑕(𝑜) 𝜁

slide-65
SLIDE 65

20.06.2018 30

Linear Gain

𝑕(𝑜) JSΔ 𝜁

slide-66
SLIDE 66

20.06.2018 30

Linear Gain

ℎ(𝑜) 𝑕(𝑜) JSΔ 𝜁

slide-67
SLIDE 67

20.06.2018 30

Linear Gain

ℎ(𝑜) 𝑕(𝑜) JSΔ 𝜁

slide-68
SLIDE 68

20.06.2018 31

Sublinear Gain

ℎ(𝑜) 𝑕(𝑜) JSΔ 𝜁

slide-69
SLIDE 69

20.06.2018 31

Sublinear Gain

ℎ(𝑜) 𝑕(𝑜) JSΔ 𝜁

slide-70
SLIDE 70

20.06.2018 31

Sublinear Gain

ℎ(𝑜) 𝑕(𝑜) JSΔ 𝜁

slide-71
SLIDE 71

20.06.2018 32

ℎ(𝑜) 𝑕(𝑜) JSΔ 𝜁

Operations 100 200 300 400

slide-72
SLIDE 72

20.06.2018 32

ℎ(𝑜) 𝑕(𝑜) JSΔ Stepwise JSΔ 𝜁

Operations 100 200 300 400

slide-73
SLIDE 73

20.06.2018 32

ℎ(𝑜) 𝑕(𝑜) JSΔ Stepwise JSΔ 𝜁

Operations 100 200 300 400

slide-74
SLIDE 74

20.06.2018 33

abcdefg

n-gram removal

slide-75
SLIDE 75

20.06.2018 33

abfg

n-gram removal

slide-76
SLIDE 76

20.06.2018 33

abfg wi ard z

n-gram removal character flip

slide-77
SLIDE 77

20.06.2018 33

abfg wia rd z

n-gram removal character flip

slide-78
SLIDE 78

20.06.2018 33

abfg wia rd z

The End.

n-gram removal character flip character map

slide-79
SLIDE 79

20.06.2018 33

abfg wia rd z

The End!

n-gram removal character flip character map

slide-80
SLIDE 80

20.06.2018 33

abfg wia rd z

The End!

house

n-gram removal character flip character map synonym

slide-81
SLIDE 81

20.06.2018 33

abfg wia rd z

The End!

home

n-gram removal character flip character map synonym

slide-82
SLIDE 82

20.06.2018 33

abfg wia rd z

The End!

home

author

n-gram removal character flip character map synonym Netspeak

slide-83
SLIDE 83

20.06.2018 33

abfg wia rd z

The End!

home

author of

n-gram removal character flip character map synonym Netspeak

slide-84
SLIDE 84

20.06.2018 34

ℎ(𝑜) 𝑕(𝑜) JSΔ Stepwise JSΔ

Operations 20 60 100 120 40 80

𝜁

slide-85
SLIDE 85

20.06.2018 35

With a furtive glance around him, he clapped the other half of the clay sphere over the filled hemisphere and then stood up. The patients lined up at the door, waiting for the walk back across the green hills to the main hospital. The attendants made a quick count and then unlocked the door. The group shuffled out into the warm, afternoon sunlight and the door closed behind

  • them. Miss Abercrombie gazed around the cluttered

room and picked up her chart book of patient progress. Moving slowly down the line of benches, she made short, precise notes on the day’s work accomplished by each

  • patient. [...]

‘ ’

A Filbert Is a Nut by Rick Raphael

slide-86
SLIDE 86

20.06.2018 36

With a furtive glance around him, he clapped the other half of the clay sphere over the filled hemisphere and then stood up. The patients lined up at the door, waiting for the walk back across the site hills to the main hospital. The attendants made a quick investigation and then unlocked the door. The group shuffled out into the warm, daylight sunlight and the door closed behind them. Miss Abercrombie gazed around the cluttered room and picked up her chart forward

  • f patient progress. Moving slowly down the line of

bens, she made parcel, precise notes on the day’s work accomplishedb y aehc patient. [...]’

A Filbert Is a Nut by Rick Raphael

slide-87
SLIDE 87

20.06.2018 36

With a furtive glance around him, he clapped the other half of the clay sphere over the filled hemisphere and then stood up. The patients lined up at the door, waiting for the walk back across the site hills to the main hospital. The attendants made a quick investigation and then unlocked the door. The group shuffled out into the warm, daylight sunlight and the door closed behind them. Miss Abercrombie gazed around the cluttered room and picked up her chart forward

  • f patient progress. Moving slowly down the line of

bens, she made parcel, precise notes on the day’s work accomplishedb y aehc patient. [...]’

A Filbert Is a Nut by Rick Raphael

slide-88
SLIDE 88

20.06.2018 37

With a furtive glance around him, he clapped the other half of the clay sphere over the filled hemisphere and then stood up. The patients lined up at the door, waiting for the walk back across the site hills to the main hospital. The attendants made a quick investigation and then unlocked the door. The group shuffled out into the warm, daylight sunlight and the door closed behind them. Miss Abercrombie gazed around the cluttered room and picked up her chart forward

  • f patient progress. Moving slowly down the line of

bens, she made parcel, precise notes on the day’s work accomplishedb y aehc patient. [...]’

A Filbert Is a Nut by Rick Raphael

slide-89
SLIDE 89
  • Unmasking can be attacked by KLD obfuscation,

20.06.2018 38

slide-90
SLIDE 90
  • Unmasking can be attacked by KLD obfuscation,
  • JS∆ is an effective authorship metric,

20.06.2018 38

slide-91
SLIDE 91
  • Unmasking can be attacked by KLD obfuscation,
  • JS∆ is an effective authorship metric,
  • important building block: length-invariant thresholds.

20.06.2018 38

slide-92
SLIDE 92
  • Unmasking can be attacked by KLD obfuscation,
  • JS∆ is an effective authorship metric,
  • important building block: length-invariant thresholds.
  • Design of an admissible heuristic search function:

20.06.2018 38

slide-93
SLIDE 93
  • Unmasking can be attacked by KLD obfuscation,
  • JS∆ is an effective authorship metric,
  • important building block: length-invariant thresholds.
  • Design of an admissible heuristic search function:
  • significant reduction of text modifications at the same effect,

20.06.2018 38

slide-94
SLIDE 94
  • Unmasking can be attacked by KLD obfuscation,
  • JS∆ is an effective authorship metric,
  • important building block: length-invariant thresholds.
  • Design of an admissible heuristic search function:
  • significant reduction of text modifications at the same effect,
  • better text quality.

20.06.2018 38

slide-95
SLIDE 95

for your attention

Thank you

Image Credits: Min An, Pexels.com