Taming the Devil: Techniques for Evaluating Anonymized Network Data - - PowerPoint PPT Presentation

taming the devil
SMART_READER_LITE
LIVE PREVIEW

Taming the Devil: Techniques for Evaluating Anonymized Network Data - - PowerPoint PPT Presentation

Taming the Devil: Techniques for Evaluating Anonymized Network Data Scott Coull 1 , Charles Wright 1 , Angelos Keromytis 2 , Fabian Monrose 1 , Michael Reiter 3 Johns Hopkins University 1 Columbia University 2 University of North Carolina - Chapel


slide-1
SLIDE 1

Taming the Devil:

Techniques for Evaluating Anonymized Network Data

Scott Coull1, Charles Wright1, Angelos Keromytis2, Fabian Monrose1, Michael Reiter3 Johns Hopkins University1 Columbia University2 University of North Carolina - Chapel Hill3

slide-2
SLIDE 2

2

The Network Data Sanitization Problem

  • Anonymize a packet trace or flow log s.t.:
  • 1. Researchers gain maximum utility
  • 2. Adversaries w/ auxiliary information do not learn

sensitive information

Network Data Anon. Network Data

Anonymization

slide-3
SLIDE 3

3

Methods of Sanitization

Pseudonyms for IPs

Strict prefix-preserving [FXAM04] Partial prefix-preserving [PAPL06] Transaction-specific [OBA05]

Other data fields anonymized in reaction to

attacks

e.g., time stamps are quantized due to clock skew

attack [KBC05]

slide-4
SLIDE 4

4

Notable Attacks

Several active and passive attacks exist…

Active probing [BA05, BAO05,KAA06] Host profiling [CWCMR07,RCMT08] Identifying web pages [KAA06, CCWMR07]

slide-5
SLIDE 5

5

The Underlying Problem

Attacks can be generalized as follows:

  • 1. Identifying information is encoded in the

anonymized data

  • Host behaviors for profiling attacks
  • 2. Adversary has external information on true

identities

  • Public information on services offered by a host
  • 3. Adversary maps true identities to pseudonyms
slide-6
SLIDE 6

6

Our Goals

  • 1. Find objects at risk of deanonymization
  • 2. Compare anonymization systems and

policies

  • 3. Model hypothetical attack scenarios
  • Focus on ‘natural’ sources of information leakage
slide-7
SLIDE 7

7

Related Work

Definitions of Anonymity

k-Anonymity [SS98], l-Diversity [MGKV05], and

t-Closeness[LLV07]

Information theoretic metrics

Analysis of anonymity in mixnets [SD02][DSCP02]

An orthogonal method for evaluating network

data [RCMT08]

slide-8
SLIDE 8

8

Outline

Adversarial Model Defining Objects Auxiliary Information Calculating Anonymity Evaluation

slide-9
SLIDE 9

9

Adversarial Model

Adversary’s goal: map an anonymized

  • bject to its unanonymized counterpart

Anon. Network Data Network Data

50.20.2.1 10.0.0.2 10.0.0.100 10.0.0.1

20% 75% 5%

slide-10
SLIDE 10

10

Defining Objects

Consider network data as a database

n rows, m columns Each row is a packet (or flow) record Each column is a data field (e.g., source port)

Fields can induce a probability distribution

Sample space defined by values in the field Represented by random variables in our analysis

slide-11
SLIDE 11

11

Defining Objects

ID Local IP Local Port Remote IP Remote Port 1 10.0.0.1 80 192.168.2.5 1052 2 10.0.0.2 3069 10.0.1.5 80 3 10.0.0.1 80 192.168.2.10 4059 4 10.0.0.1 21 192.168.6.11 5024

slide-12
SLIDE 12

12

Defining Objects

Local IP 10.0.0.1 10.0.0.2 10.0.0.1 10.0.0.1

10.0.0.1 10.0.0.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.25 0.75

slide-13
SLIDE 13

13

Defining Objects

Combinations of fields can leak information even if

the fields are indistinguishable in isolation

A real-world adversary has a directed plan of attack on a

certain subset of fields

Our analysis must consider a much larger set of potential

fields

Use feature selection methods based on mutual

information to find related fields

Limits computational requirements

slide-14
SLIDE 14

14

Defining Objects

A feature is a group of correlated fields

Calculate normalized mutual information Group into pairs if mutual information > t Merge groups that share a field in to a feature

A feature distribution is the joint distribution

  • ver the fields in the feature
slide-15
SLIDE 15

15

Defining Objects

ID Local IP Local Port Remote IP Remote Port 1 10.0.0.1 80 192.168.2.5 1052 2 10.0.0.2 3069 10.0.1.5 80 3 10.0.0.1 80 192.168.2.10 4059 4 10.0.0.1 21 192.168.6.11 5024

slide-16
SLIDE 16

16

Defining Objects

Local IP Local Port 10.0.0.1 80 10.0.0.2 3069 10.0.0.1 80 10.0.0.1 21

10.0.0.1, 80 10.0.0.1, 21

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.25 0.5 0.25 10.0.0.2, 3069

slide-17
SLIDE 17

17

Defining objects

An object is a set of feature distributions over

records produced due its presence

e.g., host objects – feature distributions induced

by records sent from or received by a given host

slide-18
SLIDE 18

18

Defining Objects

ID Local IP Local Port Remote IP Remote Port 1 10.0.0.1 80 192.168.2.5 1052 2 10.0.0.2 3069 10.0.1.5 80 3 10.0.0.1 80 192.168.2.10 4059 4 10.0.0.1 21 192.168.6.11 5024

slide-19
SLIDE 19

19

Defining Objects

ID Local IP Local Port Remote IP Remote Port 1 10.0.0.1 80 192.168.2.5 1052 3 10.0.0.1 80 192.168.2.10 4059 4 10.0.0.1 21 192.168.6.11 5024

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66

slide-20
SLIDE 20

20

Defining Objects

ID Local IP Local Port Remote IP Remote Port 1 10.0.0.1 80 192.168.2.5 1052 3 10.0.0.1 80 192.168.2.10 4059 4 10.0.0.1 21 192.168.6.11 5024

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.1

slide-21
SLIDE 21

21

Adversarial Model

Anon. Network Data Network Data

50.20.2.1 10.0.0.2 10.0.0.100 10.0.0.1

20% 75% 5%

slide-22
SLIDE 22

22

Adversarial Model

Anon. Network Data Network Data

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

50.20.2.1

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.2

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.100

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.1

20% 75% 5%

slide-23
SLIDE 23

23

Auxiliary Information

Auxiliary information captures the

adversary’s external knowledge

Initially, adversary only has knowledge obtained

from meta-data

As adversary deanonymizes objects, new

knowledge is gained

Used to iteratively refine mapping between

anonymized and unanonymized objects

slide-24
SLIDE 24

24

Auxiliary Information

Local IP: Prefix-Preserving

50.20.2.1 50.20.2.2 50.20.2.3

Anonymized Values {10.0.0.1, …, 10.0.0.255} {10.0.0.1, …, 10.0.0.255} {10.0.0.1, …, 10.0.0.255}

Unanonymized Values

slide-25
SLIDE 25

25

Auxiliary Information

Local IP: Prefix-Preserving

50.20.2.1 50.20.2.2 50.20.2.3

Anonymized Values {10.0.0.1} {10.0.0.2, 10.0.0.3} {10.0.0.2, 10.0.0.3}

Unanonymized Values

slide-26
SLIDE 26

26

Adversarial Model

Anon. Network Data Network Data

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

50.20.2.1

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.2

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.100

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.1

20% 75% 5%

slide-27
SLIDE 27

27

Adversarial Model

Anon. Network Data Network Data

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

50.20.2.1

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.2

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.100

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.1

20% 75% 5%

19 32 50

{1, …, 1024} {1, …, 1024} {1, …, 1024}

slide-28
SLIDE 28

28

Adversarial Model

Anon. Network Data Network Data

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

50.20.2.1

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.2

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.100

10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 192.168.2.5, 1052 192.168.2.10, 4059 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 192.168.6.11, 5024 0.33 0.33

10.0.0.1

10% 88% 2%

19 32 50

{80} {1, …, 1024} {21}

slide-29
SLIDE 29

29

Calculating Anonymity

Compare each feature distribution of

anonymized object against all unanonymized

  • bjects

Use L1 similarity measure as a count to

approximate a probability distribution

Use information entropy of the distribution as

  • bject anonymity with respect to the feature

Auxiliary information dictates how the features

are compared

slide-30
SLIDE 30

30

Calculating Anonymity

  • Sum of entropy across all features gives us the
  • verall object anonymity
  • Assuming features are independent due to mutual

information correlation criterion

  • Calculate conditional anonymity of an object via

a greedy algorithm

1.

Choose lowest entropy object and assume it has been deanonymized

2.

Reverse anonymization to learn mappings

3.

Recalculate object anonymity with new auxiliary information

slide-31
SLIDE 31

31

Evaluation

Capture flow logs at the edge of the JHUISI network

24 hours of data 27,753 flows 237 hosts on three subnets Anonymized with tcpmkpub [PAPL06]

Analysis of Host Objects

Defined by unique Local IPs 19 features generated from the fields:

  • Start time, end time, local IP, local port, local size, remote IP,

remote port, remote size, and protocol

slide-32
SLIDE 32

32

Evaluation

CDF of Overall Entropy:

slide-33
SLIDE 33

33

Evaluation

CDF of three worst features:

slide-34
SLIDE 34

34

Evaluation

Comparison of prefix-preserving schemes

using conditional anonymity:

CryptoPAn [FXAM04] – if n bits of a prefix are

shared in the unanonymized IP, n bits will be shared in the anonymized IP

Pang et al [PAPL06] – use pseudorandom

permutation to anonymize host and subnet portions separately

slide-35
SLIDE 35

35

Evaluation

CryptoPAn vs. Pang et al:

slide-36
SLIDE 36

36

Evaluation

Conditional anonymity can also be used to

evaluate the impact of known attacks

Simulate the behavioral profiling attack

[CWCMR07]

Determine the hosts that are susceptible Determine the impact of deanonymizing those

hosts on those that remain

slide-37
SLIDE 37

37

Conclusion

Privacy risks are due to information encoded

within the anonymized network data

Provide one of the first methods for evaluating

anonymized data for information leakage

Discover objects at risk of deanonymization Compare anonymization policies and techniques Simulate hypothetical attack scenarios

slide-38
SLIDE 38

38

Calculating Anonymity

50.20.2.1, 19 50.20.2.1, 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66 10.0.0.1, 80 10.0.0.1, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66

Local IP, Local Port Feature

Anonymized Object: 50.20.2.1 Unanonymized Object: 10.0.0.1

  • Unanon. Object

10.0.0.1 Similarity 2.0

slide-39
SLIDE 39

39

10.0.0.2, 25 10.0.0.2, 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 0.5

Calculating Anonymity

50.20.2.1, 19 50.20.2.1, 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66

Local IP, Local Port Feature

Anonymized Object: 50.20.2.1 Unanonymized Object: 10.0.0.2

  • Unanon. Object

10.0.0.1 Similarity 2.0

  • Unanon. Object

10.0.0.1 10.0.0.2 Similarity 2.0 1.66

slide-40
SLIDE 40

40

10.0.0.100, 80 10.0.0.100, 25 10.0.0.100, 21

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.25 0.45 0.30

Calculating Anonymity

50.20.2.1, 19 50.20.2.1, 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.33 0.66

Local IP, Local Port Feature

Anonymized Object: 50.20.2.1 Unanonymized Object: 10.0.0.100

  • Unanon. Object

10.0.0.1 10.0.0.2 Similarity 2.0 1.66

  • Unanon. Object

10.0.0.1 10.0.0.2 10.0.0.100 Similarity 2.0 1.66 1.51

  • Unanon. Object

10.0.0.1 10.0.0.2 10.0.0.100 Similarity 38.7% 32.1% 29.2%