Analyzing Privacy in Enterprise Packet Trace Anonymization - - PowerPoint PPT Presentation

analyzing privacy in enterprise packet trace
SMART_READER_LITE
LIVE PREVIEW

Analyzing Privacy in Enterprise Packet Trace Anonymization - - PowerPoint PPT Presentation

Bruno Ribeiro, Gerome Miklau, Don Towsley UMass Amherst Weifeng Chen California University of Pennsylvania Analyzing Privacy in Enterprise Packet Trace Anonymization Motivation Internet Enterprise (university) Packets Monitor Packet


slide-1
SLIDE 1

Bruno Ribeiro, Gerome Miklau, Don Towsley

UMass Amherst

Weifeng Chen

California University of Pennsylvania

Analyzing Privacy in Enterprise Packet Trace Anonymization

slide-2
SLIDE 2

2

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Motivation

Internet

Enterprise

(university)

Monitor

src address dest address src port dest port … 14.1.1.1 11.0.0.3 6738 80 … 18.0.0.1 11.0.0.1 2434 22 … 11.0.0.1 20.0.0.3 6913 80 …

Packets

Packet header traces

Used for networking research Many public repositories (UMass, CAIDA, LBNL, …)

Raw trace may violate user privacy

If enterprise IP addresses can be tied to individuals

slide-3
SLIDE 3

3

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Motivation

src addr. dest addr. src port dest port … 14.1.1.1 11.0.0.3 6738 80 … 11.0.0.1 20.0.0.3 7913 22 …

src addr. dest addr. src port dest port … 200.0.1.2 128.0.64.2 6738 80 … 128.0.64.0 5.0.4.5 7913 22 …

anonymization mapping

Anonymized trace

Trace repositories

Anonymize IP addresses

Two most widely used schemes

Full prefix preservation (Xu et al. , 2001) Partial prefix preservation (Pang et al. 2006)

Original trace

slide-4
SLIDE 4

4

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Adversary

Adversarial model:

De-anonymize enterprise IP addresses in the trace

  • 1. Probes (scan) enterprise network
  • 2. Collects similar information from the trace

De-anonymizes trace IPs matching (1) with (2)

slide-5
SLIDE 5

5

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Outline

Our contributions New attack on IP anonymization: Attack overview Defined as a tree editing distance problem Worst-case analysis: From a set of trace labels (information) Assesses worst-case attack Related work Conclusions

slide-6
SLIDE 6

6

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Proposed attack overview

Adversary provides:

Labeled tree constructed using anonymized trace Labeled tree constructed from probing enterprise A cost (or distance) function (to deal with “mismatched” labels)

Our algorithm finds:

All de-anonymizations that comply with prefix preservation restrictions and have minimum total cost

An instance of the tree edit distance problem

slide-7
SLIDE 7

7

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Full prefix preserving anonymization

Full prefix preservation

If two real addresses share first X bits, then the same two anonymized addresses share first X bits

It imposes restrictions on the real IP → Anonymized IP mapping

slide-8
SLIDE 8

8

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Labeled trees

Trace tree

Match sets:

00 maps to {01} 10 maps to {10, 11}

Probed tree

Web server Not a Web server Probed IP leaf labels No traffic on port 80 Trace IP leaf labels 00 01 10 11 00 01 10 11

1 1 1

Match set:

00 maps to {00, 01, 10, 11} 10 maps to {00, 01, 10, 11}

Traffic on port 80

slide-9
SLIDE 9

9

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Imperfect information

Trace tree Probed tree

Web server Not a Web server Traffic on port 80 No traffic on port 80 Backup Web server Correct mapping

Other sources of imperfect labels: Dynamic IP addresses, host shutdown, etc.

Probed IP leaf labels Trace IP leaf labels

00 01 10 11 00 01 10 11

slide-10
SLIDE 10

10

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Mapping costs

Assign a cost to map two IPs with different labels

Is zero if labels are equal

Mapping cost

Sum of all individual costs

Trace tree Probed tree

Cost = 0 Cost = 1 Cost = 1

Example:

1

Total cost = 1

slide-11
SLIDE 11

11

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Proposed attack

All minimum cost mappings (over the whole network)

Because it is prefix-preserving

Every de-anonymization limits future de-anonymizations

And our algorithm is fast

10 seconds (on this laptop) for all mappings of a network with 216 addresses Probed tree Trace tree

?

00 01 10 11 00 01 10 11

slide-12
SLIDE 12

12

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Experiment

Network: class B (64K addresses) Labels

“Active host” Active ports: FTP, SSH, Telnet, E-mail, Time, DNS, Web, POP3, SOCKS

Trace IP labels

“Active host” label – recorded any outgoing traffic “Active ports” – Recorded traffic from ports 80, 22, ….

Probed IP labels

Probed over all network “Active host” label – PING “Active ports” – TCP SYN ACK reply from ports 80, 22, …

Naïve cost function: Zero is labels are equal, one otherwise

slide-13
SLIDE 13

13

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Experiment results Trace collected: 2007, June 18th (9097 active IPs) Network probed: 2007, June 18th

0% 10% 20% 30% 40% 50% 60% 1 2 3 4 5 6 7 8 size of matching set

cumulative fraction of hosts in the trace Correct matches

Uniquely re-identified BAD Data publisher’s view

Incorrect matches

slide-14
SLIDE 14

14

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Worst-case analysis

Given a labeled trace tree Find best de-anonymization We provide an algorithm that

Obtains worst attack matching set size

For each IP address in the trace For any label mismatch cost function For any labeled probed tree

slide-15
SLIDE 15

15

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Worst-case experiment

Full prefix preservation June 18th experiment

0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 size of matching set

Naïve attack Worst-case attack Uniquely re-identified BAD Data publisher’s view

cumulative fraction of hosts in the trace

slide-16
SLIDE 16

16

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Partial prefix preservation

Does not retain part of the address structure Used in Pang et al., 2006 Solution also formulated as an instance of the tree edit distance problem

Probed tree root Probed tree root 8 bits 8 bits … Anonymized tree root Anonymized tree root 8 bits 8 bits 8 bits 8 bits … Anonymization mapping 8 bits 8 bits

Up to 256 addresses

slide-17
SLIDE 17

17

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 size of matching set

Partial vs. Full prefix preservation

Intuition: Partial is much safer than full prefix preservation

Worst case: Full prefix preservation Worst case: Partial prefix preservation BAD Data publisher’s view

cumulative fraction of hosts in the trace

slide-18
SLIDE 18

18

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Worst-case analysis (II)

Uniquely re-identified

Full prefix preservation: 2713 active IP addresses in the trace Partial prefix preservation: 113 active IP addresses in the trace

Partial prefix preservation is safer but not completely safe

slide-19
SLIDE 19

19

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Related work

“Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Traces”, Scott Coull, Charles Wright, Fabian Monrose, Michael Collins and Michael Reiter, NDSS 2007 An attack on partial prefix preservation “Taming the Devil: Techniques for Evaluating Anonymized Network Data”, Scott Coull, Charles Wright, Fabian Monrose, Angelos Keromytis and Michael Reiter, NDSS 2008 Comes right after this talk ☺

slide-20
SLIDE 20

20

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Conclusions

Attack

Include global mapping restrictions An instance of the tree edit distance problem Indicates that full prefix preservation has flaws

Impact of late probing on the de-anonymization

Worst-case analysis

Can help future anonymization schemes

A tool for data publishers

Experiments indicate that:

Partial is much safer than full prefix preservation But still not completely safe

slide-21
SLIDE 21

21

Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

Thanks

Jim Kurose, UMass Amherst Edmundo de Souza e Silva, Federal University of Rio de Janeiro Kyoungwon Suh, Illinois State University Anonymous NDSS’08 reviewers Neils Provos, Google Inc.