You cannot hide for long: De-anonymization of real-world dynamic - - PowerPoint PPT Presentation

you cannot hide for long de anonymization of real world
SMART_READER_LITE
LIVE PREVIEW

You cannot hide for long: De-anonymization of real-world dynamic - - PowerPoint PPT Presentation

You cannot hide for long: De-anonymization of real-world dynamic behaviour George Danezis (University College London) Carmela Troncoso (Gradiant) Privacy beyond confjdentiality Common belief: if I encrypt my data, then the data is


slide-1
SLIDE 1

You cannot hide for long: De-anonymization of real-world dynamic behaviour

George Danezis (University College London) Carmela Troncoso (Gradiant)

slide-2
SLIDE 2

Privacy beyond confjdentiality

  • Common belief: “if I encrypt my data, then the data is

private”

  • Encryption works and gets more and more effjcient!
  • But does not hide all data
  • Origin and destination
  • Timing
  • Frequency
  • Location
slide-3
SLIDE 3

Anonymization

  • Decouple user identity from actions
  • Enabler for privacy-preserving technologies
  • Anonymous credentials
  • eVoting
  • Privacy-preserving statistics computation

Anonymizer Anonymizer

slide-4
SLIDE 4

Anonymity in reality

  • Diffjcult to guarantee perfect anonymity due to constraints
  • Observations allow for inferences (e.g., behavioral profjles)

Anonymizer behaviour Prior info on users Alice Bob Alice is speaking to Bob with probability X

State of the art limitation: static behavior

slide-5
SLIDE 5

A model for dynamic behaviour

  • Users
  • Anonymizer
  • Divided in batches (n batches per epoch)
  • Perfect anonymity

t t

Sends messages to at rate λAB Sends messages to at rate λAO Send messages to at rate λOB Send messages to at rate λOO Dynamism: Epochs t of stationary behaviour Profjle evolution probability VA VO VB VO’

) ( Pois ) ( Pois ) ( Pois ) ( Pois

' OO AO O OB AB B OO OB O AO AB A

V V V V λ λ λ λ λ λ λ λ + ← + ← + ← + ←

Visible Hidde n Given observation… What is λAB?

slide-6
SLIDE 6

Sequential Monte Carlo aka. Particle Filters

  • Inferring hidden parameters of sequential models
  • Our case: modeling λAB at t depends on λAB at t-1
  • Core idea:
  • Particles representing sample hidden states (λAB , λOB)
  • Distributed following posterior distribution given

evidence (VX)

  • From Bayes theorem

Allow for statistic computation (mean, std, …) of hidden variables

)] , Pr[( ) | ( ) | ( ] | ) , Pr[(

1 1 1 * * * − − −

t t t t t t

OB AB AB AB OB AB

E V L V λ λ λ λ λ λ λ

Prob at epoch t Likelihood of obs. given hidden state Prob evolving to current λAB Prior (epoch t-1)

slide-7
SLIDE 7

T

  • y example

t-1 t

t t

VA VO VB VO’

λt λt λt λt λt-1 λt-1 λt-1 λt-1

  • 1. Propose new

particles

  • 2. Likelihood

given Obs and previous state

Weight particles: i. Likelihood

  • ii. Evolution
  • iii. Proposal
  • 3. Re-sample

] | ) , Pr[(

*

V

t t

OB AB λ

λ

slide-8
SLIDE 8

In pseudocode

Take obs in all epochs Initialize particles Propose current state given

  • bservation

Likelihood of

  • bservation given

current and previous state Reweighting of proposal likelihood given proposal distributions Resampling to obtain new particles according to posterior All types of samples

slide-9
SLIDE 9
  • How likely is an observation V* given sending

rates λ*

) ( Pois ) ( Pois ) ( Pois ) ( Pois

' OO AO O OB AB B OO OB O AO AB A

V V V V λ λ λ λ λ λ λ λ + ← + ← + ← + ←

The likelihood function

t t

VA VO VB VO’ Visible Hidde n

) | (

* * λ

V L

Prob of total volume in epoch given λ* (just Poisson) Prob of each of the rounds pab is just the probability A sent to B pab=(λAB/ λAB+ λOB) Binomial

slide-10
SLIDE 10
  • Probability of λAB at t given λAB at t-1
  • T

wo stages 1) Probability transitions silent-communication 2) Probability of given difgerence: mixture with heavy tails

The profjle evolution probability

) | (

1 − t AB t AB

E λ λ

slide-11
SLIDE 11
  • Three datasets:
  • eMail: Enron dataset ~0.5M emails, 150 users.
  • Mailing list: Indymedia ~300K posts from 28237 senders to

693 lists

  • Location: Gowala dataset ~6.5M checkins from ~200K users
  • Parameters empirically inferred using EM
  • T

wo sets

  • Communication
  • Silent
  • Anonymity system
  • 1 day delay (anonymity vs delay trade-ofg given 1 week

epochs)

  • Thresholds: eMail/Mailing ~100 Location ~15K

Evaluation

Prio r silen t Transitions Stop talking Stay silent Mixtur e evoluti

  • n
slide-12
SLIDE 12
  • State of the art: Statistical Disclosure Attack
  • Background traffjc:
  • Use background to estimate volume in her rounds
  • Assumes static behaviour: short and long term

Evaluation - an example trace (Avg(Batch)= 244)

messages to

slide-13
SLIDE 13

Evaluation – estimation accuracy as Squared error

Epoch Trace

MSEComm =12 84 13 3.7 K 20 83 K 0.7 2.8 0.8 1.2 2.3 K 36 MSESilent

slide-14
SLIDE 14

Evaluation – communication detection

Are Alice and Bob communicating? Base rate fallacy! Use particles distribution Use rate directly

slide-15
SLIDE 15

Conclusions

  • Structured model for traffjc analysis based on known

Bayesian inference techniques

  • easy to extend
  • allow assessment of inference quality
  • avoid base rate fallacy
  • Attacks on real world traces
  • can be efgective for rather low action rates
  • can be efgective over a much shorter period of time than

previously thought

  • can be efgective for secure confjgurations of the anonymity

system

  • Rethink current evaluations and fjgures of merit
slide-16
SLIDE 16

Thanks!! ctroncoso@gradiant.org g.danezis@ucl.ac.uk