Carmela Troncoso
George Danezis
September-November 2008 Microsoft Research Cambridge/ KU Leuven(COSIC)
Bayesian Inference and Traffic Analysis Carmela Troncoso George - - PowerPoint PPT Presentation
Bayesian Inference and Traffic Analysis Carmela Troncoso George Danezis September-November 2008 Microsoft Research Cambridge/ KU Leuven(COSIC) Anonymous Communications T ell me who your friends are. .. => Anonymous
George Danezis
September-November 2008 Microsoft Research Cambridge/ KU Leuven(COSIC)
“T
ell me who your friends are. .. ” => Anonymous communications to hide communication partners
High latency systems (e.g.anonymous remailers) use
mixes [Chaum 81]: hide input/output relationship
MIX
2
MIX MIX
Attacks to mix networks
Restricted routes [Dan03] Bridging and Fingerprinting [DanSyv08] Social information:
Disclosure Attack [Kes03], Statistical Disclosure Attack [Dan03], P
erfect Matching Disclosure Attacks [T ron08]
Heuristics and specific models
3
Determine probability distributions input-output
MIX 3 MIX 2 MIX 1
A B Q C S R
) 4 1 , 8 3 , 8 3 ( ) 4 1 , 8 3 , 8 3 ( ) 2 1 , 4 1 , 4 1 ( ) , , ( C B A
2 1 2 1 B
A 2 1 2 1 B
A 2 1 4 1 4 1 C
B
A
MIX 3 MIX 2 MIX 1
A B Q C S R
) 2 1 , 4 1 , 4 1 ( ) , 2 1 , 2 1 ( ) , , ( C B A ) 2 1 , 4 1 , 4 1 (
N on trivial given observation!!
1 C
Constraints,
e.g. length=2
2 1 2 1 B
A 2 1 2 1 B
A
S enders Mixes (Threshold = 3) Receivers
How to compute probabilities How to compute probabilities systematically? ? systematically? ?
Find “hidden state”
A B Q C S R M1 M2 M3
HS
C O HS C HS C HS O C O HS ) | , Pr( ) | Pr( ) , | Pr( ) , | Pr(
Prior information T
enumerate!!
K C HS O ) , | Pr(
“hidden state”
+ Observation = P aths
A B Q C S R M1 M2 M3 A M1 M2 M3 R B M1 M3 Q C M2 S P1 P2 P3
) | Pr( ) , | Pr( ) , | Pr( C Paths K C HS O C O HS
Actually… we want marginal probabilities But… we cannot obtain them directly
j HS I C O HS Q A
HS j Q A
) ( ) , , | Pr(
A B Q C S R
) 4 1 , 8 3 , 8 3 ( ) 4 1 , 8 3 , 8 3 ( ) 2 1 , 4 1 , 4 1 ( ) , , ( C B A
2 1 2 1 B
A 2 1 2 1 B
A 2 1 4 1 4 1 C
B
A
If we obtain samples
HS
1,
HS2, HS3, HS4,…, HS
j
0 1 0 1 … 1
(A → Q)?
How does Pr(P
aths|C) look like?
Markov Chain Monte Carlo Methods
Metropolis Hastings alg
j HS I C O HS Q A
HS j Q A
) ( ) , , | Pr(
) | Pr( ) , | Pr( C Paths C O HS
) , | Pr( ~ C O HS
Length restrictions with any distribution
e.g.
uniform ( Lmin, Lmax)
N ode choice restrictions
Choose l out of the N mix node a
vailable
Choose a set
) | Pr( C l L
min max
1 ) | Pr( L L C l L
) , ( 1 ) , | Pr( l N P C l L M
mix x
) (
x set M
I
) ( ) , | Pr( ) | Pr( ) | Pr(
x set x x
M I C l L M C l L C P
x x C
P C Paths ) | Pr( ) | Pr(
Users decide independently
Unknown destinations
C S S
) ( ) , | Pr( ) | Pr( ) | Pr(
max
x set L L l x x
M I C l L M C l L C P
3
max
L
Bridging
Known nodes
N on-compliant clients (with probability )
Do not respect length restrict ions Choose l out of the N mix node a
vailable, allow repetiti ons
) (
x bridging
M I ) , (
max, min, p c p c
L L
) , ( 1 )) ( , , | Pr( l N P Path I C l L M
mix r p c x
p c
p
cp p c p c p c
P j j P i i p c i
C P p P I C P p C Paths ) | Pr( ) 1 ( )) ( , | Pr( ) | Pr(
x x C
P C Paths ) | Pr( ) | Pr(
S
Assuming we know sending profiles
O ther constraints
Unknown origin Dummies O ther mixing strategies ….
) Rec Sen Pr( ) ( ) , | Pr( ) | Pr( ) | Pr(
x x x set x x
M I C l L M C l L C P
) Rec Sen Pr(
x x
S
ample from a distribution difficult to sample from directly
3 K
ey advantages:
Requires generative model (we know how to compute it!) Good estimation of errors
N ot false positives and negatives
Systematic
HS
C O HS C HS C HS O C O HS ) | , Pr( ) | Pr( ) , | Pr( ) , | Pr(
) | Pr( ) , | Pr( C Paths K C HS O
Constructs a Markov Chain with stationary distribution
Current state Candidate state
1. Compute 2. If else if else Q
) | ( ) Pr( ) | ( ) Pr(
candidate current current current candidate candidate
HS HS Q HS HS HS Q HS
HS
candidate
HS
current
) | (
current candidate
HS HS Q ) | (
candidate current
HS HS Q
1
candidate current
HS HS
candidate current
HS HS ) 1 , ( ~ U u u
current current
HS HS
) , | Pr( C O HS
T
ransition Q : swap operation
More complicated transitions for non-compliant clients A B Q C S R M1 M2 M3
Z C Paths C O HS ) | Pr( ) , | Pr(
Pahts
candidate
Paths
current
) | (
current candidate
Paths Paths Q ) | (
candidate current
Paths Paths Q
) | ( ) Pr( ) | ( ) Pr(
candidate current current current candidate candidate
Paths Paths Q Paths Paths Paths Q Paths
Paths Paths Paths Paths Paths
Z C Paths C O HS ) | Pr( ) , | Pr(
Pahts
candidate
Paths
current
) | (
current candidate
Paths Paths Q ) | (
candidate current
Paths Paths Q
Consecutive samples dependant S
ufficiently separated
) Pr( ) | Pr(
i j i
Paths Paths Paths
Paths Paths Paths Paths
i
Paths Paths
j
Paths Paths Paths Paths Paths
P1 P2 P3 P4 1 0 1 0
(A → Q)?
j I Q A
Q A
) Pr(
Error estimation
Bernouilli distribution Prior Beta(1,1) ~ uniform Confidence intervals
,...] , , | ) Pr[Pr(
3 2 1
Paths Paths Paths Q A ) 1 ) ( , 1 ) ( ( ~ ) Pr(
Paths i Q A Paths i Q A
Path I Path I Beta Q A )] Pr( | ,... , , Pr[
3 2 1
Q A Paths Paths Paths
1.
Create an instance of a network
2.
Run the sampler
3.
Choose a target sender and a receiver
4.
Estimate probability
5.
Check if actually S en chose Rec as receiver
6.
Choose new network and g
Events should happen with the estimated probability
j Paths I
Paths j
) ( ) Rec Sen Pr(
Rec Sen
) (
Rec Sen
network I
)) ( ( ) ( ) Rec Sen Pr(
Rec Sen Rec Sen
network I E j Paths I
Paths j
j Paths I
Paths j
) (
Rec Sen
)) ( (
Rec Sen
network I E
Nmix t Nmsg Samples RAM(Mb)
3 3 10 500 16 3 3 50 500 18 5 10 100 500 19 10 20 1 000 1 000 24 10 20 10 000 1 000 125
S
ize of network and population
Results are kept in memory during simulation
N umber samples collected increases
Nmix t Nmsg iter Full analysis (min) One sample(ms)
3 3 10 6011 2.33 267.68 3 3 50 6011 2.55 306.00 5 10 100 4011 1.58 190.35 10 20 1 000 7011 3.16 379.76
O perations should be O (1)
W riting of the results on a file
Different number of iterations
T
raffic analysis is non trivial when there are constraints
Probabilistic model:
incorpor ates most attacks
N on-compliant clients
Monte Carlo Markov Chain methods to extract marginal
probabilities
Future work:
SDA based on Ba
yesian Inferenc e
Added value?
Carmela.T roncoso@ esat.kuleuven .be Microsoft technical report coming soon…
28
HS
J