[PPT] - Rare event simulation for a static distribution F . C erou, P . PowerPoint Presentation

SLIDE 1

1.1

Rare event simulation for a static distribution

F . C´ erou, P . Del Moral, T. Furon, A. Guyader Resim 2008, Rennes

This work was partially supported by the French Agence Nationale de la Recherche (ANR), project Nebbiano, number ANR-06-SETI-009

SLIDE 2

1.2

Introduction

X ∼ µ, with µ probability measure on X ( Rd, or a discrete space) We know how to draw samples from µ Given a function S : X − → R, we look at the rare event R = {S(X) > τ} We want to compute µ(R) =

(X ∈ R), and draw samples from

µR(dx) =

1 µ(R)

R(x)µ(dx)

SLIDE 3

1.3

Motivation and examples

Watermarking of digital contents: imbedding/hiding information in a digital file (typically audio or video), such that the change is not noticed, and very hard to remove (robust to any kind of transformation, coding, compression...) Used for: copy protection or fingerprinting

No Watermark W ∈

n

Image I ∈ I X Encoding Detection Yes

Figure 1: Watermarking Our rare event occurs when the detection box anwers “yes” but the content is not watermarked

SLIDE 4

1.4

Zero-bit watermarking

u region detection

Figure 2: Zero-bit watermarking

u ∈ Rd is a fixed and normalized secret vector.
A content X is deemed watermarked if S(X) = X,u

X > τ.

Classic Assumption : An unwatermarked content X has a radially symmetric
pdf. As S is also radially symmetric, we choose X ∼ N(0, I)
False detection : Pfd =

(S(X) > τ|X unwatermarked)

Toy example used to validate the algorithm

SLIDE 5

1.5

Probabilistic fingerprinting codes

Fingerprinting:

Principle : Some personal identification sequence Fi ∈ {0, 1}m is hidden in

the copy of each user.

Benefit : Find a dishonest user via his fingerprint
False Detections : Accusing an innocent (false alarm) or accusing none of

the colluders (false negative) Tardos probabilistic codes:

Fingerprint : X = [X1, . . . , Xm], Xi ∼ B(pi) and pi ∼ f(p) (same pi’s for all

users)

Pirated Copy : y = [y1, . . . , ym] ∈ {0, 1}m
Accusation procedure : S(X) = m

i=1 yigi(Xi) ≷ τ

The choice of f and the gi’s is crucial (but not discussed here)

SLIDE 6

1.6

Collusions

?

Accusation Procedure

I ′ Image I Image I I1 IN Image I Fingerprint F1 Fingerprint Fi Fingerprint FN bad boys

Figure 3: Collusion Several users compare their digital content: they are not exactly the same... Stategies to build up a new file, different from all the users’ ones:

majority vote
random choice on parts
put the detected bits equal to 0
...

SLIDE 7

1.7

Multilevel approach

pdf of S(X) L1 L2 τ = LM Li . . . . . . p2 =

(S(X) > L2|S(X) > L1)

R

Figure 4: Multilevel

Ingredients : fix M and L1 < · · · < LM = τ so that each

pi =

(S(X) > Li|S(X) > Li−1) is not too small.

Bayes decomposition : α = p1p2 . . . pM.
Unrealistic case : suppose you can estimate each pi independently with

classic Monte-Carlo : pi ≈ ˆ pi = Ni/N.

Multilevel Estimator : ˆ

αN = ˆ p1ˆ p2 . . . ˆ pM.

SLIDE 8

1.8

The Shaker

Recall : X ∼ µ on X.
Ingredient : a µ reversible transition kernel K(x, dx′) on X :

∀(x, x′) ∈ X2 µ(dx)K(x, dx′) = µ(dx′)K(x′, dx).

Consequence : µK = µ.
Example : if X ∼ N(0, 1) then X ′ = X+σW

√ 1+σ2 ∼ N(0, 1), i.e.

K(x, dx′) ∼ N(

x √ 1+σ2, σ2 1+σ2)(dx′) is a “good shaker”.

x

√

1+σ2

M(x, .) = N (

x

√

1+σ2 , σ2 1+σ2 )

SLIDE 9

1.9

Feynman-Kac representation

Ak = S−1(]Lk, +∞[) M K

k (x, dy) = K(x, dy) Ak(y) + K(x, Ac k) δx(dy)

µk(dx) =

1 µ(Ak)

Ak(x) µ(dx) the normalized restriction of µ on Ak

µk invariant by M K

k

Xk Markov chain with initial distribution µ and transitions M K

k

For every test function ϕ, for k ∈ {0, . . . , n}, we have the following Feynman-Kac representation µk+1(ϕ) = E[ϕ(Xk) k

j=0

Aj+1(Xj)]

E[k

j=0

Aj+1(Xj)]

.

SLIDE 10

1.10

Algorithm

Initialization : Simulate an i.i.d. sample ξ1

0, . . . , ξN 0 ∼ µ.

Estimate ˆ

p1 = 1

N

A1(ξ1

0)

Selection : ˆ

ξi

0 = ξi 0 if S(ξi 0) > L1, else pick at random among the N1 selected

particles.

Mutation : ˜

ξi

0 ∼ M(ˆ

ξi

0, dx′) and

∀i ∈ {1, . . . , N} ξi

1 =

   ˜ ξi

1

if S(˜ ξi

1) > L1

ˆ ξi

1

if S(˜ ξi

1) ≤ L1

Consider next level and iterate until the rare event is reached

SLIDE 11

1.11

Algorithm

L1 ˆ p1 = 4

8

SLIDE 12

1.12

Algorithm

L1

SLIDE 13

1.13

Algorithm

L1

SLIDE 14

1.14

Algorithm

L1

SLIDE 15

1.15

Algorithm

L1 L2 ˆ p2 = 3

8

SLIDE 16

1.16

Algorithm

L1 L2

SLIDE 17

1.17

Implementation issues

Choice of K Depends on the model: Metropolis-Hastings, Gibbs sampler, Gaussian case (zero-bit watermarking), i.i.d. on some random sites (Tardos codes) Trade-off between two drawbacks :

“shaking effect” too large : most proposed mutations are refused.
“shaking effect” too small : particles almost don’t move.

Levels Lk Adaptive levels with fixed rate of success: p0 quantile on S(ξj

k) to set

Lk+1, p0 = 0.75 or 0.8 is a good choice Less dependent sample We can iterate the kernel M K

k several times to improve

the variability of the particles. From well known results on Metropolis-Hastings, the sample is getting more and more independent. Rule: iterate until 90 or 95% have actually moved to an accepted transition

SLIDE 18

1.18

Asymptotic variance

Best achievable asymptotic variance:

Multilevel Estimator : ˆ

αN = ˆ p1ˆ p2 . . . ˆ pM.

Fluctuations : If the ˆ

pi’s are independent, then √ N · ˆ αN − α α

L

− − − →

N→∞ N(0, M

i=1

1 − pi pi ).

Constrained Minimization :

arg min

p1,...,pM M

i=1

1 − pi pi s.t.

M

i=1

pi = α.

Optimum : p1 = · · · = pM = α1/M.

SLIDE 19

1.19

Simulations : The Model

u region detection

The model : X ∼ N(0, I20).
Rare event : α =

(X,u

X > 0.95).

Numerical computation : α = 4.704 · 10−11.
Parameter : p = 3/4 α = r × pM = .83 × (3/4)82.

SLIDE 20

1.20

Numerical results

credit: V. Bahuon

2

10

3

10

4

10

5

10

2

10

1

10 10

Number N of particles

Figure 5: Relative standard deviation.

SLIDE 21

1.21

2

10

3

10

4

10

5

10

4

10

3

10

2

10

1

10 10

Number N of particles

Figure 6: Relative bias.

SLIDE 22

1.22

Rare event simulation for a static distribution

F . C´ erou, P . Del Moral, T. Furon, A. Guyader Resim 2008, Rennes

This work was partially supported by the French Agence Nationale de la Recherche (ANR), project Nebbiano, number ANR-06-SETI-009

Introduction

X ∼ µ, with µ probability measure on X ( Rd, or a discrete space) We know how to draw samples from µ Given a function S : X − → R, we look at the rare event R = {S(X) > τ} We want to compute µ(R) =

µR(dx) =

Motivation and examples

Watermarking of digital contents: imbedding/hiding information in a digital file (typically audio or video), such that the change is not noticed, and very hard to remove (robust to any kind of transformation, coding, compression...) Used for: copy protection or fingerprinting

No Watermark W ∈

Image I ∈ I X Encoding Detection Yes

Figure 1: Watermarking Our rare event occurs when the detection box anwers “yes” but the content is not watermarked

Zero-bit watermarking

Figure 2: Zero-bit watermarking

Toy example used to validate the algorithm

Probabilistic fingerprinting codes

Fingerprinting:

the copy of each user.

the colluders (false negative) Tardos probabilistic codes:

users)

The choice of f and the gi’s is crucial (but not discussed here)

Collusions

?

Figure 3: Collusion Several users compare their digital content: they are not exactly the same... Stategies to build up a new file, different from all the users’ ones:

Multilevel approach

Figure 4: Multilevel

pi =

classic Monte-Carlo : pi ≈ ˆ pi = Ni/N.

αN = ˆ p1ˆ p2 . . . ˆ pM.

The Shaker

∀(x, x′) ∈ X2 µ(dx)K(x, dx′) = µ(dx′)K(x′, dx).

K(x, dx′) ∼ N(

Feynman-Kac representation

Ak = S−1(]Lk, +∞[) M K

µk(dx) =

µk invariant by M K

Xk Markov chain with initial distribution µ and transitions M K

For every test function ϕ, for k ∈ {0, . . . , n}, we have the following Feynman-Kac representation µk+1(ϕ) = E[ϕ(Xk) k

E[k

.

Algorithm

p1 = 1

ξi

particles.

ξi

ξi

∀i ∈ {1, . . . , N} ξi

   ˜ ξi

if S(˜ ξi

ˆ ξi

if S(˜ ξi

Algorithm

Algorithm

Algorithm

Algorithm

Algorithm

Algorithm

Implementation issues

Choice of K Depends on the model: Metropolis-Hastings, Gibbs sampler, Gaussian case (zero-bit watermarking), i.i.d. on some random sites (Tardos codes) Trade-off between two drawbacks :

Levels Lk Adaptive levels with fixed rate of success: p0 quantile on S(ξj

Lk+1, p0 = 0.75 or 0.8 is a good choice Less dependent sample We can iterate the kernel M K

the variability of the particles. From well known results on Metropolis-Hastings, the sample is getting more and more independent. Rule: iterate until 90 or 95% have actually moved to an accepted transition

Asymptotic variance

Best achievable asymptotic variance:

αN = ˆ p1ˆ p2 . . . ˆ pM.

pi’s are independent, then √ N · ˆ αN − α α

− − − →

1 − pi pi ).

arg min

1 − pi pi s.t.

pi = α.

Simulations : The Model

Numerical results

credit: V. Bahuon

Figure 5: Relative standard deviation.

Figure 6: Relative bias.

Perspectives

Find other similar applications (e.g. probabilistic counting algotihms in a large discrete set) Work in progress: non asymptotic variance estimates