Rare event simulation for a static distribution F . C erou, P . - - PowerPoint PPT Presentation

rare event simulation for a static distribution
SMART_READER_LITE
LIVE PREVIEW

Rare event simulation for a static distribution F . C erou, P . - - PowerPoint PPT Presentation

1.1 Rare event simulation for a static distribution F . C erou, P . Del Moral, T. Furon, A. Guyader Resim 2008, Rennes This work was partially supported by the French Agence Nationale de la Recherche (ANR), project Nebbiano, number


slide-1
SLIDE 1

1.1

Rare event simulation for a static distribution

F . C´ erou, P . Del Moral, T. Furon, A. Guyader Resim 2008, Rennes

This work was partially supported by the French Agence Nationale de la Recherche (ANR), project Nebbiano, number ANR-06-SETI-009

slide-2
SLIDE 2

1.2

Introduction

X ∼ µ, with µ probability measure on X ( Rd, or a discrete space) We know how to draw samples from µ Given a function S : X − → R, we look at the rare event R = {S(X) > τ} We want to compute µ(R) =

(X ∈ R), and draw samples from

µR(dx) =

1 µ(R)

R(x)µ(dx)
slide-3
SLIDE 3

1.3

Motivation and examples

Watermarking of digital contents: imbedding/hiding information in a digital file (typically audio or video), such that the change is not noticed, and very hard to remove (robust to any kind of transformation, coding, compression...) Used for: copy protection or fingerprinting

No Watermark W ∈

n

Image I ∈ I X Encoding Detection Yes

Figure 1: Watermarking Our rare event occurs when the detection box anwers “yes” but the content is not watermarked

slide-4
SLIDE 4

1.4

Zero-bit watermarking

u region detection

Figure 2: Zero-bit watermarking

  • u ∈ Rd is a fixed and normalized secret vector.
  • A content X is deemed watermarked if S(X) = X,u

X > τ.

  • Classic Assumption : An unwatermarked content X has a radially symmetric
  • pdf. As S is also radially symmetric, we choose X ∼ N(0, I)
  • False detection : Pfd =
(S(X) > τ|X unwatermarked)

Toy example used to validate the algorithm

slide-5
SLIDE 5

1.5

Probabilistic fingerprinting codes

Fingerprinting:

  • Principle : Some personal identification sequence Fi ∈ {0, 1}m is hidden in

the copy of each user.

  • Benefit : Find a dishonest user via his fingerprint
  • False Detections : Accusing an innocent (false alarm) or accusing none of

the colluders (false negative) Tardos probabilistic codes:

  • Fingerprint : X = [X1, . . . , Xm], Xi ∼ B(pi) and pi ∼ f(p) (same pi’s for all

users)

  • Pirated Copy : y = [y1, . . . , ym] ∈ {0, 1}m
  • Accusation procedure : S(X) = m

i=1 yigi(Xi) ≷ τ

The choice of f and the gi’s is crucial (but not discussed here)

slide-6
SLIDE 6

1.6

Collusions

?

Accusation Procedure

I ′ Image I Image I I1 IN Image I Fingerprint F1 Fingerprint Fi Fingerprint FN bad boys

Figure 3: Collusion Several users compare their digital content: they are not exactly the same... Stategies to build up a new file, different from all the users’ ones:

  • majority vote
  • random choice on parts
  • put the detected bits equal to 0
  • ...
slide-7
SLIDE 7

1.7

Multilevel approach

pdf of S(X) L1 L2 τ = LM Li . . . . . . p2 =

(S(X) > L2|S(X) > L1)

R

Figure 4: Multilevel

  • Ingredients : fix M and L1 < · · · < LM = τ so that each

pi =

(S(X) > Li|S(X) > Li−1) is not too small.
  • Bayes decomposition : α = p1p2 . . . pM.
  • Unrealistic case : suppose you can estimate each pi independently with

classic Monte-Carlo : pi ≈ ˆ pi = Ni/N.

  • Multilevel Estimator : ˆ

αN = ˆ p1ˆ p2 . . . ˆ pM.

slide-8
SLIDE 8

1.8

The Shaker

  • Recall : X ∼ µ on X.
  • Ingredient : a µ reversible transition kernel K(x, dx′) on X :

∀(x, x′) ∈ X2 µ(dx)K(x, dx′) = µ(dx′)K(x′, dx).

  • Consequence : µK = µ.
  • Example : if X ∼ N(0, 1) then X ′ = X+σW

√ 1+σ2 ∼ N(0, 1), i.e.

K(x, dx′) ∼ N(

x √ 1+σ2, σ2 1+σ2)(dx′) is a “good shaker”.

x

x

1+σ2

M(x, .) = N (

x

1+σ2 , σ2 1+σ2 )

slide-9
SLIDE 9

1.9

Feynman-Kac representation

Ak = S−1(]Lk, +∞[) M K

k (x, dy) = K(x, dy) Ak(y) + K(x, Ac k) δx(dy)

µk(dx) =

1 µ(Ak)

Ak(x) µ(dx) the normalized restriction of µ on Ak

µk invariant by M K

k

Xk Markov chain with initial distribution µ and transitions M K

k

For every test function ϕ, for k ∈ {0, . . . , n}, we have the following Feynman-Kac representation µk+1(ϕ) = E[ϕ(Xk) k

j=0

Aj+1(Xj)]

E[k

j=0

Aj+1(Xj)]

.

slide-10
SLIDE 10

1.10

Algorithm

  • Initialization : Simulate an i.i.d. sample ξ1

0, . . . , ξN 0 ∼ µ.

  • Estimate ˆ

p1 = 1

N

  • A1(ξ1

0)

  • Selection : ˆ

ξi

0 = ξi 0 if S(ξi 0) > L1, else pick at random among the N1 selected

particles.

  • Mutation : ˜

ξi

0 ∼ M(ˆ

ξi

0, dx′) and

∀i ∈ {1, . . . , N} ξi

1 =

   ˜ ξi

1

if S(˜ ξi

1) > L1

ˆ ξi

1

if S(˜ ξi

1) ≤ L1

  • Consider next level and iterate until the rare event is reached
slide-11
SLIDE 11

1.11

Algorithm

L1 ˆ p1 = 4

8

slide-12
SLIDE 12

1.12

Algorithm

L1

slide-13
SLIDE 13

1.13

Algorithm

L1

slide-14
SLIDE 14

1.14

Algorithm

L1

slide-15
SLIDE 15

1.15

Algorithm

L1 L2 ˆ p2 = 3

8

slide-16
SLIDE 16

1.16

Algorithm

L1 L2

slide-17
SLIDE 17

1.17

Implementation issues

Choice of K Depends on the model: Metropolis-Hastings, Gibbs sampler, Gaussian case (zero-bit watermarking), i.i.d. on some random sites (Tardos codes) Trade-off between two drawbacks :

  • “shaking effect” too large : most proposed mutations are refused.
  • “shaking effect” too small : particles almost don’t move.

Levels Lk Adaptive levels with fixed rate of success: p0 quantile on S(ξj

k) to set

Lk+1, p0 = 0.75 or 0.8 is a good choice Less dependent sample We can iterate the kernel M K

k several times to improve

the variability of the particles. From well known results on Metropolis-Hastings, the sample is getting more and more independent. Rule: iterate until 90 or 95% have actually moved to an accepted transition

slide-18
SLIDE 18

1.18

Asymptotic variance

Best achievable asymptotic variance:

  • Multilevel Estimator : ˆ

αN = ˆ p1ˆ p2 . . . ˆ pM.

  • Fluctuations : If the ˆ

pi’s are independent, then √ N · ˆ αN − α α

L

− − − →

N→∞ N(0, M

  • i=1

1 − pi pi ).

  • Constrained Minimization :

arg min

p1,...,pM M

  • i=1

1 − pi pi s.t.

M

  • i=1

pi = α.

  • Optimum : p1 = · · · = pM = α1/M.
slide-19
SLIDE 19

1.19

Simulations : The Model

u region detection

  • The model : X ∼ N(0, I20).
  • Rare event : α =
(X,u

X > 0.95).

  • Numerical computation : α = 4.704 · 10−11.
  • Parameter : p = 3/4 α = r × pM = .83 × (3/4)82.
slide-20
SLIDE 20

1.20

Numerical results

credit: V. Bahuon

2

10

3

10

4

10

5

10

2

10

1

10 10

Number N of particles

Figure 5: Relative standard deviation.

slide-21
SLIDE 21

1.21

2

10

3

10

4

10

5

10

4

10

3

10

2

10

1

10 10

Number N of particles

Figure 6: Relative bias.

slide-22
SLIDE 22

1.22

Perspectives

Find other similar applications (e.g. probabilistic counting algotihms in a large discrete set) Work in progress: non asymptotic variance estimates