Improved Boltzmann sampling for the Hadamard product of - - PowerPoint PPT Presentation

improved boltzmann sampling for the hadamard product of
SMART_READER_LITE
LIVE PREVIEW

Improved Boltzmann sampling for the Hadamard product of - - PowerPoint PPT Presentation

Improved Boltzmann sampling for the Hadamard product of distributions Andrea Sportiello CNRS and LIPN, Universit e Paris Nord, Villetaneuse GASCom 2016, La Marana, Corsica June 4th 2016 Andrea Sportiello Sampling the Hadamard product of


slide-1
SLIDE 1

Improved Boltzmann sampling for the Hadamard product of distributions

Andrea Sportiello CNRS and LIPN, Universit´ e Paris Nord, Villetaneuse GASCom 2016, La Marana, Corsica June 4th 2016

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-2
SLIDE 2

The problem

We have two probability distributions, p(x) and q(x)

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-3
SLIDE 3

The problem

We have two probability distributions, p(x) and q(x) Define f (x) = p(x)q(x)

  • y p(y)q(y)

(the Hadamard product of p and q)

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-4
SLIDE 4

The problem

We have two probability distributions, p(x) and q(x) Define f (x) = p(x)q(x)

  • y p(y)q(y)

(the Hadamard product of p and q) We have two black-box algorithms that sample from p and from q

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-5
SLIDE 5

The problem

We have two probability distributions, p(x) and q(x) Define f (x) = p(x)q(x)

  • y p(y)q(y)

(the Hadamard product of p and q) We have two black-box algorithms that sample from p and from q We want to sample from f

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-6
SLIDE 6

The problem

We have two probability distributions, p(x) and q(x) Define f (x) = p(x)q(x)

  • y p(y)q(y)

(the Hadamard product of p and q) We have two black-box algorithms that sample from p and from q We want to sample from f Fast

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-7
SLIDE 7

The obvious rejection algorithm

The rules of the game are clear. We have no analytic information whatsoever on p and q. We only have the black boxes The obvious rejection algorithm seems to be the only candidate: Algorithm 1: Obvious rejection begin repeat x p ; y q ; until x = y; return x end Call (p, q) =

x p(x)q(x)

The cycle is repeated on average 1/(p, q) times If we have a size parameter n, this may be large, and we want to make better

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-8
SLIDE 8

Scaling of the size parameter

Assume that (p, p), (p, q), (q, q) are all Θ(n−2α) (by Cauchy–Schwarz, that’s (p, p), (q, q) = Θ(n−2α) and p, q not asymptotically orthogonal) and that

x(p(x)3 + q(x)3) ≪ (p, q)

(that’s normally the case, anyhow we have variants of the algorithm that work if

x(p(x)k + q(x)k) ≪ (p, q), for any finite k)

The complexity of the obvious rejection algorithm is Θ(n2α) We want to go down to Θ(nα ln n) or Θ(nα)

✲ ✛

Θ(n2α)

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-9
SLIDE 9

A complexity paradigm

In algorithm complexity you normally just count operations. Nonetheless, when you have black boxes, it is wise to count separately operations and black-box queries, which are generally much more expensive that a single operation Here we have two black boxes, for simplicity we will assume they have similar complexity Nice notation:

  • for black-box queries

for operations Then, e.g., Θ(nx + ny ) simplifies to Θ(nx ) if x ≥ y, but stays as is if x < y, as we do not specify the / ratio, except for it being > 1 The complexity of the obvious rejection algorithm is Θ(n2α ) We want to go down to Θ(nα + nα ln(n) )

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-10
SLIDE 10

The na¨ ıve ‘birthday paradox’ algorithm

Let us consider the following algorithm Algorithm 2: Birthday paradox, first try begin repeat (x1, . . . , xk) p ; (y1, . . . , yk) q ; until ∃! (i, j) | xi = yj; return xi end Best hope: at each run there are ∼ Poissk2(p,q) pairs (i, j) such that xi = yj, so if we tune k ∼ 1/

  • (p, q)

the cycle costs Θ(k ), and is repeated on average Θ(1) times The complexity drops down to Θ(nα )

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-11
SLIDE 11

An easy win?

An easy win?

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-12
SLIDE 12

An easy win?

An easy win? . . . No!

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-13
SLIDE 13

An easy win?

An easy win? . . . No! besides a bunch of solvable minor issues (is the number of good pairs really distributed as a Poissonian?) (na¨ ıvely searching for a good pair takes time k2) there is the one big problem:

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-14
SLIDE 14

An easy win?

An easy win? . . . No! besides a bunch of solvable minor issues (is the number of good pairs really distributed as a Poissonian?) (na¨ ıvely searching for a good pair takes time k2) there is the one big problem: The resulting probability distribution is wrong!

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-15
SLIDE 15

An easy win?

An easy win? . . . No! besides a bunch of solvable minor issues (is the number of good pairs really distributed as a Poissonian?) (na¨ ıvely searching for a good pair takes time k2) there is the one big problem: The resulting probability distribution is wrong! The average number of good pairs (i, j) with xi = yj = z is in fact proportional to f (z), and boosted by a factor k2, which is good. . . . . . but knowing that you have a unique good pair gives a bias! However the whole idea reamins valuable, because, if the average number of good pairs is small, having no further pairs is ‘normal’, and this bias shall also be small.

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-16
SLIDE 16

Poissonisation of the na¨ ıve algorithm

Analysing the bias of the previous algorithm is complicated. It gets easier if we ‘Poissonise’ k, i.e. we rather consider: Algorithm 3: Birthday paradox, Poissonised begin repeat sample kp and kq with Poissk; (x1, . . . ,xkp) p ; (y1, . . . ,ykq) q ; until ∃! (i, j) | xi = yj; return xi end Call νz = (az, bz), with az = #{i | xi = z} and bz = #{j | yj = z} The good fact of Poissonisation is that the νz’s are independent random variables so that we can easily analyse what is the probability of returning z

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-17
SLIDE 17

A formula for the bias

Fact: prob(az, bz) = Poisskp(z)(az)Poisskq(z)(bz) thus the probability that there is a single pair for z is Poisskp(z)(1)Poisskq(z)(1) = e−k(p(z)+q(z))k2p(z)q(z) and the probability that there are no pairs for values z′ = z is 1 −

  • 1 − Poisskp(z′)(0)
  • 1 − Poisskq(z′)(0)
  • =

e−k(p(z′)+q(z′)) ekp(z′) + ekq(z′) − 1

  • as a result, the probability of returning z is proportional to†

f (z) ekp(z) + ekq(z) − 1 and we would be fine if we could devise a final rejection procedure for the spurious factor Bias(z) := 1/(ekp(z) + ekq(z) − 1)

† I.e., up to a factor, like z′ φ(z′), which is the same for all z,

although possibly complicated to calculate

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-18
SLIDE 18

A philosophical digression

So, we want to add a final procedure for correcting the spurious factor Bias(z) := 1/(ekp(z) +ekq(z) −1). In other words, for some C > maxz(ekp(z) + ekq(z) − 1), but still with

z f (z) ekp(z)+ekq(z)−1 C

= Θ(1), I want to sample a Bernoulli random variable of parameter ekp(z)+ekq(z)−1

C

However, recall that we have no analytic information on p and q (we only have the black boxes!) Can we ever sample Bernξ without evaluating ξ? This is not impossible a priori, just remember the famous algorithm for Bernπ/4 that makes no use of π. . .

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-19
SLIDE 19

A first solution with static lists

Algorithm 4: Hadamard product, with static lists repeat sample kp and kq with Poissk; (x1, . . . , xk) p ; (y1, . . . , yk) q ; sort the lists above, produce the list of {(z, az, bz)}; if N• = 0 and N• = 1 then ν ← az + bz − 1 for the only z ∈ Ω•; win ← Bernν/3; return z; until win=true; 1 2 3 4 1 2 3 4

1 3 2 3 2 3

1 1

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-20
SLIDE 20

Why this algorithm is unbiased

Let us first see why this algorithm is correct Call for short φ(z) = e−k(p(z)+q(z)) The algorithm may return z only if no z′ = z arrives out of the Ω• region, each z′ makes a factor φ(z′)

  • 1 + kp(z′) + (kp(z′))2

2

+ kq(z′) + (kq(z′))2

2

  • 1

2 3 4 1 2 3 4

1 3 2 3 2 3

1 1

Then, it shall also happen that z falls within the Ω• region this makes a factor k2p(z)q(z) φ(z)

  • 1 +

kp(z) 2

+

(kp(z))2 6

+

kq(z) 2

+

(kq(z))2 6

  • We would have done

if the two factors in parenthesis were proportional They are not. For this reason we put a final Bernoulli rejection, that corrects for the factorials

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-21
SLIDE 21

Why this algorithm is unbiased

Let us first see why this algorithm is correct Call for short φ(z) = e−k(p(z)+q(z)) The algorithm may return z only if no z′ = z arrives out of the Ω• region, each z′ makes a factor φ(z′)

  • 1 + kp(z′) + (kp(z′))2

2

+ kq(z′) + (kq(z′))2

2

  • 1

2 3 4 1 2 3 4

1 3 2 3 2 3

1 1

Then, it shall also happen that z falls within the Ω• region this makes a factor k2p(z)q(z) φ(z)

  • 1

3·1 + 2 3· kp(z) 2

+ 1· (kp(z))2

6

+ 2

3· kq(z) 2

+ 1· (kq(z))2

6

  • We would have done

if the two factors in parenthesis were proportional They are not. For this reason we put a final Bernoulli rejection, that corrects for the factorials

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-22
SLIDE 22

Why this algorithm is fast

We have taken the apparently risky choice of making a very large rejection region Ω•. If any z falls here, we restart no matter what. Does this affect the complexity in an important way, or the estimates from the na¨ ıve birthday heuristics are still valid? One run costs on average 2k + k ln(k) (for sorting), i.e. already the seeked Θ(nα + nα ln(n) ), so we are done iff the acceptance rate of one run is Θ(1) This rate is easily deduced from the previous calculation, to be

z

1 3k2p(z)q(z)

w

φ(w)

  • 1 + kp(w) + (kp(w))2

2 + kq(w) + (kq(w))2 2

  • = 1

3k2(p, q)

  • w

Φ(w) . . . we claim that

  • w

Φ(w) ≃ e−k2(p,q)

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-23
SLIDE 23

Why this algorithm is fast

If this is true, the average complexity can be made as small as complexity + Θ(ln n) = min

k

2k

1 3k2(p, q)e−k2(p,q)

= 1

  • (p, q)

min

x

2√x

x 3 exp(−x) =

1

  • (p, q)

min

x

6ex √x = 6 √ 2e

  • (p, q)

The reason why this is true is that 1 Φ(w) = 1 + k2p(w)q(w) 1 + kp(w) + (kp(w))2

2

+ kq(w) + (kq(w))2

2

thus

w Φ(w)−1 ≃ w(1 + k2p(w)q(w)) ≃

  • w exp(k2p(w)q(w)) = exp(k2(p, q))

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-24
SLIDE 24

Why this algorithm is fast

If this is true, the average complexity can be made as small as complexity + Θ(ln n) = min

k

2k

1 3k2(p, q)e−k2(p,q)

= 1

  • (p, q)

min

x

2√x

x 3 exp(−x) =

1

  • (p, q)

min

x

6ex √x = 6 √ 2e

  • (p, q)

The reason why this is true is that 1 Φ(w) = 1 + k2p(w)q(w)+ k3

3! (p(w)3 + q(w)3) + · · ·

1 + kp(w) + (kp(w))2

2

+ kq(w) + (kq(w))2

2

thus

w Φ(w)−1 ≃ w(1 + k2p(w)q(w)) ≃

  • w exp(k2p(w)q(w)) = exp(k2(p, q))

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-25
SLIDE 25

Why this algorithm is fast

If this is true, the average complexity can be made as small as complexity + Θ(ln n) = min

k

2k

1 3k2(p, q)e−k2(p,q)

= 1

  • (p, q)

min

x

2√x

x 3 exp(−x) =

1

  • (p, q)

min

x

6ex √x = 6 √ 2e

  • (p, q)

The reason why this is true is that 1 Φ(w) = 1 + k2p(w)q(w)+ k3

3! (p(w)3 + q(w)3) + · · · negligible by

  • ur assumption on
  • z(p(z)3 + q(z)3)

1 + kp(w) + (kp(w))2

2

+ kq(w) + (kq(w))2

2

thus

w Φ(w)−1 ≃ w(1 + k2p(w)q(w)) ≃

  • w exp(k2p(w)q(w)) = exp(k2(p, q))

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-26
SLIDE 26

Why this talk isn’t finished yet

So we have reached our complexity goal I personally believe that, within this framework, we cannot do better by more than a constant The algorith is so simple that it fit in one slide. . . . . . so why should we search for something better? Reason 1: When we reject a run because we find two potential solution, we have the feeling that we have wasted our time. . . Reason 2: We need to tune the value of k, i.e. we need an estimate of (p, q), and this may be not available Reason 3: Sampling from a Poissonian of large average is a pain. for this reason we now give a variant of the algorithm that, instead of sampling kp values {xi} and kq values {yj} all at once, uses incremental lists

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-27
SLIDE 27

Why this talk isn’t finished yet

So we have reached our complexity goal I personally believe that, within this framework, we cannot do better by more than a constant The algorith is so simple that it fit in one slide. . . . . . so why should we search for something better? Reason 1: When we reject a run because we find two potential solution, we have the feeling that we have wasted our time. . . Reason 2: We need to tune the value of k, i.e. we need an estimate of (p, q), and this may be not available Reason 3: Sampling from a Poissonian of large average is a pain. for this reason we now give a variant of the algorithm that, instead of sampling kp values {xi} and kq values {yj} all at once, uses incremental lists

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-28
SLIDE 28

Why this talk isn’t finished yet

So we have reached our complexity goal I personally believe that, within this framework, we cannot do better by more than a constant The algorith is so simple that it fit in one slide. . . . . . so why should we search for something better? Reason 1: When we reject a run because we find two potential solution, we have the feeling that we have wasted our time. . . Reason 2: We need to tune the value of k, i.e. we need an estimate of (p, q), and this may be not available Reason 3: Sampling from a Poissonian of large average is a pain. for this reason we now give a variant of the algorithm that, instead of sampling kp values {xi} and kq values {yj} all at once, uses incremental lists

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-29
SLIDE 29

A family of algorithms

For any point ν ∈ N2 let we have four probabilities, (cν, wν, sν, rν), which stand for continue, warning, stop and restart. Let us first describe a pseudo-algorithm Algorithm 5: (You can’t really program this) begin ⋆ freshstart; νz = (0, 0) ∀z; repeat with prob. 1

2p(z): add (1, 0) to νz

with prob. 1

2q(z): add (0, 1);

; sample event (cν, wν, sν, rν) with ν = νz; until event is s or r; if event is r or you ever sampled a w on a z′ = z then goto freshstart; return z end

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-30
SLIDE 30

(Why this was not a real program?)

Why you should not program the algorithm above? (I mean, besides the use of a ‘goto’?)

https://xkcd.com/292/ by Randall Munroe, 20/07/2007

Because you cannot initialise all νz’s (the set of values is maybe infinite) and, besides this, for the same reason you cannot check if there are still warnings standing on some z′ = z Of course, for properly coding this algorithm you need to use tree data structures

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-31
SLIDE 31

Our ultimate algorithm

Algorithm 6: Hadamard product, with incremental lists begin repeat initialise BST1 and BST2 ; win ← false ; repeat ; insert (z, p) or (z, q) in BST1 with probs. 1

2p(z) and 1 2q(z);

sample event (cν, wν, sν, rν) with ν = νz; if event is w then insert (z, ν) in BST2; until event is s or r; if event is s then explore BST2 for z′ = z; if none is found then win ← true ; until win=true ; return z end

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-32
SLIDE 32

How to design the (c, w, s, r) rates

The algorithm is better analysed by Poissonising also the time, i.e. we have a fictitious real parameter t, which is increased by an exponential random variable at each iteration of the internal loop. Acceptance rates of the algorithm are now evaluated through standard formulas for Poisson Point Processes, and again we have a factor for sampling z, and one factor per z′ = z i.e., something of the form P(sampling z at time t) dt =

z′=z

Φ(z′; t)

  • Ψ(z; t) dt

and the rates are valid iff Ψ(z; t) = Φ(z; t)U(t).

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-33
SLIDE 33

How to design the (c, w, s, r) rates

In the factor Ψ(z; t) for sampling z, we count walks from (0, 0) to any (a, b). We have the overall factor from PPP,

1 (a+b−1)! exp(−t(p(z) + q(z)))ta+b−1p(z)aq(z)b

From the final vertex, we have a factor sν, From each of the other vertices, we have factors cν + wν. In the factor Φ(z; t) for not reaching r with any other z′, we also count walks from (0, 0) to any (a, b). We have the overall PPP factor

1 (a+b)! exp(−t(p(z) + q(z)))ta+bp(z)aq(z)b

at all the vertices we have a factor cν One easily sees that there is a unique possibility: Ψ(z; t)/Φ(z; t) ∝ t and walks to (a, b) in Φ(z; t) shall match to walks to (a + 1, b + 1) in Ψ(z; t).

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-34
SLIDE 34

How to design the (c, w, s, r) rates

In summary: Ψ(a,b) =

  • walks w

(0,0)→(a,b)

  • ν∈w(a,b)
  • cν + wν
  • s(a,b)

Φ(a,b) =

  • walks w

(0,0)→(a,b)

  • ν∈w

cν and we need to solve (a + b + 1) Ψ(a+1,b+1) = U Φ(a,b) ∀ (a, b) within the polytope cν, wν, sν and rν ≥ 0 and cν + wν + sν + rν = 1, while maximising the acceptance rate

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-35
SLIDE 35

The solution

The asymptotically optimal solution is the following: 1 2 3 4 1 2 3 4

1 3 4 7 4 7 2 3 3 7 3 7

The resulting complexity is

3 √ 3π

(p,q)

  • + Θ(ln n)
  • a gain of a factor

8e = 0.658 . . . over the algorithm with static lists

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-36
SLIDE 36

The solution for arbitrary β/α > 0

What if we only have

x(p(x)k+1 + q(x)k+1) ≪ (p, q), with k > 2?

We have to push the diagram to a larger support, and set w1,b =

2b k(b+1)−b(b−1), for b ≤ k + 1,

s1,b = 1 − w1,b, c0,b = 1 for b ≤ k, w0,b+1 = 1, ra,b = 1 at all other a ≤ b e.g.: 1 2 3 4 5 6 1 2 3 4 5 6

1 5 4 13 3 7 8 13 4 13 3 7 8 13 4 5 9 13 4 7 5 13 9 13 4 7 5 13 Andrea Sportiello Sampling the Hadamard product of two distributions

slide-37
SLIDE 37

One application: random walks in random media

Consider a bistochastic digraph (a graph with uniform in- and out-degree) which has some ‘translational invariance on average’ (I choose here a model of plaquettes i.i.d. oriented clockwise or counterclockwise)

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-38
SLIDE 38

One application: random walks in random media

Consider a bistochastic digraph (a graph with uniform in- and out-degree) which has some ‘translational invariance on average’ (I choose here a model of plaquettes i.i.d. oriented clockwise or counterclockwise)

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-39
SLIDE 39

One application: random walks in random media

Consider a bistochastic digraph (a graph with uniform in- and out-degree) which has some ‘translational invariance on average’ (I choose here a model of plaquettes i.i.d. oriented clockwise or counterclockwise)

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-40
SLIDE 40

One application: random walks in random media

Consider a bistochastic digraph (a graph with uniform in- and out-degree) which has some ‘translational invariance on average’ (I choose here a model of plaquettes i.i.d. oriented clockwise or counterclockwise) Now we have a random instance of a graph. On top of this, we will consider random walks starting at the origin

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-41
SLIDE 41

One application: random walks in random media

Consider a bistochastic digraph (a graph with uniform in- and out-degree) which has some ‘translational invariance on average’ (I choose here a model of plaquettes i.i.d. oriented clockwise or counterclockwise) Now we have a random instance of a graph. On top of this, we will consider random walks starting at the origin

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-42
SLIDE 42

One application: random walks in random media

Consider a bistochastic digraph (a graph with uniform in- and out-degree) which has some ‘translational invariance on average’ (I choose here a model of plaquettes i.i.d. oriented clockwise or counterclockwise) Now we have a random instance of a graph. On top of this, we will consider random walks starting at the origin What’s the complexity for the uniform sampling of walks W (n)

0→0, of length 2n, starting

and arriving at the origin?

Andrea Sportiello Sampling the Hadamard product of two distributions

slide-43
SLIDE 43

One application: random walks in random media

Sampling random walks of length 2n, with no prescribed endpoint, is trivially linear. . . but the endpoint x is a random variable, probably at distance ∼ √n, and the probability that x = 0 is roughly n− D

2

Na¨ ıve rejection algorithm: complexity n1+ D

2

Now, grow the first half of the path, of length n, up to its endpoint x (which has distribution p(x)), and grow the last half of the path, again starting from the origin, and walking on the digraph with reversed orientation (this has distribution q(x)) If you use the algorithm described here, with these p and q, the complexity goes from n1+ D

2 to n1+ D 4 Andrea Sportiello Sampling the Hadamard product of two distributions

slide-44
SLIDE 44

Conclusions

We have described two algorithms for sampling from the Hadamard product of two distributions p, q, valid under mild hypothesis, in particular assuming almost no analytic informations on p and q, and using only black boxes for generating from p and q These algorithms improve on the obvious rejection strategy, passing from

1 (p,q) to 1

(p,q)

They are elegant, simple to code, and involve small constant factors They generalise easily to the product of several distributions However, a second generalisation is still unsolved If I interpret p(x), q(x) as f1(x) and f2(n − x), I may now want to generalise to sampling (x1, . . . , xk−1) with probability proportional to f1(x1)f2(x2) · · · fk−1(xk−1)fk(n − x1 − · · · − xk−1), but I do not know still how to correct the bias in this case.

Andrea Sportiello Sampling the Hadamard product of two distributions