Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong - - PowerPoint PPT Presentation

lattice gaussian sampling with markov chain monte carlo
SMART_READER_LITE
LIVE PREVIEW

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong - - PowerPoint PPT Presentation

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong Ling Imperial College London aaa joint work with Zheng Wang (Huawei Technologies Shanghai) aaa September 20, 2016 Cong Ling (ICL) MCMC September 20, 2016 1 / 23 Outline


slide-1
SLIDE 1

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC)

Cong Ling

Imperial College London aaa joint work with Zheng Wang (Huawei Technologies Shanghai) aaa

September 20, 2016

Cong Ling (ICL) MCMC September 20, 2016 1 / 23

slide-2
SLIDE 2

Outline

1

Background

2

Markov Chain Monte Carlo (MCMC)

3

Convergence Analysis

4

Open Questions

Cong Ling (ICL) MCMC September 20, 2016 2 / 23

slide-3
SLIDE 3

Lattice Gaussian Distribution

Lattice Λ = L(B) = {Bx : x ∈ Zn} Continuous Gaussian distribution ρσ,c(z) = 1 ( √ 2πσ)n e− z−c2

2σ2

Discrete Gaussian distribution over lattice Λ DΛ,σ,c(x) = ρσ,c(Bx) ρσ,c(Λ) = e−

1 2σ2 Bx−c2

  • x∈Zn e−

1 2σ2 Bx−c2

where ρσ,c(Λ)

Bx∈Λ ρσ,c(Bx)

−10 −5 5 10 −10 −5 5 10 0.005 0.01 0.015 DZ2,σ(λ) λ

1

λ

2

  • Fig. 1. Discrete Gaussian distribution over Z2.

Cong Ling (ICL) MCMC September 20, 2016 3 / 23

slide-4
SLIDE 4

Why Does It Matter?

Decoding The shape of DΛ,σ,c(x) suggests that a lattice point Bx closer to c will be sampled with a higher probability solve the CVP and SVP problems [Aggarwal et al. 2015, Stephens-Davidowitz 2016] decoding of MIMO systems [Liu, Ling and Stehlé 2011] Closest Vector Problem (CVP) Given a lattice basis B ∈ Rn×n and a target point c ∈ Rn, find the closest lattice point Bx to c Shortest Vector Problem (SVP) Given a lattice basis B ∈ Rn×n, find the shortest nonzero vector of B

Cong Ling (ICL) MCMC September 20, 2016 4 / 23

slide-5
SLIDE 5

Why Does It Matter?

Mathematics prove the transference theorem of lattices [Banaszczyk 1993] Coding

  • btain the full shaping gain in lattice coding [Forney, Wei 1989, Kschischang,

Pasupathy 1993] capacity achieving distribution in information theory: Gaussian channel [Ling, Belfiore 2013], Gaussian wiretap channel [Ling, Luzzi, Belfiore and Stehlé 2013], fading and MIMO channels [Campello, Ling, Belfiore 2016]... Cryptography propose lattice-based cryptosystems based on the worst-case hardness assumptions [Micciancio, Regev 2004] underpin the fully-homomorphic encryption for cloud computing [Gentry 2009] How to sample from lattice Gaussian distribution? The problem that lattice Gaussian sampling aims to solve

Cong Ling (ICL) MCMC September 20, 2016 5 / 23

slide-6
SLIDE 6

State of the Art

Klein’s algorithm [Klein 2000]: works if σ ≥ ω(

  • log n) · max1≤i≤n

bi where bi’s are Gram-Schmidt vectors [Gentry, Peikert, Vaikuntanathan 2008]. Aggarwal et al. 2015: works for arbitrary σ, exponential time, exponential space. Markov chain Monte Carlo [Wang, Ling 2014]: arbitrary σ, polynomial space, how much time? For special lattices: Construction A, B etc., very fast [Campeloo, Belfiore 2016]; polar lattices, quasilinear complexity [Yan et al.’14].

Cong Ling (ICL) MCMC September 20, 2016 6 / 23

slide-7
SLIDE 7

Klein Sampling

By sequentially sampling from the 1-dimension conditional Gaussian distribution DZ,σi,

xi in a backward order from xn to x1, the probability of

Klein is PKlein(x) =

n

  • i=1

DZ,σi,

xi(xi) =

ρσ,c(Bx)

n

i=1 ρσi, xi(Z)

(1)

Klein’s Algorithm Input: B, σ, c Output: Bx ∈ Λ

1

let B = QR and c′ = QT c

2

for i = n, . . . , 1 do

3

let xi =

c′

i−n j=i+1 ri,jxj

ri,i

, σi=

σ |ri,i| 4

sample xi from DZ,σi,

xi 5

end for

6

return Bx PKlein(x) has been demonstrated in [GPV,2008] to be close to DΛ,σ,c(x) within a negligible statistical distance if σ ≥ ω(

  • log n)·max1≤i≤n

bi The operation of Klein’s algroithm has polynomial complexity O(n2) excluding QR decomposition

Cong Ling (ICL) MCMC September 20, 2016 7 / 23

slide-8
SLIDE 8

Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo (MCMC) methods were introduced into lattice Gaussian sampling for the range of σ beyond the reach of Klein’s algorithm [Wang, Ling and Hanrot, 2014]. MCMC methods attempt to sample from an intractable target distribution of interest by building a Markov chain, which randomly generates the next sample conditioned on the previous sample.

Cong Ling (ICL) MCMC September 20, 2016 8 / 23

slide-9
SLIDE 9

Gibbs Sampling

Gibbs Sampling At each Markov move, perform sampling over a single component of x

P (xt+1

i

|xt

[−i])

= e

− 1 2σ2 Bxt+1−c2

  • xt+1

i ∈Z e − 1 2σ2 Bxt+1−c2

where xt

[−i] = (xt 1, ..., xt i−1, xt i+1, ..., xt n).

Gibbs-Klein Sampling Algorithm [Wang, Ling and Hanrot, 2014] At each Markov move, perform the sampling over a block of components of x, while keeping the complexity at the same level as that of componentwise sampling

P (xt+1

block|xt [−block])

= e

− 1 2σ2 Bxt+1−c2

  • xt+1

block∈Zm e − 1 2σ2 Bxt+1−c2

Cong Ling (ICL) MCMC September 20, 2016 9 / 23

slide-10
SLIDE 10

Metropolis-Hastings Sampling

In 1970’s, the original Metropolis sampling was extended to a more general scheme known as the Metropolis-Hastings (MH) sampling, which can be summarized as: Given the current state x for Markov chain Xt, a state candidate y for the next Markov move Xt+1 is generated from the proposal distribution q(x, y) Then the acceptance decision ratio α about y is computed α(x, y) = min

  • 1, π(y)q(y, x)

π(x)q(x, y)

  • ,

(2) where π(x) is the target invariant distribution y and x will be accepted as the state by Xt+1 with probability α and 1 − α, respectively In MH sampling, q(x, y) can be any fixed distribution. However, as the dimension goes up, finding a suitable q(x, y) could be difficult

Cong Ling (ICL) MCMC September 20, 2016 10 / 23

slide-11
SLIDE 11

Independent Metropolis-Hastings-Klein Sampling

In [Wang, Ling 2015], Klein’s sampling is used to generate the state candidate y for the Markov move Xt+1, namely, q(x, y) = PKlein(y) The generation of y for Xt+1 does not depend on the previous state Xt q(x, y) = q(y) is a special case of MH sampling known as independent MH sampling [Tierney, 1991] Independent MHK sampling algorithm Sample from the independent proposal distribution through Klein’s algorithm to obtain the candidate state y for Xt+1 q(x, y) = q(y) = PKlein(y) = ρσ,c(By)

n

i=1 ρσi, yi(Z)

, where y ∈ Zn Calculate the acceptance ratio α(x, y) α(x, y) = min

  • 1, π(y)q(y, x)

π(x)q(x, y)

  • = min
  • 1, π(y)q(x)

π(x)q(y)

  • ,

where π = DΛ,σ,c Make a decision for Xt+1 based on α(x, y) to accept Xt+1 = y or not

Cong Ling (ICL) MCMC September 20, 2016 11 / 23

slide-12
SLIDE 12

Ergodicity

A Markov chain is ergodic if there exists a limiting distribution π(·) such that lim

t→∞P t(x; ·) − π(·)T V = 0

where · T V is the total variation distance All the afore-mentioned Markov chains are ergodic Modes of ergodicity Polynomial Ergodicity P t(x; ·) − π(·)T V = M ·

1 f(t)

Uniform Ergodicity P t(x; ·) − π(·)T V = M(1 − δ)t Geometric Ergodicity P t(x; ·) − π(·)T V = M(x)(1 − δ)t

f(t) is a polynomial function of t, M < ∞, 0 < δ < 1

Mixing Time of a Markov Chain tmix(ǫ) = min{t : maxP t(x, ·) − π(·)T V ≤ ǫ}.

Cong Ling (ICL) MCMC September 20, 2016 12 / 23

slide-13
SLIDE 13

Ergodicity of Independent MHK

The transition probability P(x, y) of the independent MHK algorithm is P(x,y)=q(x, y)·α(x, y)=

    

min

  • q(y), π(y)q(x)

π(x)

  • if y= x,

q(x)+

z=x

max

  • 0,q(z)− π(z)q(x)

π(x)

  • if y= x.

(3) Lemma 1 Given the invariant distribution DΛ,σ,c, the Markov chain induced by the independent MHK algorithm is ergodic lim

t→∞P t(x; ·) − DΛ,σ,cT V = 0

for all states x ∈ Zn. For a countably infinite state space Markov chain, ergodicity is achieved by irreducibility, aperiodicity and reversibility Proof: The Markov chain produced by the proposed algorithm is inherently reversible (x = y) π(x)P(x, y) = π(x)q(x, y)α(x, y) = min{π(x)q(y), π(y)q(x)} = π(y)P(y, x) (4)

Cong Ling (ICL) MCMC September 20, 2016 13 / 23

slide-14
SLIDE 14

Uniform Ergodicity

Lemma 2 In the independent MHK algorithm for lattice Gaussian sampling, there exists a constant δ > 0 such that q(x) π(x) ≥ δ for all x ∈ Zn. Proof: q(x) π(x) = ρσ,c(Bx)

n

i=1 ρσi, xi(Z)

· ρσ,c(Λ) ρσ,c(Bx) = ρσ,c(Λ)

n

i=1 ρσi, xi(Z)

≥ ρσ,c(Λ)

n

i=1 ρσi(Z)

, (5) where the right-hand side (RHS) of (5) is completely independent of x

Cong Ling (ICL) MCMC September 20, 2016 14 / 23

slide-15
SLIDE 15

Uniform Ergodicity

Theorem 1 Given the invariant lattice Gaussian distribution DΛ,σ,c, the Markov chain induced by the independent MHK algorithm is uniformly ergodic: P t(x, ·) − DΛ,σ,c(·)T V ≤ (1 − δ)t for all x ∈ Zn, where δ is given in Lemma 2. Proof: Based on Lemma 2, we have P(x, y) ≥ δπ(y). (6) According to coupling technique, every Markov move gives probability at least δ of making X and X′ equal, P(X = X′) ≥ δ. (7) Therefore, during t consecutive Markov moves, the probability of X and X′ not equaling each other can be derived as P(Xt = X′

t) = (1 − P(X = X′))t ≤ (1 − δ)t

(8) By invoking the coupling inequality, we have P t(x, ·) − π(·)T V ≤ P(Xt = X′

t) ≤ (1 − δ)t

(9)

Cong Ling (ICL) MCMC September 20, 2016 15 / 23

slide-16
SLIDE 16

Convergence Parameter δ (when c = 0)

Analysis of the Convergence Parameter δ In the case of c = 0, δ can be expressed by Theta series ΘΛ and Jacobi theta function ϑ3 as q(x) π(x) = ρσ,0(Λ)

n

i=1 ρσi, xi(Z)

  • x∈Zn e−

1 2σ2 Bx2

n

i=1 ρσi(Z)

= ΘΛ(

1 2πσ2 )

n

i=1 ϑ3( 1 2πσ2

i

) = δ (10) Theta series ΘΛ and Jacobi theta function ϑ3 are ΘΛ(τ) =

  • λ∈Λ

e−πτλ2, (11) ϑ3(τ) =

+∞

  • n=−∞

e−πτn2 (12) with ΘZ = ϑ3

Cong Ling (ICL) MCMC September 20, 2016 16 / 23

slide-17
SLIDE 17

Convergence Parameter δ (when c = 0)

Given the lattice basis B, the value of δ can be calculated

−20 −15 −10 −5 5 0.9 1 1.1 1.2 1.3 1.4 1.5 2πσ2(dB) 1/δ 1/δ in D4 lattice

  • Fig. 2. The value of 1

δ of the D4 lattice in the case of c = 0.

Therefore, the mixing time of the Markov chain can be estimated by tmix(ǫ) = lnǫ ln(1 − δ) < (−lnǫ) ·

1

δ

  • ,

ǫ < 1

Cong Ling (ICL) MCMC September 20, 2016 17 / 23

slide-18
SLIDE 18

Convergence Parameter δ (when c = 0)

Lemma 3 In the case of c = 0, the coefficient δ for an isodual lattice has a multiplicative symmetry point at σ =

1 2π , and converges to 1 on both sides asymptotically when σ goes to 0 and ∞.

−15 −10 −5 5 10 15 1 1.5 2 2.5 2πσ2(dB) 1/δ 1/δ in E8 lattice

  • Fig. 3. The value of 1

δ of the E8 lattice in

the case of c = 0.

−20 −15 −10 −5 5 10 15 20 10 20 30 40 50 60 70 80 2πσ2(dB) 1/δ 1/δ in Leech lattice

  • Fig. 4. The value of 1

δ of the Leech lattice in

the case of c = 0. Cong Ling (ICL) MCMC September 20, 2016 18 / 23

slide-19
SLIDE 19

Metropolis-Hastings Sampling

How MH algorithm works? Firstly, sample from the proposal density T(x; y) to get a candidate. Then, make a decision based on the quantity α to decide whether to accept this candidate as xt+1 or not

α = min

  • 1,

π(y)T (y; x) π(x)T (x; y)

  • (13)

where π(·) denotes the target distribution. The art of MH algorithm lies in choosing an appropriate proposal density. One can use Klein’s algorithm to generate the proposal density, which turns out to be symmetric: T(x; y) = T(y; x). Here,

T (xt; x∗) = 1

n

i=1 ρσi,xt i

(Z) e

− 1 σ2 Bxt−Bx∗2

, (14)

where ρσ,c(z) = e− z−c2

2σ2

is a symmetrical Gaussian function. Then the acceptance ratio α

α = min

  • 1,

π(x∗) π(xt)

  • = min
  • 1, e

− 1 2σ2 (c−Bx∗2−c−Bxt2)

. (15) Cong Ling (ICL) MCMC September 20, 2016 19 / 23

slide-20
SLIDE 20

Geometric Ergodicity

Definition

A Markov chain having stationary distribution π(·) is geometrically ergodic if there exists 0 < δ < 1 and M(x) < ∞ such that for all x P t(x, ·) − π(·)T V ≤ M(x)(1 − δ)t. In MCMC, the drift condition is the usual way to prove the geometric ergodicity [Roberts and Tweedie 1996]

Definition

A Markov chain with discrete state space Ω satisfies the drift condition if there are constants 0 < λ < 1 and b < ∞, and a function V : Ω → [1, ∞], such that

P(x, y)V (y) ≤ λV (x) + b1C(x) for all x ∈ Ω, where C is a small set, 1C(x) equals to 1 when x ∈ C and 0 otherwise.

Cong Ling (ICL) MCMC September 20, 2016 20 / 23

slide-21
SLIDE 21

Geometric Ergodicity

Theorem

Given the invariant lattice Gaussian distribution DΛ,σ,c, the Markov chain established by the symmetric Metropolis-Hastings algorithm satisfies the drift condition. Therefore, it is geometrically ergodic. Overall, exponential convergence can be interpreted in two folds when xi / ∈ C, the Markov chain shrinks geometrically towards the small set C when xi ∈ C, the Markov chain converges exponentially fast to the stationary there is a trade-off between these two convergence rates depending on the size of C However, as C is determined artificially, such a tradeoff is hard to analyze The small set C means that there exist k > 0, 1 > δ > 0 and a probability measure v on Ω such that P k(x, B) ≥ δv(B), ∀x ∈ C for all measurable subsets B ⊆ Ω For independent MHK, the entire state space Ω is small.

Cong Ling (ICL) MCMC September 20, 2016 21 / 23

slide-22
SLIDE 22

Open Questions

Fast convergence requires strong lattice

  • reduction. How to integrate MCMC and

lattice reduction? Is the complexity of MCMC exponential or super-exponential? It would be a breakthrough even if it is exponential. What about other MCMC algorithms? How to design fast-mixing MCMC for discrete Gaussian sampling? What about quantum algorithms?

Cong Ling (ICL) MCMC September 20, 2016 22 / 23

slide-23
SLIDE 23

Cong Ling (ICL) MCMC September 20, 2016 23 / 23