KLS conjecture and volume computation Alexander Tarasov - - PowerPoint PPT Presentation

kls conjecture and volume computation
SMART_READER_LITE
LIVE PREVIEW

KLS conjecture and volume computation Alexander Tarasov - - PowerPoint PPT Presentation

KLS conjecture and volume computation Alexander Tarasov Saint-Petersburg State University May 19, 2019 1 / 37 Table of content KLS conjecture Origins form TCS Computation of volume is complex Probabilistic approach to the volume


slide-1
SLIDE 1

KLS conjecture and volume computation

Alexander Tarasov

Saint-Petersburg State University

May 19, 2019

1 / 37

slide-2
SLIDE 2

Table of content

◮ KLS conjecture ◮ Origins form TCS ◮ Computation of volume is complex ◮ Probabilistic approach to the volume computation ◮ Gaussian cooling and O∗(n3) algorithm

2 / 37

slide-3
SLIDE 3

Isoperimetric inequality

Let us begin with the classical isoperimetric inequality in Rn which states that for every bounded Borel set A ⊆ Rn m+(A) ≥ Cm(A)

n−1 n

where m is the n-dimensional Lebesgue measure in Rn , m+ is the

  • uter Minkowski content defined by

m+(A) = lim inf

ǫ→0

m(Aǫ) − m(A) ǫ

3 / 37

slide-4
SLIDE 4

Log-concave probability distribution

We say that µ is log-concave if its density with respect to the Lebesgue measure is e−V for some convex function V : Rn → (−∞, ∞]. In particular, the uniform probability measure

  • n a convex body in Rn is log-concave.

Definition

We say that µ verifies Cheeger’s isoperimetric inequality with constant C if for every Borel set A ⊆ Rn µ+(A) ≥ C min{µ(A), µ(Ac)} where µ+(A) = lim inf

ǫ→0

µ(Aǫ) − µ(A) ǫ .

4 / 37

slide-5
SLIDE 5

Cheeger’s isoperimetric inequality

We can define Cµ as Cµ = min

A

µ+(A) min{µ(A), µ(Ac)} We will find expressions like this later in the application of KLS-type statements.

5 / 37

slide-6
SLIDE 6

Isotropic measures

Definition

We say that a log-concave probability µ is isotropic if: ◮ The barycenter is the origin, i.e., Eµx = 0, and ◮ The covariance matrix M is the identity, i.e., Eµxixj = δi,j, 1 ≤ i, j ≤ n. The KLS conjecture (restricted to isotropic measures) can be formulated as

Conjecture

There exists an absolute constant, independent of µ and n, such that µ+(A) ≥ Cµ(A) for any Borel set A with µ(A) ≤ 1/2.

6 / 37

slide-7
SLIDE 7

Variance conjecture

A particular case of KLS (but more easily formulated) Variance conjecture is the following

Conjecture

For isotropic log-concave probability measures holds Varµ |x|2 ≤ Cn for some absolute constant C > 0.

7 / 37

slide-8
SLIDE 8

Slicing conjecture

It is true that from Variance conjecture one of the main problems

  • f convex geometry so-called Slicing of Hyperplane conjecture will

follows.

Conjecture

For any convex set A ⊂ Rn of n-dimensional volume 1 there exists hyperplane H such that Voln−1(A ∩ H) ≥ c for some absolute constant c.

8 / 37

slide-9
SLIDE 9

Origins of the KLS conjecture

One of the problems treated in theoretical computer science is the design of an algorithm to compute the volume of an n-dimensional convex body, i.e., an algorithm that receives as an input an n-dimensional convex body K, a point x0 ∈ K and an error parameter and returns as an output a real number A such that (1 − ǫ)|K| ≤ A ≤ (1 + ǫ)|K|

9 / 37

slide-10
SLIDE 10

Body as an oracle

The convex body is given as an oracle. Typically, this will be the membership oracle which, given a point x ∈ Rn , will tell whether the point x belongs to K or not. The complexity of such an algorithm will be measured by both the number of calls to the

  • racle and the number of arithmetic operations.

10 / 37

slide-11
SLIDE 11

Complexity of deterministic algorithm

Any deterministic algorithm for computing the volume of a convex body had been proved to have an exponential complexity: ◮ G. Elekes, A geometric inequality and the complexity of computing volume, Discrete and Computational Geometry, 1, 289–292, (1986) ◮ I.Barany and Z. Furedi, Computing the volume is difficult, Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing, 442–447, (1986)

11 / 37

slide-12
SLIDE 12

Exponential complexity

Let’s have a look on one of these articles. Let S be a ball in Rn space with volume 1. Choose any m points P1, P2,. . . ,Pm, ∈ S. Denote by Cm the convex hull of {Pi; 1 ≤ i ≤ m} and by v(n, m) - the maximum of Cm over all possible sets of points.

Theorem

v(n, m) ≤ m 2n

12 / 37

slide-13
SLIDE 13

Proof of the theorem

Let O be the center of S. Denote by Sj (1 ≤ i ≤ m) the ball with diameter OPi. Naturally all vol(Si) ≤ 1

2.

Statement

Cm ⊆

  • {Si; 1 ≤ i ≤ m}

This clearly implies the Theorem.

13 / 37

slide-14
SLIDE 14

Proof of the statement

Suppose that this claim is false, i.e., there is a point Q of Cm which is not contained in any Sj. This is equivalent to the property that ∠OQPi < π

2 for all 1 ≤ i ≤ m.

Consider the hyperplane H orthogonal to OQ and going through

  • Q. ∠OQP < π

2 implies that for all 1 ≤ i ≤ m, Pi is in the same

  • pen halfspace determined by H as O. Hence Q, which is on H,

cannot be in the convex hull, a contradiction.

14 / 37

slide-15
SLIDE 15

Updated oracle

Consider well-guaranteed separation oracle that in addition will give to as: ◮ a ball S containing K ◮ a ball S′ contained in K ◮ in case of ”NO”, hyperplane that separate point and body K

15 / 37

slide-16
SLIDE 16

Theorem

Theorem

Suppose that an algorithm has access to a well-guaranteed separation oracle encoding K. If for some c = c(n) < 1 the algorithm can give an estimate v0 for Vol(K) up to a factor c, i.e., c · v0 ≤ Vol(K) ≤ v0, then its running time is at least c · 2n − (n + 1).

16 / 37

slide-17
SLIDE 17

Proof of the theorem

Suppose the oracle answers ”yes” iff P ∈ S and shows a hyperplane separating P from S otherwise. Suppose, moreover, that it gives the vertices of a regular simplex inscribed in S to be in K. (Without being asked for these points.) Note that this yields an inscribed ball S′ of K. If the algorithm asks fewer than c · 2n − (n + 1) other points then it will ”know” only m ≤ c · 2n points P1, . . . , Pm to be in K. Their convex hull Cm will have vol(Cm) < Vol(S) · m 2n < Vol(S) · c by Theorem 1. So the algorithm cannot conclude that K - which may still be as large as S itself or as small as Cm - has volume either at least c · Vol(S) or less than Vol(S).

17 / 37

slide-18
SLIDE 18

Corollary

Corollary

If an algorithm has access to a well-guaranteed separation oracle and can compute the volume of K up to a factor (2 − ǫ)n then its running time is exponential. Proof: From the last theorem we have for c = (2 − ǫ)−n that the running time is at least [2/(2 − ǫ)]n − (n + 1).

18 / 37

slide-19
SLIDE 19

Randomized approach to the volume computational

Against backdrop of deterministic algorithms complexity estimation (computing the volume of an explicit polytope is the #P-hard problem) the breakthrough result of Dyer, Frieze and Kannan established a randomized polynomial-time algorithm for estimating the volume to within any desired accuracy.

19 / 37

slide-20
SLIDE 20

Randomized approach to the volume computation

The DFK algorithm for computing the volume of a convex body K in Rn given by a membership oracle uses a sequence of convex bodies K0, K1, . . . , Km = K, starting with the unit ball fully contained in K and ending with K. Each successive body Ki = 2i/nBn ∩ K is a slightly larger ball intersected with K. Using random sampling, the algorithm estimates the ratios of volumes of consecutive bodies. The product of these ratios times the volume

  • f the unit ball was the estimate of the volume of K.

20 / 37

slide-21
SLIDE 21

Random sampling

Sampling is achieved by a random walk in the convex body. There were many technical issues to be addressed, but the central challenge was to show a random walk that “mixed” rapidly, i.e. converged to its stationary distribution in a polynomial number of

  • steps. The overall complexity of the algorithm was O∗(n23) oracle

calls.

21 / 37

slide-22
SLIDE 22

Markov chain of body K

Consider in Rn regular grid of size δ and corresponding cubes. Each cube that intersects with body K we consider as a state in the Markov chain with probability 1/(4n) to jump to the neighboring cube and with probability 1/2 to not jump at all.

22 / 37

slide-23
SLIDE 23

Ergodic Marcov chain

It is easy to see that our Markov chain is “irreducible”, i.e. for each pair of states i, j, there is a natural number s such that p(s)

ij

is

  • nonzero. This follows since the graph of the natural random walk

is connected. Also the Markov chain can be seen to be aperiodic, i.e. gcd{s : p(s)

ij

> 0} = 1 for all i, j. This follows from the facts that the graph is connected and each cube has a self-loop. Hence, the chain is “ergodic” and there exist “stationary” probabilities π1, π2, ..., πN > 0 such that lim

s→∞ p(s) ij

= πj ∀i, j It means that if we will sample points as big number of steps in random walk we will close to sample from uniform measure.

23 / 37

slide-24
SLIDE 24

Speed of convergence and conductance

Definition

The conductance φ of a Markov chain with state space K, next-step distribution Px and stationary distribution Q is defined as: φ = min

s⊂K

  • S Px(K \ S)dQ(x)

min{Q(S), Q(K \ S)} In the next slide we will see how speed of convergence to stationary distribution depends on φ. But now take a look on similarity of this expression with Cµ: Cµ = min

A

µ+(A) min{µ(A), µ(Ac)}

24 / 37

slide-25
SLIDE 25

Speed of convergence and conductance

Here is just one example, showing that in Markov chain speed of convergence very depend on conductance

Theorem

For a time-reversible ergodic Markov chain with all πj’s equal, and pi,i ≥ 1/2 for all i, |pt

i,i − πj| ≤

  • 1 − φ2

2 t

25 / 37

slide-26
SLIDE 26

Sketch of the algorithm

Now we have sketch of the volume computing algorithm for body K containing unit ball Bn: ◮ Consider set of Ki = 2i/nBn ∩ K, i = 1 . . . m ◮ Using random sampling estimate Vol(Ki+1)

Vol(Ki)

◮ Take product of all estimations and volume of the Bm At the initial article the complexity of O∗(n23) was shown. Later, but using the same approach, the complexity O∗(n4) was achived.

26 / 37

slide-27
SLIDE 27

New approach

Next approach relies on sampling a sequence of log-concave distributions, akin to simulated annealing, starting with one that is highly concentrated around a point deep inside the convex body and ending with the uniform distribution. Although the total complexity of O∗(n4) was not improved in the original work, that idea was developed and for now we have O∗(n3) algorithm for volume computation that was constructed in ◮ Gaussian Cooling and O∗(n3) Algorithms for Volume and Gaussian Volume, Ben Cousins, Santosh Vempala

27 / 37

slide-28
SLIDE 28

Main theorem

Let us formulate the main theorem of article

Theorem

There is an algorithm that, for any ǫ > 0, p > 0 and convex body K in Rn that contains the unit ball and has EK(||X||2) = O(n), with probability 1 − p, approximates the volume of K within relative error ǫ and has complexity O n3 ǫ2 log2(n) log2(1 ǫ ) log2(n ǫ ) log(1 p)

  • = O∗(n3)

in the membership oracle model.

28 / 37

slide-29
SLIDE 29

One more sketch

We will combine ideas of concentrated balls and of sequence of the distributions the following way. Consider the family of functions f (σ2, x) = exp(−||x||2/(2σ2) · 1{x∈K} and it’s integrals F(σ2) =

  • K

f (σ2, x)dx For large enough σ the function f (σ2, K) is quite close to 1{x∈K}. For increasing sequence of σi we will estimate ratio Wi = F(σ2

i+1)

F(σ2

i )

So taking at the end F(σ2

0) · W1 . . . Wm we obtain estimation for

Vol(K).

29 / 37

slide-30
SLIDE 30

Estimation for Wi

We can estimate the value of Wi if we have samples from µi ∼ f (σ2, ·). Indeed, let X be the random variable distributed as µi and Y = f (σ2

i+1, X)/f (σ2 i , X) then

EY =

  • K

exp

  • ||x||2

2σ2

i

− ||x||2 2σ2

i+1

  • dµi(x)

=

  • K

exp

  • ||x||2

2σ2

i

− ||x||2 2σ2

i+1

  • · exp(−||x||2/(2σ2

i )

F(σ2

i )

dx = 1 F(σ2

i )

  • K

exp

  • − ||x||2

2σ2

i+1

  • dx = F(σ2

i+1)

F(σ2

i )

30 / 37

slide-31
SLIDE 31

Estimation for Wi

Our goal is to estimate E(Y ) within some target relative error. The algorithm estimates the quantity E(Y ) by taking random sample points X1, . . . , Xk and computing the empirical estimate for E(Y ) from the corresponding Y1, . . . , Yk Wi = 1 k

k

  • j=1

Yj = 1 k

k

  • j=1

fi+1(x) fi(x) The variance of Y divided by its expectation squared will give a bound on how many independent samples Xi are needed to estimate E(Y ) within the target accuracy. Denoting σ2 = σ2

i+1

and σ2

i = σ2/(1 + α) we obtain that

E(Y 2) E(Y )2 = F(

σ2 (1+α))F( σ2 (1−α))

F(σ2)2

31 / 37

slide-32
SLIDE 32

Cooling rate

The algorithm has two parts, and the cooling rate αi is different for them. In the first part, starting with a Gaussian of variance σ2 = 1/(4n), which has almost all its measure inside the ball contained in K, we increase σ2 by a fixed factor of 1 + 1/n in each phase till the variance σ2 reaches 1. For each σ, we sample random points from the corresponding distribution and estimate the ratio

  • f the densities for the current phase and the next phase by

averaging over samples. The total complexity for the first part is ◮ O∗(n) phases ◮ O∗(1) samples per phase ◮ O∗(n2) time per sample And it give to as O∗(n3) in total.

32 / 37

slide-33
SLIDE 33

Cooling rate

In the second part, we increase the variance till it reaches C 2n, after which one final phase suffices to compare with the target uniform distribution. However, we cannot afford to cool at the same rate of 1 + 1/n because the time per sample goes to O∗(σ2n2) for σ > 1. By the end of this part, we would be using O∗(n3) per sample, and the overall complexity would be O∗(n4). Instead we observe that we can cool at a faster rate of 1 + σ2/(2C 2n) and still maintain that the variance of the ratio estimator is a constant.

33 / 37

slide-34
SLIDE 34

Ball walk

Now we back to the sampling part of algorithm. Consider some non-negative function f restricted to the body K and corresponding random walk that we will use as a sampling tool.

34 / 37

slide-35
SLIDE 35

Sampling part

To show the random walk quickly reaches its stationary distribution, we will use the standard method of bounding the

  • conductance. For the ball walk, this runs into a hurdle, namely, the

local conductance of points near sharp corners of the body can be arbitrarily small, so the walk can get stuck and waste a large number of steps. To avoid this, we could start the walk from a random point chosen from a distribution sufficiently close to the target distribution. Again, closeness of distributions guaranteed by smallness of cooling rate αi.

35 / 37

slide-36
SLIDE 36

Again whole algorithm

Summarizing all of the above:

  • 1. Firstly we sample points from initial distribution f (σ2

0, ·)

  • 2. Then while σi ≤ C 2n

◮ estimate Wi =

F(σ2

i+1)

F(σ2

i )

◮ apply ball walk to the our points till they will be close to distribution f (σ2

i αi, ·)

  • 3. Return F(σ2

0) · W1, . . . , Wm as an answer

It is interesting that in that case the problem of sampling from uniform distribution have the same complexity as volume computation.

36 / 37

slide-37
SLIDE 37

Thanks for your attention!

37 / 37