PECULIARITIES OF LARGE DIMENSIONS and some repercussions Anatoly - - PowerPoint PPT Presentation

peculiarities of large dimensions and some repercussions
SMART_READER_LITE
LIVE PREVIEW

PECULIARITIES OF LARGE DIMENSIONS and some repercussions Anatoly - - PowerPoint PPT Presentation

PECULIARITIES OF LARGE DIMENSIONS and some repercussions Anatoly Zhigljavsky Cardiff University, MATHS Cardiff, November 7, 2018 Plan I. Large dimensions II. Applications to global optimization III. Other repercussions IV. Conclusions


slide-1
SLIDE 1

PECULIARITIES OF LARGE DIMENSIONS and some repercussions

Anatoly Zhigljavsky

Cardiff University, MATHS

Cardiff, November 7, 2018

slide-2
SLIDE 2

Plan

  • I. Large dimensions
  • II. Applications to global optimization
  • III. Other repercussions
  • IV. Conclusions
slide-3
SLIDE 3

Chapter I. Large dimensions

where we learn that our intuition usually deceives us

slide-4
SLIDE 4

Chapter I. Large dimensions

where we learn that our intuition often deceives us

slide-5
SLIDE 5

Dimension

Rd

Small dimension: d = 1, 2, 3 Medium dimension: d = 10, 20 (MANY) Large dimension: d = 100 (REALLY MANY)

slide-6
SLIDE 6

Volume of the d-dimensional unit ball B(0, 1) = {x ∈ Rd : x ≤ 1}

Vd = vol(B(0, 1)) = πd/2 Γ(d/2 + 1)

slide-7
SLIDE 7

Volume of the d-dimensional unit ball

log10 Vd as a function of d: F.e., V100 ≃ 2.368 · 10−40

slide-8
SLIDE 8

d-dimensional ball

Almost all the volume is near the equator:

  • Th. For any c > 0, the fraction of the volume of the unit ball above the

plane x1 = c/ √ d − 1 is less than 2

c exp{−c2/2}.

slide-9
SLIDE 9

d-dimensional ball

Almost all the volume is also there (in B(0, 1) \ B(0, 1 − ǫ) with ǫ = c/d): Indeed, vol(B(0, 1 − ǫ))/vol(B(0, 1)) = (1 − ǫ)d ≃ 0 for ǫ = c/d, large d and c fixed but large enough. Radius of a uniform random point has density pd(r) = drd−1, 0 ≤ r ≤ 1.

slide-10
SLIDE 10

Random points in a 100-d ball; projection to 2 dimensions

B(0, 1) = {x ∈ Rd : x2

1 + x2 2 + . . . + x2 d ≤ 1}

slide-11
SLIDE 11

Random points in a 100-d ball; projection to 2 dimensions

B(0, 1) = {x ∈ Rd : x2

1 + x2 2 + . . . + x2 d ≤ 1}

slide-12
SLIDE 12

d-dimensional cube and ball

Unit cube: {x = (x1, . . . , xd) ∈ Rd : |xi| ≤ 1/2} Unit ball: B(0, 1) = {x ∈ Rd : x ≤ 1} Length of the cube’s half-diagonal: 1 2 2 + 1 2 2 + . . . + 1 2 2 = √ d 2

slide-13
SLIDE 13

d-dimensional cube

slide-14
SLIDE 14

Shape of the d-dimensional cube

[− 1

2, 1 2]2

[− 1

2, 1 2]3

[− 1

2, 1 2]8

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Volume of the largest ball inscribed into the unit cube

Volume of the cube =1, vd =

πd/2 2dΓ(1 + d/2) (volume of the ball of radius 1/2)

v2 = π

4 ≃ 0.78,

v3 = π

6 ≃ 0.52

v10 ≃ 0.0025, v20 ≃ 0.25 · 10−7, v100 ≃ 10−70

slide-21
SLIDE 21

Volume of the largest ball inscribed into the unit cube

Volume of the cube =1, vd =

πd/2 2dΓ(1 + d/2) (volume of the ball of radius 1/2)

v2 = π

4 ≃ 0.78,

v3 = π

6 ≃ 0.52,

v10 ≃ 0.0025 v20 ≃ 0.25 · 10−7, v100 ≃ 10−70

slide-22
SLIDE 22

Volume of the largest ball inscribed into the unit cube

Volume of the cube =1, vd =

πd/2 2dΓ(1 + d/2) (volume of the ball of radius 1/2)

v2 = π

4 ≃ 0.78,

v3 = π

6 ≃ 0.52,

v10 ≃ 0.0025, v20 ≃ 0.25 · 10−7, v100 ≃ 10−70

slide-23
SLIDE 23

small ball in-between large ones, d = 2

slide-24
SLIDE 24

small ball in-between large ones, d = 3

slide-25
SLIDE 25

‘small’ ball in-between ‘large’ ones, d ≥ 3

Cube [−1, 1]d; centers of ‘large’ balls of radius 1

2 are (± 1 2, . . . , 1 2).

Therefore the radius of the ‘small’ ball is rd = 1

2(

√ d − 1). F.e., r1 = 0, r2 ≃ 0.207, r3 ≃ 0.366, r4 = 1

2, r9 = 1, r100 = 4.5

slide-26
SLIDE 26

‘small’ ball in-between ‘large’ ones, d ≥ 3

Cube [−1, 1]d; centers of ‘large’ balls of radius 1

2 are (± 1 2, . . . , ± 1 2).

Therefore the radius of the ‘small’ ball is rd = 1

2(

√ d − 1). F.e., r1 = 0, r2 ≃ 0.207, r3 ≃ 0.366, r4 = 1

2, r9 = 1, r100 = 4.5

slide-27
SLIDE 27

‘small’ ball in-between ‘large’ ones, d ≥ 3

Cube [−1, 1]d; centers of ‘large’ balls of radius 1

2 are (± 1 2, . . . , ± 1 2).

Therefore the radius of the ‘small’ ball is rd = 1

2(

√ d − 1). F.e., r1 = 0, r2 ≃ 0.207, r3 ≃ 0.366, r4 = 1

2, r9 = 1, r100 = 4.5

slide-28
SLIDE 28

‘small’ ball in-between ‘large’ ones, d ≥ 3

Cube [−1, 1]d; centers of ‘large’ balls of radius 1

2 are (± 1 2, . . . , ± 1 2).

Therefore the radius of the ‘small’ ball is rd = 1

2(

√ d − 1). F.e., r1 = 0, r2 ≃ 0.207, r3 ≃ 0.366, r4 = 1

2, r9 = 1, r100 = 4.5

slide-29
SLIDE 29

‘small’ ball in-between ‘large’ ones, d ≥ 3

Cube [−1, 1]d; centers of ‘large’ balls of radius 1

2 are (± 1 2, . . . , ± 1 2).

Therefore the radius of the ‘small’ ball is rd = 1

2(

√ d − 1). For d > 1205, the volume of the ‘small’ ball is larger than 2d!

slide-30
SLIDE 30

Covering of the space (Conway & Sloan)

Θd (thickness) = average number of balls that contain a random point. Some values of this thickness are: Θ2 ≃ 1.2092, Θ3 ≃ 1.4635, Θ10 ≃ 5.2517, Θ20 ≃ 31.14.

slide-31
SLIDE 31

Packing (Conway & Sloan)

∆d (density) = proportion of the space occupied by the balls. Some values of this density are: ∆2 ≃ 0.906, ∆3 ≃ 0.74, ∆10 ≃ 0.099, ∆20 ≃ 0.0032

slide-32
SLIDE 32

Covering and packing, d = 100

Θd (thickness of covering) = average number of balls that contain a random point. ∆d (packing density) = proportion of the space occupied by the balls. Θ2 ≃ 1.2092, Θ3 ≃ 1.4635, Θ10 ≃ 5.2517, Θ20 ≃ 31.14, Θ100 ≃? ∆2 ≃ 0.906, ∆3 ≃ 0.74, ∆10 ≃ 0.099, ∆20 ≃ 0.0032, ∆100 ≃?

slide-33
SLIDE 33

Packing and covering, d = 100

Θd (thickness of covering) = average number of balls that contain a random point. ∆d (packing density) = proportion of the space occupied by the balls. Θ2 ≃ 1.2092, Θ3 ≃ 1.4635, Θ10 ≃ 5.2517, Θ20 ≃ 31.14 ∆2 ≃ 0.906, ∆3 ≃ 0.74, ∆10 ≃ 0.099, ∆20 ≃ 0.0032, ∆100 ≃? Θ100 ≃ 4.28 · 107 (an average point is covered more than 40 million times!)

slide-34
SLIDE 34

Packing and covering, d = 100

Θd (thickness of covering) = average number of balls that contain a random point. ∆d (packing density) = proportion of the space occupied by the balls. Θ2 ≃ 1.2092, Θ3 ≃ 1.4635, Θ10 ≃ 5.2517, Θ20 ≃ 31.14 ∆2 ≃ 0.906, ∆3 ≃ 0.74, ∆10 ≃ 0.099, ∆20 ≃ 0.0032 Θ100 ≃ 4.28 · 107 (an average point is covered more than 40 million times!) ∆100 < 10−26 (less than 0.000000000000000000000001% of the space is

  • ccupied by the balls!)
slide-35
SLIDE 35

Uniform random points on a square

slide-36
SLIDE 36

Uniform points in a cube are at almost the same distance from each other

The distribution of the distances x − y =

  • d
  • i=1

(xi − yi)2 is concentrated around its expected value which is approximately

  • d/6.

Similar results hold for the unit ball and for the distributions different from the uniform.

slide-37
SLIDE 37

Gaussian distribution (density function)

slide-38
SLIDE 38

Gaussian random vectors

If x is Gaussian N(0, Id) then the distance from the origin r =

  • d
  • i=1

x2

i

is very close to √ d. More precisely, for any 0 < β < √ d, Pr{ √ d − β ≤ r ≤ √ d + β} ≥ 1 − 3β2/64 Two i.i.d. Gaussian vectors are almost orthogonal to each other. Similar for uniform r.v. in a ball and in a cube.

slide-39
SLIDE 39

Random projections

Johnson-Lindenstrauss Lemma. For any 0 < ε < 1 and any integer n, let k ≥ cε2 log n for some c > 0. For any set of n points in Rd, the random projection f : Rd → Rk has the property that for all pairs of points vi and vj, with probability at least 1– 3

2n,

(1 − ε)vi − vj ≤ f (vi) − f (vj) ≤ (1 − ε)vi − vj

slide-40
SLIDE 40

Chapter II. Applications to global optimization

where we do not see many reasons for optimism

slide-41
SLIDE 41

Chapter II. Applications to global optimization

where we do not see many reasons for optimism

slide-42
SLIDE 42

Global optimization

f (x) → min

x∈A ;

x∗ = arg min

x∈A f (x)

slide-43
SLIDE 43

Random points in a ball; projection to 2 dimensions

slide-44
SLIDE 44

How far are the points from the boundary? d ∈ [5, 200]

Figure: The difference y1,n − f∗ for n = 106 (solid) and n = 1010 (dashed), where y1,n is the record of evaluations of the function f (x) = eT

1 x at points x1, . . . , xn

with uniform distribution in the unit ball in the dimension d as d varies in [5, 200].

slide-45
SLIDE 45

Are quasi-random points any better?

y1,n y4,n

Figure: Boxplots of y1,n and y4,n for 500 runs with points generated from the Sobol low-dispersion sequence (left) and the uniform distribution (right), d = 20.

slide-46
SLIDE 46

Rate of convergence of the simple random search

The number of points nγ required to hit a ball or radius ε centered at the minimizer, with probability ≥ 1 − γ, for different dimensions d: d γ = 0.1 γ = 0.05 ε = 0.5 ε = 0.2 ε = 0.1 ε = 0.5 ε = 0.2 ε = 0.1 1 5 11 6 14 2 2 18 73 2 23 94 3 4 68 549 5 88 714 5 13 1366 43743 17 1788 56911 10 924 8.8·106 9.0·109 1202 1.1·107 1.2·1010 20 9.4·107 8.5·1015 8.9·1021 1.2·108 1.1·1016 1.2·1022 50 1.5·1028 1.2·1048 1.3·1063 1.9·1028 1.5·1048 1.7·1063 100 1.2·1070 7.7·10109 9.7·10139 1.6·1070 1.0·10110 1.3·10140 nγ is roughly ε−d/Vd (multiplied by − ln γ); recall V100 ≃ 10−40.

slide-47
SLIDE 47

Convergence: Borel-Cantelli lemma

Global random search algorithm converges if

  • j=1

inf Pj(B(x∗, ε)) = ∞ (1) for any ε > 0, where B(x∗, ε)={x ∈A: ||x−x∗|| ≤ ε}; the infimum in (1) is taken over all possible previous points and the results of the objective function evaluations at them. Standard choice of probability distributions to guarantee convergence: Pj+1 = αj+1PU + (1 − αj+1)Qj ,

  • j

αj = ∞ .

slide-48
SLIDE 48

Example: Pj+1 = αj+1PU + (1 − αj+1)Qj , αj = 1/j .

Using the approximation n

j=1 αj ≃ ln n, we obtain

nγ ≃ exp{−ln γ/PU(B)}. If A = [0, 1]d this gives nγ ≃ exp{−ln γ/PU(B)}. Assuming further B = B(x∗, ε) we obtain nγ ≃ exp{const · ε−d}, where const = (−ln γ)/Vd (if x∗ lies closer to the boundary of A than ε then n(γ) is even larger). For example, for γ = 0.1, d = 10 and ε = 0.1, nγ is a number larger than 101000000000 Even for d = 3, γ = 0.1 and ε = 0.1, the value of nγ is huge: nγ ≃ 10238.

slide-49
SLIDE 49

Simulated Annealing (SA) and Quantum Annealing (QA)

can they help us?

slide-50
SLIDE 50

Simulated Annealing (SA) and Quantum Annealing (QA)

can they help us in getting faster convergence?

slide-51
SLIDE 51

SA and Gibbs densities

SA accepts the move xk → xk+1 w.p. 1 if f (xk+1) ≤ f (xk) and exp(−(f (xk+1) − f (xk))/(Ktk)) if f (xk+1) > f (xk). πβ(x) = exp{−βf (x)}

  • A

exp{−βf (z)}dz β = 1/(Kt) . (A) Graph of the objective function f ; (B) Gibbs densities with β = 1 (dotted line) and β = 3 (solid line)

slide-52
SLIDE 52

SA, convergence

Geman S., Geman D. ”Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” IEEE Transactions on pattern analysis and machine intelligence 6 (1984): 721-741. Cited by more than 22,000. Introduction, p.3:

slide-53
SLIDE 53

Geman & Geman (the theorem)

slide-54
SLIDE 54

Geman & Geman (comment after the theorem)

N/logk = t ⇒ log k = exp(N/t) N = 20000, t = 1

2

⇒ k = exp(40000) ≃ 6 · 1017371 Travelling salesman with 10 cities: N = 10! = 3628800 ⇒ k = exp(3628800) ≃ 6.5 · 101575967 If we take log2 from this number we get ≃ 5 · 106. For 20 cities we get 20! = 2432902008176640000 and ≃ 7 · 1018.

slide-55
SLIDE 55

SA, convergence

The formula T(k) = c log k with c = N∆ for the temperature decrease in SA is one of the most famous formulas in optimization; see e.g. 24-th minute in the celebrated Google talk by Hidetoshi Nishimori Theory of Quantum Annealing: https://www.youtube.com/watch?v=OQ91L96YWCk

slide-56
SLIDE 56
slide-57
SLIDE 57

My comment on SA in 1985/1991

AZ(1985, 1991):

slide-58
SLIDE 58

QA versus SA

slide-59
SLIDE 59

QA in words

QA uses a quantum field instead of a thermal gradient. In order to explore the landscape of the objective function, SA and its variants use ”thermal” fluctuations associated to temperature gradients, while QA uses quantum fluctuations. When the QA is applied to a minimization problem, a current state is replaced by a “neighbor state“ chosen randomly (or chosen by a more sophisticated method). Main area where QA may be efficient: combinatorial optimization, like the classical “Traveling Salesman Problem”.

slide-60
SLIDE 60

QA

Main idea: Hamiltonian at time t : H(t) =

  • 1 − t

T

  • H0 + t

T Hq, 0 ≤ t ≤ T . Suited to: QUBO (Quadratic Unconstrained Binary Optimization):

n

  • i,j=1

Qi,j xixj → min

x∈{−1,+1}n

slide-61
SLIDE 61

QA versus SA

slide-62
SLIDE 62

Quantum computer D-Wave

slide-63
SLIDE 63

What have we reached with quantum computers so far?

Factorization into prime factors: 21 = 3 × 7 (this was a record in 2012; now it is slightly larger, like 56153 = 233 × 241) QUBO with D-Wave:

n

  • i,j=1

Qi,j xixj → min

x∈{−1,+1}n

Largest n?

slide-64
SLIDE 64

Can DNA computers help?

slide-65
SLIDE 65

Can the infinity computer help?

Roughly, the grossone-based infinity computer operates with infinitesimals as fast as with ordinary numbers. It’s not built yet.

slide-66
SLIDE 66

Some references

A.Blum, J. Hopcroft, R. Kannan (2018) Foundations of Data Science

  • K. Ball (1997) An elementary introduction to modern convex geometry
  • AZ. Theory of global random search, Kluwer, 1991

AZ, A. Zilinskas. Stochastic global optimization, Springer, 2001

  • A. Zilinskas, A. Pepelyshev, AZ (2017) Performance of global random

search algorithms for large dimensions. JOGO

slide-67
SLIDE 67

Thank you very much for listening for participating in this meeting for your interest in the area of big data

slide-68
SLIDE 68

Thank you very much for listening for participating in this meeting for your interest in the area of big data

slide-69
SLIDE 69

Thank you very much for listening for participating in this meeting for your interest in the area of big data

slide-70
SLIDE 70

Thank you very much for listening, for participating in this meeting for your interest in the area of big data

slide-71
SLIDE 71

Thank you very much for listening, for participating in this meeting, for your interest in the area of big data