The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , - - PowerPoint PPT Presentation

the johnson lindenstrauss lemma in linear programming
SMART_READER_LITE
LIVE PREVIEW

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , - - PowerPoint PPT Presentation

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016 The gist Goal : solving very large LPs min { c x | Ax = b x 0 }


slide-1
SLIDE 1

The Johnson-Lindenstrauss Lemma in Linear Programming

Leo Liberti, Vu Khac Ky, Pierre-Louis Poirion

CNRS LIX Ecole Polytechnique, France Aussois COW 2016

slide-2
SLIDE 2

The gist

  • Goal: solving very large LPs

min{c⊤x | Ax = b ∧ x ≥ 0}

  • Trade-off: approximate / wrong with low probability: OK
  • Means: project cols of Ax = b to random subspace T, get

Ax = b ∧ x ≥ 0 ⇔ TAx = Tb ∧ x ≥ 0 with high probability

  • Bisection: solve LP using [TAx = Tb ∧ x ≥ 0] as oracle

2

slide-3
SLIDE 3

Plan

  • Restricted Linear Membership
  • Johnson-Lindenstrauss Lemma
  • Applying JLL to RLM
  • Towards solving LPs

3

slide-4
SLIDE 4

Restricted Linear Membership

4

slide-5
SLIDE 5

Linear feasibility with constrained multipliers

Restricted Linear Membership (RLM) Given vectors A1, . . . , An, b ∈ Rm and X ⊆ Rn, is there x ∈ X s.t. b =

  • i≤n

xiAi ? RLMX is a fundamental problem class, which subsumes:

  • Linear Feasibility Problem (LFP) with X = Rn

+

  • Integer Feasibility Problem (IFP) with X = Zn

+

  • Efficient solution of LFP/IFP yields sol. of LP/IP via bisection

5

slide-6
SLIDE 6

The shape of a set of points

  • Lose dimensions but not too much accuracy

Given A1, . . . , An ∈ Rm find k ≪ m and points A′

1, . . . , A′ n ∈ Rk s.t.

A and A′ “have almost the same shape”

  • What is the shape of a set of points?

A A′ congruent sets have the same shape

  • Approximate congruence: A, A′ have almost the same shape if

∀i < j ≤ n (1 − ε)Ai − Aj ≤ A′

i − A′ j ≤ (1 + ε)Ai − Aj

for some small ε > 0

Assume norms are all Euclidean

6

slide-7
SLIDE 7

Losing dimensions in the RLM

Given X ⊆ Rn and b, A1, . . . , An ∈ Rm, find k ≪ m, b′, A′

1, . . . , A′ n ∈ Rk

such that: ∃x ∈ X b =

  • i≤n

xiAi

  • high dimensional

iff ∃x ∈ X b′ =

  • i≤n

xiA′

i

  • low dimensional

with high probability

  • If this is possible, then solve RLMX(b′, A′)
  • Since k ≪ m, solving RLMX(b′, A′) should be faster
  • RLMX(b′, A′) = RLMX(b, A) with high probability

7

slide-8
SLIDE 8

Losing dimensions = “projection”

In the plane, hopeless line 1 line 2 In 3D: no better

8

slide-9
SLIDE 9

The Johnson-Lindenstrauss Lemma

9

slide-10
SLIDE 10

Johnson-Lindenstrauss Lemma

Thm. Given A ⊆ Rm with |A| = n and ε > 0 there is k ∼ O( 1

ε2 ln n) and a

k × m matrix T s.t. ∀x, y ∈ A (1 − ε)x − y ≤ Tx − Ty ≤ (1 + ε)x − y If k × m matrix T is sampled componentwise from N(0, 1

√ k), then

A and TA have almost the same shape Discrete approximations of N(0, 1

√ k) can also be used, e.g.

P(Tij =

1 √ k) = P(Tij = − 1 √ k) = 1 6, P(Tij = 0) = 2 3

(This makes T sparser)

10

slide-11
SLIDE 11

Sampling to desired accuracy

  • Distortion has low probability:

∀x, y ∈ A P(Tx − Ty ≤ (1 − ε)x − y) ≤ 1 n2 ∀x, y ∈ A P(Tx − Ty ≥ (1 + ε)x − y) ≤ 1 n2

  • Probability ∃ pair x, y ∈ A distorting Euclidean distance:

union bound over

n

2

  • pairs

P(¬(A and TA have almost the same shape)) ≤

n

2

2

n2 = 1 − 1 n P(A and TA have almost the same shape) ≥ 1 n

⇒ re-sampling T gives JLL with arbitrarily high probability

11

slide-12
SLIDE 12

Sketch of a possible JLL proof

90% n=3 n=11 n=101 90% 90%

Thm. Let T be a k × m rectangular matrix with each component sampled from N(0, 1

√ k), and u ∈ Rm s.t. u = 1.

Then E(Tu2) = 1

dt t

  • 1 − t2

1 d¯ Sm Sm−1 O Tu 12

slide-13
SLIDE 13

In practice

  • Empirical estimation of C in k = Cε2 ln n: C ≈ 1.8

[Venkatasubramanian & Wang 2011]

  • Empirically, sample T very few times (e.g. once will do!)
  • n average Tx − Ty ≈ x − y, and distortion decreases exponentially with n

We only need a logarithmic number of dimensions in function of the number of points Surprising fact: k is independent of the original number of dimensions m

13

slide-14
SLIDE 14

Typical applications of JLL

Problems involving Euclidean distances only

  • Euclidean clustering

k-means, k-nearest neighbors

  • Linear regression

minx Ax − b2 where A is m × n with m ≫ n

14

slide-15
SLIDE 15

Applying the JLL to the RLM

15

slide-16
SLIDE 16

Projecting infeasibility

Thm. T : Rm → Rk a JLL random projection, b, A1, . . . , An ∈ Rm a RLMX

  • instance. For any given vector x ∈ X, we have:

(i) If b =

n

  • i=1

xiAi then Tb =

n

  • i=1

xiTAi (ii) If b =

n

  • i=1

xiAi then P

  • Tb =

n

  • i=1

xiTAi

  • ≥ 1 − 2e−Ck

(iii) If b =

n

  • i=1

yiAi for all y ∈ X ⊆ Rn, where |X| is finite, then P

  • ∀y ∈ X Tb =

n

  • i=1

yiTAi

  • ≥ 1 − 2|X|e−Ck

for some constant C > 0 (independent of n, k).

[VPL, arXiv:1507.00990v1/math.OC]

16

slide-17
SLIDE 17

Proof (ii)

Cor. ∀ε ∈ (0, 1) and z ∈ Rm, there is a constant C such that P((1 − ε)z ≤ Tz ≤ (1 + ε)z) ≥ 1 − 2e−Cε2k Proof By the JLL Lemma If z = 0, there is a constant C such that P(Tz = 0) ≥ 1 − 2e−Ck Proof Consider events A : Tz = 0 and B : (1 − ε)z ≤ Tz ≤ (1 + ε)z ⇒ A c ∩ B = ∅, othw Tz = 0 ⇒ (1 − ε)z ≤ Tz = 0 ⇒ z = 0, contradiction ⇒ B ⊆ A ⇒ P(A ) ≥ P(B) ≥ 1 − e−Cε2k by Corollary Holds ∀ε ∈ (0, 1) hence result Now it suffices to apply the Lemma to Ax − b

17

slide-18
SLIDE 18

Consequences of the main theorem

  • (i) and (ii): checking certificates

given x, with high probability b =

i xiAi ⇔ Tb = i xiTAi

  • (iii) RLMX whenever |X| is polynomially bounded

e.g. knapsack set {x ∈ {0, 1}n |

  • i≤n

αixi ≤ d} for a fixed d with α > 0

  • (iii) hints that LFP case is more complicated

as X = Rn

+ is not polynomially bounded

18

slide-19
SLIDE 19

Separating hyperplanes

When |X| is large, project separating hyperplanes instead

  • Convex C ⊆ Rm, x ∈ C: then ∃ hyperplane c separating x, C
  • In particular, true if C = cone(A1, . . . , An) for A ⊆ Rm
  • We aim to show x ∈ C ⇔ Tx ∈ TC with high probability
  • As above, if x ∈ C then Tx ∈ TC by linearity of T

real issue is proving the converse

19

slide-20
SLIDE 20

Projecting the separation

Thm.

Given c, b, A1, . . . , An ∈ Rm of unit norm s.t. b / ∈ cone{A1, . . . , An} pointed, ε > 0, c ∈ Rm s.t. c⊤b < −ε, c⊤Ai ≥ ε (i ≤ n), and T a random projector: P

  • Tb /

∈ cone{TA1, . . . , TAn}

  • ≥ 1 − 4(n + 1)e−C(ε2−ε3)k

for some constant C. Proof Let A be the event that T approximately preserves c − χ2 and c + χ2 for all χ ∈ {b, A1, . . . , An}. Since A consists of 2(n + 1) events, by the JLL Corollary (squared version) and the union bound, we get P(A ) ≥ 1 − 4(n + 1)e−C(ε2−ε3)k Now consider χ = b Tc, Tb = 1 4(T(c + b)2 − T(c − b)2) by JLL ≤ 1 4(c + b2 − c − b2) + ε 4(c + b2 + c − b2) = c⊤b + ε < 0 and similarly Tc, TAi ≥ 0 [VPL, arXiv:1507.00990v1/math.OC]

20

slide-21
SLIDE 21

Is this useful?

Previous results look like:

  • rig. LFP infeasible ⇒ P(proj. LFP infeasible) ≥ 1 − p(n)e−Cr(ε)k

where p, r two polynomials

  • Pick a suitable δ > 0
  • Choose k ∼ O(

1 Cr(ε)(ln p(n) + ln 1 δ)) so that RHS ≥ 1 − δ

  • Preserve infeasibility with probability ≥ 1 − δ
  • Useful for m ≤ n large enough that k ≪ m

21

slide-22
SLIDE 22

Consequences of projecting separations

  • Applicable to LFP
  • Probability depends on ε (the larger the better)
  • Largest ε given by LP

max{ε ≥ 0 | c⊤b ≤ −ε ∧ ∀i ≤ n (c⊤Ai ≥ ε)}

  • If cone(A1, . . . , An) is almost non-pointed, ε can be very small

22

slide-23
SLIDE 23

Projecting minimum distances to a cone

  • Thm.: minimum distance to a cone is approximately preserved
  • This result also works with non-pointed cones

Trade-off: need larger k, m, n

  • We appear to be all set for LFPs
  • Using bisection and LFP, also for LPs

23

slide-24
SLIDE 24

Main theorem for LFP projections

Established so far: Thm. Given δ > 0, ∃ sufficiently large m ≤ n such that: for any LFP input A, b where A is m × n we can sample a random k × m matrix T with k ≪ m and P(orig. LFP feasible ⇐ ⇒ proj. LFP feasible) ≥ 1 − δ

24

slide-25
SLIDE 25

Towards solving LPs

25

slide-26
SLIDE 26

Some results on uniform dense LFP

  • Matrix product TA takes too long

(call this an “implementation detail” and don’t count it)

  • Infeasible instances (sizes from 1000 × 1500 to 2000 × 2400):

Uniform ǫ k ≈ CPU saving accuracy (−1, 1) 0.1 0.5m 30% 50% (−1, 1) 0.15 0.25m 92% 0% (−1, 1) 0.2 0.12m 99.2% 0% (0, 1) 0.1 0.5m 10% 100% (0, 1) 0.15 0.25m 90% 100% (0, 1) 0.2 0.12m 97% 100%

  • Feasible instances:

– similar CPU savings – obviously 100% accuracy

26

slide-27
SLIDE 27

Certificates

  • Ax = b ⇒ TAx = Tb by linearity, however
  • Thm.: For x ≥ 0 s.t. TAx = Tb, Ax = b with probability 0
  • Can’t get certificate for original LFP using projected LFP!

27

slide-28
SLIDE 28

Can we solve LPs by bisection?

– Projected certificate is infeasible in original problem – Only get approximate optimal objective function value – No bound on error, no idea about how large m, n should be – Validated on “large enough” NetLib instances (with k ≈ 0.95m)

28

slide-29
SLIDE 29

Certificate retrieval from dual LFP

b r e a k i n g n e w s !

  • Primal min{c⊤x | Ax = b ∧ x ≥ 0} ⇒ dual max{b⊤y | A⊤y ≤ c}

Run bisection on projected LFP, threshold vℓ at itn. ℓ

  • Proj. LFP infeasible ⇒ ∃ unbounded dual ray λℓ s.t.

(Tb)⊤λℓ > vℓ ∧ (TA)⊤λℓ ≤ c ⇒ b⊤(T ⊤λℓ) > vℓ ∧ A⊤(T ⊤λℓ) ≤ c ⇒ T ⊤λℓ is a certificate for the original dual

  • L = set of itn. indices s.t. projected LFP infeasible

lim

ℓ∈L ℓ→∞

(T ⊤λℓ) = λ∗ (opt. dual sol.)

  • In practice: last iteration ℓ∗, get λ∗ ≈ T ⊤λℓ∗
  • Complementarity cond. ⇒ m basic cols of A, solve for x∗

29

slide-30
SLIDE 30

Current work

  • Implementation of certificate retrieval from dual
  • Random projections directly on dual LP

allows explicit feasibility & optimality guarantees

  • Results on projecting Integer Programs

we have many of these already!

  • Certificate retrieval using Plotkin-Shmoys-Tardos algorithm

30

slide-31
SLIDE 31

Some references

  • K. Vu, P.-L. Poirion, L. Liberti, Using the Johnson-Lindenstrauss lemma in

linear and integer programming, arXiv report 1507.00990v1/math.OC, 2015

  • K. Vu, P.-L. Poirion, L. Liberti, Gaussian random projections for Euclidean

membership problems, arXiv report 1509.00630v1/math.OC, 2015

  • W. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a

Hilbert space, in Contemporary Mathematics, Vol. 26, 189-206, 1984

  • S. Dasgupta and A. Gupta, An elementary proof of a theorem of Johnson and

Lindenstrauss, Random Structures and Algorithms, 22(1):60-65, 2003

  • S. Venkatasubramanian, Q. Wang, The Johnson-Lindenstrauss transform: an

empirical study, in Proceedings of ALENEX, 2011

  • J. Nash, C1 Isometric embeddings, in Annals of Mathematics, 60(3):383-396,

1954

31