the johnson lindenstrauss lemma in linear programming
play

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , - PowerPoint PPT Presentation

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016 The gist Goal : solving very large LPs min { c x | Ax = b x 0 }


  1. The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016

  2. The gist • Goal : solving very large LPs min { c ⊤ x | Ax = b ∧ x ≥ 0 } • Trade-off : approximate / wrong with low probability: OK • Means : project cols of Ax = b to random subspace T , get Ax = b ∧ x ≥ 0 ⇔ TAx = Tb ∧ x ≥ 0 with high probability • Bisection : solve LP using [ TAx = Tb ∧ x ≥ 0] as oracle 2

  3. Plan • Restricted Linear Membership • Johnson-Lindenstrauss Lemma • Applying JLL to RLM • Towards solving LPs 3

  4. Restricted Linear Membership 4

  5. Linear feasibility with constrained multipliers Restricted Linear Membership (RLM) Given vectors A 1 , . . . , A n , b ∈ R m and X ⊆ R n , is there x ∈ X s.t. � b = x i A i ? i ≤ n RLM X is a fundamental problem class, which subsumes: • Linear Feasibility Problem (LFP) with X = R n + • Integer Feasibility Problem (IFP) with X = Z n + • Efficient solution of LFP/IFP yields sol. of LP/IP via bisection 5

  6. The shape of a set of points • Lose dimensions but not too much accuracy Given A 1 , . . . , A n ∈ R m find k ≪ m and points A ′ n ∈ R k s.t. 1 , . . . , A ′ A and A ′ “have almost the same shape” • What is the shape of a set of points? A ′ A congruent sets have the same shape • Approximate congruence: A, A ′ have almost the same shape if (1 − ε ) � A i − A j � ≤ � A ′ i − A ′ j � ≤ (1 + ε ) � A i − A j � ∀ i < j ≤ n for some small ε > 0 Assume norms are all Euclidean 6

  7. Losing dimensions in the RLM Given X ⊆ R n and b, A 1 , . . . , A n ∈ R m , find k ≪ m , b ′ , A ′ 1 , . . . , A ′ n ∈ R k such that: ∃ x ∈ X b ′ = � � x i A ′ ∃ x ∈ X b = x i A i iff i i ≤ n i ≤ n � �� � � �� � high dimensional low dimensional with high probability • If this is possible, then solve RLM X ( b ′ , A ′ ) • Since k ≪ m , solving RLM X ( b ′ , A ′ ) should be faster • RLM X ( b ′ , A ′ ) = RLM X ( b, A ) with high probability 7

  8. Losing dimensions = “projection” In the plane, hopeless line 2 line 1 In 3D: no better 8

  9. The Johnson-Lindenstrauss Lemma 9

  10. Johnson-Lindenstrauss Lemma Thm. Given A ⊆ R m with | A | = n and ε > 0 there is k ∼ O ( 1 ε 2 ln n ) and a k × m matrix T s.t. ∀ x, y ∈ A (1 − ε ) � x − y � ≤ � Tx − Ty � ≤ (1 + ε ) � x − y � If k × m matrix T is sampled componentwise from N (0 , 1 √ k ), then A and TA have almost the same shape Discrete approximations of N (0 , 1 √ k ) can also be used, e.g. 1 k ) = P ( T ij = − 1 k ) = 1 6 , P ( T ij = 0) = 2 P ( T ij = √ √ 3 (This makes T sparser) 10

  11. Sampling to desired accuracy • Distortion has low probability: 1 ∀ x, y ∈ A P ( � Tx − Ty � ≤ (1 − ε ) � x − y � ) ≤ n 2 1 ∀ x, y ∈ A P ( � Tx − Ty � ≥ (1 + ε ) � x − y � ) ≤ n 2 • Probability ∃ pair x, y ∈ A distorting Euclidean distance: � n � union bound over pairs 2 � 2 � n n 2 = 1 − 1 P ( ¬ ( A and TA have almost the same shape)) ≤ 2 n 1 P ( A and TA have almost the same shape) ≥ n ⇒ re-sampling T gives JLL with arbitrarily high probability 11

  12. Sketch of a possible JLL proof Thm. Let T be a k × m rectangular matrix with each component sampled from 90% 90% 90% k ), and u ∈ R m s.t. � u � = 1. N (0 , 1 √ Then E( � Tu � 2 ) = 1 n=3 n=11 n=101 dt d ¯ S m O Tu t S m − 1 � 1 − t 2 1 12

  13. In practice • Empirical estimation of C in k = C ε 2 ln n : C ≈ 1 . 8 [Venkatasubramanian & Wang 2011] • Empirically, sample T very few times (e.g. once will do!) on average � Tx − Ty � ≈ � x − y � , and distortion decreases exponentially with n We only need a logarithmic number of dimensions in function of the number of points Surprising fact: k is independent of the original number of dimensions m 13

  14. Typical applications of JLL Problems involving Euclidean distances only • Euclidean clustering k -means, k -nearest neighbors • Linear regression min x � Ax − b � 2 where A is m × n with m ≫ n 14

  15. Applying the JLL to the RLM 15

  16. Projecting infeasibility Thm. T : R m → R k a JLL random projection, b, A 1 , . . . , A n ∈ R m a RLM X instance. For any given vector x ∈ X , we have: n n � � (i) If b = x i A i then Tb = x i TA i i =1 i =1 � � n n � � ≥ 1 − 2 e −C k (ii) If b � = x i A i then P Tb � = x i TA i i =1 i =1 n � y i A i for all y ∈ X ⊆ R n , where | X | is finite, then (iii) If b � = i =1 � � n � ≥ 1 − 2 | X | e −C k P ∀ y ∈ X Tb � = y i TA i i =1 for some constant C > 0 (independent of n, k ). [VPL, arXiv:1507.00990v1/math.OC] 16

  17. Proof (ii) Cor. ∀ ε ∈ (0 , 1) and z ∈ R m , there is a constant C such that P ((1 − ε ) � z � ≤ � Tz � ≤ (1 + ε ) � z � ) ≥ 1 − 2 e −C ε 2 k Proof By the JLL Lemma If z � = 0, there is a constant C such that P ( Tz � = 0) ≥ 1 − 2 e −C k Proof Consider events A : Tz � = 0 and B : (1 − ε ) � z � ≤ � Tz � ≤ (1 + ε ) � z � ⇒ A c ∩ B = ∅ , othw Tz = 0 ⇒ (1 − ε ) � z � ≤ � Tz � = 0 ⇒ z = 0, contradiction ⇒ B ⊆ A ⇒ P ( A ) ≥ P ( B ) ≥ 1 − e −C ε 2 k by Corollary Holds ∀ ε ∈ (0 , 1) hence result Now it suffices to apply the Lemma to Ax − b 17

  18. Consequences of the main theorem • (i) and (ii): checking certificates given x , with high probability b = � i x i A i ⇔ Tb = � i x i TA i • (iii) RLM X whenever | X | is polynomially bounded e.g. knapsack set { x ∈ { 0 , 1 } n | � α i x i ≤ d } for a fixed d i ≤ n with α > 0 • (iii) hints that LFP case is more complicated as X = R n + is not polynomially bounded 18

  19. Separating hyperplanes When | X | is large, project separating hyperplanes instead • Convex C ⊆ R m , x �∈ C : then ∃ hyperplane c separating x , C • In particular, true if C = cone( A 1 , . . . , A n ) for A ⊆ R m • We aim to show x ∈ C ⇔ Tx ∈ TC with high probability • As above, if x ∈ C then Tx ∈ TC by linearity of T real issue is proving the converse 19

  20. Projecting the separation Thm. Given c, b, A 1 , . . . , A n ∈ R m of unit norm s.t. b / ∈ cone { A 1 , . . . , A n } pointed, ε > 0, c ∈ R m s.t. c ⊤ b < − ε , c ⊤ A i ≥ ε ( i ≤ n ), and T a random projector: � � ≥ 1 − 4( n + 1) e −C ( ε 2 − ε 3 ) k P ∈ cone { TA 1 , . . . , TA n } Tb / for some constant C . Proof Let A be the event that T approximately preserves � c − χ � 2 and � c + χ � 2 for all χ ∈ { b, A 1 , . . . , A n } . Since A consists of 2( n + 1) events, by the JLL Corollary (squared version) and the union bound, we get P ( A ) ≥ 1 − 4( n + 1) e −C ( ε 2 − ε 3 ) k Now consider χ = b � Tc, Tb � = 1 4( � T ( c + b ) � 2 − � T ( c − b ) � 2 ) ≤ 1 4( � c + b � 2 − � c − b � 2 ) + ε 4( � c + b � 2 + � c − b � 2 ) by JLL = c ⊤ b + ε < 0 and similarly � Tc, TA i � ≥ 0 [VPL, arXiv:1507.00990v1/math.OC] 20

  21. Is this useful? Previous results look like: orig. LFP infeasible ⇒ P (proj. LFP infeasible) ≥ 1 − p ( n ) e −C r ( ε ) k where p, r two polynomials • Pick a suitable δ > 0 C r ( ε ) (ln p ( n ) + ln 1 1 • Choose k ∼ O ( δ )) so that RHS ≥ 1 − δ • Preserve infeasibility with probability ≥ 1 − δ • Useful for m ≤ n large enough that k ≪ m 21

  22. Consequences of projecting separations • Applicable to LFP • Probability depends on ε (the larger the better) • Largest ε given by LP max { ε ≥ 0 | c ⊤ b ≤ − ε ∧ ∀ i ≤ n ( c ⊤ A i ≥ ε ) } • If cone( A 1 , . . . , A n ) is almost non-pointed, ε can be very small 22

  23. Projecting minimum distances to a cone • Thm. : minimum distance to a cone is approximately preserved • This result also works with non-pointed cones Trade-off: need larger k, m, n • We appear to be all set for LFPs • Using bisection and LFP, also for LPs 23

  24. Main theorem for LFP projections Established so far: Thm. Given δ > 0, ∃ sufficiently large m ≤ n such that: for any LFP input A, b where A is m × n we can sample a random k × m matrix T with k ≪ m and P (orig. LFP feasible ⇐ ⇒ proj. LFP feasible) ≥ 1 − δ 24

  25. Towards solving LPs 25

  26. Some results on uniform dense LFP • Matrix product TA takes too long (call this an “implementation detail” and don’t count it) • Infeasible instances (sizes from 1000 × 1500 to 2000 × 2400 ) : Uniform ǫ k ≈ CPU saving accuracy ( − 1 , 1) 0 . 1 0 . 5 m 30% 50% ( − 1 , 1) 0 . 15 0 . 25 m 92% 0% ( − 1 , 1) 0 . 2 0 . 12 m 99 . 2% 0% (0 , 1) 0 . 1 0 . 5 m 10% 100% (0 , 1) 0 . 15 0 . 25 m 90% 100% (0 , 1) 0 . 2 0 . 12 m 97% 100% • Feasible instances : – similar CPU savings – obviously 100% accuracy 26

  27. Certificates • Ax = b ⇒ TAx = Tb by linearity, however • Thm. : For x ≥ 0 s.t. TAx = Tb , Ax = b with probability 0 • Can’t get certificate for original LFP using projected LFP! 27

  28. Can we solve LPs by bisection? – Projected certificate is infeasible in original problem – Only get approximate optimal objective function value – No bound on error, no idea about how large m, n should be – Validated on “large enough” NetLib instances (with k ≈ 0 . 95 m ) 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend