The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , - PowerPoint PPT Presentation

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016

The gist • Goal : solving very large LPs min { c ⊤ x | Ax = b ∧ x ≥ 0 } • Trade-off : approximate / wrong with low probability: OK • Means : project cols of Ax = b to random subspace T , get Ax = b ∧ x ≥ 0 ⇔ TAx = Tb ∧ x ≥ 0 with high probability • Bisection : solve LP using [ TAx = Tb ∧ x ≥ 0] as oracle 2

Plan • Restricted Linear Membership • Johnson-Lindenstrauss Lemma • Applying JLL to RLM • Towards solving LPs 3

Restricted Linear Membership 4

Linear feasibility with constrained multipliers Restricted Linear Membership (RLM) Given vectors A 1 , . . . , A n , b ∈ R m and X ⊆ R n , is there x ∈ X s.t. � b = x i A i ? i ≤ n RLM X is a fundamental problem class, which subsumes: • Linear Feasibility Problem (LFP) with X = R n + • Integer Feasibility Problem (IFP) with X = Z n + • Efficient solution of LFP/IFP yields sol. of LP/IP via bisection 5

The shape of a set of points • Lose dimensions but not too much accuracy Given A 1 , . . . , A n ∈ R m find k ≪ m and points A ′ n ∈ R k s.t. 1 , . . . , A ′ A and A ′ “have almost the same shape” • What is the shape of a set of points? A ′ A congruent sets have the same shape • Approximate congruence: A, A ′ have almost the same shape if (1 − ε ) � A i − A j � ≤ � A ′ i − A ′ j � ≤ (1 + ε ) � A i − A j � ∀ i < j ≤ n for some small ε > 0 Assume norms are all Euclidean 6

Losing dimensions in the RLM Given X ⊆ R n and b, A 1 , . . . , A n ∈ R m , find k ≪ m , b ′ , A ′ 1 , . . . , A ′ n ∈ R k such that: ∃ x ∈ X b ′ = � � x i A ′ ∃ x ∈ X b = x i A i iff i i ≤ n i ≤ n � �� high dimensional low dimensional with high probability • If this is possible, then solve RLM X ( b ′ , A ′ ) • Since k ≪ m , solving RLM X ( b ′ , A ′ ) should be faster • RLM X ( b ′ , A ′ ) = RLM X ( b, A ) with high probability 7

Losing dimensions = “projection” In the plane, hopeless line 2 line 1 In 3D: no better 8

The Johnson-Lindenstrauss Lemma 9

Johnson-Lindenstrauss Lemma Thm. Given A ⊆ R m with | A | = n and ε > 0 there is k ∼ O ( 1 ε 2 ln n ) and a k × m matrix T s.t. ∀ x, y ∈ A (1 − ε ) � x − y � ≤ � Tx − Ty � ≤ (1 + ε ) � x − y � If k × m matrix T is sampled componentwise from N (0 , 1 √ k ), then A and TA have almost the same shape Discrete approximations of N (0 , 1 √ k ) can also be used, e.g. 1 k ) = P ( T ij = − 1 k ) = 1 6 , P ( T ij = 0) = 2 P ( T ij = √ √ 3 (This makes T sparser) 10

Sampling to desired accuracy • Distortion has low probability: 1 ∀ x, y ∈ A P ( � Tx − Ty � ≤ (1 − ε ) � x − y � ) ≤ n 2 1 ∀ x, y ∈ A P ( � Tx − Ty � ≥ (1 + ε ) � x − y � ) ≤ n 2 • Probability ∃ pair x, y ∈ A distorting Euclidean distance: � n � union bound over pairs 2 � 2 � n n 2 = 1 − 1 P ( ¬ ( A and TA have almost the same shape)) ≤ 2 n 1 P ( A and TA have almost the same shape) ≥ n ⇒ re-sampling T gives JLL with arbitrarily high probability 11

Sketch of a possible JLL proof Thm. Let T be a k × m rectangular matrix with each component sampled from 90% 90% 90% k ), and u ∈ R m s.t. � u � = 1. N (0 , 1 √ Then E( � Tu � 2 ) = 1 n=3 n=11 n=101 dt d ¯ S m O Tu t S m − 1 � 1 − t 2 1 12

In practice • Empirical estimation of C in k = C ε 2 ln n : C ≈ 1 . 8 [Venkatasubramanian & Wang 2011] • Empirically, sample T very few times (e.g. once will do!) on average � Tx − Ty � ≈ � x − y � , and distortion decreases exponentially with n We only need a logarithmic number of dimensions in function of the number of points Surprising fact: k is independent of the original number of dimensions m 13

Typical applications of JLL Problems involving Euclidean distances only • Euclidean clustering k -means, k -nearest neighbors • Linear regression min x � Ax − b � 2 where A is m × n with m ≫ n 14

Applying the JLL to the RLM 15

Projecting infeasibility Thm. T : R m → R k a JLL random projection, b, A 1 , . . . , A n ∈ R m a RLM X instance. For any given vector x ∈ X , we have: n n � � (i) If b = x i A i then Tb = x i TA i i =1 i =1 � � n n � � ≥ 1 − 2 e −C k (ii) If b � = x i A i then P Tb � = x i TA i i =1 i =1 n � y i A i for all y ∈ X ⊆ R n , where | X | is finite, then (iii) If b � = i =1 � � n � ≥ 1 − 2 | X | e −C k P ∀ y ∈ X Tb � = y i TA i i =1 for some constant C > 0 (independent of n, k ). [VPL, arXiv:1507.00990v1/math.OC] 16

Proof (ii) Cor. ∀ ε ∈ (0 , 1) and z ∈ R m , there is a constant C such that P ((1 − ε ) � z � ≤ � Tz � ≤ (1 + ε ) � z � ) ≥ 1 − 2 e −C ε 2 k Proof By the JLL Lemma If z � = 0, there is a constant C such that P ( Tz � = 0) ≥ 1 − 2 e −C k Proof Consider events A : Tz � = 0 and B : (1 − ε ) � z � ≤ � Tz � ≤ (1 + ε ) � z � ⇒ A c ∩ B = ∅ , othw Tz = 0 ⇒ (1 − ε ) � z � ≤ � Tz � = 0 ⇒ z = 0, contradiction ⇒ B ⊆ A ⇒ P ( A ) ≥ P ( B ) ≥ 1 − e −C ε 2 k by Corollary Holds ∀ ε ∈ (0 , 1) hence result Now it suffices to apply the Lemma to Ax − b 17

Consequences of the main theorem • (i) and (ii): checking certificates given x , with high probability b = � i x i A i ⇔ Tb = � i x i TA i • (iii) RLM X whenever | X | is polynomially bounded e.g. knapsack set { x ∈ { 0 , 1 } n | � α i x i ≤ d } for a fixed d i ≤ n with α > 0 • (iii) hints that LFP case is more complicated as X = R n + is not polynomially bounded 18

Separating hyperplanes When | X | is large, project separating hyperplanes instead • Convex C ⊆ R m , x �∈ C : then ∃ hyperplane c separating x , C • In particular, true if C = cone( A 1 , . . . , A n ) for A ⊆ R m • We aim to show x ∈ C ⇔ Tx ∈ TC with high probability • As above, if x ∈ C then Tx ∈ TC by linearity of T real issue is proving the converse 19

Projecting the separation Thm. Given c, b, A 1 , . . . , A n ∈ R m of unit norm s.t. b / ∈ cone { A 1 , . . . , A n } pointed, ε > 0, c ∈ R m s.t. c ⊤ b < − ε , c ⊤ A i ≥ ε ( i ≤ n ), and T a random projector: � � ≥ 1 − 4( n + 1) e −C ( ε 2 − ε 3 ) k P ∈ cone { TA 1 , . . . , TA n } Tb / for some constant C . Proof Let A be the event that T approximately preserves � c − χ � 2 and � c + χ � 2 for all χ ∈ { b, A 1 , . . . , A n } . Since A consists of 2( n + 1) events, by the JLL Corollary (squared version) and the union bound, we get P ( A ) ≥ 1 − 4( n + 1) e −C ( ε 2 − ε 3 ) k Now consider χ = b � Tc, Tb � = 1 4( � T ( c + b ) � 2 − � T ( c − b ) � 2 ) ≤ 1 4( � c + b � 2 − � c − b � 2 ) + ε 4( � c + b � 2 + � c − b � 2 ) by JLL = c ⊤ b + ε < 0 and similarly � Tc, TA i � ≥ 0 [VPL, arXiv:1507.00990v1/math.OC] 20

Is this useful? Previous results look like: orig. LFP infeasible ⇒ P (proj. LFP infeasible) ≥ 1 − p ( n ) e −C r ( ε ) k where p, r two polynomials • Pick a suitable δ > 0 C r ( ε ) (ln p ( n ) + ln 1 1 • Choose k ∼ O ( δ )) so that RHS ≥ 1 − δ • Preserve infeasibility with probability ≥ 1 − δ • Useful for m ≤ n large enough that k ≪ m 21

Consequences of projecting separations • Applicable to LFP • Probability depends on ε (the larger the better) • Largest ε given by LP max { ε ≥ 0 | c ⊤ b ≤ − ε ∧ ∀ i ≤ n ( c ⊤ A i ≥ ε ) } • If cone( A 1 , . . . , A n ) is almost non-pointed, ε can be very small 22

Projecting minimum distances to a cone • Thm. : minimum distance to a cone is approximately preserved • This result also works with non-pointed cones Trade-off: need larger k, m, n • We appear to be all set for LFPs • Using bisection and LFP, also for LPs 23

Main theorem for LFP projections Established so far: Thm. Given δ > 0, ∃ sufficiently large m ≤ n such that: for any LFP input A, b where A is m × n we can sample a random k × m matrix T with k ≪ m and P (orig. LFP feasible ⇐ ⇒ proj. LFP feasible) ≥ 1 − δ 24

Towards solving LPs 25

Some results on uniform dense LFP • Matrix product TA takes too long (call this an “implementation detail” and don’t count it) • Infeasible instances (sizes from 1000 × 1500 to 2000 × 2400 ) : Uniform ǫ k ≈ CPU saving accuracy ( − 1 , 1) 0 . 1 0 . 5 m 30% 50% ( − 1 , 1) 0 . 15 0 . 25 m 92% 0% ( − 1 , 1) 0 . 2 0 . 12 m 99 . 2% 0% (0 , 1) 0 . 1 0 . 5 m 10% 100% (0 , 1) 0 . 15 0 . 25 m 90% 100% (0 , 1) 0 . 2 0 . 12 m 97% 100% • Feasible instances : – similar CPU savings – obviously 100% accuracy 26

Certificates • Ax = b ⇒ TAx = Tb by linearity, however • Thm. : For x ≥ 0 s.t. TAx = Tb , Ax = b with probability 0 • Can’t get certificate for original LFP using projected LFP! 27

Can we solve LPs by bisection? – Projected certificate is infeasible in original problem – Only get approximate optimal objective function value – No bound on error, no idea about how large m, n should be – Validated on “large enough” NetLib instances (with k ≈ 0 . 95 m ) 28

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , - PowerPoint PPT Presentation

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016 The gist Goal : solving very large LPs min { c x | Ax = b x 0 }

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 2011 joint work with Daniel

Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1.

Lipschitz Quotients [S. Bates], W.B.J., J. Lindenstrauss, D. Preiss, G. Schechtman Background

Burnsides Orbit Counting Lemma Drew Johnson November 17, 2013 Drew Johnson Burnsides

The Pumping Lemma for Regular Languages The Pumping Lemma forRegular Languages p.1/39

Theory of Computer Science C4. Regular Languages: Pumping Lemma, Closure Properties and

Linear Programming Linear Programming In a linear programming problem, there is a set of

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 2012 joint work

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

More on Polyhedra and Farkas Lemma Marco Chiarandini Department of Mathematics & Computer

The expression lemma [Ralph Lmmel, Ondrej Rypacek] Joo Vinagre MAPi Thematic Seminar, June

Sharing Clinical Trial Data at Johnson & Johnson Dr. Joanne Waldstreicher Chief Medical

Lecture 2: Linear Programming and Duality Lecture Outline Part I: Linear Programming and

The Pumping Lemma for CFLs Statement Applications 1 Intuition Recall the pumping lemma

The Pumping Lemma: limitations of regular languages Informatics 2A: Lecture 7 John Longley

Evaluation of accuracy, integrity and availability of ARNS multi-constellation signals for

Motore di calcolo Sistemi di Output Conclusioni A.A. 2017-2018 2/80

Verasco: Formal verification of a C static analyzer based on abstract interpretation

Camilo Thorne, Diego Calvanese KRDB Centre Free University of Bozen-Bolzano Via della Mostra 4

Weighted Automata and Logics for Infinite Nested Words Manfred Droste and Stefan D uck

MAD families, splitting families and large continuum Vera Fischer Kurt G odel Research Center

Peter Bandzi, Ahmed Maged @pbandzi @amaged Nov 2015 1 Assembling a VPN in the Cloud Service

On the complexity of the asymmetric VPN problem Thomas Rothvo & Laura Sanit` a Institute

Sambuz

Useful Links

Newsletter

Mail Us

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , - PowerPoint PPT Presentation

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016 The gist Goal : solving very large LPs min { c x | Ax = b x 0 }

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 2011 joint work with Daniel

Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1.

Lipschitz Quotients [S. Bates], W.B.J., J. Lindenstrauss, D. Preiss, G. Schechtman Background

Burnsides Orbit Counting Lemma Drew Johnson November 17, 2013 Drew Johnson Burnsides

The Pumping Lemma for Regular Languages The Pumping Lemma forRegular Languages p.1/39

Theory of Computer Science C4. Regular Languages: Pumping Lemma, Closure Properties and

Linear Programming Linear Programming In a linear programming problem, there is a set of

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 2012 joint work

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

More on Polyhedra and Farkas Lemma Marco Chiarandini Department of Mathematics &amp; Computer

The expression lemma [Ralph Lmmel, Ondrej Rypacek] Joo Vinagre MAPi Thematic Seminar, June

Sharing Clinical Trial Data at Johnson &amp; Johnson Dr. Joanne Waldstreicher Chief Medical

Lecture 2: Linear Programming and Duality Lecture Outline Part I: Linear Programming and

The Pumping Lemma for CFLs Statement Applications 1 Intuition Recall the pumping lemma

The Pumping Lemma: limitations of regular languages Informatics 2A: Lecture 7 John Longley

Evaluation of accuracy, integrity and availability of ARNS multi-constellation signals for

Motore di calcolo Sistemi di Output Conclusioni A.A. 2017-2018 2/80

Verasco: Formal verification of a C static analyzer based on abstract interpretation

Camilo Thorne, Diego Calvanese KRDB Centre Free University of Bozen-Bolzano Via della Mostra 4

Weighted Automata and Logics for Infinite Nested Words Manfred Droste and Stefan D uck

MAD families, splitting families and large continuum Vera Fischer Kurt G odel Research Center

Peter Bandzi, Ahmed Maged @pbandzi @amaged Nov 2015 1 Assembling a VPN in the Cloud Service

On the complexity of the asymmetric VPN problem Thomas Rothvo &amp; Laura Sanit` a Institute

Sambuz

Useful Links

Newsletter

Mail Us

More on Polyhedra and Farkas Lemma Marco Chiarandini Department of Mathematics & Computer

Sharing Clinical Trial Data at Johnson & Johnson Dr. Joanne Waldstreicher Chief Medical

On the complexity of the asymmetric VPN problem Thomas Rothvo & Laura Sanit` a Institute