eclipse an extreme scale linear program solver for web
play

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications - PowerPoint PPT Presentation

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications Kinjal Basu Amol Ghoting Rahul Mazumder Yao Pan LinkedIn AI LinkedIn AI MIT LinkedIn AI 1 Overview 2 ECLIPSE: Extreme Scale LP Solver Agenda 3 Applications 4


  1. ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications Kinjal Basu Amol Ghoting Rahul Mazumder Yao Pan LinkedIn AI LinkedIn AI MIT LinkedIn AI

  2. 1 Overview 2 ECLIPSE: Extreme Scale LP Solver Agenda 3 Applications 4 System Architecture 5 Experimental Results

  3. Overview

  4. Introduction Large-Scale Linear Programs (LP) has several applications on web

  5. Problems of Extreme Scale Billions to Trillions of Variables ● Ad-hoc Solutions ● Splitting the problem to smaller sub-problem à No guarantee of optimality ● Exploit the Structure of the Problem ● Solve a Perturbation of the Primal Problem. ● Smooth Gradient ● Efficient computation ●

  6. Motivating Example Friend or Connection Matching Problem Maximize Value ● Total invites sent is greater than a threshold ● Limit on invitations per member to prevent ● overwhelming members 𝑞 ! - Value Model Scale: ● 𝑞 " - Invitation Model 𝐽 ≈ 10 % ● • 𝑦 #$ - Probability of showing user j to user i 𝐾 ≈ 10 & ● • 𝑜 ≈ 10 !" • ( 1 Trillion Decision Variables)

  7. General Framework c T x Users 𝑗 , Items 𝑘 , and 𝑦 #$ is the association min ● between (𝑗, 𝑘) x 𝑜 = 𝐽𝐾 can range in 100s of millions to 10s of trillions s.t. Ax  b ● 𝐷 # are simple constraints (i.e. allows for efficient ● x i 2 C i , i 2 [ I ] projections) Global Constraints ✓ A (1) Cohort Level Constraints A ✓ Eg: Total Invite Constraint 0 1 D 11 . . . D 1 I . . A (2) Item level constraints . . = B C . . · · · @ A Eg: Limits on invitation per user D m 2 1 . . . D m 2 I

  8. ECLIPSE: Extreme Scale LP Solver

  9. Solving The Problem c T x min s.t. x i 2 C i , i 2 [ I ] P ∗ Ax  b, 0 := Primal LP: x Old idea: Perturbation of the LP (Mangasarian & Meyer ’79; Nesterov ‘05; Osher et al ‘11…) c T x + γ 2 x T x P ∗ min s.t. x i 2 C i , i 2 [ I ] γ := Ax  b, Primal QP: x Dualize c T x + γ n 2 x T x + λ T ( Ax � b ) o g γ ( λ ) := min Dual QP: x ∈ Q C i length ( λ ) is small Key Observation: = P ∗ = max λ ≥ 0 g γ ( λ ) g ∗ γ := Solve the Dual QP: γ Strong duality

  10. Solving The Problem c T x min s.t. x i 2 C i , i 2 [ I ] Ax  b, P ∗ 0 := Primal: x c T x + γ 2 x T x x ∗ s.t. γ 2 argmin x i 2 C i , i 2 [ I ] Ax  b, x | − • Observation-1: Exact Regularization (Mangasarian & Meyer ’79; Friedlander Tseng ‘08) γ > 0 such that x ∗ ∃ ¯ γ solves LP for all γ ≤ ¯ γ c T x + γ n 2 x T x + λ T ( Ax � b ) o g γ ( λ ) := min Dual: x ∈ Q C i g ∗ = max λ ≥ 0 g γ ( λ ) γ := • Observation-2: Error Bound (Nesterov ‘05) | g ∗ γ − P ∗ 0 | = O ( γ )

  11. Solving The Problem = max λ ≥ 0 g γ ( λ ) ECLIPSE Algorithm • Proximal Gradient Based methods • Observation-1: Dual objective is smooth (implicitly defined) (Acceleration, Restarts) [Nesterov ‘05] • Optimal convergence rates. λ 7! g γ ( λ ) is O (1 / γ ) -smooth. n o • Observation-2: Gradient expression (Danskin’s Theorem) Q c T x + γ n 2 x T x + λ T ( Ax � b ) o r g γ ( λ ) = A ˆ x ( λ ) � b x ( λ ) 2 argmin ˆ n x ∈ Q C i ✓ ◆ � 1 γ ( A T λ + c ) i x i ( λ ) = Π C i ˆ n • Key bottleneck: Matrix-vector multiplication • Simple projection operation

  12. Overall Algorithm Input: Get Primal: At Iteration k: Dual Compute Gradient: Next Iteration Update Dual: GD: AGD:

  13. Applications

  14. Volume Optimization Maximize Sessions Total number of emails / ● notifications bounded Clicks above a threshold ● Disablement below a threshold ● Generalized from global to cohort level systems and member level systems

  15. Multi-Objective Optimization Maximize Metric 1 ● Metric 2 is greater than a ● minimum Metric 3 is bounded ● … ● Most Product Applications ● Engagement vs Revenue ● Sessions vs Notification / ● Email Volume Member Value vs Annoyance ●

  16. System Infrastructure

  17. System Architecture Data is collected from different sources • and restructured to form Input 𝐵, 𝑐, 𝑑

  18. System Architecture Data is collected from different sources • and restructured to form Input 𝐵, 𝑐, 𝑑 The solver is called which runs the overall • iterations. The data is split into multiple executors and • they perform matrix vector multiplications in parallel The driver collects the dual and broadcasts • it back to continue the iterations

  19. System Architecture Data is collected from different sources • and restructured to form Input 𝐵, 𝑐, 𝑑 The solver is called which runs the overall • iterations. The data is split into multiple executors and • they perform matrix vector multiplications in parallel The driver collects the dual and broadcasts • it back to continue the iterations On convergence the final duals are • returned which are used in online serving

  20. Detailed Spark Implementation Data Representation Estimating Primal Estimating Gradient • Customized DistributedMatrix • • Component wise Matrix Most computationally API Multiplications and expensive step to get Projections are done in • • : BlockMatrix API from The worst-case complexity is parallel 𝑃 𝑜 = 𝐽𝐾 Apache MLLib We cache 𝐵 in executor and • • : Leverage Diagonal broadcast duals to minimize structure and implement communication cost. DistributedVector API using • RDD (index, Vector) The overall complexity to get the primal is 𝑃(𝐾)

  21. Experimental Results

  22. Comparative Results We compare with a technique of • splitting the problem (SOTA): Please see the full paper for other comparisons

  23. Real Data Results Test on large-scale volume • optimization and matching problems Spark 2.3 with up to 800 • executors 1 Trillion use case • converged within 12 hours SCS: O’Donoghue et al (2016)

  24. Key Takeaways

  25. Key Takeaways A framework for solving structured LP problems arising in several applications • from internet industry Most multi-objective optimization can be framed through this. • Given the computation resources, we can scale to extremely large problems. • We can easily scale up to 1 Trillion variables on real data. •

  26. Thank you

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend