between discrete and continuous optimization
play

Between Discrete and Continuous Optimization: Submodularity & - PowerPoint PPT Presentation

Between Discrete and Continuous Optimization: Submodularity & Optimization Stefanie Jegelka, MIT Simons Bootcamp Aug 2017 Submodularity set function: F ( S ) V S submodularity = diminishing returns S T, a / T F


  1. Between Discrete and Continuous Optimization: 
 Submodularity & Optimization Stefanie Jegelka, MIT 
 Simons Bootcamp Aug 2017

  2. Submodularity set function: F ( S ) V S • submodularity = “diminishing returns” ∀ S ⊆ T, a / ∈ T F ( S ∪ { a } ) − F ( S ) ≥ F ( T ∪ { a } ) − F ( T )

  3. Submodularity set function: F ( S ) • diminishing returns: ∀ S ⊆ T, a / ∈ T F ( S ∪ { a } ) − F ( S ) ≥ F ( T ∪ { a } ) − F ( T ) • equivalent general definition: ∀ A, B ⊆ V F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B )

  4. Why is this interesting? Importance of convex functions (Lovász, 1983): • “occur in many models in economy, engineering and other sciences”, “often the only nontrivial property that can be stated in general” • preserved under many operations and transformations: larger effective range of results • sufficient structure for a “mathematically beautiful and practically useful theory” • efficient minimization “It is less apparent, but we claim and hope to prove to a certain extent, that a similar role is played in discrete optimization by submodular set-functions“ […] 


  5. Examples of submodular set functions • linear functions • discrete entropy • discrete mutual information • matrix rank functions • matroid rank functions (“combinatorial rank”) • coverage • diffusion in networks • volume (by log determinant) • graph cuts • …

  6. 
 
 Roadmap • Optimizing submodular set functions: 
 discrete optimization via continuous optimization 
 • Submodularity more generally: 
 continuous optimization via discrete optimization 
 • Further connections

  7. 
 
 Roadmap • Optimizing submodular set functions 
 via continuous optimization 
 Key Question: 
 Submodularity = Discrete Convexity or Discrete Concavity? 
 (Lovász, Fujishige, Murota, …)

  8. Continuous extensions S ⊆ V F ( S ) min x ∈ { 0 , 1 } n F ( x ) min ⇔ • LP relaxation? 
 nonlinear cost function: exponentially many variables… f : [0 , 1] n → R F : { 0 , 1 } n → R

  9. Nonlinear extensions & optimization nonlinear extension/optimization f : [0 , 1] n → R F : { 0 , 1 } n → R z ∈ conv( C ) ⊆ [0 , 1] n f ( z ) min x ∈ C ⊆ { 0 , 1 } n F ( x ) min

  10. 
 Generic construction f : [0 , 1] n → R F : { 0 , 1 } n → R 1 .5 a a discrete set: 
 0 .5 b b continuous z T = {a,d} 0 0 c c 1 .8 d d • Define probability measure over subsets (joint over coordinates) such that marginals agree with z : 
 P ( i ∈ S ) = z i • Extension: 
 f ( z ) = E [ F ( S )] • for discrete z : f ( z ) = F ( z )

  11. Independent coordinates .5 a f ( z ) = E [ F ( S )] .5 b 0 c Y Y P ( S ) = (1 − z j ) z i · .8 d i ∈ S j / ∈ S • is a multilinear polynomial: multilinear extension 
 f ( z ) • neither convex nor concave…

  12. Lovász extension P ( i ∈ S ) = z i f ( z ) = E [ F ( S )] • “coupled” distribution defined by level sets S 0 = {} , S 1 = { d } , S 2 = { a, b, d } , a .5 S 3 = { a, b, c, d } b .5 c 0 = Choquet integral of F E [ F ( S )] d .8 z Theorem (Lovász 1983) 
 is convex iff is submodular. f ( z ) F ( S )

  13. 
 Convexity and subgradients if F is submodular (Edmonds 1971, Lovász 1983) : s ∈ B F h s, z i f ( z ) = E [ F ( S )] = max a .5 b .5 c 0 d .8 Base Polytope of F • can compute subgradient of f(z) in O(n log n) • rounding: use one of the level sets of z* 
 
 = exact convex relaxation! z ∈ [0 , 1] n f ( z ) min S ⊆ V F ( S ) min

  14. Submodular minimization: a brief overview z ∈ [0 , 1] n f ( z ) min convex optimization • ellipsoid method (Grötschel-Lovász-Schrijver 81) • subgradient method (improved: Chakrabarty-Lee-Sidford-Wong 16) combinatorial optimization • network flow based (Schrijver 00, Iwata-Fleischer-Fujishige-01) 
 O ( n 4 T + n 5 log M ) O ( n 6 + n 5 T ) (Iwata 03), (Orlin 09) convex + combinatorial • cutting planes (Lee-Sidford-Wong 15) O ( n 2 T log nM + n 3 log c nM ) O ( n 3 T log 2 n + n 4 log c n )

  15. 
 
 How far does relaxation go? • strongly convex version: 2 k z k 2 z ∈ R n f ( z )+ 1 min z ∈ [0 , 1] n f ( z ) min dual: 2 k s k 2 1 min s ∈ B F • Fujishige-Wolfe / minimum-norm point algorithm • actually solves parametric submodular minimization • But: no relaxation is tight for constrained minimization 
 typically hard to approximate

  16. Submodular maximization NP-hard | S | ≤ k F ( S ) max max S ⊆ V F ( S ) * • simple cases (*, monotone) : 
 discrete greedy algorithm is optimal (Nemhauser-Wolsey-Fisher 1972) • more complex cases (complicated constraints, non-monotone) : continuous extension + rounding f : [0 , 1] n → R F : { 0 , 1 } n → R concave envelope is intractable, but …

  17. Independent coordinates Y Y P ( S ) = (1 − z j ) z i · f ( z ) = E [ F ( S )] i ∈ S j / ∈ S ∂ 2 f • for all i,j ≤ 0 ∂ x i ∂ x j • concave in increasing directions 
 f ( z ) (diminishing returns) • convex in “swap” directions f ( z ) • continuous maximization (monotone): despite nonconvexity! 
 (Calinescu-Chekuri-Pal-Vondrak 2007, Feldman-Naor-Schwartz 2011,…, Hassani-Soltanolkotabi- Karbasi 2017, …) • similar approach for non-monotone functions 
 (Buchbinder-Naor-Feldman 2012,…)

  18. 
 
 
 
 
 “Continuous greedy” as Frank-Wolfe Initialize: z 0 = 0 • concavity in positive directions: 
 for all there is a : 
 for t =1, . . . T: z ∈ [0 , 1] n v ∈ P s t 2 arg max s ∈ P h s, r f ( z t ) i h v, r f ( z ) i � OPT � f ( z ) z t +1 = z t + α t s t • Analysis: 
 f ( z t +1 ) � f ( z t ) + α h s t , r f ( z t ) i � C 2 α 2 ≥ f ( z t ) + α [OPT − f ( z t )] − C 2 α 2 ⇒ OPT − f ( z t +1 ) ≤ (1 − α )[OPT − f ( z t )] + C 2 α 2 • with α = 1 /T f ( z T ) ≥ (1 − (1 − 1 T ) T )OPT − C 2 T

  19. Binary / Set function optimization • NP-hard • exact convex relaxation • But: constant-factor approxi- • Lovász extension mations for constraints • But: constrained is hard • multilinear extension • convexity • diminishing returns

  20. 
 
 Roadmap • Optimizing submodular set functions: 
 discrete optimization via continuous optimization 
 • Submodularity more generally: 
 continuous optimization via discrete optimization 
 • Further connections

  21. 
 
 
 Submodularity beyond sets • sets: for all subsets 
 A, B ⊆ V F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B ) • replace sets by vectors: 
 F ( x ) + F ( y ) ≥ F ( x ∨ y ) + F ( x ∧ y ) • or: Hessian has all off-diagonals <= 0. (Topkis 1978) ∂ 2 F ≤ 0 ∂ x i ∂ x j

  22. Examples ∂ 2 F ≤ 0 F ( x ) + F ( y ) ≥ F ( x ∨ y ) + F ( x ∧ y ) ∂ x i ∂ x j submodular function can be 
 convex, concave or neither! • any separable function X n F ( x ) = i =1 F i ( x i ) • for concave F ( x ) = g ( x i − x j ) g • for convex � X � F ( x ) = h i x i h

  23. Maximization • General case: 
 diminishing returns stronger than submodularity • DR-submodular function: for all ∂ 2 F/ ∂ x i ∂ x j ≤ 0 i, j • with DR, many results generalize 
 (including “continuous greedy”) 
 (Kapralov-Post-Vondrák 2010, Soma et al 2014-15, Ene & Nguyen 2016, Bian et al 2016, Gottschalk & Peis 2016)

  24. Minimization • discretize continuous functions: factor O (1 / ✏ ) • Option 1: 
 transform into set function optimization 
 (Birkhoff 1937, Schrijver 2000, Orlin 2007) 
 better for DR-submodular 
 (Ene & Nguyen 2016) • Option II: 
 convex extension for integer submodular 
 function (Bach 2015)

  25. 
 
 
 
 
 Convex extension • Set functions: efficient minimization via convex extension 
 f : [0 , 1] n → R F : { 0 , 1 } n → R 1 .5 .5 0 f ( z ) = E [ F ( S )] 0 0 .8 1 • Integer vectors: distribution over {0,…k} for each coordinate 1 F : { 0 , . . . k } n → R 4 f ( z ) = E [ F ( x )] 0 2

  26. 
 
 
 
 Applications • robust optimization of bipartite influences (Staib-Jegelka 2017) 
 max y ∈ B min p ∈ P I ( y ; p ) p st • non-convex isotonic regression (Bach 2017) n X min G ( x i − z i ) s.t. x i ≥ x i ∀ ( i, j ) ∈ E x ∈ [0 , 1] n i =1

  27. 
 
 Roadmap • Optimizing submodular set functions: 
 discrete optimization via continuous optimization 
 • Submodularity more generally: 
 continuous optimization via discrete optimization 
 • Further connections

  28. 
 Log-sub/supermodular distributions P ( x ) ∝ exp( F ( x )) P ( S ) ∝ exp( F ( S )) • -F(S) submodular: multivariate totally positive, 
 FKG lattice condition 
 • implies positive association : 
 for all monotonically increasing G,H : 
 E [ G ( S ) H ( S )] ≥ E G ( S ) E H ( S ) • F(S) submodular?

  29. 
 
 
 
 Negative association and stable polynomials • sub-class satisfies negative association : 
 for all monotonically increasing G,H with disjoint support: 
 E [ G ( S ) H ( S )] ≤ E G ( S ) E H ( S ) • Condition implies conditionally negative association: 
 X Y z ∈ C n q ( z ) = P ( S ) z i , S ⊆ V i ∈ S should be real stable. Strongly Rayleigh measures 
 (Borcea, Bränden, Liggett 2009) 


Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend