optimal approximation for unconstrained non submodular
play

Optimal approximation for unconstrained non-submodular minimization - PowerPoint PPT Presentation

Optimal approximation for unconstrained non-submodular minimization Marwa El Halabi Stefanie Jegelka CSAIL, MIT ICML 2020 Set function minimization Goal: Select collection S of items in V that minimize cost H ( S ) Unconstrained


  1. Optimal approximation for unconstrained non-submodular minimization Marwa El Halabi Stefanie Jegelka CSAIL, MIT ICML 2020

  2. Set function minimization Goal: Select collection S of items in V that minimize cost H ( S ) Unconstrained non-submodular minimization Slide 2/ 17

  3. Set function minimization in Machine learning x ♮ y A ε Structured sparse learning Batch Bayesian optimization Figures from [Mairal et al., 2010, Krause et al., 2008] Unconstrained non-submodular minimization Slide 3/ 17

  4. Set function minimization Ground set V = { 1 , · · · , d } , set function H : 2 V → R min S ⊆ V H ( S ) ◮ Assume: H ( ∅ ) = 0 , black box oracle to evaluate H Unconstrained non-submodular minimization Slide 4/ 17

  5. Set function minimization Ground set V = { 1 , · · · , d } , set function H : 2 V → R min S ⊆ V H ( S ) ◮ Assume: H ( ∅ ) = 0 , black box oracle to evaluate H ◮ NP-hard to approximate in general Unconstrained non-submodular minimization Slide 4/ 17

  6. Set function minimization Ground set V = { 1 , · · · , d } , set function H : 2 V → R min S ⊆ V H ( S ) ◮ Assume: H ( ∅ ) = 0 , black box oracle to evaluate H ◮ NP-hard to approximate in general ◮ Submodularity helps: diminishing returns (DR) property H ( A ∪ { i } ) − H ( A ) ≥ H ( B ∪ { i } ) − H ( B ) for all A ⊆ B Unconstrained non-submodular minimization Slide 4/ 17

  7. Set function minimization Ground set V = { 1 , · · · , d } , set function H : 2 V → R min S ⊆ V H ( S ) ◮ Assume: H ( ∅ ) = 0 , black box oracle to evaluate H ◮ NP-hard to approximate in general ◮ Submodularity helps: diminishing returns (DR) property H ( A ∪ { i } ) − H ( A ) ≥ H ( B ∪ { i } ) − H ( B ) for all A ⊆ B ◮ Efficient minimization Unconstrained non-submodular minimization Slide 4/ 17

  8. Set function minimization in Machine learning x ♮ y A ε Structured sparse learning Bayesian optimization H is not submodular Figures from [Mairal et al., 2010, Krause et al., 2008] Unconstrained non-submodular minimization Slide 5/ 17

  9. Set function minimization in Machine learning x ♮ y A ε Structured sparse learning Bayesian optimization H is not submodular but it is “close” . . . Figures from [Mairal et al., 2010, Krause et al., 2008] Unconstrained non-submodular minimization Slide 5/ 17

  10. Approximately submodular functions What if the objective is not submodular, but “close”? Unconstrained non-submodular minimization Slide 6/ 17

  11. Approximately submodular functions What if the objective is not submodular, but “close”? ◮ Several works on non-submodular maximization [Das and Kempe, 2011, Bian et al., 2017, Kuhnle et al., 2018, Horel and Singer, 2016, Hassidim and Singer, 2018] ◮ Only constrained non-submodular minimization is studied [Wang et al., 2019, Bai et al., 2016, Qian et al., 2017, Sviridenko et al., 2017] Unconstrained non-submodular minimization Slide 6/ 17

  12. Approximately submodular functions Can submodular minimization algorithms extend to such non-submodular functions? Unconstrained non-submodular minimization Slide 6/ 17

  13. Overview of main results Can submodular minimization algorithms extend to such non-submodular functions? Yes! ◮ First approximation guarantee ◮ Efficient simple algorithm: Projected subgradient method ◮ Extension to noisy setting ◮ Matching lower-bound showing optimality Unconstrained non-submodular minimization Slide 7/ 17

  14. Weakly DR-submodular functions H is α -weakly DR-submodular [Lehmann et al., 2006] , with α > 0 if � � H ( A ∪ { i } ) − H ( A ) ≥ α H ( B ∪ { i } ) − H ( B ) for all A ⊆ B Unconstrained non-submodular minimization Slide 8/ 17

  15. Weakly DR-submodular functions H is α -weakly DR-submodular [Lehmann et al., 2006] , with α > 0 if � � H ( A ∪ { i } ) − H ( A ) ≥ α H ( B ∪ { i } ) − H ( B ) for all A ⊆ B ◮ H is submodular ⇒ α = 1 Unconstrained non-submodular minimization Slide 8/ 17

  16. Weakly DR-submodular functions H is α -weakly DR-submodular [Lehmann et al., 2006] , with α > 0 if � � H ( A ∪ { i } ) − H ( A ) ≥ α H ( B ∪ { i } ) − H ( B ) for all A ⊆ B ◮ H is submodular ⇒ α = 1 Unconstrained non-submodular minimization Slide 8/ 17

  17. Weakly DR-submodular functions H is α -weakly DR-submodular [Lehmann et al., 2006] , with α > 0 if � � H ( A ∪ { i } ) − H ( A ) ≥ α H ( B ∪ { i } ) − H ( B ) for all A ⊆ B ◮ H is submodular ⇒ α = 1 ◮ Caveat: H should be monotone H ( A ) ≤ H ( B ) ⇒ α ≤ 1 H ( A ) ≥ H ( B ) ⇒ α ≥ 1 Unconstrained non-submodular minimization Slide 8/ 17

  18. Problem set-up S ⊆ V H ( S ) := F ( S ) − G ( S ) min ◮ F and G are both non-decreasing ◮ F is α -weakly DR-submodular ◮ G is β -weakly DR-supermodular ◮ F ( ∅ ) = G ( ∅ ) = 0 Unconstrained non-submodular minimization Slide 9/ 17

  19. What set functions have this form? min S ⊆ V H ( S ) := F ( S ) − G ( S ) Objectives in several applications: Structured sparse learning, variance reduction in Bayesian optimization, Bayesian A-optimality in experimental design [Bian et al., 2017] , column subset selection [Sviridenko et al., 2017] . Unconstrained non-submodular minimization Slide 10/ 17

  20. What set functions have this form? min S ⊆ V H ( S ) := F ( S ) − G ( S ) Decomposition result Given any set function H , and α, β ∈ (0 , 1] , αβ < 1 , we can write H ( S ) = F ( S ) − G ( S ) ◮ F and G are non-decreasing α -weakly DR-submodular ◮ G is β -weakly DR-supermodular Unconstrained non-submodular minimization Slide 10/ 17

  21. Submodular function minimization min S ⊆ V H ( S ) = s ∈ [0 , 1] d h L ( s ) min ( | V | = d ) h L is the Lovász extension of H Unconstrained non-submodular minimization Slide 11/ 17

  22. Submodular function minimization min S ⊆ V H ( S ) = s ∈ [0 , 1] d h L ( s ) min ( | V | = d ) h L is the Lovász extension of H ◮ H is submodular ⇔ Lovász extension is convex [Lovász, 1983] ◮ Easy to compute subgradients [Edmonds, 2003] : Sorting + d function evaluations of H Unconstrained non-submodular minimization Slide 11/ 17

  23. Non-submodular function minimization Can we use the same strategy? min S ⊆ V H ( S ) = s ∈ [0 , 1] d h L ( s ) min ( | V | = d ) Unconstrained non-submodular minimization Slide 12/ 17

  24. Non-submodular function minimization Can we use the same strategy? No min S ⊆ V H ( S ) = s ∈ [0 , 1] d h L ( s ) min ( | V | = d ) ◮ The Lovász extension h L is not convex anymore Unconstrained non-submodular minimization Slide 12/ 17

  25. Non-submodular function minimization Can we use the same strategy? Almost min S ⊆ V H ( S ) := F ( S ) − G ( S ) = s ∈ [0 , 1] d h L ( s ) := f L ( S ) − g L ( S ) min ◮ The Lovász extension h L is not convex anymore Main result ◮ Easy to compute approximate subgradient (= subgradients in the submodular case): α f L ( s ′ ) − βg L ( s ′ ) ≥ h L ( s ) + � κ , s ′ − s � , ∀ s ′ ∈ [0 , 1] d 1 ◮ H approximately submodular ⇒ h L is approximately convex Unconstrained non-submodular minimization Slide 12/ 17

  26. Projected subgradient method (PGM) s t +1 = Π [0 , 1] d ( s t − η κ t ) (PGM) κ t is an approximate subgradient of h L at s t Unconstrained non-submodular minimization Slide 13/ 17

  27. Projected subgradient method (PGM) s t +1 = Π [0 , 1] d ( s t − η κ t ) (PGM) κ t is an approximate subgradient of h L at s t S ⊆ V H ( S ):= F ( S ) − G ( S ) min � PGM does not need to know α, β, F, G , just H Unconstrained non-submodular minimization Slide 13/ 17

  28. Projected subgradient method (PGM) s t +1 = Π [0 , 1] d ( s t − η κ t ) (PGM) κ t is an approximate subgradient of h L at s t S ⊆ V H ( S ):= F ( S ) − G ( S ) min � PGM does not need to know α, β, F, G , just H Approximation guarantee After T iterations of PGM + rounding, we obtain: S ) ≤ 1 αF ( S ∗ ) − βG ( S ∗ ) + O ( 1 H ( ˆ √ ) T Unconstrained non-submodular minimization Slide 13/ 17

  29. Projected subgradient method (PGM) s t +1 = Π [0 , 1] d ( s t − η κ t ) (PGM) κ t is an approximate subgradient of h L at s t S ⊆ V H ( S ):= F ( S ) − G ( S ) min � PGM does not need to know α, β, F, G , just H Approximation guarantee After T iterations of PGM + rounding, we obtain: S ) ≤ 1 αF ( S ∗ ) − βG ( S ∗ ) + O ( 1 H ( ˆ √ ) T � Result extends to noisy oracle setting: � | ˆ H ( S ) − H ( S ) | ≤ ǫ � ≥ 1 − δ P Unconstrained non-submodular minimization Slide 13/ 17

  30. Can we do better? General set function minimization (in value oracle model): min S ⊆ V H ( S ) := F ( S ) − G ( S ) Inapproximability result For any δ > 0 , no (deterministic or randomized) algorithm achieves E [ H ( ˆ S )] ≤ 1 α F ( S ∗ ) − βG ( S ∗ ) − δ with less than exponentially many queries. Unconstrained non-submodular minimization Slide 14/ 17

  31. Experiment: Structured sparse learning Problem: Learn x ♮ ∈ R d , whose support is an interval, from noisy linear Gaussian measurements x ♮ y A ε S ⊆ V H ( S ) := λF ( S ) − G ( S ) min n × d ◮ Regularizer: F ( S ) = d + max( S ) − min( S ) , F ( ∅ ) = 0 ; α = 1 ◮ Loss: G ( S ) = ℓ (0) − min supp( x ) ⊆ S ℓ ( x ) , where ℓ is least squares loss. G is β -weakly DR-supermodular; β > 0 Unconstrained non-submodular minimization Slide 15/ 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend