refined bounds for algorithm configuration the knife edge
play

Refined bounds for algorithm configuration: The knife-edge of dual - PowerPoint PPT Presentation

Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality,


  1. Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik

  2. Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality, … Hand-tuning is time-consuming , tedious , and error-prone

  3. Automated algorithm configuration Goal: Automate algorithm configuration via machine learning Algorithmically find good parameter settings using a set of “typical” inputs from application at hand Training set

  4. Automated configuration procedure 1. Fix parameterized algorithm (e.g., CPLEX) 2. Receive set 𝒯 of “typical” inputs from unknown distribution Problem Problem Problem Problem instance 1 instance 2 instance 3 instance 4 3. Return parameter setting with good avg performance over 𝒯 Runtime, solution quality, memory usage, etc.

  5. Automated configuration procedure Seen Unseen ? Problem Problem Problem Problem Problem instance 1 instance 2 instance 3 instance 4 instance Key question (focus of talk): Will those parameters have good expected performance?

  6. Overview of main result Key question (focus of talk): Key question (focus of talk): Will those parameters have good expected performance? Will those parameters have good expected performance? “Yes” when algorithmic performance as function of parameters can be approximated by a simple function Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

  7. Overview of main result Observe this structure, e.g., in integer programming algorithm configuration Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

  8. Overview of main result: a dichotomy If approximation holds under the 𝑀 ! -norm: 𝑔 ∗ 𝑠 − 𝑕 ∗ (𝑠) sup " We provide strong guarantees is small Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

  9. Overview of main result: a dichotomy If approximation holds under the 𝑀 ! -norm: ! ∫ 𝑔 ∗ 𝑠 − 𝑕 ∗ (𝑠) # 𝑒𝑠 We provide strong guarantees is small If approximation only holds under the 𝑀 " -norm for 𝑞 < ∞ : Not possible to provide strong guarantees in worst case Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

  10. Model

  11. Model 𝒴 : Set of all inputs (e.g., integer programs) ℝ # : Set of all parameter settings (e.g., CPLEX parameters) Standard assumption: Unknown distribution 𝒠 over inputs E.g., represents scheduling problem airline solves day-to-day

  12. “Algorithmic performance” 𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ # on input 𝑦 𝑔 E.g., runtime, solution quality, memory usage, … Assume 𝑔 𝒔 𝑦 ∈ −1,1 Can be generalized to 𝑔 𝒔 𝑦 ∈ −𝐼, 𝐼

  13. Generalization bounds

  14. Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Empirical average utility Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

  15. Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Expected utility Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

  16. Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

  17. Generalization bounds 𝒔 : 𝒴 → ℝ 𝒔 ∈ ℝ # is gnarly Challenge: Class ℱ = 𝑔 E.g., in integer programming algorithm configuration: • Each domain element is an IP • Unclear how to plot or visualize functions 𝑔 𝒔 • No obvious notions of Lipschitzness or smoothness to rely on

  18. Dual functions

  19. Dual classes 𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ # on input 𝑦 𝑔 𝒔 : 𝒴 → ℝ 𝒔 ∈ ℝ # ℱ = 𝑔 “Primal” function class ∗ 𝒔 = utility as function of parameters 𝑔 ) ∗ 𝒔 = 𝑔 𝑔 𝒔 𝑦 ) ℱ ∗ = 𝑔 ∗ : ℝ # → ℝ 𝑦 ∈ 𝒴 “Dual” function class ) • Dual functions have simple, Euclidean domain • Often have ample structure can use to bound complexity of ℱ

  20. Dual function approximability 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 Dual class 𝒣 ∗ (𝜹, 𝒒) -approximates ℱ ∗ if for all 𝑦 ∈ 𝒴 , ! ∗ − 𝑕 ) ∗ 𝒔 − 𝑕 ) ∗ (𝒔) " 𝑒𝒔 ≤ 𝛿. ∗ 𝑔 " = A ℝ " 𝑔 ) ) ∗ 𝑔 $ ∗ 𝑕 $ 𝑠

  21. Main result: Upper bound

  22. Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 Average utility over the training set

  23. Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 Expected utility

  24. Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 If 𝒣 not too complex and 𝒣 ∗ (𝛿, ∞) -approximates ℱ ∗ , Bound approaches 𝑷 𝜹 as 𝑶 → ∞ .

  25. Main result: Lower bound

  26. Lower bound For any 𝛿 and 𝑞 < ∞ , there exist function classes ℱ, 𝒣 such that: • Dual class 𝒣 ∗ (𝛿, 𝑞) -approximates ℱ ∗ • 𝒣 is very simple Rademacher complexity is 0 % • ℱ is very complex Rademacher complexity is 0 • Not possible to provide generalization bounds in worst case

  27. Experiments

  28. Experiments: Integer programming Tune integer programming solver parameters Also studied by Balcan, Dick, Sandholm, Vitercik [ICML’18] Generalization error Distributions over auction IPs Our bound Bound by BDS V ’18 [Leyton-Brown, Pearson, Shoham, EC’00] Number of training instances

  29. Conclusion

  30. Conclusion • Provided generalization bounds for algorithm configuration • Apply whenever utility as function of parameters is “approximately simple” • Connection between learnability and approximability is balanced on a knife-edge • If approximation holds under 𝑀 / -norm, can provide strong bounds • If holds under 𝑀 0 -norm for 𝑞 < ∞ , not possible to provide bounds • Experiments demonstrate strength of these bounds

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend