bandit optimisation with approximations
play

Bandit Optimisation with Approximations Kirthevasan Kandasamy - PowerPoint PPT Presentation

Bandit Optimisation with Approximations Kirthevasan Kandasamy Carnegie Mellon University Ecole Polytechnique, Paris April 27, 2017 Slides: www.cs.cmu.edu/ kkandasa/misc/ecole-slides.pdf www.cs.cmu.edu/ kkandasa Slides are up on my


  1. Multi-fidelity Bandit Optimisation in 2 Fidelities (1 Approximation) (Kandasamy et al. NIPS 2016b) f (2) f (1) x ⋆ At time t : Determine the point x t ∈ X and fidelity m t ∈ { 1 , 2 } for querying. Maximise f (2) . Don’t care for maximum of f (1) . End Goal: S (Λ) = f (2) ( x ⋆ ) − max t : m t =2 f (2) ( x t ) Simple Regret: S (Λ) = + ∞ if we haven’t queried f (2) yet. → But use f (1) to guide search for x ⋆ at f (2) . 11/26

  2. Challenges f (2) = f x ⋆ 11/26

  3. Challenges f (2) + ζ (1) − ζ (1) x ⋆ 11/26

  4. Challenges f (2) f (1) x ⋆ 11/26

  5. Challenges f (2) f (1) x ⋆ ◮ f (1) is not just a noisy version of f (2) . 11/26

  6. Challenges f (2) f (1) x ⋆ x (1) ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ 11/26

  7. Challenges f (2) f (1) x ⋆ x (1) ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ 11/26

  8. Challenges f (2) f (1) x ⋆ x (1) ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ 11/26

  9. Challenges f (2) f (1) x ⋆ x (1) ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ 11/26

  10. Challenges f (2) f (1) x ⋆ x (1) ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ ◮ Need to explore f (2) sufficiently well around the high valued regions of f (1) – but at a not too large region. 11/26

  11. Challenges f (1) f (2) x (1) x ⋆ ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ ◮ Need to explore f (2) sufficiently well around the high valued regions of f (1) – but at a not too large region. 11/26

  12. Challenges f (1) f (2) x (1) x ⋆ ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ ◮ Need to explore f (2) sufficiently well around the high valued regions of f (1) – but at a not too large region. Key Message: We will explore X using f (1) and use f (2) mostly in a promising region X α . 11/26

  13. MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound f (2) f (1) x ⋆ 12/26

  14. MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound f (2) f (1) x ⋆ ◮ Construct Upper Confidence Bound ϕ t for f (2) . Choose point x t = argmax x ∈X ϕ t ( x ). 12/26

  15. MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound t = 14 f (2) f (1) x ⋆ x t ◮ Construct Upper Confidence Bound ϕ t for f (2) . Choose point x t = argmax x ∈X ϕ t ( x ). ϕ (1) µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) t ( x ) = t ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } 12/26

  16. MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound t = 14 f (2) m t = 2 f (1) γ (1) x ⋆ x t ◮ Construct Upper Confidence Bound ϕ t for f (2) . Choose point x t = argmax x ∈X ϕ t ( x ). ϕ (1) µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) t ( x ) = t ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } � if β 1 / 2 σ (1) t − 1 ( x t ) > γ (1) 1 ◮ Choose fidelity m t = t 2 otherwise. 12/26

  17. Theoretical Results for MF-GP-UCB GP-UCB (Srinivas et al. 2010) � Ψ n Λ ( X ) S (Λ) = f (2) ( x ⋆ ) − max t : m t =2 f (2) ( x t ) w.h.p � n Λ n Λ = ⌊ Λ /λ (2) ⌋ . Ψ n Λ ( A ) = Maximum Information Gain → Scales with vol ( A ). 13/26

  18. Theoretical Results for MF-GP-UCB GP-UCB (Srinivas et al. 2010) � Ψ n Λ ( X ) S (Λ) = f (2) ( x ⋆ ) − max t : m t =2 f (2) ( x t ) w.h.p � n Λ n Λ = ⌊ Λ /λ (2) ⌋ . Ψ n Λ ( A ) = Maximum Information Gain → Scales with vol ( A ). 13/26

  19. Theoretical Results for MF-GP-UCB GP-UCB (Srinivas et al. 2010) � Ψ n Λ ( X ) S (Λ) = f (2) ( x ⋆ ) − max t : m t =2 f (2) ( x t ) w.h.p � n Λ n Λ = ⌊ Λ /λ (2) ⌋ . Ψ n Λ ( A ) = Maximum Information Gain → Scales with vol ( A ). MF-GP-UCB (Kandasamy et al. NIPS 2016b) � � Ψ n Λ ( X c Ψ n Λ ( X α ) α ) w.h.p ∀ α > 0 , S (Λ) � + n 2 − α n Λ Λ X α = { x : f (2) ( x ⋆ ) − f (1) ( x ) ≤ C α ζ (1) } . Good approximation = ⇒ vol ( X α ) ≪ vol ( X ) = ⇒ Ψ n Λ ( X α ) ≪ Ψ n Λ ( X ). 13/26

  20. expensive > λ (1) λ (2) Proof Sketches cheap MF-GP-UCB (Kandasamy et al. NIPS 2016b) � � Ψ n Λ ( X α ) Ψ n Λ ( X c α ) w.h.p S (Λ) + � n 2 − α n Λ Λ X α = { x : f (2) ( x ⋆ ) − f (1) ( x ) � ζ (1) } . Good approximation = ⇒ vol ( X α ) ≪ vol ( X ) = ⇒ Ψ n Λ ( X α ) ≪ Ψ n Λ ( X ). 14/26

  21. expensive > λ (1) λ (2) Proof Sketches cheap MF-GP-UCB (Kandasamy et al. NIPS 2016b) � � Ψ n Λ ( X α ) Ψ n Λ ( X c α ) w.h.p S (Λ) + � n 2 − α n Λ Λ X α = { x : f (2) ( x ⋆ ) − f (1) ( x ) � ζ (1) } . Good approximation = ⇒ vol ( X α ) ≪ vol ( X ) = ⇒ Ψ n Λ ( X α ) ≪ Ψ n Λ ( X ). Number of (random) queries after capital Λ ← N , Λ Λ n Λ = λ (2) ≤ N ≤ λ (1) . 14/26

  22. expensive > λ (1) λ (2) Proof Sketches cheap MF-GP-UCB (Kandasamy et al. NIPS 2016b) � � Ψ n Λ ( X α ) Ψ n Λ ( X c α ) w.h.p S (Λ) + � n 2 − α n Λ Λ X α = { x : f (2) ( x ⋆ ) − f (1) ( x ) � ζ (1) } . Good approximation = ⇒ vol ( X α ) ≪ vol ( X ) = ⇒ Ψ n Λ ( X α ) ≪ Ψ n Λ ( X ). Number of (random) queries after capital Λ ← N , Λ Λ n Λ = λ (2) ≤ N ≤ λ (1) . But we show N ∈ O ( n Λ ). 14/26

  23. expensive > λ (1) λ (2) Proof Sketches cheap MF-GP-UCB (Kandasamy et al. NIPS 2016b) � � Ψ n Λ ( X α ) Ψ n Λ ( X c α ) w.h.p S (Λ) + � n 2 − α n Λ Λ X α = { x : f (2) ( x ⋆ ) − f (1) ( x ) � ζ (1) } . Good approximation = ⇒ vol ( X α ) ≪ vol ( X ) = ⇒ Ψ n Λ ( X α ) ≪ Ψ n Λ ( X ). Number of (random) queries after capital Λ ← N , Λ Λ n Λ = λ (2) ≤ N ≤ λ (1) . But we show N ∈ O ( n Λ ). N = T (1) N ( X α ) + T (1) α ) + T (2) N ( X α ) + T (2) N ( X c N ( X c α ) 14/26

  24. expensive > λ (1) λ (2) Proof Sketches cheap MF-GP-UCB (Kandasamy et al. NIPS 2016b) � � Ψ n Λ ( X α ) Ψ n Λ ( X c α ) w.h.p S (Λ) � + n 2 − α n Λ Λ X α = { x : f (2) ( x ⋆ ) − f (1) ( x ) � ζ (1) } . Good approximation = ⇒ vol ( X α ) ≪ vol ( X ) = ⇒ Ψ n Λ ( X α ) ≪ Ψ n Λ ( X ). Number of (random) queries after capital Λ ← N , Λ Λ n Λ = λ (2) ≤ N ≤ λ (1) . But we show N ∈ O ( n Λ ). N = T (1) + T (1) + T (2) N ( X α ) + T (2) N ( X c N ( X c N ( X α ) α ) α ) � �� � � �� � � �� � N α polylog ( N ) sublinear ( N ) 14/26

  25. expensive > λ (1) λ (2) T (2) α ) ≤ N α N ( X c for all α > 0 cheap t = 50 For x ∈ X α , f (2) ( x ⋆ ) − f (1) ( x ) ≤ C α ζ (1) . f (2) f (1) is small in X c f (1) α . x ⋆ x t 15/26

  26. expensive > λ (1) λ (2) T (2) α ) ≤ N α N ( X c for all α > 0 cheap t = 50 For x ∈ X α , f (2) ( x ⋆ ) − f (1) ( x ) ≤ C α ζ (1) . f (2) f (1) is small in X c f (1) α . x ⋆ x t ϕ (1) t ( x ) = µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) , ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } , x t = argmax ϕ t ( x ) → [1] . x ∈X � if β 1 / 2 σ (1) t − 1 ( x t ) > γ (1) 1 t Choose fidelity m t = → [2] . if β 1 / 2 σ (1) t − 1 ( x t ) ≤ γ (1) 2 t 15/26

  27. expensive > λ (1) λ (2) T (2) α ) ≤ N α N ( X c for all α > 0 cheap t = 50 For x ∈ X α , f (2) ( x ⋆ ) − f (1) ( x ) ≤ C α ζ (1) . f (2) f (1) is small in X c f (1) α . x ⋆ x t ϕ (1) t ( x ) = µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) , ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } , x t = argmax ϕ t ( x ) → [1] . x ∈X � if β 1 / 2 σ (1) t − 1 ( x t ) > γ (1) 1 t Choose fidelity m t = → [2] . if β 1 / 2 σ (1) t − 1 ( x t ) ≤ γ (1) 2 t Argument: If x t ∈ X c α in [1], then m t = 2 is unlikely in [2]. 15/26

  28. expensive > λ (1) λ (2) T (2) α ) ≤ N α N ( X c for all α > 0 cheap t = 50 For x ∈ X α , f (2) ( x ⋆ ) − f (1) ( x ) ≤ C α ζ (1) . f (2) f (1) is small in X c f (1) α . x ⋆ x t ϕ (1) t ( x ) = µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) , ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } , x t = argmax ϕ t ( x ) → [1] . x ∈X � if β 1 / 2 σ (1) t − 1 ( x t ) > γ (1) 1 t Choose fidelity m t = → [2] . if β 1 / 2 σ (1) t − 1 ( x t ) ≤ γ (1) 2 t Argument: If x t ∈ X c α in [1], then m t = 2 is unlikely in [2]. ⇒ σ (1) ⇒ Several f (1) queries near x t m t = 2 = t − 1 ( x t ) is small = 15/26

  29. expensive > λ (1) λ (2) T (2) α ) ≤ N α N ( X c for all α > 0 cheap t = 50 For x ∈ X α , f (2) ( x ⋆ ) − f (1) ( x ) ≤ C α ζ (1) . f (2) f (1) is small in X c f (1) α . x ⋆ x t ϕ (1) t ( x ) = µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) , ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } , x t = argmax ϕ t ( x ) → [1] . x ∈X � if β 1 / 2 σ (1) t − 1 ( x t ) > γ (1) 1 t Choose fidelity m t = → [2] . if β 1 / 2 σ (1) t − 1 ( x t ) ≤ γ (1) 2 t Argument: If x t ∈ X c α in [1], then m t = 2 is unlikely in [2]. ⇒ σ (1) ⇒ Several f (1) queries near x t m t = 2 = t − 1 ( x t ) is small = = ⇒ µ (1) ⇒ ϕ (1) t − 1 ( x t ) ≈ f (1) ( x t ) = t ( x t ) is small = ⇒ 15/26

  30. expensive > λ (1) λ (2) T (2) α ) ≤ N α N ( X c for all α > 0 cheap t = 50 For x ∈ X α , f (2) ( x ⋆ ) − f (1) ( x ) ≤ C α ζ (1) . f (2) f (1) is small in X c f (1) α . x ⋆ x t ϕ (1) t ( x ) = µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) , ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } , x t = argmax ϕ t ( x ) → [1] . x ∈X � if β 1 / 2 σ (1) t − 1 ( x t ) > γ (1) 1 t Choose fidelity m t = → [2] . if β 1 / 2 σ (1) t − 1 ( x t ) ≤ γ (1) 2 t Argument: If x t ∈ X c α in [1], then m t = 2 is unlikely in [2]. ⇒ σ (1) ⇒ Several f (1) queries near x t m t = 2 = t − 1 ( x t ) is small = = ⇒ µ (1) ⇒ ϕ (1) t − 1 ( x t ) ≈ f (1) ( x t ) = t ( x t ) is small = ⇒ x t won’t be arg-max. 15/26

  31. MF-GP-UCB with multiple approximations 16/26

  32. MF-GP-UCB with multiple approximations Things work out. 16/26

  33. Experiment: Viola & Jones Face Detection 22 Threshold values for each cascade. ( d = 22) Fidelities with dataset sizes (300 , 3000). ( M = 2) 0.35 0.3 0.25 0.2 0.15 0.1 1000 2000 3000 4000 5000 6000 7000 8000 17/26

  34. Experiment: Cosmological Maximum Likelihood Inference ◮ Type Ia Supernovae Data ◮ Maximum likelihood inference for 3 cosmological parameters: ◮ Hubble Constant H 0 ◮ Dark Energy Fraction Ω Λ ◮ Dark Matter Fraction Ω M ◮ Likelihood: Robertson Walker metric (Robertson 1936) Requires numerical integration for each point in the dataset. 18/26

  35. Experiment: Cosmological Maximum Likelihood Inference 3 cosmological parameters. ( d = 3) Fidelities: integration on grids of size (10 2 , 10 4 , 10 6 ). ( M = 3) 10 5 0 -5 -10 500 1000 1500 2000 2500 3000 3500 19/26

  36. MF-GP-UCB Synthetic Experiment: Hartmann-3 D d = 3 , M = 3 Query frequencies for Hartmann-3D 40 m=1 m=2 35 m=3 Num. of Queries 30 25 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 f (3) ( x ) 19/26

  37. Multi-fidelity Optimisation with Continuous Approximations 20/26

  38. Multi-fidelity Optimisation with Continuous Approximations - Use an arbitrary amount of data? - Iterative algorithms: use arbitrary number of iterations? 20/26

  39. Multi-fidelity Optimisation with Continuous Approximations - Use an arbitrary amount of data? - Iterative algorithms: use arbitrary number of iterations? E.g. Train an ML model with N • data and T • iterations. 20/26

  40. Multi-fidelity Optimisation with Continuous Approximations - Use an arbitrary amount of data? - Iterative algorithms: use arbitrary number of iterations? E.g. Train an ML model with N • data and T • iterations. But use N < N • data and T < T • iterations to approximate cross validation performance. Approximations from a continuous 2D “fidelity space” ( N , T ). 20/26

  41. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) Z X A fidelity space Z ⊂ R p and domain X ⊂ R d . 21/26

  42. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) Z X A fidelity space Z ⊂ R p and domain X ⊂ R d . g : Z × X → R . 21/26

  43. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) f ( x ) Z z • X A fidelity space Z ⊂ R p and domain X ⊂ R d . g : Z × X → R . We wish to optimise f ( x ) = g ( z • , x ) where z • ∈ Z . 21/26

  44. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) f ( x ) Z z • X A fidelity space Z ⊂ R p and domain X ⊂ R d . g : Z × X → R . We wish to optimise f ( x ) = g ( z • , x ) where z • ∈ Z . previous e.g.: Z = all ( N , T ) values, z • = [ N • , T • ]. 21/26

  45. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) f ( x ) Z z • X A fidelity space Z ⊂ R p and domain X ⊂ R d . g : Z × X → R . We wish to optimise f ( x ) = g ( z • , x ) where z • ∈ Z . previous e.g.: Z = all ( N , T ) values, z • = [ N • , T • ]. A cost function, λ : Z → R + . e.g.: λ ( z ) = λ ( N , T ) = O ( N 2 T ) 21/26

  46. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) f ( x ) Z x ⋆ z • X A fidelity space Z ⊂ R p and domain X ⊂ R d . g : Z × X → R . We wish to optimise f ( x ) = g ( z • , x ) where z • ∈ Z . previous e.g.: Z = all ( N , T ) values, z • = [ N • , T • ]. A cost function, λ : Z → R + . e.g.: λ ( z ) = λ ( N , T ) = O ( N 2 T ) x ⋆ = argmax x f ( x ). 21/26

  47. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) f ( x ) Z x ⋆ z • X A fidelity space Z ⊂ R p and domain X ⊂ R d . g : Z × X → R . We wish to optimise f ( x ) = g ( z • , x ) where z • ∈ Z . previous e.g.: Z = all ( N , T ) values, z • = [ N • , T • ]. A cost function, λ : Z → R + . e.g.: λ ( z ) = λ ( N , T ) = O ( N 2 T ) x ⋆ = argmax x f ( x ). Simple Regret: t : z t = z • f ( x t ) . S (Λ) = f ( x ⋆ ) − max 21/26

  48. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) g ∼ GP ( 0 , κ ), f ( x ) Z z • X 22/26

  49. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) κ : ( Z × X ) 2 → R . g ∼ GP ( 0 , κ ), f ( x ) Z z • X 22/26

  50. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) κ : ( Z × X ) 2 → R . g ∼ GP ( 0 , κ ), f ( x ) κ ([ z , x ] , [ z ′ , x ′ ]) = κ X ( x , x ′ ) · κ Z ( z , z ′ ) Z z • X 22/26

  51. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) κ : ( Z × X ) 2 → R . g ∼ GP ( 0 , κ ), f ( x ) κ ([ z , x ] , [ z ′ , x ′ ]) = κ X ( x , x ′ ) · κ Z ( z , z ′ ) Z z • X h = 0 . 05 h = 0 . 5 0 SE kernel: 22/26

  52. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) κ : ( Z × X ) 2 → R . g ∼ GP ( 0 , κ ), f ( x ) κ ([ z , x ] , [ z ′ , x ′ ]) = κ X ( x , x ′ ) · κ Z ( z , z ′ ) Z z • X Information Gap ξ : Z → R h = 0 . 05 h = 0 . 5 - measures the price (in information) for querying at z � = z • . 0 SE kernel: 22/26

  53. Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. Arxiv 2017) g ( z, x ) κ : ( Z × X ) 2 → R . g ∼ GP ( 0 , κ ), f ( x ) κ ([ z , x ] , [ z ′ , x ′ ]) = κ X ( x , x ′ ) · κ Z ( z , z ′ ) Z z • X Information Gap ξ : Z → R h = 0 . 05 h = 0 . 5 - measures the price (in information) for querying at z � = z • . 0 � z − z • � SE kernel: ξ ( z ) � . h 22/26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend