Online Learning, and Private Optimization Ellen Vitercik - - PowerPoint PPT Presentation

online learning
SMART_READER_LITE
LIVE PREVIEW

Online Learning, and Private Optimization Ellen Vitercik - - PowerPoint PPT Presentation

Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization Ellen Vitercik Northwestern Quarterly Theory Workshop Joint work with Nina Balcan and Travis Dick Many problems have fast, optimal algorithms E.g.,


slide-1
SLIDE 1

Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization

Ellen Vitercik Northwestern Quarterly Theory Workshop Joint work with Nina Balcan and Travis Dick

slide-2
SLIDE 2

Many problems have fast, optimal algorithms

  • E.g., sorting, shortest paths
slide-3
SLIDE 3

Many problems have fast, optimal algorithms

  • E.g., sorting, shortest paths

Many problems don’t

  • E.g., integer programming, subset selection
  • Many approximation and heuristic techniques
  • Best method depends on the application
  • Which to use?
slide-4
SLIDE 4

Practitioners repeatedly solve problems Maintain same structure Differ on underlying data Should be algo that’s good across all instances Use information about prior instances to choose algorithm for future instances Use ML to automate algorithm design

slide-5
SLIDE 5

Automated algorithm design

Use ML to automate algorithm design Large body of empirical work:

  • Comp bio [DeBlasio and Kececioglu, ‘18]
  • AI [Xu, Hutter, Hoos, and Leyton-Brown, ’08]

This work: formal guarantees for this approach Use ML to automate algorithm design

slide-6
SLIDE 6

Simple example: knapsack

Problem instance:

  • 𝑜 items; Item 𝑗 has value 𝑤𝑗 and size 𝑡𝑗
  • Knapsack with capacity 𝐿

Goal: find most valuable items that fit Algorithm (parameterized by 𝜍 ≥ 0): Add items in decreasing order of

𝑤𝑗 𝑡𝑗

𝜍

[Gupta and Roughgarden, ‘17] How to set?

slide-7
SLIDE 7

Application domain: stealing jewelry

slide-8
SLIDE 8

Day 1

Online algorithm configuration

Knapsack algorithm parameter 𝜍 0.95

slide-9
SLIDE 9

Day 2

Online algorithm configuration

Knapsack algorithm parameter 𝜍 0.45

slide-10
SLIDE 10

Value of items in knapsack

Day 3

Online algorithm configuration

Knapsack algorithm parameter 𝜍 0.45 Parameter 𝜍

slide-11
SLIDE 11

Day 3

Online algorithm configuration

Knapsack algorithm parameter 𝜍 0.45 Algorithm utility on 3rd instance Parameter 𝜍 𝑣3 𝜍

slide-12
SLIDE 12

Day 4

Online algorithm configuration

Knapsack algorithm parameter 𝜍 0.75 Algorithm utility on 4th instance Parameter 𝜍 𝑣4 𝜍

slide-13
SLIDE 13

Online algorithm configuration

Goal:Compete with best fixed parameters in hindsight. Minimize regret.

slide-14
SLIDE 14

Optimizing piecewise Lipschitz functions

Worst-case impossible to optimize online! Configuration ⇔ optimizing sums of piecewise Lipschitz functions

Parameter 𝜍 Algorithm utility on 𝑢th instance

slide-15
SLIDE 15

Our contributions

Structural property dispersion implies strong guarantees for:

  • Online optimization of PWL functions
  • Uniform convergence in statistical settings
  • Differentially private optimization

Dispersion satisfied in real problems under very mild assumptions

slide-16
SLIDE 16

Outline

  • 1. Online learning setup
  • 2. Dispersion
  • 3. Regret bounds
  • 4. Examples of dispersion
  • 5. Other applications of dispersion
  • 6. Conclusion
slide-17
SLIDE 17

Online piecewise Lipschitz optimization

For each round 𝑢 ∈ 1, … , 𝑈 :

  • 1. Learner chooses 𝝇𝑢 ∈ ℝ𝑒
  • 2. Adversary chooses piecewise 𝑀-Lipschitz function 𝑣𝑢: ℝ𝑒 → ℝ
  • 3. Learner gets reward 𝑣𝑢 𝝇𝑢
  • 4. Full information: Learner observes function 𝑣𝑢

𝑣𝑢 𝜍 = Algorithm utility on 𝑢th instance 𝜍

slide-18
SLIDE 18

Online piecewise Lipschitz optimization

For each round 𝑢 ∈ 1, … , 𝑈 :

  • 1. Learner chooses 𝝇𝑢 ∈ ℝ𝑒
  • 2. Adversary chooses piecewise 𝑀-Lipschitz function 𝑣𝑢: ℝ𝑒 → ℝ
  • 3. Learner gets reward 𝑣𝑢 𝝇𝑢
  • 4. Full information: Learner observes function 𝑣𝑢

Bandit feedback: Learner only observes 𝑣𝑢 𝝇𝑢

𝑣𝑢 𝜍 = Algorithm utility on 𝑢th instance 𝜍

slide-19
SLIDE 19

Online piecewise Lipschitz optimization

For each round 𝑢 ∈ 1, … , 𝑈 :

  • 1. Learner chooses 𝝇𝑢 ∈ ℝ𝑒
  • 2. Adversary chooses piecewise 𝑀-Lipschitz function 𝑣𝑢: ℝ𝑒 → ℝ
  • 3. Learner gets reward 𝑣𝑢 𝝇𝑢
  • 4. Full information: Learner observes function 𝑣𝑢

Bandit feedback: Learner only observes 𝑣𝑢 𝝇𝑢 Goal: Minimize regret = max

𝝇∈ℝ𝑒 σ𝑢=1 𝑈

𝑣𝑢 𝝇 − σ𝑢=1

𝑈

𝑣𝑢 𝝇𝑢 Want regret sublinear in 𝑈

Avg regret 𝑈

slide-20
SLIDE 20

Prior work on PWL online optimization

Gupta and Roughgarden [’17]: Max-Weight Independent Set algo configuration Cohen-Addad and Kanade [’17]: 1D piecewise constant functions

slide-21
SLIDE 21

Mean adversary

Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret. Round 1: Adversary chooses one or the other with equal prob.

slide-22
SLIDE 22

Mean adversary

Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret. Round 1: Round 2:

slide-23
SLIDE 23

Mean adversary

Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret. Round 1: Round 2: Repeatedly halves optimal region

slide-24
SLIDE 24

Mean adversary

Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret. Round 1: Round 2: Repeatedly halves optimal region

slide-25
SLIDE 25

Mean adversary

Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret. Round 1: Round 2: Repeatedly halves optimal region

slide-26
SLIDE 26

Mean adversary

Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret. Round 1: Round 2: Learner’s expected reward:

𝑈 2

Reward of best point in hindsight: 𝑈 Expected regret =

𝑈 2

Repeatedly halves optimal region

slide-27
SLIDE 27

Outline

  • 1. Online learning setup
  • 2. Dispersion
  • 3. Regret bounds
  • 4. Examples of dispersion
  • 5. Other applications of dispersion
  • 6. Conclusion
slide-28
SLIDE 28

Dispersion

Mean adversary concentrates discontinuities near maximizer 𝜍∗ Even points very close to 𝜍∗ have low utility! 𝑣1, … , 𝑣𝑈 are 𝒙, 𝒍 -dispersed at point 𝝇 if: ℓ2-ball 𝐶 𝝇, 𝑥 contains discontinuities for ≤ 𝑙 of 𝑣1, … , 𝑣𝑈

𝝇 𝑥

Ball of radius 𝑥 about 𝝇 contains 2 discontinuities. → (𝑥, 2)-dispersed at 𝝇.

slide-29
SLIDE 29

Sums of piecewise dispersed functions

Given 𝑣1, … , 𝑣𝑈, plot of sum σ𝑢=1

𝑈

𝑣𝑢: Not dispersed Many discontinuities in interval Few discontinuities in interval Dispersed

𝜍 𝜍

slide-30
SLIDE 30

Key property of dispersed functions

If 𝑣1, … , 𝑣𝑈: ℝ𝑒 → [0,1] are

  • 1. Piecewise 𝑀-Lipschitz
  • 2. (𝑥, 𝑙)-dispersed at maximizer 𝝇∗,

For every 𝝇 ∈ 𝐶 𝝇∗, 𝑥 : σ𝑢=1

𝑈

𝑣𝑢 𝝇 ≥ σ𝑢=1

𝑈

𝑣𝑢 𝝇∗ − 𝑈𝑀𝑥 − 𝑙. Proof idea : 𝑣1, … , 𝑣𝑈 Is 𝑣𝑢 𝑀-Lipschitz

  • n 𝐶 𝝇∗, 𝑥 ?

𝝇∗

slide-31
SLIDE 31

Key property of dispersed functions

If 𝑣1, … , 𝑣𝑈: ℝ𝑒 → [0,1] are

  • 1. Piecewise 𝑀-Lipschitz
  • 2. (𝑥, 𝑙)-dispersed at maximizer 𝝇∗,

For every 𝝇 ∈ 𝐶 𝝇∗, 𝑥 : σ𝑢=1

𝑈

𝑣𝑢 𝝇 ≥ σ𝑢=1

𝑈

𝑣𝑢 𝝇∗ − 𝑈𝑀𝑥 − 𝒍. Proof idea : 𝑣1, … , 𝑣𝑈 𝑣𝑢 𝝇 − 𝑣𝑢 𝝇∗ ≤ 1 No (≤ 𝑙 functions) Is 𝑣𝑢 𝑀-Lipschitz

  • n 𝐶 𝝇∗, 𝑥 ?

𝝇∗

slide-32
SLIDE 32

Key property of dispersed functions

If 𝑣1, … , 𝑣𝑈: ℝ𝑒 → [0,1] are

  • 1. Piecewise 𝑀-Lipschitz
  • 2. (𝑥, 𝑙)-dispersed at maximizer 𝝇∗,

For every 𝝇 ∈ 𝐶 𝝇∗, 𝑥 : σ𝑢=1

𝑈

𝑣𝑢 𝝇 ≥ σ𝑢=1

𝑈

𝑣𝑢 𝝇∗ − 𝑼𝑴𝒙 − 𝑙. Proof idea : 𝑣1, … , 𝑣𝑈 𝑣𝑢 𝝇 − 𝑣𝑢 𝝇∗ ≤ 1 𝑣𝑢 𝝇 − 𝑣𝑢 𝝇∗ ≤ 𝑀𝑥 Yes (≤ 𝑈 functions) No (≤ 𝑙 functions) Is 𝑣𝑢 𝑀-Lipschitz

  • n 𝐶 𝝇∗, 𝑥 ?

𝝇∗

slide-33
SLIDE 33

Outline

  • 1. Online learning setup
  • 2. Dispersion
  • 3. Regret bounds

1. Full information 2. Bandit feedback

  • 4. Examples of dispersion
  • 5. Other applications of dispersion
  • 6. Conclusion
slide-34
SLIDE 34

Full information online learning

Exponentially Weighted Forecaster [Cesa-Bianchi & Lugosi ’06]: At round 𝑢, sample from dist. w/ PDF 𝑔

𝑢(𝝇) ∝ exp 𝜇 σ𝑡=1 𝑢−1 𝑣𝑡 𝝇

.

𝑣𝑢 𝜍 𝜍

slide-35
SLIDE 35

Full information online learning

𝝇∗ Theorem: If 𝑣1, … , 𝑣𝑈: 𝐶𝑒(𝟏, 1) → 0,1 are:

  • 1. Piecewise 𝑀-Lipschitz
  • 2. (𝑥, 𝑙)-dispersed at 𝝇∗,

EWF has regret 𝑃 𝑼𝒆 𝐦𝐩𝐡

𝟐 𝒙 + 𝑼𝑴𝒙 + 𝒍 .

Intuition: Every 𝝇 ∈ 𝐶 𝝇∗, 𝑥 has utility ≥ 𝑃𝑄𝑈 − 𝑈𝑀𝑥 − 𝑙. EWF can compete with 𝐶 𝝇∗, 𝑥 up to 𝑃 𝑈𝑒 log

1 𝑥

factor. When is this a good bound? For 𝑥 =

1 𝑀 𝑈 and 𝑙 = ෨

𝑃 𝑈 , regret is ෨ 𝑃 𝑈𝑒

slide-36
SLIDE 36

Full information online learning

𝝇∗ Theorem: If 𝑣1, … , 𝑣𝑈: 𝐶𝑒(𝟏, 1) → 0,1 are:

  • 1. Piecewise 𝑀-Lipschitz
  • 2. (𝑥, 𝑙)-dispersed at 𝝇∗,

EWF has regret 𝑃 𝑼𝒆 𝐦𝐩𝐡

𝟐 𝒙 + 𝑼𝑴𝒙 + 𝒍 .

Intuition: Every 𝝇 ∈ 𝐶 𝝇∗, 𝑥 has utility ≥ 𝑃𝑄𝑈 − 𝑼𝑴𝒙 − 𝒍. EWF can compete with 𝐶 𝝇∗, 𝑥 up to 𝑃 𝑈𝑒 log

1 𝑥

factor.

slide-37
SLIDE 37

Full information online learning

𝝇∗ Theorem: If 𝑣1, … , 𝑣𝑈: 𝐶𝑒(𝟏, 1) → 0,1 are:

  • 1. Piecewise 𝑀-Lipschitz
  • 2. (𝑥, 𝑙)-dispersed at 𝝇∗,

EWF has regret 𝑃 𝑼𝒆 𝐦𝐩𝐡

𝟐 𝒙 + 𝑼𝑴𝒙 + 𝒍 .

Intuition: Every 𝝇 ∈ 𝐶 𝝇∗, 𝑥 has utility ≥ 𝑃𝑄𝑈 − 𝑼𝑴𝒙 − 𝒍. EWF can compete with 𝐶 𝝇∗, 𝑥 up to 𝑃 𝑼𝒆 𝐦𝐩𝐡

𝟐 𝒙

factor.

slide-38
SLIDE 38

Matching lower bound

Theorem: For any algorithm, exist PW constant 𝑣1, … , 𝑣𝑈 s.t.: Algorithm’s regret is Ω inf

(𝑥,𝑙)

𝑈𝑒 log

1 𝑥 + 𝑙 .

Inf over all (𝑥, 𝑙)-dispersion parameters 𝑣1, … , 𝑣𝑈 satisfy at 𝝇∗. Upper bound = 𝑃 inf

(𝑥,𝑙)

𝑈𝑒 log

1 𝑥 + 𝑙 .

slide-39
SLIDE 39

Outline

  • 1. Online learning setup
  • 2. Dispersion
  • 3. Regret bounds

1. Full information 2. Bandit feedback

  • 4. Examples of dispersion
  • 5. Other applications of dispersion
  • 6. Conclusion
slide-40
SLIDE 40

Bandit feedback

Theorem: If 𝑣1, … , 𝑣𝑈: 𝐶𝑒(𝟏, 1) → 0,1 are:

  • 1. Piecewise 𝑀-Lipschitz
  • 2. (𝑥, 𝑙)-dispersed at 𝝇∗,

There is a bandit algorithm with regret ෨ 𝑃 𝑈𝑒

1 𝑥 𝑒

+ 𝑈𝑀𝑥 + 𝑙 .

𝑣𝑢 𝜍 𝜍

slide-41
SLIDE 41

Bandit feedback

Theorem: Exists algorithm with regret ෨ 𝑃 𝑈𝑒

1 𝑥 𝑒

+ 𝑈𝑀𝑥 + 𝑙 . Proof idea.

  • Let 𝝇1, … , 𝝇𝑂 be a 𝑥-net of 𝐶𝑒(𝟏, 1) (can take 𝑂 ≈ 1/𝑥𝑒).
  • Reduce to 𝑂-armed bandit.
  • Use EXP3 on discretization → regret 𝑃

𝑈𝑂 log 𝑂 .

  • Ball of radius 𝑥 around 𝝇∗ must contain some 𝝇𝑗.
  • Regret of 𝝇𝑗 compared to 𝝇∗ is ≤ 𝑈𝑀𝑥 + 𝑙.

When is this a good bound? If 𝑒 = 1, 𝑥 =

1

3 𝑈, and 𝑙 = ෨

𝑃 𝑈2/3 , regret is ෨ 𝑃 𝑀𝑈2/3 .

𝑣𝑢 𝜍 𝜍

slide-42
SLIDE 42

Bandit feedback

Theorem: Exists algorithm with regret ෨ 𝑃 𝑈𝑒

1 𝑥 𝑒

+ 𝑈𝑀𝑥 + 𝑙 . Proof idea.

  • Let 𝝇1, … , 𝝇𝑂 be a 𝑥-net of 𝐶𝑒(𝟏, 1) (can take 𝑂 ≈ 1/𝑥𝑒).
  • Reduce to 𝑂-armed bandit.
  • Use EXP3 on discretization → regret 𝑃

𝑈𝑂 log 𝑂 .

  • Ball of radius 𝑥 around 𝝇∗ must contain some 𝝇𝑗.
  • Regret of 𝝇𝑗 compared to 𝝇∗ is ≤ 𝑈𝑀𝑥 + 𝑙.

When is this a good bound? If 𝑥 = 𝑈

𝑒+1 𝑒+2−1, 𝑙 = ෨

𝑃 𝑈

𝑒+1 𝑒+2 , then regret is ෨

𝑃 𝑈

𝑒+1 𝑒+2

𝑒3𝑒 + 𝑀 .

slide-43
SLIDE 43

Outline

  • 1. Online learning setup
  • 2. Dispersion
  • 3. Regret bounds
  • 4. Examples of dispersion
  • 5. Other applications of dispersion
  • 6. Conclusion
slide-44
SLIDE 44

Smooth adversaries and dispersion

Adversary chooses thresholds 𝑣𝑢: 0,1 → 0,1 . 1 𝜐

slide-45
SLIDE 45

Smooth adversaries and dispersion

Adversary chooses thresholds 𝑣𝑢: 0,1 → 0,1 . Discontinuity 𝜐 “smoothed” by adding 𝑎~𝑂(0, 𝜏2). Lemma: W.h.p., ∀𝑥, 𝑣1, … , 𝑣𝑈 are 𝑥, ෨ 𝑃

𝑈𝑥 𝜏 +

𝑈

  • dispersed.

Corollary: 𝑥 =

𝜏 𝑈 ⇒ Full information regret = 𝑷

𝑼 𝐦𝐩𝐡

𝑼 𝝉 .

𝜐 + 𝑎 1 𝜐

slide-46
SLIDE 46

Adversary chooses thresholds 𝑣𝑢: 0,1 → 0,1 . Discontinuity 𝜐 “smoothed” by adding 𝑎~𝑂(0, 𝜏2). Lemma: W.h.p., ∀𝑥, 𝑣1, … , 𝑣𝑈 are 𝑥, ෨ 𝑃

𝑈𝑥 𝜏 +

𝑈

  • dispersed.

Proof idea: For any width-𝑥 interval, 𝔽[#discontinuities] = 𝑃

𝑈𝑥 𝜏

.

  • VC-dim ⇒ w.h.p., every interval has ෨

𝑃

𝑈𝑥 𝜏 +

𝑈 discontinuities.

Smooth adversaries and dispersion

𝜐 + 𝑎 1 𝜐

slide-47
SLIDE 47

Simple example: knapsack

Problem instance:

  • 𝑜 items; Item 𝑗 has value 𝑤𝑗 and size 𝑡𝑗
  • Knapsack with capacity 𝐿

Goal: find most valuable items that fit Algorithm (parameterized by 𝜍 ≥ 0): Add items in decreasing order of

𝑤𝑗 𝑡𝑗

𝜍

[Gupta and Roughgarden, ‘17]

slide-48
SLIDE 48

Simple example: knapsack

Problem instance:

  • 𝑜 items; Item 𝑗 has value 𝑤𝑗 and size 𝑡𝑗
  • Knapsack with capacity 𝐿

Goal: find most valuable items that fit Algorithm (parameterized by 𝜍 ≥ 0): Add items in decreasing order of

𝑤𝑗 𝑡𝑗

𝜍

[Gupta and Roughgarden, ‘17]

𝜍 𝑣𝑢(𝜍) Algorithm utility on 𝑢th instance

slide-49
SLIDE 49

Dispersion for knapsack

Theorem: If instances randomly distributed s.t. on each round:

  • 1. Each 𝑤𝑗 independent from 𝑡𝑗
  • 2. All 𝑤𝑗, 𝑤𝑘 have 𝜆-bounded joint density,

W.h.p., for any 𝛽 ≥

1 2, 𝑣1, … , 𝑣𝑈 are

෨ 𝑃

𝑈1−𝛽 𝜆

, ෨ 𝑃 (# items)2𝑈𝛽

  • dispersed.

Corollary: Bandit feedback regret = ෨ 𝑃 𝑈

2 3

1 𝜆 + (# items)2

.

𝜍 𝑣𝑢(𝜍) Algorithm utility on 𝑢th instance

Corollary: Full information regret = ෨ 𝑃 (# items)2 𝑈 .

slide-50
SLIDE 50

More Results for Algorithm Configuration

Prove dispersion under smoothness assumptions for:

  • Maximum weight independent set

Under no assumptions, we show dispersion for:

  • Integer quadratic programming approximation algos
  • Based on semi-definite programming relaxations
  • 𝑡-linear rounding [Feige & Langberg ‘06]
  • Outward rotations [Zwick ‘99]
  • Both generalizations of Goemans-Williamson max-cut algorithm [‘95].
slide-51
SLIDE 51

Outline

  • 1. Online learning setup
  • 2. Dispersion
  • 3. Regret bounds
  • 4. Examples of dispersion
  • 5. Other applications of dispersion
  • 6. Conclusion
slide-52
SLIDE 52

Uniform convergence for batch learning

Theorem: If 𝑣1, … , 𝑣𝑈: ℝ𝑒 → 0,1 are:

  • 1. Independently drawn from a distribution 𝒠
  • 2. Piecewise 𝑀-Lipschitz
  • 3. Globally (𝑥, 𝑙)-dispersed,

W.h.p., for every 𝝇 ∈ ℝ𝑒, 1 𝑈 ෍

𝑢=1 𝑈

𝑣𝑢 𝝇 − 𝔽𝑣∼𝒠 𝑣 𝝇 = ෨ 𝑃 𝑒 𝑈 log 1 𝑥 + 𝑀𝑥 + 𝑙 𝑈 .

slide-53
SLIDE 53

Differentially private optimization

Given 𝑣1, … , 𝑣𝑈: 𝐶𝑒(𝟏, 1) → 0,1 up front. Goal:

  • Find (approximate) maximizer of

1 𝑈 σ𝑢=1 𝑈

𝑣𝑢.

  • Preserve 𝜗-DP w.r.t. changing any one function.

Exponential mechanism [McSherry-Talwar ’07] has suboptimality ෨ 𝑃 𝑒 𝑈𝜗 log 1 𝑥 + 𝑀𝑥 + 𝑙 𝑈 . Matching lower bounds!

slide-54
SLIDE 54

Outline

  • 1. Online learning setup
  • 2. Dispersion
  • 3. Regret bounds
  • 4. Examples of dispersion
  • 5. Other applications of dispersion
  • 6. Conclusion
slide-55
SLIDE 55

Conclusions and open questions

  • Introduced dispersion.
  • Measures concentration of discontinuities of PWL functions.
  • Implies regret bounds for online optimization of PWL functions.
  • Batch learning and private optimization guarantees.
  • Examples of dispersion in real problems.
slide-56
SLIDE 56

Conclusions and open questions

  • Introduced dispersion.
  • Measures concentration of discontinuities of PWL functions.
  • Implies regret bounds for online optimization of PWL functions.
  • Batch learning and private optimization guarantees.
  • Examples of dispersion in real problems.

Open Questions:

  • Bad properties beyond discontinuities?
  • Config. between full-info and bandit. Can we provide algos?