Differentially-Private Federated Linear Bandits Introduction - - PowerPoint PPT Presentation

differentially private federated linear bandits
SMART_READER_LITE
LIVE PREVIEW

Differentially-Private Federated Linear Bandits Introduction - - PowerPoint PPT Presentation

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual Bandits Summary Background Abhimanyu Dubey and Alex Pentland


slide-1
SLIDE 1

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Differentially-Private Federated Linear Bandits

Abhimanyu Dubey and Alex Pentland

Media Lab and Institute for Data Systems and Society (IDSS) Massachusetts Institute of Technology dubeya@mit.edu

June 2020

slide-2
SLIDE 2

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Federated Learning

Figure: Federated Learning (courtesy blogs.nvidia.com).

slide-3
SLIDE 3

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Federated Learning

Advantages: ◮ Agents have small personal datasets, resulting in weak local models. ◮ The federated learning model allows each agent to leverage the stronger joint model trained on data from all agents. ◮ Federated learning is designed to be private:

◮ No raw data leaves any agent. ◮ All messages sent to the server must keep user data private.

Challenges: ◮ Communication-utility tradeoff: frequent communication can be expensive and non-private, but grant higher utility. ◮ Performance guarantees are non-trivial to obtain for private algorithms.

slide-4
SLIDE 4

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Multi-Armed Bandits

Figure: Multi-armed bandit (courtesy lilianweng.github.io).

slide-5
SLIDE 5

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Contextual Bandits

◮ The most fundamental reinforcement learning problem, basic framework to study sequential decision-making. ◮ Contextual bandits have numerous applications:

◮ Recommender systems in e-commerce. ◮ Portfolio selection and management. ◮ Channel selection in distributed communication systems. ◮ Information retrieval and caching. ◮ Power schedules for current limiting in electric vehicle batteries.

slide-6
SLIDE 6

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Summary of Contributions

◮ We study the contextual bandit in a differentially-private federated setting. ◮ We provide the first differentially-private algorithms for both centralized and decentralized federated learning for the multi-agent contextual bandit. ◮ We prove rigorous bounds on the utility of our algorithms - matching near-optimal rates in terms of regret (utility) and only a factor of O(

  • 1/ε)

from the optimal rate in terms of privacy. ◮ We additionally shed some light into the communication-utility tradeoff, and provide design guidelines for practitioners in real-world settings.

slide-7
SLIDE 7

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Single-Agent Contextual Bandits

◮ In each round t, the agent is given a decision set Dt. ◮ They select an action xt ∈ Dt and obtain a reward yt, such that yt = (θ∗)⊤xt + εt, εt is i.i.d. noise, and θ∗ is an unknown (but fixed) parameter vector. ◮ The objective of the problem is to minimize regret: R(T) =

T

  • t=1
  • (θ∗)⊤x∗

t − (θ∗)⊤xt

  • , where , x∗

t = arg max x∈Dt

x⊤θ∗.

slide-8
SLIDE 8

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Federated Contextual Bandits

◮ M agents are each solving the same contextual bandit in parallel.

◮ Each agent m ∈ [M] receives their own (unique) decision sets, and selects actions independently of other agents.

◮ Agents communicate with each other following fixed protocols:

◮ Centralized Setting: Agents synchronize via a central server, i.e., they send synchronization requests to the server, and the server acts as an intermediary. ◮ Decentralized Setting: Agents directly communicate with each other over an undirected network via peer-to-peer messages.

◮ The objective of the problem is to minimize group regret: RM(T) =

  • m∈[M]

T

  • t=1
  • (θ∗)⊤x∗

m,t − (θ∗)⊤xm,t

  • , where , x∗

m,t = arg max x∈Dm,t

x⊤θ∗.

slide-9
SLIDE 9

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

The Upper Confidence Bound (UCB) Algorithm

◮ “Optimism in the face of uncertainty” strategy – i.e. to be optimistic about an arm when we are uncertain of its utility. ◮ In the multi-armed setting, for each arm we compute: UCBk(t) = nk(t−1)

i=1

ri

k

nk(t − 1)

  • empirical mean

+

  • 2 ln(t − 1)

nk(t − 1)

  • exploration bonus

. ◮ Choose arm with largest UCBk(t).

slide-10
SLIDE 10

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

The Upper Confidence Bound (UCB) Algorithm

◮ In the contextual bandit case, we construct an analog to the UCB in the form of a confidence set Et. ◮ Et is a region of Rd that contains θ∗ with high probability. ◮ The action is taken optimistically with respect to Et, i.e., xt = arg max

x∈Dt

  • max

θ∈Etx, θ

  • .
slide-11
SLIDE 11

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

The Upper Confidence Bound (UCB) Algorithm

◮ How do we construct a reasonable Et? ◮ We look to the classic linear prediction problem: linear regression. Given X<t =

  • x⊤

1 x⊤ 2 ... x⊤ t−1

⊤ and y<t = [y1 y2 ... yt−1]⊤, consider: ˆ θt := arg min

θ∈Rd

  • X<tθ − y<t2

2 + θ⊤Htθ

  • ◮ The regression solution can be given by ˆ

θt := (Gt + Ht)−1X ⊤

<ty<t, where

Gt = X ⊤

<tX<t is the Gram matrix of actions, and Ht is a regularizer.

◮ Since we know the finite-sample behavior of linear regression, we can center Et around the estimate ˆ θt to obtain a reasonable algorithm.

slide-12
SLIDE 12

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

The Upper Confidence Bound (UCB) Algorithm

◮ We can therefore set Et as follows (for some fixed βt): Et :=

  • θ ∈ Rd : (θ − ˆ

θt)⊤(Gt + Ht)(θ − ˆ θt) ≤ βt

  • ◮ Et is an ellipsoid centered at ˆ

θt, and βt determines its “radius”. ◮ The UCB can be given as UCBt(x) = ˆ θt, x + βt

  • x⊤(Gt + Ht)−1x
  • .
slide-13
SLIDE 13

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Fundamentals of Cooperation

◮ For n samples (x1, y1), ..., (xn, yn), the error in the linear regression estimate decreases by the rate O(

  • 1/n).

◮ At any time t, each agent has t local samples to make their linear regression estimate, i.e., error rate is O(

  • 1/t).

◮ However, there are M agents in total, therefore, if each agent shares all their

  • bservations with each other, then they can achieve a rate of

O(

  • 1/(Mt)).

◮ This can be achieved if agents synchronize every round, but privacy and computational constraints make this infeasible.

slide-14
SLIDE 14

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

(ε, δ)-Differential Privacy

◮ Differential privacy is the widely-accepted standard for statistical privacy. ◮ Let D and D′ be two datasets that differ in 1 entry, and A be an algorithm whose output are in the set S. ◮ A is (ε, δ)-differentially private with respect to its inputs if for any subset S′

  • f S and pair of datasets D, D′, we have:

P(A(D) ∈ S′) ≤ eεP(A(D′) ∈ S′) + δ. ◮ Basically, the likelihood of A taking any specific output should not fluctuate more than eε ≈ (1 + ε) by the presence of any specific entry in its input. ◮ If the dataset D contains data from different users, it implies that A is sufficiently insensitive to the presence of any specific users’ data.

slide-15
SLIDE 15

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Algorithm Design

◮ The algorithm for any agent is entirely dependent on the regularized Gram matrix Vt = Gt + Ht and bias vector ut = X ⊤

<tyt = t τ=1 xt · yt.

◮ The cooperative versions of these parameters enjoy linearity, i.e., they can be given by Vt =

m∈[M] Vm,t and ut = m∈[M] um,t.

◮ To achieve cooperation, the server can, at each round, simply sum up the individual parameters to obtain Vt and ut. There are 2 major issues with this, however:

◮ O(T) communication complexity. ◮ The algorithm is not differentially private.

slide-16
SLIDE 16

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Algorithm Design

◮ However, if each agent m ∈ [M] constructs alternate Vm,t and ˜ um,t that are (ε, δ)-differentially private, then we can construct estimates

  • Vt =

m∈[M]

Vm,t and ˜ ut =

m∈[M] ˜

um,t that are also private. ◮ Moreover, we can reduce synchronization to O(log T) by carefully selecting the synchronization rounds without degrading the regret too much.

slide-17
SLIDE 17

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Algorithm

◮ Each agent m ∈ [M] takes actions using their local Vm,t and um,t at each round, and obtains xm,t and ym,t. ◮ If log det(Vm,t + xm,tx⊤

m,t) ≥ Dt for some sequence Dt, then the agent m

sends a request to synchronize to the central server. ◮ If the server receives a request to synchronize, it obtains privatized Vm,t and ˜ um,t from each agent, sums them up to obtain Vt and ˜ ut and transmits that back to each agent. ◮ Each agent uses the new (privatized) Gram matrix and bias parameters. ◮ By carefully selecting Dt, we can obtain the desired level of privacy and communication.

slide-18
SLIDE 18

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Regret Guarantees

FedUCB obtains O(d3/4 MT/ε) regret in the centralized setting and

  • O(d3/4

(¯ χ(Gγ) · γ)MT/ε) regret in the decentralized setting. ◮ ε controls the level of privacy, i.e., for a prefectly private system, ε → 0, leading to O(T) regret. A lower bound exists that implies that such a dependency is inevitable. ◮ In the non-private centralized case, our algorithm obtains O( √ MT) regret, which matches the optimal rate. ◮ In the decentralized case, our bound depends on the clique number of the γ-power of G (i.e., the communication graph).

◮ If diam(G) ≤ γ, this factor is 1 (since there is only 1 clique). ◮ If G is a line graph (worst-case), this factor is O(M/γ).

slide-19
SLIDE 19

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Conclusion

We provide new results on federated bandit learning under differential privacy. Future research directions include: ◮ Our analysis is still sub-optimal in the decentralized setting owing to a subsampling argument. Future work can use an alternative technique to

  • btain sharper rates.

◮ The current setting assumes full cooperation (i.e., all agents have the same bandit problem), this can be generalized to non-identical bandit problems (currently in preparation). ◮ The far more ubiquitous problem is supervised learning, and extending our results to this setting is a natural next step (currently in preparation).

slide-20
SLIDE 20

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction

Federated Learning Contextual Bandits Summary

Background

Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy

Method

Algorithm Design Algorithm Regret Guarantees

Conclusion

Thank You!