Federated Learning Min Du Postdoc, UC Berkeley Outline q - PowerPoint PPT Presentation

Federated Learning Min Du Postdoc, UC Berkeley

Outline q Preliminary: deep learning and SGD q Federated learning: FedSGD and FedAvg q Related research in federated learning q Open problems

The goal of deep learning • Find a function, which produces a desired output given a particular input. Example task Given input Desired output Image classification 8 𝑥 is the set of parameters contained by Next-word-prediction Looking forward to your ? reply the function Playing GO Next move

Finding the function: model training • Given one input sample pair 𝑦 # , 𝑧 # , the goal of deep learning model training is to find a set of parameters 𝑥 , to maximize the probability of outputting 𝑧 # given 𝑦 # . Given input: 𝑦 # Maximize: 𝑞(5|𝑦 # , 𝑥)

Finding the function: model training • Given a training dataset containing 𝑜 input-output pairs 𝑦 , , 𝑧 , , 𝑗 ∈ 1, 𝑜 , the goal of deep learning model training is to find a set of parameters 𝑥 , such that the average of 𝑞(𝑧 , ) is maximized given 𝑦 , . Given input: Output:

Finding the function: model training • Given a training dataset containing 𝑜 input-output pairs 𝑦 , , 𝑧 , , 𝑗 ∈ 1, 𝑜 , the goal of deep learning model training is to find a set of parameters 𝑥 , such that the average of 𝑞(𝑧 , ) is maximized given 𝑦 , . • That is, 7 1 𝑛𝑏𝑦𝑗𝑛𝑗𝑨𝑓 𝑜 4 𝑞(𝑧 , |𝑦 , , 𝑥) ,56 Which is equivalent to 7 A basic component for loss function 1 𝑛𝑗𝑜𝑗𝑛𝑗𝑨𝑓 𝑜 4 −log(𝑞(𝑧 , |𝑦 , , 𝑥)) 𝑚(𝑦 , , 𝑧 , , 𝑥) given sample 𝑦 , , 𝑧 , : ,56 Let 𝑔 , 𝑥 = 𝑚(𝑦 , , 𝑧 , , 𝑥) denote the loss function.

Deep learning model training For a training dataset containing 𝑜 samples (𝑦 , , 𝑧 , ), 1 ≤ 𝑗 ≤ 𝑜 , the training objective is: where 𝑔 𝑥 ≝ 6 7 7 ∑ ,56 C∈ℝ E 𝑔(𝑥) min 𝑔 , (𝑥) 𝑔 , 𝑥 = 𝑚(𝑦 , , 𝑧 , , 𝑥) is the loss of the prediction on example 𝑦 , , 𝑧 , No closed-form solution : in a typical deep learning model, 𝑥 may contain millions of parameters. Non-convex : multiple local minima exist. 𝑔(𝑥) 𝑥

Solution: Gradient Descent Loss 𝑔(𝑥) How to stop? – when the update is small enough – converge. ∥ 𝑥 IJ6 − 𝑥 I ∥≤ 𝜗 Randomly initialized weight 𝑥 or ∥ ∇𝑔(𝑥 I ) ∥≤ 𝜗 At the local minimum, ∇𝑔(𝑥) is close to 0. Compute gradient ∇𝑔(𝑥) 𝑥 IJ6 = 𝑥 I − 𝜃∇𝑔(𝑥) 𝑥 (Gradient Descent) Learning rate 𝜃 controls the step size Problem : Usually the number of training samples n is large – slow convergence

Solution: Stochastic Gradient Descent (SGD) At each step of gradient descent, instead of compute for all training ● samples, randomly pick a small subset (mini-batch) of training samples 𝑦 N , 𝑧 N . 𝑥 IJ6 ← 𝑥 I − 𝜃∇𝑔 𝑥 I ; 𝑦 N , 𝑧 N Compared to gradient descent, SGD takes more steps to converge, but ● each step is much faster.

Outline q Preliminary: deep learning and SGD q Federated learning: FedSGD and FedAvg q Related research in federated learning q Open problems

The importance of data for ML “ The biggest obstacle to using advanced data analysis isn’t skill base or technology; ” it’s plain old access to the data. -Edd Wilder-James, Harvard Business Review

“Data is the New Oil”

image classification: e.g. to predict which photos are most likely to be viewed Google, Apple, multiple times in the future; ...... language models: ML model e.g. voice recognition, next-word-prediction, and auto-reply in Gmail Private data: all the photos a user takes and everything they type on their mobile keyboard, including passwords, URLs, messages, etc.

Addressing privacy: Model parameters will never contain more information than the raw training data Google, Apple, ...... MODEL Addressing network overhead: ML model AGGREGATION The size of the model is generally smaller than the size of the raw training data ML ML ML model model model Instead of uploading the raw data, train a model locally and upload the model .

Federated optimization Characteristics (Major challenges) ● Non-IID ○ The data generated by each user are quite different ■ Unbalanced ○ Some users produce significantly more data than others ■ Massively distributed ○ # mobile device owners >> avg # training samples on each device ■ Limited communication ○ Unstable mobile network connections ■

A new paradigm – Federated Learning a synchronous update scheme that proceeds in rounds of communication McMahan, H. Brendan, Eider Moore, Daniel Ramage, and Seth Hampson. "Communication-efficient learning of deep networks from decentralized data." AISTATS, 2017 .

Federated learning – overview Local Local data data Model M(i) Model M(i) Gradient updates Gradient updates for M(i) for M(i) Global model M(i) Model M(i) Model M(i) Gradient updates Gradient updates Central Server for M(i) for M(i) Local Local data data In round number i… Deployed by Google, Apple, etc.

Federated learning – overview Local Local data data Updates of M(i) Updates of M(i) Gradient updates Gradient updates for M(i) for M(i) Model Aggregation M(i+1) Updates of M(i) Updates of M(i) Gradient updates Gradient updates Central Server for M(i) for M(i) Local Local data data In round number i…

Federated learning – overview Local Local Model M(i+1) data data Model M(i+1) Global model M(i+1) Model M(i+1) Model M(i+1) Central Server Local Local data data Round number i+1 and continue…

Federated learning – detail For efficiency, at the beginning of each round, a random fraction C of clients is selected, and the server sends the current model parameters to each of these clients.

Federated learning – detail Recall in traditional deep learning model training ● For a training dataset containing 𝑜 samples (𝑦 , , 𝑧 , ), 1 ≤ 𝑗 ≤ 𝑜 , the training ○ objective is: 6 7 7 ∑ ,56 C∈ℝ E 𝑔(𝑥) min where 𝑔 𝑥 ≝ 𝑔 , (𝑥) 𝑔 , 𝑥 = 𝑚(𝑦 , , 𝑧 , , 𝑥) is the loss of the prediction on example 𝑦 , , 𝑧 , Deep learning optimization relies on SGD and its variants, through mini-batches ○ 𝑥 IJ6 ← 𝑥 I − 𝜃∇𝑔 𝑥 I ; 𝑦 N , 𝑧 N

Federated learning – detail In federated learning ● Suppose 𝑜 training samples are distributed to 𝐿 clients, where 𝑄 N is the set of ○ indices of data points on client 𝑙 , and 𝑜 N = 𝑄 N . For training objective: min C∈ℝ E 𝑔(𝑥) ○ 7 U 6 T 𝑔 𝑥 = ∑ N56 7 U ∑ ,∈W U 𝑔 7 𝐺 N (𝑥) where 𝐺 N (𝑥) ≝ , (𝑥)

A baseline – FederatedSGD (FedSGD) A randomly selected client that has 𝑜 N training data samples in federated ● learning ≈ A randomly selected sample in traditional deep learning Federated SGD (FedSGD): a single step of gradient descent is done per ● round Recall in federated learning, a C -fraction of clients are selected at each ● round. C =1: full-batch (non-stochastic) gradient descent ○ C <1: stochastic gradient descent (SGD) ○

A baseline – FederatedSGD (FedSGD) Learning rate: 𝜃 ; total #samples: 𝑜 ; total #clients: 𝐿 ; #samples on a client k : 𝑜 N ; clients fraction 𝐷 = 1 In a round t: ● The central server broadcasts current model 𝑥 I to each client; each client k computes ○ gradient: 𝑕 N = ∇𝐺 N (𝑥 I ) , on its local data. Approach 1: Each client k submits 𝑕 N ; the central server aggregates the gradients to generate a ■ new model: ` a _ 7 U Recall f w = ∑ ^56 ` F ^ (w) T 𝑥 IJ6 ← 𝑥 I − 𝜃∇𝑔 𝑥 I = 𝑥 I − 𝜃 ∑ N56 7 𝑕 N . ● N ← 𝑥 I − 𝜃𝑕 N ; the central server performs Approach 2: Each client k computes: 𝑥 IJ6 ■ aggregation: For multiple times ⟹ FederatedAveraging (FedAvg) 7 U T N 𝑥 IJ6 ← ∑ N56 7 𝑥 IJ6 ●

Federated learning – deal with limited communication Increase computation ● Select more clients for training between each communication round ○ Increase computation on each client ○

Federated learning – FederatedAveraging (FedAvg) Learning rate: 𝜃 ; total #samples: 𝑜 ; total #clients: 𝐿 ; #samples on a client k : 𝑜 N ; clients fraction 𝐷 In a round t: ● The central server broadcasts current model 𝑥 I to each client; each client k computes ○ gradient: 𝑕 N = ∇𝐺 N (𝑥 I ) , on its local data. Approach 2: ■ N Each client k computes for E epochs : 𝑥 IJ6 ← 𝑥 I − 𝜃𝑕 N ● 7 U T N The central server performs aggregation: 𝑥 IJ6 ← ∑ N56 7 𝑥 IJ6 ● 7 U Suppose B is the local mini-batch size, #updates on client k in each round: 𝑣 N = 𝐹 e . ●

Federated learning – FederatedAveraging (FedAvg) Model initialization Two choices: ● On the central server ○ On each client ○ Shared initialization works better in practice. The loss on the full MNIST training set for models generated by 𝜄𝑥 + (1 − 𝜄)𝑥 i

Federated Learning Min Du Postdoc, UC Berkeley Outline q - PowerPoint PPT Presentation

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q Federated learning: FedSGD and FedAvg q Related research in federated learning q Open problems Outline q Preliminary: deep learning and SGD q Federated

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Fair Resource Allocation in Federated Learning Tian Li (CMU) , Maziar Sanjabi (Facebook AI), Ahmad

Analyzing Federated Learning through an Adversarial Lens Arjun Nitin Bhagoji 1 , Supriyo

Federated Machine Learning via Over-the-Air Computation Yuanming Shi ShanghaiTech University 1

Docker in the EGI Docker in the EGI Federated Cloud Federated Cloud Carlos Gimeno

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Agnostic federated learning Mehryar Mohri 1 , 2 , Gary Sivek 1 , Ananda Theertha Suresh 1 1 Google

Anomaly Detection in Smart Buildings using Federated Learning Tuhin Sharma | Binaize Labs

BIT Maintaining Training Efficiency and Accuracy for Edge-assisted Online Federated

FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection Ruixuan

#7 Thinking in possibilities for federated log out. Marcel den Reijer & Fouad Makioui

Federated Optimization in Heterogeneous Networks Tian Li (CMU) , Anit Kumar Sahu (BCAI), Manzil

Lets get a federated identity Do you have access to your email? Youll need a valid email

2nd Meeting of ICAC Federated Identity Status & Plans Mine Altunay October 15, 2019 Current

Talash: Friend Finding in Federated Social Networks Ruturaj Dhekane And Brion Vibber Ruturaj

Authoritative Quality From Campus Identity Management to a Federated Solution EuroCAMP, Porto,

Framed mapping class groups and strata of Abelian differentials Nick Salter Joint with Aaron

Teaching Rigorous Distributed Systems With E ffj cient Model Checking Ellis Michael Doug Woos

Lecture 1.1: Basic set theory Matthew Macauley Department of Mathematical Sciences Clemson

Types Deian Stefan (adopted from my & Edward Yangs CSE242 slides) Today General

Sets and Objects Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University, College

Competency-Based Learning Series: Seminar #3Habits of Work Slides April 2016 Thinking

Dual Phase Considerations Conversation starter, not a complete set of slides. Keep DP in

SEESAW: Se t E nhanced S uperpage Aw are caching Mayank Parasar , Abhishek Bhattacharjee ,

Federated Learning Min Du Postdoc, UC Berkeley Outline q - PowerPoint PPT Presentation

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q Federated learning: FedSGD and FedAvg q Related research in federated learning q Open problems Outline q Preliminary: deep learning and SGD q Federated

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Fair Resource Allocation in Federated Learning Tian Li (CMU) , Maziar Sanjabi (Facebook AI), Ahmad

Analyzing Federated Learning through an Adversarial Lens Arjun Nitin Bhagoji 1 , Supriyo

Federated Machine Learning via Over-the-Air Computation Yuanming Shi ShanghaiTech University 1

Docker in the EGI Docker in the EGI Federated Cloud Federated Cloud Carlos Gimeno

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Agnostic federated learning Mehryar Mohri 1 , 2 , Gary Sivek 1 , Ananda Theertha Suresh 1 1 Google

Anomaly Detection in Smart Buildings using Federated Learning Tuhin Sharma | Binaize Labs

BIT Maintaining Training Efficiency and Accuracy for Edge-assisted Online Federated

FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection Ruixuan

#7 Thinking in possibilities for federated log out. Marcel den Reijer &amp; Fouad Makioui

Federated Optimization in Heterogeneous Networks Tian Li (CMU) , Anit Kumar Sahu (BCAI), Manzil

Lets get a federated identity Do you have access to your email? Youll need a valid email

2nd Meeting of ICAC Federated Identity Status &amp; Plans Mine Altunay October 15, 2019 Current

Talash: Friend Finding in Federated Social Networks Ruturaj Dhekane And Brion Vibber Ruturaj

Authoritative Quality From Campus Identity Management to a Federated Solution EuroCAMP, Porto,

Framed mapping class groups and strata of Abelian differentials Nick Salter Joint with Aaron

Teaching Rigorous Distributed Systems With E ffj cient Model Checking Ellis Michael Doug Woos

Lecture 1.1: Basic set theory Matthew Macauley Department of Mathematical Sciences Clemson

Types Deian Stefan (adopted from my &amp; Edward Yangs CSE242 slides) Today General

Sets and Objects Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University, College

Competency-Based Learning Series: Seminar #3Habits of Work Slides April 2016 Thinking

Dual Phase Considerations Conversation starter, not a complete set of slides. Keep DP in

SEESAW: Se t E nhanced S uperpage Aw are caching Mayank Parasar , Abhishek Bhattacharjee ,

#7 Thinking in possibilities for federated log out. Marcel den Reijer & Fouad Makioui

2nd Meeting of ICAC Federated Identity Status & Plans Mine Altunay October 15, 2019 Current

Types Deian Stefan (adopted from my & Edward Yangs CSE242 slides) Today General