agnostic federated learning
play

Agnostic federated learning Mehryar Mohri 1 , 2 , Gary Sivek 1 , - PowerPoint PPT Presentation

Agnostic federated learning Mehryar Mohri 1 , 2 , Gary Sivek 1 , Ananda Theertha Suresh 1 1 Google Research, 2 Courant Institute June 11, 2019 1/8 Federated learning scenario [McMahan et al., 17] centralized model client A client


  1. Agnostic federated learning Mehryar Mohri 1 , 2 , Gary Sivek 1 , Ananda Theertha Suresh 1 1 Google Research, 2 Courant Institute June 11, 2019 1/8

  2. Federated learning scenario [McMahan et al., ’17] centralized model · · · client A client B client Z ◮ Data from large number of clients (phones, sensors) ◮ Data remains distributed over clients ◮ Centralized model trained based on data What is the loss function? 2/8

  3. Standard federated learning Setting ◮ Merge samples from all clients and minimize loss ◮ Domains: clusters of clients ◮ Clients belong to p domains: D 1 , D 2 , . . . , D p Training procedure ◮ ˆ D k : empirical distribution of D k with m k samples ◮ ˆ U : uniform distribution over all observed samples p m k ˆ ˆ � U = D k � p i =1 m i k =1 ◮ Minimize loss over uniform distribution min h ∈H L ˆ U ( h ) 3/8

  4. Inference distribution Training Proprietary + Confidential Inference APPENDIX The loss function? Proprietary + Confidential The loss function? APPENDIX Inference Training Inference Training Inference distribution is not same as training distribution Inference distribution is not same as training distribution – E.g., training only when the phone is connected to wifj and is being charged – E.g., training only when the phone is connected to wifj and is being charged Inference distribution is not same as the training distribution P 4 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem P 4 Permissions, hardware compatibility, network constraints Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem 4/8

  5. Agnostic federated learning D λ · · · D 1 D 2 D p ◮ Learn model that performs well over any mixture of domains ◮ D λ = � p k =1 λ k · ˆ D k ◮ λ is unknown and belongs to Λ ⊆ ∆ p ◮ Minimize the agnostic loss min h ∈H max λ ∈ Λ L D λ ( h ) ◮ Fairness implications 5/8

  6. Theoretical results Generalization bound Asume L is bounded by M . For any δ > 0, with probability at least 1 − δ , for all h ∈ H and λ ∈ Λ, � s ( λ � m ) log | Λ ǫ | L D λ ( h ) ≤ L D λ ( h ) + 2 R m ( G , λ ) + M ǫ + M 2 m δ ◮ R m ( G , λ ) : weighted Rademacher complexity ◮ s ( λ � m ) : skewness parameter 1 + χ 2 ( λ, m ) ◮ Regularization based on generalization bound Efficient algorithms? 6/8

  7. Stochastic optimization as a two player game Algorithm Stochastic-AFL Initialization : w 0 ∈ W and λ 0 ∈ Λ. Parameters : step size γ w > 0 and γ λ > 0. For t = 1 to T : 1. Stochastic gradients: δ w L( w t − 1 , λ t − 1 ) and δ λ L( w t − 1 , λ t − 1 ) 2. w t = Project ( w t − 1 − γ w δ w L( w t − 1 , λ t − 1 ) , W ) 3. λ t = Project ( λ t − 1 + γ λ δ λ L( w t − 1 , λ t − 1 ) , Λ) Output : w A = 1 t =1 w t and λ A = 1 � T � T t =1 λ t T T Results √ ◮ 1 / T convergence ◮ Extensions to stochastic mirror descent ◮ Experimental validation of the above results 7/8

  8. Thank you!, more at poster #172 8/8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend