Federated Machine Learning via Over-the-Air Computation Yuanming - PowerPoint PPT Presentation

Federated Machine Learning via Over-the-Air Computation Yuanming Shi ShanghaiTech University 1

Outline  Motivations  Big data, IoT,AI  Three vignettes:  Federated machine learning  Federated model aggregation  Over-the-air computation  Joint device selection and beamforming design  Sparse and low-rank optimization  Difference-of-convex programming algorithm 2

Intelligent IoT ecosystem (Internet of Skills) Tactile Internet Internet of Things Mobile Internet Develop computation, communication & AI technologies: enable smart IoT applications to make low-latency decision on streaming data 3

Intelligent IoT applications Autonomous vehicles Smart home Smart city Smart agriculture Smart drones Smart health 4

Challenges  Retrieve or infer information from high-dimensional/large-scale data 2.5 exabytes of data are generated every day (2012) exabyte zettabyte yottabyte...?? We’re interested in the information rather than the data Challenges:  High computational cost  Only limited memory is available  Do NOT want to compromise statistical accuracy limited processing ability (computation, storage, ...) 5

High-dimensional data analysis (big) data Models: (deep) machine learning Methods: 1. Large-scale optimization 2. High-dimensional statistics 3. Device-edge-cloud computing 6

Deep learning: next wave of AI image speech natural language recognition recognition processing 7

Cloud-centric machine learning 8

The model lives in the cloud 9

We train models in the cloud 10

Make predictions in the cloud 12

Gather training data in the cloud 13

And make the models better 14

Why edge machine learning? 15

Learning on the edge  The emerging high-stake AI applications: low-latency, privacy,… drones phones robots where to compute? glasses self driving cars 16

Mobile edge AI  Processing at “edge” instead of “cloud” 17

Edge computing ecosystem  “Device-edge-cloud” computing system for mobile AI applications Grid Power Shannon (communication) meets Turing (computing) cloud Cloud Center Wireless Network computing Edge device on-device Power Supply computing Local Charge Processing Active Servers Inactive Servers mobile edge Discharge User Devices computing MEC server 18

Edge machine learning  Edge ML: both ML inference and training processes are pushed down into the network edge (bottom) 19

On-device inference 20

Deep model compression  Layer-wise deep neural network pruning via sparse optimization sparse optimization [Ref] T. Jiang, X. Yang, Y. Shi, and H. Wang, “Layer-wise deep neural network pruning via iteratively reweighted optimization,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) , Brighton, UK, May 2019. 21

Edge distributed inference  Wireless MapReduce for on-device distributed inference process wireless distributed computing system distributed computing model [Ref] K. Yang, Y. Shi, and Z. Ding, “Data shuffling in wireless distributed computing via low-rank optimization,” 22 IEEE Trans. Signal Process. , vol. 67, no. 12, pp. 3087-3099, Jun., 2019.

This talk: On-device training 23

Vignettes A: Federated machine learning 24

Federated computation and learning  Goal: imbue mobile devices with state of the art machine learning systems without centralizing data and with privacy by default  Federated computation: a server coordinates a fleet of participating devices to compute aggregations of devices’ private data  Federated learning: a shared global model is trained via federated computation 25

Federated learning 26

Federated learning 2 27 7

Federated learning: applications  Applications: where the data is generated at the mobile devices and is undesirable/infeasible to be transmitted to centralized servers financial services keyboard prediction smart retail smart healthcare 33

Federated learning over wireless networks  Goal: train a shared global model via wireless federated computation System challenges  Massively distributed  Node heterogeneity Statistical challenges  Unbalanced  Non-IID  Underlying structure on-device distributed federated learning system 34

How to efficiently aggregate models over wireless networks? 35

Vignettes B: Over-the-air computation 36

Model aggregation via over-the-air computation  Aggregating local updates from mobile devices  weighted sum of messages mobile devices and one antenna  base station Over-the-air computation: is the set of  explore signal superposition of selected devices a wireless multiple-access channel for model aggregation is the data size at device  37

Over-the-air computation  The estimated value before post-processing at the BS is the transmitter scalar, is the received beamforming vector, is a  normalizing factor  target function to be estimated:  recovered aggregation vector entry via post-processing:  Model aggregation error:  Optimal transmitter scalar: 38

Problem formulation  Key observations: More selected devices yield fast convergence rate of the training process   Aggregation error leads to the deterioration of model prediction accuracy 39

Problem formulation  Goal: maximize the number of selected devices under target MSE constraint  Joint device selection and received beamforming vector design  Improve convergence rate in the training process , guarantee prediction accuracy in the inference process  Mixed combinatorial optimization problem 40

Vignettes C: Sparse and low-rank optimization 41

Sparse and low-rank optimization  Sparse and low-rank optimization for on-device federated learning multicasting duality sum of feasibilities matrix lifting 42

Problem analysis  Goal: induce sparsity while satisfying fixed-rank constraint  Limitations of existing methods  Sparse optimization: iterative reweighted algorithms are parameters sensitive  Low-rank optimization: semidefinite relaxation (SDR) approach (i.e., drop rank-one constraint) has the poor capability of returning rank-one solution 43

Difference-of-convex functions representation  Ky Fan norm [Fan, PNAS’1951]: the sum of largest- absolute values convex function is a permutation of ,where  PNAS’1951 44

Difference-of-convex functions representation  DC representation for sparsity function  DC representation for rank-one positive semidefinite matrix algorithmic  where advantages? [Ref] J.-y. Gotoh, A. Takeda, and K. Tono, “DC formulations and algorithms for sparse optimization problems,” Math. Program., vol. 169, pp. 141– 176, May 2018. 45

A DC representation framework  A two-step framework for device selection  Step 1: obtain the sparse solution such that the objective value achieves zero through increasing from to zero? 46

A DC representation framework  Step II: feasibility detection  Ordering in descending order as  Increasing from to , choosing as  Feasibility detection via DC programming zero? 47

DC algorithm with convergence guarantees and : minimize the difference of two strongly convex functions   e.g., and  The DC algorithm via linearizing the concave part  converge to a critical point with speed 48

Numerical results  Convergence of the proposed DC algorithm for problem 49

Numerical results  Probability of feasibility with different algorithms 50

Numerical results  Average number of selected devices with different algorithms 51

Numerical results  Performance of proposed fast model aggregation in federated learning  Training an SVM classifier on CIFAR-10 dataset 52

Concluding remarks  Wireless communication meets machine learning  Over-the-air computation for fast model aggregation  Sparse and low-rank optimization framework  Joint device selection and beamforming design  A unified DC programming framework  DC representation for sparse and low-rank functions 53

Future directions  Federated learning  security, provable guarantees, …  Over-the-air computation  channel uncertainty, synchronization,…  Sparse and low-rank optimization via DC programming  optimality, scalability,… 54

T o learn more…  Papers:  K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” IEEE Trans. Wireless Commun ., DOI10.1109/TWC.2019.2961673, Jan. 2020.  K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning based on over-the- air computation,” in Proc. IEEE Int. Conf. Commun. (ICC), Shanghai, China, May 2019. http://shiyuanming.github.io/home.html 55

Tha hank nks 56

Federated Machine Learning via Over-the-Air Computation Yuanming - PowerPoint PPT Presentation

Federated Machine Learning via Over-the-Air Computation Yuanming Shi ShanghaiTech University 1 Outline Motivations Big data, IoT,AI Three vignettes: Federated machine learning Federated model aggregation Over-the-air

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Fair Resource Allocation in Federated Learning Tian Li (CMU) , Maziar Sanjabi (Facebook AI), Ahmad

Analyzing Federated Learning through an Adversarial Lens Arjun Nitin Bhagoji 1 , Supriyo

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Docker in the EGI Docker in the EGI Federated Cloud Federated Cloud Carlos Gimeno

#AIR AIR EXPRESS SELECTION AIR SOLUTION 4 YOU AIR EXPRESS SELECTION MOBILE DUST EXTRACTORS

Air Air Car Cargo go in IL in IL & the S & the South outh Suburban Suburban Air

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Open-Channel Solid State Drives Matias Bjrling 2015/03/12 Vault 1 Solid State Drives

Mass Storage & IO Positioning time ( random-access time ) is time to move disk arm to

36. I/O Devices Operating System: Three Easy Pieces 1 Youjip Won I/O Devices I/O is

Input-output Basic (simplified) I/O architecture I/O is very much architecture/system

Need for a Deeper Cross-Layer Optimization for Dense NAND SSD to Improve Read Performance of Big

Di-Higgs production and Higgs self-coupling in ATLAS at HL-LHC Petar Bokan on behalf of the

LECTURE 33 NETWORK ARCHITECTURE MCS 260 Fall 2020 David Dumas / REMINDERS Quiz 11 due today

Control Path Design and Lab 3 1 Separating Control From Data The datapath is where data

Federated Machine Learning via Over-the-Air Computation Yuanming - PowerPoint PPT Presentation

Federated Machine Learning via Over-the-Air Computation Yuanming Shi ShanghaiTech University 1 Outline Motivations Big data, IoT,AI Three vignettes: Federated machine learning Federated model aggregation Over-the-air

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Fair Resource Allocation in Federated Learning Tian Li (CMU) , Maziar Sanjabi (Facebook AI), Ahmad

Analyzing Federated Learning through an Adversarial Lens Arjun Nitin Bhagoji 1 , Supriyo

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Docker in the EGI Docker in the EGI Federated Cloud Federated Cloud Carlos Gimeno

#AIR AIR EXPRESS SELECTION AIR SOLUTION 4 YOU AIR EXPRESS SELECTION MOBILE DUST EXTRACTORS

Air Air Car Cargo go in IL in IL &amp; the S &amp; the South outh Suburban Suburban Air

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Open-Channel Solid State Drives Matias Bjrling 2015/03/12 Vault 1 Solid State Drives

Mass Storage &amp; IO Positioning time ( random-access time ) is time to move disk arm to

36. I/O Devices Operating System: Three Easy Pieces 1 Youjip Won I/O Devices I/O is

Input-output Basic (simplified) I/O architecture I/O is very much architecture/system

Need for a Deeper Cross-Layer Optimization for Dense NAND SSD to Improve Read Performance of Big

Di-Higgs production and Higgs self-coupling in ATLAS at HL-LHC Petar Bokan on behalf of the

LECTURE 33 NETWORK ARCHITECTURE MCS 260 Fall 2020 David Dumas / REMINDERS Quiz 11 due today

Control Path Design and Lab 3 1 Separating Control From Data The datapath is where data

Air Air Car Cargo go in IL in IL & the S & the South outh Suburban Suburban Air

Mass Storage & IO Positioning time ( random-access time ) is time to move disk arm to