Optimizing Federated Learning on Non-IID Data with Reinforcement - - PowerPoint PPT Presentation

optimizing federated learning on non iid data with
SMART_READER_LITE
LIVE PREVIEW

Optimizing Federated Learning on Non-IID Data with Reinforcement - - PowerPoint PPT Presentation

INFOCOM20 Optimizing Federated Learning on Non-IID Data with Reinforcement Learning Hao Wang *, Zakhary Kaplan*, Di Niu^, Baochun Li* *University of Toronto, ^University of Alberta < < > > Alexa Siri 2 Machine


slide-1
SLIDE 1

Hao Wang*, Zakhary Kaplan*, Di Niu^, Baochun Li* *University of Toronto, ^University of Alberta

INFOCOM’20

Optimizing Federated Learning on Non-IID Data with Reinforcement Learning

slide-2
SLIDE 2

Siri Alexa

< …> < …>

2

slide-3
SLIDE 3

Learning Machine

slide-4
SLIDE 4

Learning Federated

slide-5
SLIDE 5
slide-6
SLIDE 6

5

Federated Averaging Algorithm (FedAvg)

slide-7
SLIDE 7

6

Local data Local model Random selection

slide-8
SLIDE 8

6

Local data Local model Random selection

slide-9
SLIDE 9

7

Thank you for the feedback Local data Local model

slide-10
SLIDE 10

8

ML algorithms assume the training data is independent and identically distributed (IID)

slide-11
SLIDE 11

9

Federated Learning reuses the existing ML algorithms but on non-IID data

slide-12
SLIDE 12

10

… < > … < > …

slide-13
SLIDE 13

10

… < > …

slide-14
SLIDE 14

Non-IID data introduces bias into the training and leads to a slow convergence and training failures

11

slide-15
SLIDE 15

MNIST

http://yann.lecun.com/exdb/mnist/

12

slide-16
SLIDE 16

Accuracy (%) 91 93 95 97 100 Communication Round (#) 1 10 19 28 37 46 55 64 73 82 91 100109 118 127 136145154

FedAvg-IID FedAvg-non-IID

13

slide-17
SLIDE 17

No, we don’t have any access to the data on your phone.

14

Build IID training data?

slide-18
SLIDE 18

Shared Data

α× Shared

Data

α× Shared

Data

α× Shared

Data Private Data

α× Shared

Data Private Data

α× Shared

Data Private Data

α× Shared

Data

Figure 6: Illustration of the data-

Zhao, Yue, et al. "Federated Learning with Non-IID Data." arXiv preprint arXiv:1806.00582 (2018).

15

slide-19
SLIDE 19

Optimizing Federated Learning on Non-IID Data with Reinforcement Learning [INFOCOM’20]

16

slide-20
SLIDE 20

Peeking into the data distribution

  • n each device without violating

data privacy

17

Build IID training data? No Probing the bias of non-IID data

slide-21
SLIDE 21

… < > …

Carefully select devices to balance the bias introduced by non-IID data

18

slide-22
SLIDE 22

Probing the data distribution

slide-23
SLIDE 23

20

Initial model Local model A two-layer CNN model with 431,080 parameters 100 devices, each has 600 samples Non-IID data 80% data has the same label, e.g, “6”

slide-24
SLIDE 24

We apply Principle Component Analysis (PCA) to reduce dimensionality

21

431,080-dimension model weight 2-dimension space

slide-25
SLIDE 25

C1 −0.2 −0.1 0.1 0.2 0.3 0.4 C0 −0.2 0.2 0.4 0.6

−0.10 −0.05 −0.05 0.05

22

slide-26
SLIDE 26

An implicit connection between model weights and data distribution

23

… …

slide-27
SLIDE 27

Selecting devices for federated learning Probing the data distribution

slide-28
SLIDE 28

25

< > < >

slide-29
SLIDE 29

C1 −0.2 −0.1 0.1 0.2 0.3 0.4 C0 −0.2 0.2 0.4 0.6

−0.10 −0.05 −0.05 0.05

26

slide-30
SLIDE 30

K-Center Clustering

27

slide-31
SLIDE 31

Random Selection from Groups

28

slide-32
SLIDE 32

Accuracy (%) 91 93 95 97 100 Communication Round (#) 1 31 61 91 121 151

FedAvg-IID FedAvg-non-IID K-Center-non-IID

29

slide-33
SLIDE 33

Selecting devices for federated learning Probing the data distribution How to select devices to speed up training ?

slide-34
SLIDE 34

It is difficult to select the appropriate subset of devices

  • Model weights —> device selection choice
  • A dynamic and undeterministic problem

31

Reinforcement Learning (RL)

slide-35
SLIDE 35

Reward State Agent

… FL server

Environment

32

Action

Episode (…,state, action, reward, state’, action’, …,end)

slide-36
SLIDE 36

34

… (…,state, action, reward, state’, action’, …,end) (…,state, action, reward, state’, action’, …,end) (…,state, action, reward, state’, action’, …,end) (…,state, action, reward, state’, action’, …,end) (…,state, action, reward, state’, action’, …,end) (…,state, action, reward, state’, action’, …,end) (…,state, action, reward, state’, action’, …,end) (…,state, action, reward, state’, action’, …,end)

Learn to maximize sum(reward)

slide-37
SLIDE 37

States

Global weights Local model weights

< > …

35

100-dimension vector

slide-38
SLIDE 38

Select K devices from a pool of N devices — a huge action space

Selecting 10 devices from a pool of 100 devices leads to 1.7310309e+13 possible actions

Actions

36

slide-39
SLIDE 39

Modify the RL training algorithm

slide-40
SLIDE 40

Only one device is selected during the RL training Now the action space is {1, 2, …, N}, instead of selecting K devices from N devices

Selecting the Top K Devices

38

slide-41
SLIDE 41

Scores 0.3 0.5 0.1 … 0.2

39

Evaluating Each Device

… … Select the top K

slide-42
SLIDE 42

Positive constant Training Accuracy Target accuracy Communication round #

Ω ωt Ξ t

40

Rewards

rt = Ξ(ωt−Ω) − 1 0 ⩽ ωt ⩽ Ω ⩽ 1 rt ∈ (−1,0] ! Accuracy increase:

⬆ —> ⬆

ωt rt

" More communication rounds: ⬆ —> sum( )⬇

t rt

slide-43
SLIDE 43

Training the DRL Agent

41

R =

T

t=1

γt−1rt =

T

t=1

γt−1(Ξ(ωt−Ω) − 1)

γ ∈ (0,1)

discount factor

Max

Look for a function that points out the actions leading to the maximum cumulative return under a particular state

slide-44
SLIDE 44

Action

at

42

Environment

Agent Features softmax … … … …

st−1

State

… FL server

rt

Reward

DDQN

slide-45
SLIDE 45

Cumulative Discounted Reward

  • 110
  • 83
  • 55
  • 28

Episode 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171

Training the DRL agent

43

slide-46
SLIDE 46

DRL agent DRL agent

Selection Check-in Probing Update Update weight

44

slide-47
SLIDE 47

Evaluating Our Solution

Benchmark: MNIST, FashionMNIST, CIFAR-10 Non-IID level: 1, half-and-half, 80%, 50% 80% Half-and-half

45

slide-48
SLIDE 48

Communication Rounds 550 1100 1650 2200 MNIST FashionMNIST CIFAR-10

FedAvg K-Center Favor

Non-IID level 1

46

slide-49
SLIDE 49

Communication Rounds 400 800 1200 1600 MNIST FashionMNIST CIFAR-10

FedAvg K-Center Favor

Non-IID level half & half

47

slide-50
SLIDE 50

Communication Rounds 60 120 180 240 MNIST FashionMNIST CIFAR-10

FedAvg K-Center Favor

Non-IID level 80%

48

slide-51
SLIDE 51

Communication Rounds 18 35 53 70 MNIST FashionMNIST CIFAR-10

FedAvg K-Center Favor

Non-IID level 50%

49

slide-52
SLIDE 52

winit w1 w2 w3 w4 w5

Local weights Global weights C2 −0.5 0.5 1.0 1.5 C1 1.0 1.5 2.0 2.5 3.0

winit w1 w2 w3 w4

Local weights Global weights C2 −0.5 0.5 1.0 1.5 C1 1.0 1.5 2.0 2.5 3.0

FedAvg Favor

50

slide-53
SLIDE 53

Indirect data distribution probing DRL-based device selection Communication rounds can be reduced by up to

  • 49% on the MNIST
  • 23% on FashionMNIST
  • 42% on CIFAR-10

51