Fair Resource Allocation in Federated Learning Tian Li (CMU) , - - PowerPoint PPT Presentation

fair resource allocation in federated learning
SMART_READER_LITE
LIVE PREVIEW

Fair Resource Allocation in Federated Learning Tian Li (CMU) , - - PowerPoint PPT Presentation

Fair Resource Allocation in Federated Learning Tian Li (CMU) , Maziar Sanjabi (Facebook AI), Ahmad Beirami (Facebook AI), Virginia Smith (CMU) tianli@cmu.edu 1 Federated Learning 2 2 Federated Learning Privacy-preserving training in


slide-1
SLIDE 1

Fair Resource Allocation in Federated Learning

Tian Li (CMU), Maziar Sanjabi (Facebook AI), Ahmad Beirami (Facebook AI), Virginia Smith (CMU)

tianli@cmu.edu

1

slide-2
SLIDE 2

Federated Learning

2 2

slide-3
SLIDE 3

Federated Learning

Privacy-preserving training in heterogeneous, (potentially) massive networks

2 2

slide-4
SLIDE 4

Federated Learning

Privacy-preserving training in heterogeneous, (potentially) massive networks

2 2

slide-5
SLIDE 5

Federated Learning

Privacy-preserving training in heterogeneous, (potentially) massive networks

2 2

slide-6
SLIDE 6

Federated Learning

Privacy-preserving training in heterogeneous, (potentially) massive networks

2

slide-7
SLIDE 7

Federated Learning

Privacy-preserving training in heterogeneous, (potentially) massive networks

2

slide-8
SLIDE 8

Federated Learning

Privacy-preserving training in heterogeneous, (potentially) massive networks

2

slide-9
SLIDE 9

Federated Learning

Privacy-preserving training in heterogeneous, (potentially) massive networks

2

slide-10
SLIDE 10

3

Challenges

slide-11
SLIDE 11

3

Challenges

min

w

F1

FN

F2

p1 +p2 +pN +

( )

  • bjective:
slide-12
SLIDE 12

no accuracy guarantee for individual devices

3

Challenges

min

w

F1

FN

F2

p1 +p2 +pN +

( )

  • bjective:
slide-13
SLIDE 13

no accuracy guarantee for individual devices

3

model performance can vary widely

Challenges

min

w

F1

FN

F2

p1 +p2 +pN +

( )

  • bjective:
slide-14
SLIDE 14

no accuracy guarantee for individual devices

3

model performance can vary widely

Challenges

min

w

F1

FN

F2

p1 +p2 +pN +

( )

  • bjective:

Can we devise an efficient federated optimization method to encourage a more fair (i.e., more uniform) distribution of the model performance across devices?

slide-15
SLIDE 15

Fair Resource Allocation Objective

slide-16
SLIDE 16

Fair Resource Allocation Objective

min

w

F1

FN

F2

p1 +p2 +pN +

( )

slide-17
SLIDE 17

Fair Resource Allocation Objective

q-FFL:

q + 1 q + 1 q + 1 1 q + 1

min

w

F1

FN

F2

p1 +p2 +pN +

( )

slide-18
SLIDE 18

Fair Resource Allocation Objective

q-FFL:

q + 1 q + 1 q + 1 1 q + 1

min

w

F1

FN

F2

p1 +p2 +pN +

( )

Inspired by 𝛽-fairness for fair resource allocation in wireless networks

slide-19
SLIDE 19

A tunable framework ( : previous objective; : minimax fairness)

q = 0 q = ∞

Fair Resource Allocation Objective

q-FFL:

q + 1 q + 1 q + 1 1 q + 1

min

w

F1

FN

F2

p1 +p2 +pN +

( )

Inspired by 𝛽-fairness for fair resource allocation in wireless networks

slide-20
SLIDE 20

A tunable framework ( : previous objective; : minimax fairness)

q = 0 q = ∞

Theory

Fair Resource Allocation Objective

q-FFL:

q + 1 q + 1 q + 1 1 q + 1

min

w

F1

FN

F2

p1 +p2 +pN +

( )

Inspired by 𝛽-fairness for fair resource allocation in wireless networks

slide-21
SLIDE 21

A tunable framework ( : previous objective; : minimax fairness)

q = 0 q = ∞

Theory

Fair Resource Allocation Objective

q-FFL:

q + 1 q + 1 q + 1 1 q + 1

min

w

F1

FN

F2

p1 +p2 +pN +

( )

Inspired by 𝛽-fairness for fair resource allocation in wireless networks

slide-22
SLIDE 22

A tunable framework ( : previous objective; : minimax fairness)

q = 0 q = ∞

Theory

Fair Resource Allocation Objective

q-FFL:

q + 1 q + 1 q + 1 1 q + 1

min

w

F1

FN

F2

p1 +p2 +pN +

( )

Inspired by 𝛽-fairness for fair resource allocation in wireless networks Generalization guarantees Increasing results in more ‘uniform’ accuracy distributions (in terms of various uniformity measures such as variance)

q

slide-23
SLIDE 23

Fair Resource Allocation Objective

min

w

F1

FN

F2

p1 +p2 +pN +

q-FFL:

q + 1 q + 1 q + 1 1 q + 1(

)

slide-24
SLIDE 24

Fair Resource Allocation Objective

min

w

F1

FN

F2

p1 +p2 +pN +

q-FFL:

q + 1 q + 1 q + 1 1 q + 1(

)

test accuracy #

0.2 0.4 0.6 0.8

slide-25
SLIDE 25

Fair Resource Allocation Objective

min

w

F1

FN

F2

p1 +p2 +pN +

q-FFL:

q + 1 q + 1 q + 1 1 q + 1(

)

test accuracy #

0.2 0.4 0.6 0.8 Baseline

slide-26
SLIDE 26

Fair Resource Allocation Objective

min

w

F1

FN

F2

p1 +p2 +pN +

q-FFL:

q + 1 q + 1 q + 1 1 q + 1(

)

test accuracy #

0.2 0.4 0.6 0.8 Baseline q-FFL

slide-27
SLIDE 27

Fair Resource Allocation Objective

min

w

F1

FN

F2

p1 +p2 +pN +

q-FFL:

q + 1 q + 1 q + 1 1 q + 1(

)

test accuracy #

0.2 0.4 0.6 0.8 Baseline q-FFL

slide-28
SLIDE 28

Efficient Solver

slide-29
SLIDE 29

Efficient Solver

Challenges

slide-30
SLIDE 30

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s

slide-31
SLIDE 31

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s

slide-32
SLIDE 32

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s Heterogeneous networks, expensive communication

slide-33
SLIDE 33

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s Heterogeneous networks, expensive communication

slide-34
SLIDE 34

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s Heterogeneous networks, expensive communication

slide-35
SLIDE 35

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s Heterogeneous networks, expensive communication High level ideas

slide-36
SLIDE 36

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s Heterogeneous networks, expensive communication Dynamically estimate the step sizes associated with different ’s

q

High level ideas

slide-37
SLIDE 37

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s Heterogeneous networks, expensive communication Dynamically estimate the step sizes associated with different ’s

q

High level ideas

slide-38
SLIDE 38

Efficient Solver

Challenges

Different fairness/accuracy tradeoffs: different q’s Heterogeneous networks, expensive communication Dynamically estimate the step sizes associated with different ’s

q

Allow for low device participation, local updating High level ideas

slide-39
SLIDE 39

Empirical Results

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespeare

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

slide-40
SLIDE 40

Empirical Results

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespeare

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Benchmark: LEAF (leaf.cmu.edu)

slide-41
SLIDE 41

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespe

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Empirical Results

Benchmark: LEAF (leaf.cmu.edu)

slide-42
SLIDE 42

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespe

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Empirical Results

Benchmark: LEAF (leaf.cmu.edu)

similar average accuracy

slide-43
SLIDE 43

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespe

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Empirical Results

Benchmark: LEAF (leaf.cmu.edu)

slide-44
SLIDE 44

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespe

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Empirical Results

Benchmark: LEAF (leaf.cmu.edu)

decrease variance significantly

slide-45
SLIDE 45

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespe

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Empirical Results

Benchmark: LEAF (leaf.cmu.edu)

slide-46
SLIDE 46

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespe

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Empirical Results

Benchmark: LEAF (leaf.cmu.edu)

increase the accuracy of the worst 10% devices

slide-47
SLIDE 47

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespe

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Empirical Results

Benchmark: LEAF (leaf.cmu.edu)

slightly decrease the accuracy of the best devices

slide-48
SLIDE 48

Dataset

Objective Average Worst 10% Best 10% Variance

Synthetic

q = 0 80.8 18.8 100.0 724 q = 1 79.0 31.1 100.0 472

Vehicle

q = 0 87.3 43.0 95.7 291 q = 5 87.7 69.9 94.0 48

Sent140

q = 0 65.1 15.9 100.0 697 q = 1 66.5 23.0 100.0 509

Shakespe

q = 0 51.1 39.7 72.9 82 q = .001 52.1 42.1 69.0 54

Empirical Results

Benchmark: LEAF (leaf.cmu.edu)

slightly decrease the accuracy of the best devices

  • n average, reduce the variance of accuracy across all

devices by 45% solving the objective orders-of-magnitude more quickly than other baselines

slide-49
SLIDE 49

q-FFL Extended: meta-learning

slide-50
SLIDE 50

q-FFL Extended: meta-learning

Fair meta-learning (q-FFL+MAML): fair initializations across tasks

slide-51
SLIDE 51

q-FFL Extended: meta-learning

Fair meta-learning (q-FFL+MAML): fair initializations across tasks

Dataset

Objective Average Worst 10% Best 10% Variance

Omniglot

q = 0 79.1 61.2 94.0 93 q = .1 79.3 62.5 93.8 86

slide-52
SLIDE 52

q-FFL Extended: meta-learning

Fair meta-learning (q-FFL+MAML): fair initializations across tasks

Dataset

Objective Average Worst 10% Best 10% Variance

Omniglot

q = 0 79.1 61.2 94.0 93 q = .1 79.3 62.5 93.8 86

Many other scenarios

More broadly, an alternative/generalization of minimax optimization

slide-53
SLIDE 53
slide-54
SLIDE 54

code & paper: OpenReview / cs.cmu.edu/~litian

slide-55
SLIDE 55

code & paper: OpenReview / cs.cmu.edu/~litian

Thanks!