FedSel: Federated SGD under Local Differential Privacy with Top-k - - PowerPoint PPT Presentation

fedsel federated sgd under local differential privacy
SMART_READER_LITE
LIVE PREVIEW

FedSel: Federated SGD under Local Differential Privacy with Top-k - - PowerPoint PPT Presentation

FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection Ruixuan Liu 1 , Yang Cao 2 , Masatoshi Yoshikawa 2 , Hong Chen 1 1 Renmin University of China, 2 Kyoto University DASFAA, 2020 Federated Learning Overview


slide-1
SLIDE 1

FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection

Ruixuan Liu1, Yang Cao2, Masatoshi Yoshikawa2, Hong Chen1

1Renmin University of China, 2Kyoto University

DASFAA, 2020

slide-2
SLIDE 2

Federated Learning Overview

Sensitive information: age, job, location, etc.

slide-3
SLIDE 3

Federated Learning Overview

Sensitive information: age, job, location, etc.

slide-4
SLIDE 4

Federated Learning Overview

Sensitive information: age, job, location, etc.

slide-5
SLIDE 5

Federated Learning Overview

  • Sensitive information:

age, job, location, etc.

slide-6
SLIDE 6

Federated Learning Overview

  • Sensitive information:

age, job, location, etc.

slide-7
SLIDE 7

Federated Learning Overview

  • Sensitive information:

age, job, location, etc.

slide-8
SLIDE 8

Federated Learning Privacy Vulnerabilities

  • Sensitive information:

age, job, location, etc.

slide-9
SLIDE 9

Federated Learning Privacy Vulnerabilities

  • Sensitive information:

age, job, location, etc.

slide-10
SLIDE 10

Federated Learning Privacy Vulnerabilities

  • Sensitive information:

age, job, location, etc.

slide-11
SLIDE 11

Federated Learning Privacy Vulnerabilities

Possible privacy attacks…

  • Membership Inference

“Whether data of a target victim has been used to train a model?”

  • Reconstruction attack

Given a gender classifier, “What a male looks like?”

  • Unintended inference attack

Given a gender classifier, “What is the race of people in Bob’s photos?”

slide-12
SLIDE 12

Differential Privacy for Federated Learning

  • Sensitive information:

age, job, location, etc.

slide-13
SLIDE 13

Differential Privacy for Federated Learning

+noise

The server adds noises to aggregated updates. Sensitive information: age, job, location, etc.

slide-14
SLIDE 14

Differential Privacy for Federated Learning

+noise

Requires a trusted server 

Sensitive information: age, job, location, etc.

slide-15
SLIDE 15

Local Differential Privacy for Federated Learning

  • +noise

+noise +noise +noise

No worry about untrusted server 

Sensitive information: age, job, location, etc.

slide-16
SLIDE 16

Local Differential Privacy for Federated Learning

  • +noise

+noise +noise +noise LDP is a natural privacy definition for FL Sensitive information: age, job, location, etc.

slide-17
SLIDE 17

Local Differential Privacy for Federated Learning …

  • input
  • utput
slide-18
SLIDE 18

For a -dimensional vector, the metric is:

  • Given a local privacy budget for the vector,
  • The error in the estimated mean of each dimension

If split local privacy budget to d dimensions[1]:

  • The error is super-linear to , and can be excessive when

is large

Challenges of LDP in Federated Learning

[1] Wang N, Xiao X, Yang Y, et al. Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019: 638-649.

slide-19
SLIDE 19

For a -dimensional vector, the metric is:

  • Given a local privacy budget for the vector,
  • The error in the estimated mean of each dimension

If split local privacy budget to d dimensions[1]:

  • The error is super-linear to , and can be excessive when

is large An asymptotically optimal conclusion[1]:

  • 1. Random sample

dimensions

  • Increase the privacy budget for each dimension
  • Reduce the noise variance incurred
  • 2. Perturb each sampled dimension with
  • 3. Aggregate and scale up by the factor of
  • Challenges of LDP in Federated Learning

[1] Wang N, Xiao X, Yang Y, et al. Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019: 638-649.

slide-20
SLIDE 20

Challenges of LDP in Federated Learning

Typical orders-of-magnitude d: 100-1,000,000s dimensions m: 100-1000s users per round : smaller privacy budget = stronger privacy The dimension curse!

slide-21
SLIDE 21

Our Intuition

Common bottleneck of the dimension curse

  • Distributed learning

Data are partitioned and distributed for accelerating the training process Gradient vectors are transmitted among separate workers Communication costs = bits of representing one real value

  • Gradient sparsification

Reduce communication costs by only transmitting important dimensions

  • Intuition

Dimensions with larger absolute magnitudes are more important => Efficient dimension reduction for LDP

slide-22
SLIDE 22

Our Intuition

Common focus on selecting Top dimensions

Communication resources Utility / Learning performance Privacy budget Utility / Learning performance

slide-23
SLIDE 23

Our Intuition

Communication resources Utility / Learning performance Privacy budget Utility / Learning performance

Common focus on selecting Top dimensions

slide-24
SLIDE 24

Two-stage Framework- FedSel

  • Top-k dimension selection is data-dependent

Local vector = Top-k information + value information

  • Two-stage framework

Private selection + Value Perturbation

  • Sequential Composition
  • The Top-k selection is 𝜗-LDP
  • The value perturbation is 𝜗-LDP
  • => The mechanism is 𝜗-LDP, 𝜗 = 𝜗 + 𝜗

 Pull

Local data

 Calculate Gradients with local data  Push noisy vector 𝑡 ∗  Update global parameters

parameters  Average gradient Server User 𝑣  Select Top-K dimensions privately  Perturb the selected value  Update the local accumulated vector

𝑠
  • Dimension Selection
𝑡 𝑡 ValuePerturbation

1 × 𝑒

slide-25
SLIDE 25

Two-stage Framework- FedSel

  • Top-k dimension selection is data-dependent

Local vector = Top-k information + value information

  • Two-stage framework

Private selection + Value Perturbation

  • Sequential Composition
  • The Top-k selection is 𝜗-LDP
  • The value perturbation is 𝜗-LDP
  • => The mechanism is 𝜗-LDP, 𝜗 = 𝜗 + 𝜗

Next goal

 Pull

Local data

 Calculate Gradients with local data  Push noisy vector 𝑡 ∗  Update global parameters

parameters  Average gradient Server User 𝑣  Select Top-K dimensions privately  Perturb the selected value  Update the local accumulated vector

𝑠
  • Dimension Selection
𝑡 𝑡 ValuePerturbation

1 × 𝑒

slide-26
SLIDE 26

Methods-Exponential Mechanism (EXP)

1. Sorting and the ranking is denoted with { , …, }

  • 2.

Sample unevenly with the probability

1 3 6 2 4 5

value

slide-27
SLIDE 27

1 2 3 4 5 6

Methods-Exponential Mechanism (EXP)

1. Sorting and the ranking is denoted with { , …, }

  • 2.

Sample unevenly with the probability

1 3 6 2 4 5

value probability

slide-28
SLIDE 28

Methods-Perturbed Encoding Mechanism (PE)

1. Sorting and the ranking is denoted the Top-k status with { , …, }

  • 2.

For each dimension, to retain status with a larger probability to flip has a smaller probability 3. Sample from dimension set

1 3 6 2 4 5

value

slide-29
SLIDE 29

Methods-Perturbed Encoding Mechanism (PE)

1. Sorting and the ranking is denoted the Top-k status with { , …, }

  • 2.

For each dimension, to retain status with a larger probability to flip has a smaller probability 3. Sample from dimension set

1 3 6 2 4 5

value

slide-30
SLIDE 30

Methods-Perturbed Encoding Mechanism (PE)

1. Sorting and the ranking is denoted the Top-k status with { , …, }

  • 2.

For each dimension, to retain status with a larger probability to flip has a smaller probability 3. Sample from dimension set

1 3 6 2 4 5

value

slide-31
SLIDE 31

Methods-Perturbed Sampling Mechanism (PS)

1. Sorting and the ranking is denoted the Top-k status with { , …, }

  • 2.

Sample a dimension from: Top-k dimension set, with a larger probability Non-top dimension set, with a smaller probability

1 3 6 2 4 5

value

slide-32
SLIDE 32

Empirical results

  • Even a small budget in dimension selection helps to increase the learning accuracy
  • Private Top-k selection helps to improve the learning utility independent of the

mechanism for perturbing one dimension.

slide-33
SLIDE 33

Empirical results

What we gain is much larger than what we lose from private and efficient Top-k selection

slide-34
SLIDE 34

Summary

Conclusion

  • We propose a two-stage framework for locally differential private federated SGD
  • We propose 3 private selection mechanisms for efficient dimension reduction under LDP

Takeaway

  • Private mechanism can be specialized for sparse vector
  • Private Top-k dimension selection can improve learning utility under a given privacy level

Future work

  • Optimal hyper-parameter tuning
slide-35
SLIDE 35
  • Privacy +
  • Utility +

Thanks