fedsel federated sgd under local differential privacy
play

FedSel: Federated SGD under Local Differential Privacy with Top-k - PowerPoint PPT Presentation

FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection Ruixuan Liu 1 , Yang Cao 2 , Masatoshi Yoshikawa 2 , Hong Chen 1 1 Renmin University of China, 2 Kyoto University DASFAA, 2020 Federated Learning Overview


  1. FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection Ruixuan Liu 1 , Yang Cao 2 , Masatoshi Yoshikawa 2 , Hong Chen 1 1 Renmin University of China, 2 Kyoto University DASFAA, 2020

  2. Federated Learning Overview Sensitive information: age, job, location, etc.

  3. Federated Learning Overview Sensitive information: age, job, location, etc.

  4. Federated Learning Overview Sensitive information: age, job, location, etc.

  5. Federated Learning Overview Sensitive information: age, job, location, etc. �

  6. Federated Learning Overview Sensitive information: age, job, location, etc. �

  7. Federated Learning Overview Sensitive information: age, job, location, etc. �

  8. Federated Learning Privacy Vulnerabilities Sensitive information: age, job, location, etc. �

  9. Federated Learning Privacy Vulnerabilities Sensitive information: age, job, location, etc. �

  10. Federated Learning Privacy Vulnerabilities Sensitive information: age, job, location, etc. �

  11. Federated Learning Privacy Vulnerabilities Possible privacy attacks…  Membership Inference “Whether data of a target victim has been used to train a model?”  Reconstruction attack Given a gender classifier, “What a male looks like?”  Unintended inference attack Given a gender classifier, “What is the race of people in Bob’s photos?”

  12. Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. �

  13. Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. � +noise The server adds noises to aggregated updates.

  14. Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. � +noise Requires a trusted server 

  15. Local Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. � +noise +noise +noise +noise No worry about untrusted server 

  16. Local Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. � +noise +noise +noise +noise LDP is a natural privacy definition for FL

  17. Local Differential Privacy for Federated Learning � output … input

  18. Challenges of LDP in Federated Learning [1] Wang N, Xiao X, Yang Y, et al. Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019: 638-649. For a -dimensional vector, the metric is : • Given a local privacy budget for the vector, • The error in the estimated mean of each dimension If split local privacy budget to d dimensions[1]: • The error is super-linear to , and can be excessive when is large

  19. Challenges of LDP in Federated Learning [1] Wang N, Xiao X, Yang Y, et al. Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019: 638-649. For a -dimensional vector, the metric is : • Given a local privacy budget for the vector, • The error in the estimated mean of each dimension If split local privacy budget to d dimensions[1]: • The error is super-linear to , and can be excessive when is large An asymptotically optimal conclusion[1]: 1. Random sample dimensions • Increase the privacy budget for each dimension • Reduce the noise variance incurred 2. Perturb each sampled dimension with � 3. Aggregate and scale up by the factor of �

  20. Challenges of LDP in Federated Learning Typical orders-of-magnitude d: 100-1,000,000s dimensions m: 100-1000s users per round : smaller privacy budget = stronger privacy The dimension curse!

  21. Our Intuition Common bottleneck of the dimension curse  Distributed learning Data are partitioned and distributed for accelerating the training process Gradient vectors are transmitted among separate workers Communication costs = bits of representing one real value  Gradient sparsification Reduce communication costs by only transmitting important dimensions  Intuition Dimensions with larger absolute magnitudes are more important => Efficient dimension reduction for LDP

  22. Our Intuition Common focus on selecting Top dimensions Utility / Learning performance Utility / Learning performance Privacy budget Communication resources

  23. Our Intuition Common focus on selecting Top dimensions Utility / Learning performance Utility / Learning performance Privacy budget Communication resources

  24. Two-stage Framework- FedSel  Top-k dimension selection is data-dependent Server Local vector = Top-k information + value  Average gradient  Update global parameters parameters information  Two-stage framework …  Pull  Update the local  Push noisy vector 𝑡 ∗ accumulated vector Private selection + Value Perturbation Local data ∗ 0 0 0 0 𝑡 � 0 0 0  Sequential Composition 𝑠 𝑡 � 0 0 0 0 0 0 0 � 1 × 𝑒 • The Top-k selection is 𝜗 � -LDP Dimension Selection ValuePerturbation • The value perturbation is 𝜗 � -LDP  Perturb  Calculate Gradients  Select Top-K the selected value with local data dimensions privately • => The mechanism is 𝜗 -LDP, 𝜗 = 𝜗 � + 𝜗 � User 𝑣 �

  25. Two-stage Framework- FedSel  Top-k dimension selection is data-dependent Server Local vector = Top-k information + value  Average gradient  Update global parameters parameters information  Two-stage framework …  Pull  Update the local  Push noisy vector 𝑡 ∗ accumulated vector Private selection + Value Perturbation Local data ∗ 0 0 0 0 𝑡 � 0 0 0  Sequential Composition 𝑠 𝑡 � 0 0 0 0 0 0 0 � 1 × 𝑒 • The Top-k selection is 𝜗 � -LDP Dimension Selection ValuePerturbation • The value perturbation is 𝜗 � -LDP  Perturb  Calculate Gradients  Select Top-K the selected value with local data dimensions privately • => The mechanism is 𝜗 -LDP, 𝜗 = 𝜗 � + 𝜗 � Next goal User 𝑣 �

  26. Methods-Exponential Mechanism (EXP) 1. Sorting and the ranking is denoted with { � , …, � } � 2. Sample unevenly with the probability value 1 3 6 2 4 5

  27. Methods-Exponential Mechanism (EXP) 1. Sorting and the ranking is denoted with { � , …, � } � 2. Sample unevenly with the probability probability value 1 3 6 1 2 3 4 5 6 2 4 5

  28. Methods-Perturbed Encoding Mechanism (PE) 1. Sorting and the ranking is denoted the Top-k status with { � , …, � } � 2. For each dimension, to retain status � with a larger probability to flip � has a smaller probability 3. Sample from dimension set ∗ � value 1 3 6 2 4 5

  29. Methods-Perturbed Encoding Mechanism (PE) 1. Sorting and the ranking is denoted the Top-k status with { � , …, � } � 2. For each dimension, to retain status � with a larger probability to flip � has a smaller probability 3. Sample from dimension set ∗ � value 1 3 6 2 4 5

  30. Methods-Perturbed Encoding Mechanism (PE) 1. Sorting and the ranking is denoted the Top-k status with { � , …, � } � 2. For each dimension, to retain status � with a larger probability to flip � has a smaller probability 3. Sample from dimension set ∗ � value 1 3 6 2 4 5

  31. Methods-Perturbed Sampling Mechanism (PS) 1. Sorting and the ranking is denoted the Top-k status with { � , …, � } � 2. Sample a dimension from: Top-k dimension set, with a larger probability Non-top dimension set, with a smaller probability value 1 3 6 2 4 5

  32. Empirical results Even a small budget in dimension selection helps to increase the learning accuracy • Private Top-k selection helps to improve the learning utility independent of the • mechanism for perturbing one dimension.

  33. Empirical results What we gain is much larger than what we lose from private and efficient Top-k selection

  34. Summary Conclusion We propose a two-stage framework for locally differential private federated SGD • We propose 3 private selection mechanisms for efficient dimension reduction under LDP • Takeaway • Private mechanism can be specialized for sparse vector Private Top-k dimension selection can improve learning utility under a given privacy level • Future work Optimal hyper-parameter tuning •

  35. Thanks - Utility + - Privacy +

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend