privacy-preserving decentralized learning of personalized models and - PowerPoint PPT Presentation

privacy-preserving decentralized learning of personalized models and collaboration graphs Aurélien Bellet (Inria) Includes work with: M. Tommasi, P. Vanhaesebrouck (University of Lille & Inria) R. Guerraoui, M. Taziki (EPFL) V. Zantedeschi (University of Saint-Etienne) Workshop on Optimization for Machine Learning Centre International de Rencontres Mathématiques, Marseille March 10, 2020

connected devices: pervasive or invasive? • Connected devices are spreading rapidly and collect increasingly personal data • Ex: browsing logs, health, speech, accelerometer, geolocation... • Opportunity to provide personalized services but also a potential threat to privacy • A first step to try and reconcile the two: keep and process data on the user device • Training on the edge: train ML model on data from many devices 2

training on the edge: challenges • How to deal with imbalanced and non-i.i.d. local datasets • How to provide formal privacy guarantees • ... 3 • How to scale to a large number of devices

federated vs fully decentralized training Standard federated learning • Coordination by a central server • Single point of failure, server may become a bottleneck Fully decentralized learning • Device-to-device communication in a sparse network graph • Naturally scales to many devices 4 See [Kairouz et al., 2019] for a detailed overview of federated/decentralized ML

global model vs personalized models Global model predictions for all devices all users • Large model may be needed to capture the specificities of each user Personalized models • One model per device bla blablbalbalablab that user and from similar users • Smaller models may be sufficient 5 • One-size-fits-all: same model makes • Model should be trained on data from • Model should be trained on data from

our approach We propose to learn personalized models in a fully decentralized setting: • Learn “who to communicate with” by inferring a graph of similarities between users • Collaboratively learn personalized models over this graph • Jointly optimize the models and the graph, in an alternating fashion 6

problem formulation

users and local datasets m i m i 8 • A set of n users (devices) with common feature space X and label space Y • User i has local dataset S i = { ( x j i , y j i ) } m i j = 1 drawn from personal distribution and wants to learn a model θ i ∈ R p which generalizes well to future local data • Let ℓ : R p × X × Y → R be a loss function, differentiable in first argument • In isolation, user i can learn a model by minimizing a local objective L i ( θ ; S i ) , e.g., ∑ L i ( θ ; S i ) = 1 ℓ ( θ ; x j i , y j i ) + λ i ∥ θ ∥ 2 , with λ i ≥ 0 j = 1 • This will generalize poorly when local data is scarce → need to collaborate

decentralized setting • Asynchronous time model : each user becomes active at random times, asynchronously and in parallel (we use global counter t to denote the t -th activation) • Communication model : all users can exchange messages, but we want to restrict communication to pairs of most similar users • We model this by a collaboration graph: a sparse weighted graph with edge weight 9 w ij ≥ 0 reflecting similarity between the learning tasks of users i and j

joint optimization problem n local models and a shared model per connected component 2 10 as solutions to min • Learn personalized models Θ ∈ R n × p and graph weights w ∈ R n ( n − 1 ) / 2 ≥ 0 ∑ ∑ d i c i L i ( θ i ; S i ) + µ w ij ∥ θ i − θ j ∥ 2 + λ g ( w ) , J (Θ , w ) = Θ ∈ R n × p i = 1 i < j w ∈ R n ( n − 1 ) / 2 ≥ 0 • c i ∈ ( 0 , 1 ] ∝ m i : “confidence” of user i , d i = ∑ j ̸ = i w ij : degree of i • Trade-off between accurate models on local data and smooth models over the graph • Term g ( w ) : avoid trivial collaboration graph, encourage sparsity • Flexible relationships: hyperparameter µ ≥ 0 interpolates between learning purely

outline of the proposed algorithm 1. A decentralized algorithm to learn the models given the graph 2. A decentralized algorithm to learn a graph given the models 11 We design an alternating optimization procedure over Θ and w :

learning models given the graph

properties of objective function i i i 13 • For fixed graph weights w , denote f (Θ) := J (Θ , w ) • Assume local loss L i has L loc i -Lipschitz continuous gradient • Then ∇ f is L i -Lipschitz w.r.t. block θ i with L i = d i ( µ + c i L loc i ) • Can also assume that L i is σ loc -strongly convex where σ loc > 0 • Then f is σ -strongly convex with σ ≥ min 1 ≤ i ≤ n [ d i c i σ loc ] > 0

decentralized algorithm i • This is an instance of block coordinate descent! d i w ij 14 1 • Denote neighborhood of user i by N i = { j : w ij > 0 } • Initialize models Θ( 0 ) ∈ R n × p • At step t ≥ 0, a random user i becomes active: 1. user i updates its model based on its local dataset S i and the information from neighbors: ( ) ∑ θ i ( t + 1 ) = θ i ( t ) − c i ∇L i ( θ i ( t ); S i ) − µ θ j ( t ) µ + c i L loc j ∈ N i 2. user i sends its updated model θ i ( t + 1 ) to its neighborhood N i

convergence rate Proposition ( [Bellet et al., 2018] ) • Makes the algorithm naturally scalable to many users nL max 15 convex, we have: For any T > 0 , let (Θ( t )) T t = 1 be the sequence of iterates generated by the algorithm run- ning for T iterations from an initial point Θ( 0 ) . When the local losses L i are strongly ( ) T σ E [ f (Θ( T )) − f ⋆ ] ≤ ( f (Θ( 0 )) − f ∗ ) . 1 − where L max = max i L i and σ are smoothness and strong convexity parameters. • Constant number of per-user updates → optimality gap roughly constant in n

what about privacy? • In some applications, data may be sensitive and users may not want to reveal it sequences of models computed from data • Consider an adversary observing all the information sent over the network (but not the internal memory of users) • Goal: formally quantify how much information is leaked about the local dataset 16 • In our algorithms, users never communicate their local data but they exchange

differential privacy • Information-theoretic (no computational assumptions) 17 ϵ -Differential Privacy [Dwork, 2006] Let M be a randomized mechanism taking a dataset as input, and let ϵ > 0. We say that M is ϵ -differentially private if for all datasets S , S ′ differing in a single data point and for all sets of possible outputs O ⊆ range ( M ) , we have: Pr ( M ( S ) ∈ O ) ≤ e ϵ Pr ( M ( S ′ ) ∈ O ) . • Output of M almost the same regardless of whether a particular data point was used • Robust to background knowledge that adversary may have • Composition property: the combined output of two ϵ -DP mechanisms run on the same dataset is 2 ϵ -DP

differentially private algorithm i d i w ij 1. Replace the update of the algorithm by c i 18 1 ( ) ∑ ( ) θ i ( t + 1 ) = � � ∇L i ( � � θ i ( t ) − θ i ( t ); S i ) + η i − µ θ j ( t ) µ + c i L loc j ∈ N i where η i ∼ Laplace ( 0 , s i ) p ∈ R p 2. User i then broadcasts noisy iterate � θ i ( t + 1 ) to its neighbors

privacy guarantee Theorem ( [Bellet et al., 2018] ) L 0 • Follows from sensitivity analysis of the update • Can be improved by strong composition [Kairouz et al., 2015] (under relaxed DP) 19 Let i ∈ � n � and assume • ℓ ( · ; x , y ) L 0 -Lipschitz w.r.t. the L 1 -norm for all ( x , y ) ∈ X × Y • User i wakes up T i times and use noise scale s i = ϵ i m i • Mechanism M i ( S i ) : releases the sequence of user i’s models For any � Θ( 0 ) independent of S i , M i ( S i ) is ¯ ϵ i -DP with ¯ ϵ i = T i ϵ i .

privacy/utility trade-off nL max • See paper for details on warm start strategy and how to scale noise across iterations • A good (differentially private) warm start can help a lot • Users with less data add more noise but their contribution to the error is smaller nL max n Theorem ( [Bellet et al., 2018] ) 1 nL min 20 For any T > 0 , let ( � Θ( t )) T t = 1 be the sequence of iterates generated by T iterations. For σ -strongly convex f, we have: [ Θ( T )) − f ⋆ ] ( ) T ( Θ( 0 )) − f ⋆ ) ( ) t [ T − 1 ∑ ∑ ] 2 , σ σ f ( � f ( � E ≤ 1 − + 1 − d i c i s i ( t ) t = 0 i = 1 where L min = min 1 ≤ i ≤ n L i . • T rules a trade-off between optimization error and noise error

extension: personalized l1-adaboost n • More details in [Zantedeschi et al., 2020] 2 exp d i c i log 21 min • Consider a set of base models H = { h k : X → R } K k = 1 (e.g., pre-trained on proxy data) • Find personalized ensembles α 1 , . . . , α n ∈ R K as solutions to: ( m i )) ∑ ∑ ∑ ( + µ w ij ∥ θ i − θ j ∥ 2 + λ g ( w ) − ( A i θ i ) j ∥ θ 1 ∥ 1 ≤ β,..., ∥ θ K ∥ 1 ≤ β i = 1 j = 1 i < j w ∈ R n ( n − 1 ) / 2 ≥ 0 • A i ∈ R m i × K : margins of base models on each data point of user i • Use block coordinate Frank Wolfe → communication cost logarithmic in K

learning the graph given models

privacy-preserving decentralized learning of personalized models and - PowerPoint PPT Presentation

privacy-preserving decentralized learning of personalized models and collaboration graphs Aurlien Bellet (Inria) Includes work with: M. Tommasi, P. Vanhaesebrouck (University of Lille & Inria) R. Guerraoui, M. Taziki (EPFL) V. Zantedeschi

Realizing the Dreams of Personalized Medicine Realizing the Dreams of Personalized Medicine

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

Privacy-preserving KYC on Ethereum Introduction A decentralized KYC-compliant identity Alex

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Personalized Learning October 2018 Pattonville Personalized Learning Vision Students own their

Privacy Preserving Privacy Preserving Netw ork Flow Netw ork Flow Recording Recording Bilal

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Building Blocks for Privacy- Preserving Decentralized Online Social Networks iSocial Summer

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of

Privacy preserving data mining randomized response and association rule hiding Li Xiong

Towards Privacy-Preserving Ontology Publishing F. Baader & A. Nuradiansyah Technische

Linguagem algor tmica: Portugol Jos e Romildo Malaquias Departamento de Computa c

Radiosity CS5502 Fall 2006 (c) Chun-Fa Chang What is Radiosity Borrowed from radiative

MEDICINES USE AND SAFETY NETWORK EVENT: MARCH 2019 WHO Medication Without Harm Challenge:

An introduction to the theory of coherent lower previsions 2nd SIPTA School on Imprecise

In Introduction to MPI Shaohao Chen Research Computing Services Information Services and

Jhelum Basin, NW Himalaya Presenter Gowhar Meraj Jammu and Kashmir Environmental Information

Participatory modelling for water planning and risk management at the urban fringe Dr Katherine

Light from the West Figure: The upper limes germanicus , s. ii CE (CC-BY-SA: source) Figure: The

privacy-preserving decentralized learning of personalized models and - PowerPoint PPT Presentation

privacy-preserving decentralized learning of personalized models and collaboration graphs Aurlien Bellet (Inria) Includes work with: M. Tommasi, P. Vanhaesebrouck (University of Lille & Inria) R. Guerraoui, M. Taziki (EPFL) V. Zantedeschi

Realizing the Dreams of Personalized Medicine Realizing the Dreams of Personalized Medicine

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

Privacy-preserving KYC on Ethereum Introduction A decentralized KYC-compliant identity Alex

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Personalized Learning October 2018 Pattonville Personalized Learning Vision Students own their

Privacy Preserving Privacy Preserving Netw ork Flow Netw ork Flow Recording Recording Bilal

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Building Blocks for Privacy- Preserving Decentralized Online Social Networks iSocial Summer

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of

Privacy preserving data mining randomized response and association rule hiding Li Xiong

Towards Privacy-Preserving Ontology Publishing F. Baader &amp; A. Nuradiansyah Technische

Linguagem algor tmica: Portugol Jos e Romildo Malaquias Departamento de Computa c

Radiosity CS5502 Fall 2006 (c) Chun-Fa Chang What is Radiosity Borrowed from radiative

MEDICINES USE AND SAFETY NETWORK EVENT: MARCH 2019 WHO Medication Without Harm Challenge:

An introduction to the theory of coherent lower previsions 2nd SIPTA School on Imprecise

In Introduction to MPI Shaohao Chen Research Computing Services Information Services and

Jhelum Basin, NW Himalaya Presenter Gowhar Meraj Jammu and Kashmir Environmental Information

Participatory modelling for water planning and risk management at the urban fringe Dr Katherine

Light from the West Figure: The upper limes germanicus , s. ii CE (CC-BY-SA: source) Figure: The

Towards Privacy-Preserving Ontology Publishing F. Baader & A. Nuradiansyah Technische