Differentially Private k-Means
with Constant Multiplicative Error
Uri Stemmer Ben-Gurion University
joint work with Haim Kaplan
with Constant Multiplicative Error Uri Stemmer Ben-Gurion - - PowerPoint PPT Presentation
Differentially Private k-Means with Constant Multiplicative Error Uri Stemmer Ben-Gurion University joint work with Haim Kaplan What is -Means Clustering? Given: Data points = , , and parameter
Uri Stemmer Ben-Gurion University
joint work with Haim Kaplan
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
โ Probably the most well-studied clustering problem โ Tons of applications โ Super popular
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
What is Differentially Private ๐-Means?
[Dwork, McSherry, Nissim, Smith 06] (informal)
๏ผ Every data point ๐๐ represents the (private) information of one individual ๏ผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
What is Differentially Private ๐-Means?
[Dwork, McSherry, Nissim, Smith 06] (informal)
๏ผ Every data point ๐๐ represents the (private) information of one individual ๏ผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual ๏ผ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point (an algorithm satisfying this requirement is differentially private)
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
What is Differentially Private ๐-Means?
[Dwork, McSherry, Nissim, Smith 06] (informal)
๏ผ Every data point ๐๐ represents the (private) information of one individual ๏ผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual ๏ผ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point (an algorithm satisfying this requirement is differentially private)
Why is that a good privacy definition?
Even if an observer knows all other data point but mine, and now she sees the outcome
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
Requirement: the output distribution is insensitive to any arbitrarily change of a single input point
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
Requirement: the output distribution is insensitive to any arbitrarily change of a single input point
Observe: With privacy we must have additive error
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
Requirement: the output distribution is insensitive to any arbitrarily change of a single input point
Observe: With privacy we must have additive error
๐ณ
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
Requirement: the output distribution is insensitive to any arbitrarily change of a single input point
Observe: With privacy we must have additive error
๐ณ
Given: Data points ๐ป = ๐๐, โฆ , ๐๐ โ โ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐๐, โฆ , ๐๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จโ ๐๐ โ ๐โ ๐
๐
Requirement: the output distribution is insensitive to any arbitrarily change of a single input point
Observe: With privacy we must have additive error
๐ณ
โน We assume that input points come from the unit ball
Ref Model Runtime Bounds GLMRTโ10
differential privacy
๐๐ ๐ ๐ โ ๐๐๐ + ๐ท ๐๐ โ ๐ NCBNโ16
differential privacy
๐ช๐ฉ๐ฆ๐ณ ๐ ๐ฆ๐ฉ๐ก ๐ โ ๐๐๐ + ๐ท ๐ FXZRโ17
differential privacy
๐ช๐ฉ๐ฆ๐ณ ๐ท ๐ ๐ฆ๐ฉ๐ก ๐ โ ๐๐๐ + ๐ท ๐๐/๐ โ ๐ BDLMZโ17
differential privacy
๐ช๐ฉ๐ฆ๐ณ ๐ท ๐ฆ๐ฉ๐ก๐๐ โ ๐๐๐ + ๐ท ๐๐ + ๐ NSโ18
differential privacy
๐ช๐ฉ๐ฆ๐ณ ๐ท ๐ โ ๐๐๐ + ๐ท ๐๐.๐๐ โ ๐๐.๐๐ New
differential privacy
๐ช๐ฉ๐ฆ๐ณ ๐ท ๐ โ ๐๐๐ + ๐ท ๐๐.๐๐ โ ๐๐.๐๐ + ๐๐/๐