with Constant Multiplicative Error Uri Stemmer Ben-Gurion - - PowerPoint PPT Presentation

โ–ถ
with constant multiplicative error
SMART_READER_LITE
LIVE PREVIEW

with Constant Multiplicative Error Uri Stemmer Ben-Gurion - - PowerPoint PPT Presentation

Differentially Private k-Means with Constant Multiplicative Error Uri Stemmer Ben-Gurion University joint work with Haim Kaplan What is -Means Clustering? Given: Data points = , , and parameter


slide-1
SLIDE 1

Differentially Private k-Means

with Constant Multiplicative Error

Uri Stemmer Ben-Gurion University

joint work with Haim Kaplan

slide-2
SLIDE 2

What is ๐’-Means Clustering?

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

slide-3
SLIDE 3

What is ๐’-Means Clustering?

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

โœ“ Probably the most well-studied clustering problem โœ“ Tons of applications โœ“ Super popular

slide-4
SLIDE 4

What is ๐’-Means Clustering?

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

What is Differentially Private ๐’-Means?

[Dwork, McSherry, Nissim, Smith 06] (informal)

๏ƒผ Every data point ๐’š๐’‹ represents the (private) information of one individual ๏ƒผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual

slide-5
SLIDE 5

What is ๐’-Means Clustering?

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

What is Differentially Private ๐’-Means?

[Dwork, McSherry, Nissim, Smith 06] (informal)

๏ƒผ Every data point ๐’š๐’‹ represents the (private) information of one individual ๏ƒผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual ๏ƒผ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point (an algorithm satisfying this requirement is differentially private)

slide-6
SLIDE 6

What is ๐’-Means Clustering?

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

What is Differentially Private ๐’-Means?

[Dwork, McSherry, Nissim, Smith 06] (informal)

๏ƒผ Every data point ๐’š๐’‹ represents the (private) information of one individual ๏ƒผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual ๏ƒผ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point (an algorithm satisfying this requirement is differentially private)

Why is that a good privacy definition?

Even if an observer knows all other data point but mine, and now she sees the outcome

  • f the computation, then she still cannot learn โ€œanythingโ€ on my data point
slide-7
SLIDE 7

Differentially Private ๐’-Means Clustering

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

Requirement: the output distribution is insensitive to any arbitrarily change of a single input point

slide-8
SLIDE 8

Differentially Private ๐’-Means Clustering

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

Requirement: the output distribution is insensitive to any arbitrarily change of a single input point

Observe: With privacy we must have additive error

  • Assume ๐’ = ๐’ = ๐Ÿ’
  • OPTโ€™s cost = 0
slide-9
SLIDE 9

Differentially Private ๐’-Means Clustering

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

Requirement: the output distribution is insensitive to any arbitrarily change of a single input point

Observe: With privacy we must have additive error

๐šณ

  • Assume ๐’ = ๐’ = ๐Ÿ’
  • OPTโ€™s cost = 0
  • Move one point
  • OPTโ€™s cost = 0
slide-10
SLIDE 10

Differentially Private ๐’-Means Clustering

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

Requirement: the output distribution is insensitive to any arbitrarily change of a single input point

Observe: With privacy we must have additive error

๐šณ

  • Assume ๐’ = ๐’ = ๐Ÿ’
  • OPTโ€™s cost = 0
  • Move one point
  • OPTโ€™s cost = 0
  • Each solution must remain approx. equally likely
  • On at least one of these inputs our cost is โ‰ˆ ๐šณ๐Ÿ‘
slide-11
SLIDE 11

Differentially Private ๐’-Means Clustering

Given: Data points ๐‘ป = ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ โˆˆ โ„๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จโ„“ ๐’š๐’‹ โˆ’ ๐’—โ„“ ๐Ÿ‘

๐’‹

Requirement: the output distribution is insensitive to any arbitrarily change of a single input point

Observe: With privacy we must have additive error

๐šณ

  • Assume ๐’ = ๐’ = ๐Ÿ’
  • OPTโ€™s cost = 0
  • Move one point
  • OPTโ€™s cost = 0
  • Each solution must remain approx. equally likely
  • On at least one of these inputs our cost is โ‰ˆ ๐šณ๐Ÿ‘

โŸน We assume that input points come from the unit ball

slide-12
SLIDE 12

Ref Model Runtime Bounds GLMRTโ€™10

differential privacy

๐’๐’† ๐ ๐Ÿ โ‹…๐๐๐” + ๐‘ท ๐’๐Ÿ‘ โ‹… ๐’† NCBNโ€™16

differential privacy

๐ช๐ฉ๐ฆ๐ณ ๐ ๐ฆ๐ฉ๐ก ๐’ โ‹…๐๐๐” + ๐‘ท ๐’ FXZRโ€™17

differential privacy

๐ช๐ฉ๐ฆ๐ณ ๐‘ท ๐’ ๐ฆ๐ฉ๐ก ๐’ โ‹…๐๐๐” + ๐‘ท ๐’๐Ÿ’/๐Ÿ‘ โ‹… ๐’† BDLMZโ€™17

differential privacy

๐ช๐ฉ๐ฆ๐ณ ๐‘ท ๐ฆ๐ฉ๐ก๐Ÿ’๐’ โ‹…๐๐๐” + ๐‘ท ๐’๐Ÿ‘ + ๐’† NSโ€™18

differential privacy

๐ช๐ฉ๐ฆ๐ณ ๐‘ท ๐’ โ‹…๐๐๐” + ๐‘ท ๐’๐Ÿ.๐Ÿ”๐Ÿ โ‹… ๐’†๐Ÿ.๐Ÿ”๐Ÿ New

differential privacy

๐ช๐ฉ๐ฆ๐ณ ๐‘ท ๐Ÿ โ‹…๐๐๐” + ๐‘ท ๐’๐Ÿ.๐Ÿ๐Ÿ โ‹… ๐’†๐Ÿ.๐Ÿ”๐Ÿ + ๐’๐Ÿ’/๐Ÿ‘

Previous and New Bounds