Locally private learning without interaction requires separation - - PowerPoint PPT Presentation

β–Ά
locally private learning without interaction requires
SMART_READER_LITE
LIVE PREVIEW

Locally private learning without interaction requires separation - - PowerPoint PPT Presentation

Locally private learning without interaction requires separation Vitaly Feldman Research with Amit Daniely Hebrew University Local Differential Privacy (LDP) 1 [KLNRS 08] -LDP if for every user , message is sent using a


slide-1
SLIDE 1

Locally private learning without interaction requires separation

Vitaly Feldman Research

with Amit Daniely

Hebrew University

slide-2
SLIDE 2

[KLNRS β€˜08] πœ—-LDP if for every user 𝑗, message π‘˜ is sent using a local πœ—π‘—,π‘˜-DP randomizer 𝐡𝑗,π‘˜ and πœ—π‘—,π‘˜

π‘˜

≀ πœ—

Local Differential Privacy (LDP)

𝑨2 𝑨1 𝑨3 π‘¨π‘œ

Server

slide-3
SLIDE 3

Non-interactive LDP

𝑨2 𝑨1 𝑨3 π‘¨π‘œ

Server

slide-4
SLIDE 4

PAC learning

4

PAC model [Valiant β€˜84]: Let 𝐷 be a set of binary classifiers over π‘Œ 𝐡 is a PAC learning algorithm for 𝐷 if βˆ€π‘” ∈ 𝐷 and distribution 𝐸 over π‘Œ, given i.i.d. examples 𝑦𝑗, 𝑔 𝑦𝑗 for 𝑦𝑗 ∼ 𝐸, 𝐡 outputs β„Ž such that w.h.p. 𝐐𝐬

π‘¦βˆΌπΈ β„Ž 𝑦 β‰  𝑔(𝑦) ≀ 𝛽

Distribution-specific learning: 𝐸 is fixed and known to 𝐡

slide-5
SLIDE 5

Statistical query model [Kearns β€˜93]

[KLNRS β€˜08] Simulation with success prob. 1 βˆ’ 𝛾 (πœ— ≀ 1)

  • πœ—-LDP with 𝑛 messages β‡’ 𝑃 𝑛 queries with 𝜐 = Ξ©

𝛾 𝑛

  • π‘Ÿ queries with tolerance 𝜐 β‡’ πœ—-LDP with π‘œ = 𝑃

π‘Ÿ log π‘Ÿ/𝛾 πœπœ— 2

samples/messages Non-interactive if and only if queries are non-adaptive

𝑀1 βˆ’ π…π‘¨βˆΌπ‘„ 𝜚1 𝑨 ≀ 𝜐

𝜐 is tolerance of the query; 𝜐 = 1/ π‘œ

πœšπ‘Ÿ

𝑀1

𝜚2

𝑀2 π‘€π‘Ÿ

𝜚1 SQ algorithm

𝜚1: π‘Ž β†’ 0,1

𝑄

SQ oracle 𝑄 distribution over π‘Ž π‘Ž = π‘Œ Γ— {Β±1} 𝑄 is the distribution of (𝑦, 𝑔 𝑦 ) for 𝑦 ∼ 𝐸

slide-6
SLIDE 6

Known results

Examples:

  • Yes: halfspaces/linear classifiers [Dunagan,Vempala β€˜04]
  • No: parity functions [Kearns β€˜93]
  • Yes, non-adaptively: Boolean conjunctions

6

𝐷 is SQ-learnable efficiently (non-adaptively) if and only if learnable efficiently with πœ—-LDP (non-interactively) [KLNRS 08] There exists 𝐷 that is 1. SQ/LDP-learnable efficiently over the uniform distribution on 0,1 𝑒 but 2. requires exponential num. of samples to learn non-interactively by an LDP algorithm

Masked parity

[KLNRS 08]:

Does separation hold for distribution-independent learning?

slide-7
SLIDE 7

Margin Complexity

7

+ + + + + +

  • +
  • +

+ + +

  • +

Margin complexity of 𝐷 over π‘Œ - 𝐍𝐃(𝐷): smallest 𝑁 such that exists an embedding Ξ¨: π‘Œ β†’ 𝐂𝑒(1) under which every 𝑔 ∈ 𝐷 is linearly separable with margin 𝛿 β‰₯

1 𝑁

Positive examples Ξ¨ 𝑦 𝑔 𝑦 = +1} Negative examples Ξ¨ 𝑦 𝑔 𝑦 = βˆ’1}

slide-8
SLIDE 8

Lower bound

8

Thm: Let 𝐷 be a negation-closed set of classifiers. Any non-interactive 1-LPD algorithm that learns 𝐷 with error 𝛽 < 1/2 and success probability Ξ© 1 needs π‘œ = Ξ© 𝐍𝐃 𝐷 2/3

Corollaries:

  • Decision lists over 0,1 𝑒: π‘œ = 2Ξ© 𝑒1/3

[Buhrman,Vereshchagin,de Wolf β€˜07] (Interactively) learnable with π‘œ = poly

𝑒 π›½πœ— [Kearns ’93]

  • Linear classifiers over 0,1 𝑒: π‘œ = 2Ξ© 𝑒

[Goldmann,Hastad,Razborov β€˜92; Sherstov β€˜07] (Interactively) learnable with π‘œ = poly

𝑒 π›½πœ— [Dunagan,Vempala ’04]

slide-9
SLIDE 9

Upper bound

9

Thm: For any 𝐷 and distribution 𝐸 there exists a non-adaptive πœ—-LPD algorithm that learns 𝐷 over 𝐸 with error 𝛽 and success probability 1 βˆ’ 𝛾 using π‘œ = poly 𝐍𝐃 𝐷 β‹… log 1/𝛾 π›½πœ— Instead of fixed 𝐸

  • access to public unlabeled samples from 𝐸
  • (interactive) LDP access to unlabeled samples from 𝐸

Lower bound holds against the hybrid model

slide-10
SLIDE 10

Lower bound technique

10

Thm: Let 𝐷 be a negation-closed set of classifiers. If exists a non-adaptive SQ algorithm that uses π‘Ÿ queries

  • f tolerance 1/π‘Ÿ to learn 𝐷 with error 𝛽 < 1/2 and success probability Ξ© 1 then

𝐍𝐃 𝐷 = 𝑃 π‘Ÿ3/2

Correlation dimension of 𝐷 - CSQdim(𝐷) [F. ’08] : smallest 𝑒 for which exist 𝑒 functions β„Ž1, … , β„Žπ‘’: π‘Œ β†’ [βˆ’1,1] such that for every 𝑔 ∈ 𝐷 and distribution 𝐸 exists 𝑗 such that 𝐅

π‘¦βˆΌπΈ 𝑔 𝑦 β„Žπ‘— 𝑦

β‰₯ 1 𝑒

Thm: [F. ’08; Kallweit,Simon β€˜11]: 𝐍𝐃 𝐷 ≀ CSQdim 𝐷 3/2

slide-11
SLIDE 11

Proof

Let 𝜚1, … , πœšπ‘Ÿ: π‘Œ Γ— Β±1 β†’ 0,1 be the (non-adaptive) queries of 𝐡 Decompose 𝜚 𝑦, 𝑧 = 𝜚 𝑦, 1 + 𝜚 𝑦, βˆ’1 2 + 𝜚 𝑦, 1 βˆ’ 𝜚 𝑦, βˆ’1 2 β‹… 𝑧 𝐅

π‘¦βˆΌπΈ πœšπ‘—(𝑦, 𝑔 𝑦 ) = 𝐅 π‘¦βˆΌπΈ 𝑕𝑗 𝑦

+ 𝐅

π‘¦βˆΌπΈ 𝑔 𝑦 β„Žπ‘— 𝑦

If 𝐅

π‘¦βˆΌπΈ 𝑔 𝑦 β„Žπ‘— 𝑦

≀ 1

π‘Ÿ then 𝐅 π‘¦βˆΌπΈ πœšπ‘—(𝑦, 𝑔 𝑦 ) β‰ˆ 𝐅 π‘¦βˆΌπΈ πœšπ‘—(𝑦, βˆ’π‘” 𝑦 )

If this holds for all 𝑗 ∈ [π‘Ÿ], then the algorithm cannot distinguish between 𝑔 and βˆ’π‘” Cannot achieve error < 1/2

11

If exists a non-adaptive SQ algorithm 𝐡 that uses π‘Ÿ queries

  • f tolerance 1/π‘Ÿ to learn 𝐷 with error 𝛽 < 1/2 then

CSQdim 𝐷 ≀ π‘Ÿ β„Ž 𝑕

slide-12
SLIDE 12

Upper bound

12

Thm: For any 𝐷 and distribution 𝐸 there exists a non-adaptive πœ—-LPD algorithm that learns 𝐷 over 𝐸 with error 𝛽 < 1/2 and success probability 1 βˆ’ 𝛾 using π‘œ = poly 𝐍𝐃 𝐷 β‹… log 1/𝛾 π›½πœ—

Margin complexity of 𝐷 over π‘Œ - 𝐍𝐃(𝐷): smallest 𝑁 such that exists an embedding Ξ¨: π‘Œ β†’ 𝐂𝑒(1) under which every 𝑔 ∈ 𝐷 is linearly separable with margin 𝛿 β‰₯

1 𝑁

Thm [Arriaga,Vempala ’99; Ben-David,Eiron,Simon β€˜02]: For every every 𝑔 ∈ 𝐷, random projection into 𝐂𝑒(1) for 𝑒 = 𝑃(𝐍𝐃 𝐷 2log(1/𝛾)) ensures that with prob. 1 βˆ’ 𝛾, 1 βˆ’ 𝛾 fraction of points are linearly separable with margin 𝛿 β‰₯

1 πŸ‘ 𝐍𝐃 𝐷

slide-13
SLIDE 13

Algorithm

Perceptron: if sign( π‘₯𝑒, 𝑦 ) β‰  𝑧 then update π‘₯𝑒+1 ← π‘₯𝑒 + 𝑧𝑦 Expected update: 𝐅

(𝑦,𝑧)βˆΌπ‘„ 𝑧𝑦 | sign( π‘₯𝑒, 𝑦 ) β‰  𝑧

|| 𝐅

(𝑦,𝑧)βˆΌπ‘„ 𝑧𝑦 β‹… 𝟚(sign( π‘₯𝑒, 𝑦 ) β‰  𝑧) /

𝐐𝐬

(𝑦,𝑧)βˆΌπ‘„ sign( π‘₯𝑒, 𝑦 ) β‰  𝑧

|| 𝐅

(𝑦,𝑧)βˆΌπ‘„ 𝑦 β‹… 𝑧 βˆ’ sign π‘₯𝑒, 𝑦

2 Estimate the mean vector with β„“2 error

  • LDP [Duchi,Jordan,Wainright β€˜13]
  • SQs [F.,Guzman,Vempala β€˜15]

13

scalar β‰₯ 𝛽 independent of the label non-adaptive

= 𝐅

𝑦,𝑧 βˆΌπ‘„ 𝑦𝑧 +

𝐅

𝑦,𝑧 βˆΌπ‘„ 𝑦 sign π‘₯𝑒, 𝑦

2

slide-14
SLIDE 14

Conclusions

  • New approach to lower bounds for non-interactive LDP
  • Reduction to margin-complexity lower bounds
  • Lower bounds for classical learning problems
  • Same results for communication constrained protocols
  • Also equivalent to SQ
  • Interaction is necessary for learning
  • Open:
  • Distribution-independent learning in poly 𝐍𝐃 𝐷
  • Lower bounds against 2 + round protocols
  • Stochastic convex optimization

https://arxiv.org/abs/1809.09165

14