Private Equilibrium Computation for Analyst Privacy ?? ?? ?? ?? - - PowerPoint PPT Presentation

private equilibrium computation for analyst privacy
SMART_READER_LITE
LIVE PREVIEW

Private Equilibrium Computation for Analyst Privacy ?? ?? ?? ?? - - PowerPoint PPT Presentation

Private Equilibrium Computation for Analyst Privacy ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? Justin Hsu, Aaron Roth, 1 Jonathan Ullman 2 1 University of Pennsylvania 2 Harvard University June 2, 2013 A market survey scenario A market


slide-1
SLIDE 1

Private Equilibrium Computation for Analyst Privacy

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Justin Hsu, Aaron Roth,1 Jonathan Ullman2

1University of Pennsylvania 2Harvard University

June 2, 2013

slide-2
SLIDE 2

A market survey scenario

slide-3
SLIDE 3

A market survey scenario

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-4
SLIDE 4

A market survey scenario

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-5
SLIDE 5

A market survey scenario

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Requirements

  • Data privacy: protect the consumer’s privacy
slide-6
SLIDE 6

A market survey scenario

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Requirements

  • Data privacy: protect the consumer’s privacy
slide-7
SLIDE 7

A market survey scenario

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Requirements

  • Data privacy: protect the consumer’s privacy
  • Analyst privacy [DNV’12]: protect the analyst’s privacy
slide-8
SLIDE 8

(Standard) Differential privacy [DMNS’06]

D

[Dwork-McSherry-Nissim-Smith 06]

Algorithm Pr [r] ratio bounded Alice Bob Chris Donna Ernie Xavier

slide-9
SLIDE 9

More formally

Definition (DMNS’06)

Let M be a randomized mechanism from databases to range R, and let D, D′ be databases differing in one record. M is ǫ-differentially private if for every r ∈ R, Pr[M(D) = r] ≤ eǫ · Pr[M(D′) = r].

Useful properties

  • Very strong, worst-case privacy guarantee
  • Well-behaved under composition, post-processing
slide-10
SLIDE 10

Many-to-one-analyst privacy [DNV’12]

Intuition

  • A single analyst can’t tell if other analysts change their queries
slide-11
SLIDE 11

Many-to-one-analyst privacy [DNV’12]

Intuition

  • A single analyst can’t tell if other analysts change their queries

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-12
SLIDE 12

Many-to-one-analyst privacy [DNV’12]

Intuition

  • A single analyst can’t tell if other analysts change their queries

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-13
SLIDE 13

Many-to-one-analyst privacy [DNV’12]

Intuition

  • A single analyst can’t tell if other analysts change their queries

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-14
SLIDE 14

Many-to-one-analyst privacy [DNV’12]

Intuition

  • A single analyst can’t tell if other analysts change their queries

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-15
SLIDE 15

One-query-to-many-analyst privacy (Today)

Intuition

  • All but one analyst (possibly colluding) can’t tell if last

analyst changes one of their queries

slide-16
SLIDE 16

One-query-to-many-analyst privacy (Today)

Intuition

  • All but one analyst (possibly colluding) can’t tell if last

analyst changes one of their queries

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-17
SLIDE 17

One-query-to-many-analyst privacy (Today)

Intuition

  • All but one analyst (possibly colluding) can’t tell if last

analyst changes one of their queries

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-18
SLIDE 18

One-query-to-many-analyst privacy (Today)

Intuition

  • All but one analyst (possibly colluding) can’t tell if last

analyst changes one of their queries

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-19
SLIDE 19

One-query-to-many-analyst privacy (Today)

Intuition

  • All but one analyst (possibly colluding) can’t tell if last

analyst changes one of their queries

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

slide-20
SLIDE 20

The query release problem

Basic problem

  • Analysts want accurate answers to a large set Q of

counting (linear) queries

slide-21
SLIDE 21

The query release problem

Basic problem

  • Analysts want accurate answers to a large set Q of

counting (linear) queries “What fraction of records satisfy P?”

slide-22
SLIDE 22

The query release problem

Basic problem

  • Analysts want accurate answers to a large set Q of

counting (linear) queries “What fraction of records satisfy P?”

  • Privately construct synthetic database to answer queries
slide-23
SLIDE 23

The query release problem

Basic problem

  • Analysts want accurate answers to a large set Q of

counting (linear) queries “What fraction of records satisfy P?”

  • Privately construct synthetic database to answer queries

Prior work

  • Long line of work [BLR’08, RR’09, HR’10,. . . ], data privacy
slide-24
SLIDE 24

The query release problem

Basic problem

  • Analysts want accurate answers to a large set Q of

counting (linear) queries “What fraction of records satisfy P?”

  • Privately construct synthetic database to answer queries

Prior work

  • Long line of work [BLR’08, RR’09, HR’10,. . . ], data privacy
  • Stateful mechanisms: not analyst private
slide-25
SLIDE 25

Accuracy

Theorem

Suppose the analysts ask queries Q, and let the database have n records from X. There exists an ǫ analyst and data private mechanism which achieves error α on all queries in Q, where α = O polylog(|X|, |Q|) ǫ√n

  • .
slide-26
SLIDE 26

Plan for rest of the talk

Outline

  • Interpretation of query release as a game
  • Privately solving the query release game
  • Analyst private query release
slide-27
SLIDE 27

The query release game

slide-28
SLIDE 28

The query release game

Record r

slide-29
SLIDE 29

The query release game

Record r

Query q

slide-30
SLIDE 30

The query release game

Record r

Query q

Loss q(r) − q(D)

(D is true database)

slide-31
SLIDE 31

The query release game

Record r

Query q

Loss q(r) − q(D)

Loss − (q(r) − q(D))

(D is true database)

slide-32
SLIDE 32

From strategies to query release

Database as a distribution

  • Think of true database D as a distribution over records
  • ˆ

D is data player’s distribution over records

slide-33
SLIDE 33

From strategies to query release

Database as a distribution

  • Think of true database D as a distribution over records
  • ˆ

D is data player’s distribution Mixed strategy

  • ver records
slide-34
SLIDE 34

From strategies to query release

Database as a distribution

  • Think of true database D as a distribution over records
  • ˆ

D is data player’s distribution Mixed strategy

  • ver records
  • Versus a counting query q, data player’s expected loss:

Er∼ ˆ

D[q(r) − q(D)] = q( ˆ

D) − q(D)

slide-35
SLIDE 35

From strategies to query release

Database as a distribution

  • Think of true database D as a distribution over records
  • ˆ

D is data player’s distribution Mixed strategy

  • ver records
  • Versus a counting query q, data player’s expected loss:

Er∼ ˆ

D[q(r) − q(D)] = q( ˆ

D) − q(D)

  • D is mixed strategy with zero loss

Equilibrium strategy

slide-36
SLIDE 36

From strategies to query release

What if small expected loss?

  • Suppose data player’s expected loss less than α for all queries
slide-37
SLIDE 37

From strategies to query release

What if small expected loss?

  • Suppose data player’s expected loss less than α

α-approximate equilibrium for all queries

slide-38
SLIDE 38

From strategies to query release

What if small expected loss?

  • Suppose data player’s expected loss less than α

α-approximate equilibrium for all queries

  • Data distribution answers all queries with error at most α
slide-39
SLIDE 39

From strategies to query release

What if small expected loss?

  • Suppose data player’s expected loss less than α

α-approximate equilibrium for all queries

  • Data distribution answers all queries with error at most α

Query release!

slide-40
SLIDE 40

From strategies to query release

What if small expected loss?

  • Suppose data player’s expected loss less than α

α-approximate equilibrium for all queries

  • Data distribution

Synthetic database answers all queries with error at most α Query release!

slide-41
SLIDE 41

From strategies to query release

What if small expected loss?

  • Suppose data player’s expected loss less than α

α-approximate equilibrium for all queries

  • Data distribution

Synthetic database answers all queries with error at most α Query release!

  • But how to compute this?
slide-42
SLIDE 42

Computing the equilibrium privately

Known approach: repeated game

  • Players maintain distributions over actions
slide-43
SLIDE 43

Computing the equilibrium privately

Known approach: repeated game

  • Players maintain distributions over actions
  • Loop:
  • Sample and play action
slide-44
SLIDE 44

Computing the equilibrium privately

Known approach: repeated game

  • Players maintain distributions over actions
  • Loop:
  • Sample and play action
  • Receive loss for all actions
slide-45
SLIDE 45

Computing the equilibrium privately

Known approach: repeated game

  • Players maintain distributions over actions
  • Loop:
  • Sample and play action
  • Receive loss for all actions
  • Update distribution: increase probability of better actions
slide-46
SLIDE 46

Computing the equilibrium privately

Known approach: repeated game

  • Players maintain distributions over actions
  • Loop:
  • Sample and play action
  • Receive loss for all actions
  • Update distribution:

Multiplicative weights (MW) increase probability of better actions

slide-47
SLIDE 47

Computing equilibrium strategy privately

Record r

Query q

Loss q(r) − q(D)

Loss − (q(r) − q(D))

slide-48
SLIDE 48

Computing equilibrium strategy privately Loss q(r) − q(D) Loss − (q(r) − q(D))

MW MW

slide-49
SLIDE 49

Computing equilibrium strategy privately

Record r

Query q

MW MW

slide-50
SLIDE 50

Computing equilibrium strategy privately

Idea: use distribution over plays [FS’96]

  • Both players use multiplicative weights
  • MW distributions converge to approximate equilibrium
slide-51
SLIDE 51

Computing equilibrium strategy privately

Idea: use distribution over plays [FS’96]

  • Both players use multiplicative weights
  • MW distributions converge to approximate equilibrium

Not private

slide-52
SLIDE 52

Computing equilibrium strategy privately

Idea: use distribution over plays [FS’96]

  • Both players use multiplicative weights
  • MW distributions converge to approximate equilibrium

Not private

  • Empirical distributions also converge to approximate

equilibrium

slide-53
SLIDE 53

Computing equilibrium strategy privately

Idea: use distribution over plays [FS’96]

  • Both players use multiplicative weights
  • MW distributions converge to approximate equilibrium

Not private

  • Empirical distributions

Distribution of actual plays also converge to approximate equilibrium

slide-54
SLIDE 54

Computing equilibrium strategy privately

Idea: use distribution over plays [FS’96]

  • Both players use multiplicative weights
  • MW distributions converge to approximate equilibrium

Not private

  • Empirical distributions

Distribution of actual plays also converge to approximate equilibrium

  • Samples from MW distribution: private?
slide-55
SLIDE 55

(Standard) Differential privacy [DMNS’06]

D

[Dwork-McSherry-Nissim-Smith 06]

Algorithm Pr [r] ratio bounded Alice Bob Chris Donna Ernie Xavier

slide-56
SLIDE 56

Computing equilibrium strategy privately

Idea: use distribution over plays [FS’96]

  • Both players use multiplicative weights
  • MW distributions converge to approximate equilibrium

Not private

  • Empirical distributions

Distribution of actual plays also converge to approximate equilibrium

  • Samples from MW distribution: private?
slide-57
SLIDE 57

Computing equilibrium strategy privately

Idea: use distribution over plays [FS’96]

  • Both players use multiplicative weights
  • MW distributions converge to approximate equilibrium

Not private

  • Empirical distributions

Distribution of actual plays also converge to approximate equilibrium

  • Samples from MW distribution: private?
  • Depends on losses: what if we change database or query?
slide-58
SLIDE 58

Privacy for games

Data privacy

q(r) − q(D) Record r

Query q

slide-59
SLIDE 59

Privacy for games

Data privacy

q(r) − q(D) Record r

Query q

Record r q(r) − q(D0)

slide-60
SLIDE 60

Privacy for games

Data privacy

q(r) − q(D) Record r

Query q

Record r q(r) − q(D0)

  • Changing a record in database changes all losses only a little
slide-61
SLIDE 61

Privacy for games

Analyst privacy

q(r) − q(D) Record r

Query q

slide-62
SLIDE 62

Privacy for games

Analyst privacy

q(r) − q(D) Record r

Query q

Record r

q0(r) − q0(D) Query q 7! q0

slide-63
SLIDE 63

Privacy for games

Analyst privacy

q(r) − q(D) Record r

Query q

Record r

q0(r) − q0(D) Query q 7! q0

  • Changing a query changes losses for an entire row

(maybe by a lot)

slide-64
SLIDE 64

Query release mechanism

Plan

  • Private inputs: database D, set of all queries Q from analysts
slide-65
SLIDE 65

Query release mechanism

Plan

  • Private inputs: database D, set of all queries Q from analysts
  • Simulate repeated play of query release game
slide-66
SLIDE 66

Query release mechanism

Plan

  • Private inputs: database D, set of all queries Q from analysts
  • Simulate repeated play of query release game
  • Publish: empirical distribution on data player’s plays
slide-67
SLIDE 67

Query release mechanism

Plan

  • Private inputs: database D, set of all queries Q from analysts
  • Simulate repeated play of query release game
  • Publish: empirical distribution on data player’s plays
  • Analysts compute answers by using this as synthetic database
slide-68
SLIDE 68

Analyst private query release

Requirement: Analyst privacy

  • If query changed, synthetic database shouldn’t change much
slide-69
SLIDE 69

Analyst private query release

Requirement: Analyst privacy

  • If query changed, synthetic database shouldn’t change much

Obstacle: query player can’t play a query too often

  • Changing it might drastically change synthetic database
slide-70
SLIDE 70

A closer look at the MW update

Data player’s update

  • Versus query q, update probability of record r:

pr := pr · exp{−(q(r) − q(D))}

slide-71
SLIDE 71

A closer look at the MW update

Data player’s update

  • Versus query q, update probability of record r:

pr := pr · exp{−(q(r) − q(D))}

  • After queries

q(1) pr ∼ exp

  • q(1)(r) − q(1)(D)
slide-72
SLIDE 72

A closer look at the MW update

Data player’s update

  • Versus query q, update probability of record r:

pr := pr · exp{−(q(r) − q(D))}

  • After queries

q(1), q(2) pr ∼ exp

  • q(1)(r) − q(1)(D)
  • q(2)(r) − q(2)(D)
slide-73
SLIDE 73

A closer look at the MW update

Data player’s update

  • Versus query q, update probability of record r:

pr := pr · exp{−(q(r) − q(D))}

  • After queries

q(1), q(2), . . . , q(T) : pr ∼ exp

  • q(1)(r) − q(1)(D)
  • − · · · −
  • q(T)(r) − q(T)(D)
slide-74
SLIDE 74

A closer look at the MW update

Data player’s update

  • Versus query q, update probability of record r:

pr := pr · exp{−(q(r) − q(D))}

  • After queries

q(1), q(2), . . . , q(T) : pr ∼ exp

  • q(1)(r) − q(1)(D)
  • − · · · −
  • q(T)(r) − q(T)(D)
  • Very sensitive to changing a query if query played many times
slide-75
SLIDE 75

Analyst private query release

Requirement: Analyst privacy

  • If query changed, synthetic database shouldn’t change much

Obstacle: query player can’t play a query too often

  • Changing it might drastically change synthetic database
slide-76
SLIDE 76

Analyst private query release

Requirement: Analyst privacy

  • If query changed, synthetic database shouldn’t change much

Obstacle: query player can’t play a query too often

  • Changing it might drastically change synthetic database
  • Project query distribution so probabilities are capped
slide-77
SLIDE 77

Analyst private query release

Requirement: Analyst privacy

  • If query changed, synthetic database shouldn’t change much

Obstacle: query player can’t play a query too often

  • Changing it might drastically change synthetic database
  • Project query distribution so probabilities are capped

No query played too often

slide-78
SLIDE 78

Putting it all together

Analyst private mechanism

slide-79
SLIDE 79

Putting it all together

Analyst private mechanism

  • Maintain distributions over records and queries
slide-80
SLIDE 80

Putting it all together

Analyst private mechanism

  • Maintain distributions over records and queries
  • Loop:
  • Draw actions (record and query) from distributions
slide-81
SLIDE 81

Putting it all together

Analyst private mechanism

  • Maintain distributions over records and queries
  • Loop:
  • Draw actions (record and query) from distributions
  • Calculate loss defined by the plays
slide-82
SLIDE 82

Putting it all together

Analyst private mechanism

  • Maintain distributions over records and queries
  • Loop:
  • Draw actions (record and query) from distributions
  • Calculate loss defined by the plays
  • Update distributions (MW)
slide-83
SLIDE 83

Putting it all together

Analyst private mechanism

  • Maintain distributions over records and queries
  • Loop:
  • Draw actions (record and query) from distributions
  • Calculate loss defined by the plays
  • Update distributions (MW)
  • Project query distribution to cap probabilities
slide-84
SLIDE 84

Putting it all together

Analyst private mechanism

  • Maintain distributions over records and queries
  • Loop:
  • Draw actions (record and query) from distributions
  • Calculate loss defined by the plays
  • Update distributions (MW)
  • Project query distribution to cap probabilities
  • Output data’s empirical distribution: synthetic database
slide-85
SLIDE 85

Changing the game

Mishandled queries

  • What if only a few queries with high error?
slide-86
SLIDE 86

Changing the game

Mishandled queries

  • What if only a few queries with high error?
  • Query player might not be able to put high probability on

these queries

slide-87
SLIDE 87

Changing the game

Mishandled queries

  • What if only a few queries with high error?
  • Query player might not be able to put high probability

Probabilities are capped!

  • n

these queries

slide-88
SLIDE 88

Changing the game

Mishandled queries

  • What if only a few queries with high error?
  • Query player might not be able to put high probability

Probabilities are capped!

  • n

these queries

  • At equilibrium, a few queries might have high error
slide-89
SLIDE 89

Putting it all together

Analyst private mechanism

  • Maintain distributions over records and queries
  • Loop:
  • Draw actions (record and query) from distributions
  • Calculate loss defined by the plays
  • Update distributions (MW)
  • Project query distribution to cap probabilities
  • Output data’s empirical distribution: synthetic database
slide-90
SLIDE 90

Putting it all together

Analyst private mechanism

  • Maintain distributions over records and queries
  • Loop:
  • Draw actions (record and query) from distributions
  • Calculate loss defined by the plays
  • Update distributions (MW)
  • Project query distribution to cap probabilities
  • Output data’s empirical distribution: synthetic database
  • Find and answer queries where synthetic data performs poorly
slide-91
SLIDE 91

Accuracy

Theorem

Suppose the analysts ask queries Q, and let the database have n records from X. There exists an ǫ analyst and data private mechanism which achieves error α on all queries in Q, where α = O polylog(|X|, |Q|) ǫ√n

  • .
slide-92
SLIDE 92

Accuracy

Theorem

Suppose the analysts ask queries Q, and let the database have n records from X. There exists an ǫ analyst and data private mechanism which achieves error α on all queries in Q, where α = O polylog(|X|, |Q|) ǫ√n

  • .

Notes

  • Counting queries, so error α ≪ 1 is nontrivial
slide-93
SLIDE 93

Accuracy

Theorem

Suppose the analysts ask queries Q, and let the database have n records from X. There exists an ǫ analyst and data private mechanism which achieves error α on all queries in Q, where α = O polylog(|X|, |Q|) ǫ√n

  • .

Notes

  • Counting queries, so error α ≪ 1 is nontrivial
  • Improved dependence on n compared to O(1/n1/4) [DNV’12],

but analyst privacy guarantees are incomparable

slide-94
SLIDE 94

Accuracy

Theorem

Suppose the analysts ask queries Q, and let the database have n records from X. There exists an ǫ analyst and data private mechanism which achieves error α on all queries in Q, where α = O polylog(|X|, |Q|) ǫ√n

  • .

Notes

  • Counting queries, so error α ≪ 1 is nontrivial
  • Improved dependence on n compared to O(1/n1/4) [DNV’12],

but analyst privacy guarantees are incomparable

  • O(1/√n) nearly optimal dependence on n, even for data

privacy only

slide-95
SLIDE 95

Additional results

Extensions

  • One-analyst-to-many-analyst private mechanism: one analyst

is allowed to change all of their queries

  • Analyst private online mechanism
  • Analyst private mechanism for general low-sensitivity queries
slide-96
SLIDE 96

Wrapping up

Our contributions

  • Interpretation of query release as zero-sum game
  • Method for privately computing the approximate equilibrium
  • Nearly optimal error for one-query-to-many-analyst privacy
slide-97
SLIDE 97

Wrapping up

Our contributions

  • Interpretation of query release as zero-sum game
  • Method for privately computing the approximate equilibrium
  • Nearly optimal error for one-query-to-many-analyst privacy

Ongoing/Future Work

  • Inherent gap between analyst privacy and just data privacy?
  • Other applications of privately solving zero-sum games?
  • Solving linear programs?
slide-98
SLIDE 98

Private Equilibrium Computation for Analyst Privacy

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Justin Hsu, Aaron Roth,1 Jonathan Ullman2

1University of Pennsylvania 2Harvard University

June 2, 2013