Satisfy Legal Definitions of Privacy? The Case of FERPA and - - PowerPoint PPT Presentation

satisfy legal definitions of privacy
SMART_READER_LITE
LIVE PREVIEW

Satisfy Legal Definitions of Privacy? The Case of FERPA and - - PowerPoint PPT Presentation

Do Computer Science Definitions of Privacy Satisfy Legal Definitions of Privacy? The Case of FERPA and Differential Privacy CS LAW Kobbi Nissim Ben-Gurion University Center for Research on Computation and Society


slide-1
SLIDE 1

Do Computer Science Definitions of Privacy Satisfy Legal Definitions of Privacy? The Case of FERPA and Differential Privacy

Priv rivacy Enhancin ing Technolo logie ies for r Bio Biometric ic Data Haifa University, Jan 17, 2016

CS LAW

Kobbi Nissim

Ben-Gurion University Center for Research on Computation and Society Harvard University

slide-2
SLIDE 2

This Work: a Collaboration

Product of a working group (meeting since Nov 2014) Contributing to this project:

  • Center for Research on Computation and Society (CRCS):
  • Kobbi Nissim, Aaron Bembenek, Mark Bun, Marco

Gaboardi, Thomas Steinke, and Salil Vadhan

  • Berkman Center for Internet & Society:
  • David O’Brien, Alexandra Wood, and Urs Gasser

CS LAW

slide-3
SLIDE 3
slide-4
SLIDE 4

Privacy Tools for Sharing Research Data

  • Goal: help social scientists share privacy-sensitive research data via a

collection of technological and legal tools

  • A problem: privacy protection techniques repeatedly shown to fail to

provide reasonable privacy

slide-5
SLIDE 5

Data Privacy

  • Studied (at least) from the 60s
  • Approaches: De-identification, redaction, auditing, noise addition, synthetic datasets …
  • Focus on how to provide privacy, not on what privacy protection is
  • May have been suitable for the pre-internet era
  • Re-identification [Sweeney ’00, …]
  • GIS data, health data, clinical trial data, DNA, Pharmacy data, text data, registry information, …
  • Blatant non-privacy [Dinur, Nissim ‘03], …
  • Auditors [Kenthapadi, Mishra, Nissim ’05]
  • AOL Debacle ‘06
  • Genome-Wide association studies (GWAS) [Homer et al. ’08]
  • Netflix award [Narayanan, Shmatikov ‘09]
  • Netflix canceled second contest
  • Social networks [Backstrom, Dwork, Kleinberg ‘11]
  • Genetic research studies [Gymrek, McGuire, Golan, Halperin, Erlich ‘11]
  • Microtargeted advertising [Korolova 11]
  • Recommendation Systems [Calandrino, Kiltzer, Naryanan, Felten, Shmatikov 11]
  • Israeli CBS [Mukatren, Nissim, Salman, Tromer ’14]
  • Attack on statistical aggregates [Homer et al.’08] [Dwork, Smith, Steinke, Vadhan ‘15]

Slide idea stolen shamelessly from Or Sheffet

slide-6
SLIDE 6

Privacy Tools for Sharing Research Data

  • Goal: help social scientists share privacy-sensitive research data via a

collection of technological and legal tools

  • A problem: privacy protection techniques repeatedly shown to fail to

provide reasonable privacy

  • Differential privacy [Dwork, McSherry, N, Smith 2006]
  • A formal mathematical privacy concept
  • Addresses weaknesses of traditional schemes (and more)
  • Has a rich theory, in first steps of implementation and testing
slide-7
SLIDE 7

𝑁: 𝑌𝑜 → 𝑈 satisfies 𝜗-differential privacy if ∀𝑦, 𝑦′ ∈ 𝑌𝑜 s.t. 𝑒𝑗𝑡𝑢𝐼 𝑦, 𝑦′ = 1 ∀𝑇 ⊆ 𝑈, Pr

M 𝑁 𝑦 ∈ 𝑇 ≤ 𝑓𝜗 Pr M 𝑁 𝑦′ ∈ 𝑇 .

The Protagonists

Differential Privacy A mathematical definition of privacy FERPA (the Family Educational Rights and Privacy Act) A legal standard of privacy

slide-8
SLIDE 8

A use case: The Privacy Tools for Sharing Research Data Project

* http://privacytools.seas.harvard.edu/

slide-9
SLIDE 9

Dataverse Network

Alice Bob

Alice’s cool MOOC data

Alice’s data please

Privacy Tools

Restricted! Access to Alice’s data w/differential privacy Should I apply for access??? Other researchers may find it useful… Does DP satisfy FERPA? IRB policies, terms of use … is it worth the trouble? http://privacytools.seas.harvard.edu/ * http://dataverse.org/ Contains student info protected by FERPA

slide-10
SLIDE 10

Short digression: Motivating Differential Privacy

slide-11
SLIDE 11

Trusted Party

Just before this talk … an interesting discussion … I will do the survey… … I will only publish the result … and immediately forget the data! How many Justin Bieber fans attend the workshop? Highly sensitive personal info; how can this be done? Yay! Yay! Great! Hooray!

slide-12
SLIDE 12

3 #JustinBieber fans attend #Haifa-privacy-workshop

slide-13
SLIDE 13

Trusted Party

Me too! A few minutes later - I come in … What are you doing? I will do the survey… … publish the result … and forget the data! A survey! How many JB fans attend the wkshop?

slide-14
SLIDE 14

3 #JustinBieber fans attend #Haifa-privacy-workshop (after @Kobbi joins): 4 #JustinBieber fans attend #Haifa-privacy-workshop The tweet hides my info!

Each

4/100 chance Kobbi is a JB fan

Aha!

slide-15
SLIDE 15

Composition

  • Differencing attack:
  • How is my privacy affected when an attacker sees analysis before and after I

join/leave?

  • More generally: Composition
  • Ho is my privacy affected when an attacker combines results from two or more

privacy preserving analyses?

  • Fundamental law of information: the more information we extract from
  • ur data, the more is learned about individuals!
  • So, privacy will deteriorate as we use our data more and more
  • Best desiderata:
  • Deterioration is quantifiable and controllable
  • Not abrupt
slide-16
SLIDE 16

The Protagonists:

Differential Privacy

slide-17
SLIDE 17

Real world: My ideal world:

Data Analysis (Computation) Outcome Data w/my info removed Analysis (Computation) Outcome

My Privacy Desiderata

same outcome

Kobbi’s data

slide-18
SLIDE 18

Things to Note

  • In this talk, we only consider the outcome of analyses
  • Security flaws, hacking, implementation errors, …
  • Very important but very different questions
  • My privacy desiderata would hide whether I’m a JB fan!
  • Resilient to differencing attacks
  • Does not mean I’m fully protected
  • I’m only protected to the extent I’m protected in my ideal world
  • Some harm could happen to me even in my ideal world
  • Bob smokes in public
  • Study teaches that smoking causes cancer
  • Bob’s health insurer raises his premium
  • Bob is harmed even if he does not participate in the study!
slide-19
SLIDE 19

Data Analysis (Computation) Outcome Data w/my info removed Analysis (Computation) Outcome

Our Privacy Desiderata

Real world: My ideal world:

same outcome

Should ignore Kobbi’s info

slide-20
SLIDE 20

Data Analysis (Computation) Outcome Data w/Gert’s info removed Analysis (Computation) Outcome

Our Privacy Desiderata

Real world: Gert’s ideal world:

same outcome

Should ignore Kobbi’s info and Gertrude’s!

slide-21
SLIDE 21

Data Analysis (Computation) Outcome Data w/Mark’s info removed Analysis (Computation) Outcome

Our Privacy Desiderata

Real world: Mark’s ideal world:

same outcome

Should ignore Kobbi’s info and Gertrude’s! and Mark’s!

slide-22
SLIDE 22

Data Analysis (Computation) Outcome Data w/’s info removed Analysis (Computation) Outcome

Our Privacy Desiderata

Real world: ’s ideal world:

same outcome

Should ignore Kobbi’s info and Gertrude’s! and Mark’s! … and everybody’s!

slide-23
SLIDE 23

Data Analysis (Computation) Outcome Data w/’s info removed Analysis (Computation) Outcome

A Realistic Privacy Desiderata

Real world: ’s ideal world:

ε-”similar” same outcome

slide-24
SLIDE 24

Data Analysis (Computation) Outcome Data w/’s info removed Analysis (Computation) Outcome

Differential Privacy [Dwork McSherry N Smith 06]

Real world: ’s ideal world:

ε-”similar” *See also: Differential Privacy: An Introduction for Social Scientists.

slide-25
SLIDE 25

Why Differential Privacy?

  • DP: Strong, quantifiable, composable mathematical privacy guarantee
  • Provably resilient to known and unknown attack modes!
  • Natural interpretation: I am protected (almost) to the extent I’m

protected in my privacy-ideal scenario

  • Theoretically, DP enables many computations with personal data

while preserving personal privacy

  • Practicality in first stages of validation
slide-26
SLIDE 26

Differential Privacy [Dwork McSherry N Smith 06]

𝑁: 𝑌𝑜 → 𝑈 satisfies 𝜗-differential privacy if ∀𝑦, 𝑦′ ∈ 𝑌𝑜 s.t. 𝑒𝑗𝑡𝑢𝐼 𝑦, 𝑦′ = 1 ∀𝑇 ⊆ 𝑈, Pr

M 𝑁 𝑦 ∈ 𝑇 ≤ 𝑓𝜗 Pr M 𝑁 𝑦′ ∈ 𝑇 .

slide-27
SLIDE 27

It’s Real!

slide-28
SLIDE 28

How is Differential Privacy Achieved?

  • Careful addition of random noise into the computation:
  • Randomized Response [W65], Framework of global sensitivity

[DMNS05], Framework of smooth sensitivity [NRS07], Sample and aggregate [NRS07], Exponential mechanism [MT07], Propose test release [DL09], Sparse vector technique [DNRRV09], Private multiplicative weights [HR10], Matrix mechanism [LHRMM10], Choosing mechanism [BNS13], Large margin mechanism [CHS14], Dual query mechanism [GGHRW14], …

  • Differentially private algorithms exists for many tasks:
  • Statistics, machine learning, private data release, …
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

Some Other Efforts to Bring DP to Practice

  • Microsoft Research “PINQ”
  • CMU-Cornell-PennState “Integrating Statistical and Computational

Approaches to Privacy” (See http://onthemap.ces.census.gov/)

  • UCSD “Integrating Data for Analysis, Anonymization, and Sharing” (iDash)
  • UT Austin “Airavat: Security & Privacy for MapReduce”
  • UPenn “Putting Differential Privacy to Work”
  • Stanford-Berkeley-Microsoft “Towards Practicing Privacy”
  • Duke-NISSS “Triangle Census Research Network”
  • MIT/CSAIL/ALFA "MoocDB Privacy tools for Sharing MOOC data"
slide-34
SLIDE 34

The Protagonists:

FERPA

slide-35
SLIDE 35

Introduction to FERPA

The Family Educational Rights and Privacy Act of 1974 governs the disclosure

  • f personal information contained in education records

Our interpretation is derived from multiple sources:

  • The text of the statute (20 U.S.C. 1232g)
  • The implementing regulations (34 C.F.R. Part 99),

which are typically what we are referring to when we cite FERPA

  • The 2008 final rule amending the FERPA regulations,

especially the preamble discussion (the “Preamble”)

  • Guidance documents from the Department of

Education’s Privacy Technical Assistance Center

slide-36
SLIDE 36

FERPA: Directory/Non-Directory Information

  • Directory information: Information from education records that can be

made public; is designated by each school (34 C.F.R. § 99.37)

  • Examples include names, addresses, telephone numbers, photographs, dates and

places of birth, degrees, honors, awards . . .

  • Non-directory Personally Identifiable Information: Information contained

in education records that can only be disclosed without consent under certain exceptions (34 C.F.R. § 99.31)

slide-37
SLIDE 37

What/whom does FERPA Protect Against?

FERPA’s definition of PII (and “attacker”): “information that, alone or in combination, is linked or linkable to a specific student that would allow a reasonable person in the school community, who does not have personal knowledge of the relevant circumstances, to identify the student with reasonable certainty”

(34 C.F.R. § 99.3)

  • Objective standard: a “hypothetical, rational, prudent, average individual”
  • Standard based on the knowledge of a member of the school community (stronger

than one based on the knowledge of any reasonable person)

slide-38
SLIDE 38

* Other laws (e.g., HIPAA) share these characteristics

Points of Mismatch

FERPA*

1. Highly sector- and context-specific 2. Focus on personally identifiable information and microdata 3. Endorses de-identification techniques; Does not provide clear guidance w.r.t.

  • ther techniques than de-identification

4. Refers to the obvious extreme cases, not to more difficult “gray areas” 5. Imprecise, not rigorous/formal from a technical standpoint

Differential Privacy

1. Generic, does not refer to type of data 2. Does not focus on a specific information and representation of data 3. Technology “neutral”; does not endorse a specific technique; gives clear guidance w.r.t. all techniques 4. Applies to all analyses, does not leave “gray areas” 5. A mathematically rigorous definition

slide-39
SLIDE 39

Importance of Bridging the Gaps

For the future of differential privacy

  • Not conforming to regulatory requirements would be a serious

barrier to adoption

For the future of law and regulation

  • Policymakers need to understand the technology and its

guarantees in order to approve and support the use of formal privacy models (and ensure robust privacy protection)

slide-40
SLIDE 40

But how can we bridge these very different languages?

𝑁: 𝑌𝑜 → 𝑈 satisfies 𝜗-differential privacy if ∀𝑦, 𝑦′ ∈ 𝑌𝑜 s.t. 𝑒𝑗𝑡𝑢𝐼 𝑦, 𝑦′ = 1 ∀𝑇 ⊆ 𝑈, Pr

M 𝑁 𝑦 ∈ 𝑇 ≤ 𝑓𝜗 Pr M 𝑁 𝑦′ ∈ 𝑇 .

Sifting through all these words for something to hold on to, everything will disintegrate! It is inconceivable that this sequence

  • f mathematical

symbols can capture a social concept like privacy!

slide-41
SLIDE 41

Doesn’t a Solution Already Exist?

  • HIPAA’s Expert Determination method: Obtain confirmation

from a qualified statistician that the risk of identification is very small

  • A brilliant idea: Let’s add such a clause to FERPA and every other

regulation!

  • Who is an expert?
  • U.S. Dept. of Health & Human Services guidance for HIPAA: “There

is no specific professional degree or certification program for designating who is an expert at rendering health information de- identified.”

  • How should s/he determine that the risk is small?

* HIPAA: Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule 45 C.F.R. Part 160 and Subparts A and E of Part 164.

slide-42
SLIDE 42

Proving that Differential Privacy Satisfies FERPA

slide-43
SLIDE 43

A CS View of the Legal Definitions

analyses

copy input to output

BAD!

redact this

Good? it depends...

slide-44
SLIDE 44

A CS View of the Legal Definitions

analyses

copy input to output

BAD!

redact this

Good? it depends...

slide-45
SLIDE 45

A CS View of the Legal Definitions

analyses

copy input to output

BAD!

redact this

Good? it depends...

slide-46
SLIDE 46

A CS View of the Legal Definitions

analyses

copy input to output

BAD!

redact this

Good? it depends...

slide-47
SLIDE 47

DP satisfies one interpretation of FERPA DP satisfies other interpretations!

slide-48
SLIDE 48
slide-49
SLIDE 49

Our Methodology in a Picture

analyses

copy input to output

BAD!

redact this

Good?

DP

slide-50
SLIDE 50

Our Methodology in More Detail

slide-51
SLIDE 51

Bridging FERPA & Differential Privacy

Two arguments to be made:

  • 1. The FERPA privacy standard is relevant for analyses computed with DP

A legal argument supported by a technical argument

  • 2. Differential privacy satisfies the FERPA privacy standard

A technical argument supported by a legal argument

  • FERPA allows dissemination of de-identified information  sufficient to

show that DP analyses result in outcome that is not identifiable

  • Extract a mathematical definition of privacy from FERPA and provide a

mathematical proof that DP satisfies this definition

slide-52
SLIDE 52

CS Paradigm of Security Definitions

Security/privacy defined as a game with an attacker (aka adversary)

  • Attacker defined by:
  • Computational resources it can spend
  • Knowledge it can bring from “outside the system”
  • Not uniquely specified but a large family of potential attackers
  • To capture all plausible attackers
  • Game defines:
  • Access to the system
  • What it means for an attacker to win
  • System secure/private:
  • If no attacker can win “too much”
slide-53
SLIDE 53

dir Student info

Towards a FERPA Security Game

dir Student info dir Attempts to identify a student Computation result Attacker wins if identification successful

What is in the directory and non-dir student info? What does “identify a student” mean technically? How much winning is “too much” (i.e. a privacy breach)? What is the attacker’s computational power? What external information does the attacker have?

slide-54
SLIDE 54

Principle: Always Err on the Conservative Side of Law

Ambitious goal:

  • Show that DP satisfies every reasonable interpretation of FERPA

How? Strengthen the attacker!

  • Whenever regulation is ambiguous let the attacker decide
  • May sound ridiculous to the untrained ear …
  • … but we are just making a stronger claim than necessary
  • Proving privacy w.r.t. such an attacker  proving privacy w.r.t. any

interpretation of the regulation!

Thank you so much … suckers!

slide-55
SLIDE 55

Example 1: Modeling Directory Information

  • The definition of directory information has some ambiguity

(e.g., the definition varies between schools)

  • We could make assumptions on what directory information is
  • This would be fragile and would limit reach of our argument
  • Instead, we let the attacker to choose what constitutes directory

information

slide-56
SLIDE 56

Example 2: Modeling the Attacker’s Knowledge

FERPA: a reasonable person in the school community Our interpretation: attacker has knowledge of statistics (distribution) over students in school

  • Statistics varies beween school communities, we do not know how to

define it exactly

  • We could make assumptions about the statistics (e.g., age is distributed

according to a Gaussian distribution)

  • This would would limit reach of our argument
  • Instead, let attacker choose the statistics
slide-57
SLIDE 57

More Issues to Address:

Including:

  • A mathematical interpretation of what identifiability means
  • When there is ambiguity let the attacker decide!
  • What success level of attacker is permissible?
  • Some other subtleties in the modeling
slide-58
SLIDE 58

dir Student info

Our FERPA Security Game (simplified)

Chooses statistics over students (expressed as a probability distribution P) P dir Student info dir Chooses target student S  dir S Computation result Makes a guess G G Attacker wins if guess G correct for student S

slide-59
SLIDE 59

Our FERPA Security Game

  • What should be the “allowable” winning probability?
  • If secret information is male/female then even without access to the

mechanism attacker can win with probability ½ !

  • Our choice:
  • Wining probability is at most 1% higher than without access to system
  • Theorem: Any DP mechanism (with ε=0.01) satisfies this requirement
slide-60
SLIDE 60

Methodology

  • 1. Search explicit requirements and hints on attacker model
  • E.g., FERPA “defines” attacker as “A reasonable person in the school

community that does not have personal knowledge of the relevant circumstances”

  • Directory information can be made public
  • Attacker’s goal: identification of sensitive (non-directory) data
  • Etc.
  • 2. Create a formal mathematical attacker model for the

regulation

  • Always err on the conservative side
  • 3. Provide a formal mathematical proof
  • I.e., differential privacy satisfies the resulting security definition
  • 4. Suggest how to set up the privacy parameter ε
  • Based on the regulation

Provide explanation suitable for CS and Legal scholars alike! In work

slide-61
SLIDE 61

Summary

  • Showed how to make a combined mathematical-legal formal claim that

differential privacy satisfies FERPA

  • A hyper-quantitative approach to the law
  • At the heart: extracting a formal very conservative attacker model for the

regulation

  • Paper in writing
  • An instance of a general methodology for answering a broader question:

A new technology emerges, can we claim it satisfies the existing regulation?

  • We take a distant enough view of FERPA to allow abstraction
  • A similar argument should work for other legal privacy standards
  • HIPAA, CIPSEA, Title 13, …

* HIPAA: Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule 45 C.F.R. Part 160 and Subparts A and E of Part 164. CIPSEA: Confidential Information Protection and Statistical Efficiency Act. Title 13 of the United States Code outlines the role of the United States Census in the United States Code.

slide-62
SLIDE 62

Thank You

Contributing to this project: Center for Research on Computation and Society (CRCS), Harvard: Kobbi Nissim, Aaron Bembenek, Mark Bun, Marco Gaboardi, Thomas Steinke, and Salil Vadhan Berkman Center for Internet & Society, Harvard: David O’Brien, Alexandra Wood, and Urs Gasser Thanks: Ann Kristin Glenster (HLS), Deborah Hurley (IQSS), George Kellaris (CRCS), Michel Reymond (U. Geneva), Or Sheffet (U. Ottawa), Olga Slobodyanyuk (HLS)

CS LAW