Privacy Cognizant Information Privacy Cognizant Information Systems - - PowerPoint PPT Presentation

privacy cognizant information privacy cognizant
SMART_READER_LITE
LIVE PREVIEW

Privacy Cognizant Information Privacy Cognizant Information Systems - - PowerPoint PPT Presentation

Privacy Cognizant Information Privacy Cognizant Information Systems Systems Rakesh Agrawal Agrawal Rakesh IBM Almaden Almaden Research Center Research Center IBM Jt. work with Srikant, Kiernan, Xu & Evfimievski Evfimievski Thesis


slide-1
SLIDE 1

Privacy Cognizant Information Privacy Cognizant Information Systems Systems

Rakesh Rakesh Agrawal Agrawal IBM IBM Almaden Almaden Research Center Research Center

  • Jt. work with Srikant, Kiernan, Xu & Evfimievski

Evfimievski

slide-2
SLIDE 2

Thesis Thesis

ƒ ƒ There is increasing need to build information There is increasing need to build information systems that systems that

ƒ ƒ protect the privacy and ownership of information protect the privacy and ownership of information ƒ ƒ do not impede the flow of information do not impede the flow of information

ƒ ƒ Cross Cross-

  • fertilization of ideas from the security and

fertilization of ideas from the security and database research communities can lead to the database research communities can lead to the development of innovative solutions. development of innovative solutions.

slide-3
SLIDE 3

Outline Outline

  • Motivation

Motivation

  • Privacy Preserving Data Mining

Privacy Preserving Data Mining

  • Privacy Aware Data Management

Privacy Aware Data Management

  • Information Sharing Across Private Databases

Information Sharing Across Private Databases

  • Conclusions

Conclusions

slide-4
SLIDE 4

Drivers Drivers

  • Policies and Legislations

Policies and Legislations

– – U.S. and international regulations U.S. and international regulations – – Legal proceedings against businesses Legal proceedings against businesses

  • Consumer Concerns

Consumer Concerns

– – Consumer privacy apprehensions continue to plague the Consumer privacy apprehensions continue to plague the Web … these fears will hold back roughly $15 billion in e Web … these fears will hold back roughly $15 billion in e-

  • Commerce revenue.” Forrester Research, 2001

Commerce revenue.” Forrester Research, 2001 – – Most consumers are “privacy pragmatists.” Westin Most consumers are “privacy pragmatists.” Westin Surveys Surveys

  • Moral Imperative

Moral Imperative

– – The right to privacy: the most cherished of human The right to privacy: the most cherished of human freedom freedom --

  • - Warren & Brandeis, 1890

Warren & Brandeis, 1890

slide-5
SLIDE 5

Outline Outline

  • Motivation

Motivation

  • Privacy Preserving Data Mining

Privacy Preserving Data Mining

  • Privacy Aware Data Management

Privacy Aware Data Management

  • Information Sharing Across Private Databases

Information Sharing Across Private Databases

  • Conclusions

Conclusions

slide-6
SLIDE 6

Data Mining and Privacy Data Mining and Privacy

  • The primary task in data mining:

The primary task in data mining:

– – development of models about aggregated data. development of models about aggregated data.

  • Can we develop accurate models, while

Can we develop accurate models, while protecting the privacy of individual records? protecting the privacy of individual records?

slide-7
SLIDE 7

Setting Setting

  • Application scenario: A central server interested in

Application scenario: A central server interested in building a data mining model using data obtained building a data mining model using data obtained from a large number of clients, while preserving from a large number of clients, while preserving their privacy their privacy

– – Web Web-

  • commerce, e.g. recommendation service

commerce, e.g. recommendation service

  • Desiderata:

Desiderata:

– – Must not slow Must not slow-

  • down the speed of client interaction

down the speed of client interaction – – Must scale to very large number of clients Must scale to very large number of clients

  • During the application phase

During the application phase

– – Ship model to the clients Ship model to the clients – – Use oblivious computations Use oblivious computations

slide-8
SLIDE 8

Recommendation Service Alice Bob

35 95,000 J.S. Bach painting nasa 35 95,000 J.S. Bach painting nasa 45 60,000

  • B. Spears

baseball cnn 45 60,000

  • B. Spears

baseball cnn 42 85,000

  • B. Marley

camping microsoft 42 85,000

  • B. Marley

camping microsoft 45 60,000

  • B. Spears

baseball cnn 35 95,000 J.S. Bach painting nasa

Chris

42 85,000

  • B. Marley,

camping, microsoft

World Today World Today

slide-9
SLIDE 9

Recommendation Service Alice Bob

35 95,000 J.S. Bach painting nasa 35 95,000 J.S. Bach painting nasa 45 60,000

  • B. Spears

baseball cnn 45 60,000

  • B. Spears

baseball cnn 42 85,000

  • B. Marley

camping microsoft 42 85,000

  • B. Marley

camping microsoft 45 60,000

  • B. Spears

baseball cnn 35 95,000 J.S. Bach painting nasa

Chris

42 85,000

  • B. Marley,

camping, microsoft Mining Algorithm Data Mining Model

World Today World Today

slide-10
SLIDE 10

Recommendation Service Alice Bob

50 65,000 Metallica painting nasa 50 65,000 Metallica painting nasa 38 90,000

  • B. Spears

soccer fox 38 90,000

  • B. Spears

soccer fox 32 55,000

  • B. Marley

camping linuxware 32 55,000

  • B. Marley

camping linuxware 45 60,000

  • B. Spears

baseball cnn 35 95,000 J.S. Bach painting nasa

Chris

42 85,000

  • B. Marley,

camping, microsoft

35 becomes 50 (35+15)

Per-record randomization without considering other records Randomization parameters common across users Randomization techniques differ for numeric and categorical data Each attribute randomized independently

New Order: New Order:

Randomization to Randomization to Protect Privacy Protect Privacy

slide-11
SLIDE 11

Recommendation Service Alice Bob

50 65,000 Metallica painting nasa 50 65,000 Metallica painting nasa 38 90,000

  • B. Spears

soccer fox 38 90,000

  • B. Spears

soccer fox 32 55,000

  • B. Marley

camping linuxware 32 55,000

  • B. Marley

camping linuxware 45 60,000

  • B. Spears

baseball cnn 35 95,000 J.S. Bach painting nasa

Chris

42 85,000

  • B. Marley,

camping, microsoft

New Order: New Order:

Randomization to Randomization to Protect Privacy Protect Privacy

True values Never Leave the User!

slide-12
SLIDE 12

Recommendation Service Alice Bob

50 65,000 Metallica painting nasa 50 65,000 Metallica painting nasa 38 90,000

  • B. Spears

soccer fox 38 90,000

  • B. Spears

soccer fox 32 55,000

  • B. Marley

camping linuxware 32 55,000

  • B. Marley

camping linuxware 45 60,000

  • B. Spears

baseball cnn 35 95,000 J.S. Bach painting nasa

Chris

42 85,000

  • B. Marley,

camping, microsoft Data Mining Model Mining Algorithm Recovery

Recovery of distributions, not individual records

New Order: New Order: Randomization Randomization Protects Privacy Protects Privacy

slide-13
SLIDE 13

Reconstruction Problem Reconstruction Problem (Numeric Data) (Numeric Data)

  • Original values x

Original values x1

1, x

, x2

2, ...,

, ..., x xn

n

– – from probability distribution X (unknown) from probability distribution X (unknown)

  • To hide these values, we use y

To hide these values, we use y1

1, y

, y2

2, ...,

, ..., y yn

n

– – from probability distribution Y from probability distribution Y

  • Given

Given – – x x1

1+y

+y1

1, x

, x2

2+y

+y2

2, ...,

, ..., x xn

n+y

+yn

n

– – the probability distribution of Y the probability distribution of Y Estimate the probability distribution of X. Estimate the probability distribution of X.

slide-14
SLIDE 14

Reconstruction Algorithm Reconstruction Algorithm

f fX

X 0 := Uniform distribution

:= Uniform distribution j := 0 j := 0 repeat repeat f fX

X j+1 j+1(a) :=

(a) := Bayes Bayes’ Rule ’ Rule j := j+1 j := j+1 until (stopping criterion met) until (stopping criterion met)

(R. (R. Agrawal Agrawal & R. & R. Srikant Srikant, SIGMOD 2000) , SIGMOD 2000)

  • Converges to maximum likelihood estimate.

Converges to maximum likelihood estimate.

– – D. Agrawal & C.C. Aggarwal, PODS 2001.

  • D. Agrawal & C.C. Aggarwal, PODS 2001.

∑∫

= ∞ ∞ −

− + − +

n i j X i i Y j X i i Y

a f a y x f a f a y x f n

1

) ( ) ) (( ) ( ) ) (( 1

slide-15
SLIDE 15

Works Well Works Well

20 60 Age 200 400 600 800 1000 1200 Number of People Original Randomized Reconstructed

slide-16
SLIDE 16

Decision Tree Example Decision Tree Example

Age Salary Repeat Visitor? 23 50K Repeat 17 30K Repeat 43 40K Repeat 68 50K Single 32 70K Single 20 20K Repeat

Age < 25 Salary < 50K

Repeat Repeat

Single

Yes Yes No No

slide-17
SLIDE 17

Algorithms Algorithms

  • Global

Global

– – Reconstruct for each attribute once at the beginning Reconstruct for each attribute once at the beginning

  • By Class

By Class

– – For each attribute, first split by class, then reconstruct For each attribute, first split by class, then reconstruct separately for each class. separately for each class.

  • Local

Local

– – Reconstruct at each node Reconstruct at each node See SIGMOD 2000 paper for details. See SIGMOD 2000 paper for details.

slide-18
SLIDE 18

Experimental Methodology Experimental Methodology

  • Compare accuracy against

Compare accuracy against

– – Original Original: unperturbed data without randomization. : unperturbed data without randomization. – – Randomized Randomized: perturbed data but without making any : perturbed data but without making any corrections for randomization. corrections for randomization.

  • Test data not randomized.

Test data not randomized.

  • Synthetic benchmark from [AGI+92].

Synthetic benchmark from [AGI+92].

  • Training set of 100,000 records, split equally

Training set of 100,000 records, split equally between the two classes. between the two classes.

slide-19
SLIDE 19

Decision Tree Experiments Decision Tree Experiments

Fn 1 Fn 2 Fn 3 Fn 4 Fn 5 50 60 70 80 90 100

Accuracy

Original Randomized Reconstructed

100% Randomization Level

slide-20
SLIDE 20

Accuracy vs. Randomization Accuracy vs. Randomization

10 20 40 60 80 100 150 200

Randomization Level

40 50 60 70 80 90 100

Accuracy

Original Randomized Reconstructed

Fn 3

slide-21
SLIDE 21

More on Randomization More on Randomization

  • Privacy

Privacy-

  • Preserving Association Rule Mining Over

Preserving Association Rule Mining Over Categorical Data Categorical Data

– – Rizvi Rizvi & & Haritsa Haritsa [VLDB 02] [VLDB 02] – Evfimievski, Srikant, Agrawal, & Gehrke [KDD-02]

Privacy Breach Control: Probabilistic limits on what

  • ne can infer with access to the randomized data

as well as mining results

– Evfimievski, Srikant, Agrawal, & Gehrke [KDD-02] – Evfimievski, Gehrke & Srikant [PODS-03]

slide-22
SLIDE 22

Related Work: Related Work: Private Distributed ID3 Private Distributed ID3

  • How to build a decision

How to build a decision-

  • tree classifier on the union of two

tree classifier on the union of two private databases ( private databases (Lindell Lindell & & Pinkas Pinkas [Crypto 2000]) [Crypto 2000])

  • Basic Idea:

Basic Idea:

  • Find attribute with highest information gain privately

Find attribute with highest information gain privately

  • Independently split on this attribute and

Independently split on this attribute and recurse recurse

  • Selecting the Split Attribute

Selecting the Split Attribute

  • Given v1 known to DB1 and v2 known to DB2, compute (v1 + v2)

Given v1 known to DB1 and v2 known to DB2, compute (v1 + v2) log (v1 + v2) and output random shares of the answer log (v1 + v2) and output random shares of the answer

  • Given random shares, use

Given random shares, use Yao's Yao's protocol protocol [FOCS 84]

[FOCS 84] to compute

to compute information gain. information gain.

  • Trade

Trade-

  • off
  • ff

+ + Accuracy Accuracy – – Performance & scaling Performance & scaling

slide-23
SLIDE 23

Related Work: Purdue Toolkit Related Work: Purdue Toolkit

  • Partitioned databases (horizontally + vertically)

Partitioned databases (horizontally + vertically)

  • Secure Building Blocks

Secure Building Blocks

  • Algorithms (using building blocks):

Algorithms (using building blocks):

– – Association rules Association rules – – EM Clustering EM Clustering

  • C. Clifton et al. Tools for Privacy Preserving Data
  • C. Clifton et al. Tools for Privacy Preserving Data
  • Mining. SIGKDD Explorations 2003.
  • Mining. SIGKDD Explorations 2003.
slide-24
SLIDE 24

Related Work: Related Work: Statistical Databases Statistical Databases

  • Provide statistical information without compromising

Provide statistical information without compromising sensitive information about individuals (AW89, sensitive information about individuals (AW89, Sho82) Sho82)

  • Techniques

Techniques – – Query Restriction Query Restriction – – Data Perturbation Data Perturbation

  • Negative Results: cannot give high quality statistics

Negative Results: cannot give high quality statistics and simultaneously prevent partial disclosure of and simultaneously prevent partial disclosure of individual information [AW89] individual information [AW89]

slide-25
SLIDE 25

Summary Summary

  • Promising technical direction & results

Promising technical direction & results

  • Much more needs to be done, e.g.

Much more needs to be done, e.g.

– – Trade off between the amount of privacy breach and Trade off between the amount of privacy breach and performance performance – – Examination of other approaches (e.g. randomization Examination of other approaches (e.g. randomization based on swapping) based on swapping)

slide-26
SLIDE 26

Outline Outline

  • Motivation

Motivation

  • Privacy Preserving Data Mining

Privacy Preserving Data Mining

  • Privacy Aware Data Management

Privacy Aware Data Management

  • Information Sharing Across Private Databases

Information Sharing Across Private Databases

  • Conclusions

Conclusions

slide-27
SLIDE 27

Hippocratic Databases Hippocratic Databases

  • Hippocratic Oath, 8 (circa 400 BC)

Hippocratic Oath, 8 (circa 400 BC)

– – What I may see or hear in the course of treatment … I will What I may see or hear in the course of treatment … I will keep to myself. keep to myself.

  • What if the database systems were to embrace the

What if the database systems were to embrace the Hippocratic Oath? Hippocratic Oath?

  • Architecture derived from privacy legislations.

Architecture derived from privacy legislations.

– – US (FIPA, 1974), Europe (OECD , 1980), Canada (1995), US (FIPA, 1974), Europe (OECD , 1980), Canada (1995), Australia (2000), Japan (2003) Australia (2000), Japan (2003)

Agrawal, Kiernan, Srikant & Xu: VLDB 2002..

slide-28
SLIDE 28

Architectural Principles Architectural Principles

  • Purpose Specification

Purpose Specification

Associate with data the Associate with data the purposes for collection purposes for collection

  • Consent

Consent

Obtain donor’s consent on the Obtain donor’s consent on the purposes purposes

  • Limited Collection

Limited Collection

Collect minimum necessary Collect minimum necessary data data

  • Limited Use

Limited Use

Run only queries that are Run only queries that are consistent with the purposes consistent with the purposes

  • Limited Disclosure

Limited Disclosure

Do not release data without Do not release data without donor’s consent donor’s consent

  • Limited Retention

Limited Retention

Do not retain data beyond Do not retain data beyond necessary necessary

  • Accuracy

Accuracy

Keep data accurate and up Keep data accurate and up-

  • to

to-

  • date

date

  • Safety

Safety

Protect against theft and other Protect against theft and other misappropriations misappropriations

  • Openness

Openness

Allow donor access to data Allow donor access to data about the donor about the donor

  • Compliance

Compliance

Verifiable compliance with the Verifiable compliance with the above principles above principles

slide-29
SLIDE 29

Architecture: Policy Architecture: Policy

Privacy Policy Privacy Metadata Creator

Store

Privacy Metadata

For each purpose & piece

  • f information (attribute):
  • External recipients
  • Retention period
  • Authorized users

Different designs possible. Converts privacy policy into privacy metadata tables.

Limited Disclosure Limited Retention

slide-30
SLIDE 30

Privacy Policies Table Privacy Policies Table

{mining} {mining} {registration} {registration} {registration} {registration} {shipping} {shipping} {shipping, charge} {shipping, charge}

Authorized Authorized-

  • users

users

10 years 10 years empty empty book book

  • rder
  • rder

recommend recommend ations ations 3 years 3 years empty empty email email customer customer register register 3 years 3 years empty empty name name customer customer register register 1 month 1 month empty empty email email customer customer purchase purchase 1 month 1 month {delivery, {delivery, credit credit-

  • card}

card} name name customer customer purchase purchase

Retention Retention External External-

  • recipients

recipients Attribute Attribute Table Table Purpose Purpose

slide-31
SLIDE 31

Architecture: Data Collection Architecture: Data Collection

Data Collection

Store

Privacy Constraint Validator Audit Info Audit Trail Privacy Metadata

Privacy policy compatible with user’s privacy preference? Audit trail for compliance.

Compliance Consent

slide-32
SLIDE 32

Architecture: Data Collection Architecture: Data Collection

Data Collection

Store

Privacy Constraint Validator Data Accuracy Analyzer Audit Info Audit Trail Privacy Metadata

Data cleansing, e.g., errors in address.

Record Access Control

Associate set of purposes with each record.

Purpose Specification Accuracy

slide-33
SLIDE 33

Architecture: Queries Architecture: Queries

Queries

Store

Attribute Access Control Privacy Metadata Record Access Control

  • 2. Query tagged

“telemarketing” cannot see credit card info.

  • 3. Telemarketing query
  • nly sees records that

include “telemarketing” in set of purposes.

Safety Limited Use

  • 1. Telemarketing

cannot issue query tagged “charge”.

Safety

slide-34
SLIDE 34

Architecture: Queries Architecture: Queries

Queries

Store

Audit Info Audit Trail Query Intrusion Detector Attribute Access Control Privacy Metadata Record Access Control

Telemarketing query that asks for all phone numbers.

  • Compliance
  • Training data for

query intrusion detector

Safety Compliance

slide-35
SLIDE 35

Architecture: Other Architecture: Other

Store

Privacy Metadata Other Data Retention Manager Encryption Support

Delete items in accordance with privacy policy. Additional security for sensitive data.

Data Collection Analyzer

Analyze queries to identify unnecessary collection, retention & authorizations.

Limited Retention Limited Collection Safety

slide-36
SLIDE 36

Architecture Architecture

Privacy Policy Data Collection Queries Privacy Metadata Creator

Store

Privacy Constraint Validator Data Accuracy Analyzer Audit Info Audit Info Audit Trail Query Intrusion Detector Attribute Access Control Privacy Metadata Other Data Retention Manager Record Access Control Encryption Support Data Collection Analyzer

slide-37
SLIDE 37

Related Work: Related Work: Statistical & Secure Databases Statistical & Secure Databases

  • Statistical Databases

Statistical Databases

– – Provide statistical information (sum, count, etc.) without Provide statistical information (sum, count, etc.) without compromising sensitive information about individuals, [AW89] compromising sensitive information about individuals, [AW89]

  • Multilevel Secure Databases

Multilevel Secure Databases

– – Multilevel relations, e.g., records tagged “secret”, “confidenti Multilevel relations, e.g., records tagged “secret”, “confidential”, al”,

  • r “unclassified”, e.g. [JS91]
  • r “unclassified”, e.g. [JS91]
  • Need to protect privacy in transactional databases that

Need to protect privacy in transactional databases that support daily operations. support daily operations.

– – Cannot restrict queries to statistical queries. Cannot restrict queries to statistical queries. – – Cannot tag all the records “top secret”. Cannot tag all the records “top secret”.

slide-38
SLIDE 38

Some Interesting Problems Some Interesting Problems

  • Privacy enforcement requires cell

Privacy enforcement requires cell-

  • level decisions (which may

level decisions (which may be different for different queries) be different for different queries) – – How to minimize the cost of privacy checking? How to minimize the cost of privacy checking?

  • Encryption to avoid data theft

Encryption to avoid data theft – – How to index encrypted data for range queries? How to index encrypted data for range queries?

  • Intrusive queries from authorized users

Intrusive queries from authorized users – – Query intrusion detection? Query intrusion detection?

  • Identifying unnecessary data collection

Identifying unnecessary data collection – – Assets info needed only if salary is below a threshold Assets info needed only if salary is below a threshold – – Queries only ask “Salary > threshold” for rent application Queries only ask “Salary > threshold” for rent application

  • Forgetting data after the purpose is fulfilled

Forgetting data after the purpose is fulfilled – – Databases designed not to lose data Databases designed not to lose data – – Interaction with compliance Interaction with compliance

Solutions must scale to database-size problems!

slide-39
SLIDE 39

Outline Outline

  • Motivation

Motivation

  • Privacy Preserving Data Mining

Privacy Preserving Data Mining

  • Privacy Aware Data Management

Privacy Aware Data Management

  • Information Sharing Across Private Databases

Information Sharing Across Private Databases

  • Conclusions

Conclusions

slide-40
SLIDE 40

Assumption: Information in each database can be Assumption: Information in each database can be freely shared. freely shared.

Today’s Information Sharing Today’s Information Sharing Systems Systems

Mediator Q R Federated Q R Centralized

slide-41
SLIDE 41

Minimal Necessary Information Minimal Necessary Information Sharing Sharing

  • Compute queries across databases so that no more

Compute queries across databases so that no more information than necessary is revealed (without information than necessary is revealed (without using a trusted third party). using a trusted third party).

  • Need is driven by several trends:

Need is driven by several trends: – – End End-

  • to

to-

  • end integration of information systems

end integration of information systems across companies. across companies. – – Simultaneously compete and cooperate. Simultaneously compete and cooperate. – – Security: need Security: need-

  • to

to-

  • know information sharing

know information sharing

  • Agrawal

Agrawal, , Evfimievski Evfimievski & & Srikant Srikant: SIGMOD 2003. : SIGMOD 2003.

slide-42
SLIDE 42

Selective Document Sharing Selective Document Sharing

  • R is shopping for

R is shopping for technology. technology.

  • S has intellectual

S has intellectual property it may want to property it may want to license. license.

  • First find the specific

First find the specific technologies where there technologies where there is a match, and then is a match, and then reveal further information reveal further information about those. about those.

R Shopping List S Technology List Example 2: Govt. agencies sharing information on a need-to-know basis.

slide-43
SLIDE 43

Medical Research Medical Research

  • Validate hypothesis

Validate hypothesis between adverse between adverse reaction to a drug and a reaction to a drug and a specific DNA sequence. specific DNA sequence.

  • Researchers should not

Researchers should not learn anything beyond 4 learn anything beyond 4 counts: counts:

Mayo Clinic DNA Sequences Drug Reactions ? ? ? ? Sequence Absent Sequence Absent ? ? ? ? Sequence Present Sequence Present No Adv. Reaction No Adv. Reaction Adverse Reaction Adverse Reaction

slide-44
SLIDE 44

R S

  • R must not

know that S has b & y

  • S must not

know that R has a & x

v v u u

R S

x x v v u u a a y y v v u u b b

R S

Count (R S)

  • R & S do not learn

anything except that the result is 2.

Minimal Necessary Sharing Minimal Necessary Sharing

slide-45
SLIDE 45

Problem Statement: Problem Statement: Minimal Sharing Minimal Sharing

  • Given:

Given: – – Two parties (honest Two parties (honest-

  • but

but-

  • curious): R (receiver) and S

curious): R (receiver) and S (sender) (sender) – – Query Q spanning the tables R and S Query Q spanning the tables R and S – – Additional (pre Additional (pre-

  • specified) categories of information I

specified) categories of information I

  • Compute the answer to Q and return it to R without revealing

Compute the answer to Q and return it to R without revealing any additional information to either party, any additional information to either party, except for the except for the information contained in I information contained in I

– – For intersection, intersection size & For intersection, intersection size & equijoin equijoin, , I = { |R| , |S| } I = { |R| , |S| } – – For For equijoin equijoin size, I also includes the distribution of duplicates & size, I also includes the distribution of duplicates & some subset of information in R some subset of information in R S S

slide-46
SLIDE 46

A Possible Approach A Possible Approach

  • Secure Multi

Secure Multi-

  • Party Computation

Party Computation

– – Given two parties with inputs x and y, compute Given two parties with inputs x and y, compute f(x,y f(x,y) such ) such that the parties learn only that the parties learn only f(x,y f(x,y) and nothing else. ) and nothing else. – – Can be solved by building a combinatorial circuit, and Can be solved by building a combinatorial circuit, and simulating that circuit [Yao86]. simulating that circuit [Yao86].

  • Prohibitive cost for database

Prohibitive cost for database-

  • size problems.

size problems.

– – Intersection of two relations of a million records each Intersection of two relations of a million records each would require 144 days would require 144 days

slide-47
SLIDE 47

Intersection Protocol: Intuition Intersection Protocol: Intuition

  • Want to encrypt the value in R and S and compare

Want to encrypt the value in R and S and compare the encrypted values. the encrypted values.

  • However, want an encryption function such that it

However, want an encryption function such that it can only be jointly computed by R and S, not can only be jointly computed by R and S, not separately. separately.

slide-48
SLIDE 48

Commutative Encryption Commutative Encryption

Commutative encryption F is a computable function Commutative encryption F is a computable function f : Key F X Dom F f : Key F X Dom F -

  • > Dom F, satisfying:

> Dom F, satisfying:

– – For all e, e’ For all e, e’

Key F,

Key F, f fe

e o

  • f

fe

e’ ’ =

= f fe

e’ ’ o

  • f

fe

e

(The result of encryption with two different keys is the same, (The result of encryption with two different keys is the same, irrespective of the order of encryption) irrespective of the order of encryption) – – Each Each f fe

e is a

is a bijection bijection. . (Two different values will have different encrypted values) (Two different values will have different encrypted values) – – The distribution of <x, The distribution of <x, f fe

e(x

(x), y, ), y, f fe

e(y

(y)> is indistinguishable from the )> is indistinguishable from the distribution of <x, distribution of <x, f fe

e(x

(x), y, z>; x, y, z ), y, z>; x, y, z

r

r Dom F and e

Dom F and e

r

r Key F.

Key F. (Given a value x and its encryption (Given a value x and its encryption f fe

e(x

(x), for a new value y, we ), for a new value y, we cannot distinguish between cannot distinguish between f fe

e(y

(y) and a random value z. Thus we ) and a random value z. Thus we cannot encrypt y nor decrypt cannot encrypt y nor decrypt f fe

e(y

(y).) ).)

slide-49
SLIDE 49

Example Commutative Example Commutative Encryption Encryption

  • f

fe

e(x

(x) = ) = x xe

e mod p

mod p where where – – p: safe prime number, i.e., both p and q=(p p: safe prime number, i.e., both p and q=(p-

  • 1)/2

1)/2 are primes are primes – – encryption key e encryption key e 1, 2, …, q 1, 2, …, q-

  • 1

1 – – Dom F: all quadratic residues modulo p Dom F: all quadratic residues modulo p

  • Commutativity

Commutativity: powers commute : powers commute

( (x xd

d mod

mod p) p)e

e mod p =

mod p = x xde

de mod p = (

mod p = (x xe

e mod

mod p) p)d

d mod p

mod p

  • Indistinguishability

Indistinguishability follows from Decisional follows from Decisional Diffie Diffie-

  • Hellman

Hellman Hypothesis (DDH) Hypothesis (DDH)

slide-50
SLIDE 50

Intersection Protocol Intersection Protocol

R S

R S Secret key r s fs(S ) We apply fs on h(S), where h is a hash function, not directly

  • n S.

Shorthand for { fs(x) | x S }

slide-51
SLIDE 51

R

Intersection Protocol Intersection Protocol

S

R S fs(S) fs(S ) fr(fs(S )) r s fs(fr(S )) Commutative property

slide-52
SLIDE 52

R

Intersection Protocol Intersection Protocol

S

R S fr(R ) fr(R ) fs(fr(S )) <y, fs(y)> for y fr(R) r s <x, fs(fr(x))> for x R <y, fs(y)> for y fr(R) Since R knows <x, y=fr(x)>

slide-53
SLIDE 53

Intersection Size Protocol Intersection Size Protocol

R S

R S fr(R ) fs(S ) fs(S ) fr(R ) fr(fs(S )) r s fs(fr(R )) fr(fs(R)) R cannot map z fr(fs(R)) back to x R. Not <y, fs(y)> for y fr(R)

slide-54
SLIDE 54

Equi Equi Join and Join Size Join and Join Size

  • See Sigmod03 paper

See Sigmod03 paper

  • Also gives the cost analysis of protocols

Also gives the cost analysis of protocols

slide-55
SLIDE 55

Related Work Related Work

  • [NP99]: Protocols for list intersection problem

[NP99]: Protocols for list intersection problem

– – Oblivious evaluation of n polynomials of degree n each. Oblivious evaluation of n polynomials of degree n each. – – Oblivious evaluation of n Oblivious evaluation of n2

2 polynomials.

polynomials.

  • [HFH99]: find people with common preferences,

[HFH99]: find people with common preferences, without revealing the preferences. without revealing the preferences.

– – Intersection protocols are similar to ours, but do not Intersection protocols are similar to ours, but do not provide proofs of security. provide proofs of security.

slide-56
SLIDE 56

Challenges Challenges

  • Models of minimal disclosure and corresponding

Models of minimal disclosure and corresponding protocols for protocols for

– – other database operations

  • ther database operations

– – combination of operations combination of operations

  • Faster protocols

Faster protocols

  • Tradeoff between efficiency and

Tradeoff between efficiency and

– – the additional information disclosed the additional information disclosed – – approximation approximation

slide-57
SLIDE 57

Closing Thoughts Closing Thoughts

  • Solutions to complex problems such as privacy

Solutions to complex problems such as privacy require a mix of legislations, societal norms, require a mix of legislations, societal norms, market forces & technology market forces & technology

  • By advancing technology, we can change the mix

By advancing technology, we can change the mix and improve the overall quality of the solution and improve the overall quality of the solution

  • Gold mine of challenging research problems

Gold mine of challenging research problems (besides being useful)! (besides being useful)!

slide-58
SLIDE 58

References References

http:// http://www.almaden.ibm.com www.almaden.ibm.com/software/quest/ /software/quest/

  • M.
  • M. Bawa

Bawa, R. , R. Bayardo Bayardo, R. , R. Agrawal Agrawal. . Privacy Privacy-

  • preserving indexing of Documents on the

preserving indexing of Documents on the Network

  • Network. 29th Int'l Conf. on Very Large Databases (VLDB), Berlin, Sept.

. 29th Int'l Conf. on Very Large Databases (VLDB), Berlin, Sept. 2003. 2003.

  • R.
  • R. Agrawal

Agrawal, A. , A. Evfimievski Evfimievski, R. , R. Srikant Srikant. . Information Sharing Across Private Databases Information Sharing Across Private Databases. . ACM Int’l Conf. On Management of Data (SIGMOD), San Diego, Calif ACM Int’l Conf. On Management of Data (SIGMOD), San Diego, California, June 2003.

  • rnia, June 2003.
  • A.
  • A. Evfimievski

Evfimievski, J. , J. Gehrke Gehrke, R. , R. Srikant Srikant. . Liming Privacy Breaches in Liming Privacy Breaches in Privacy Preserving Privacy Preserving Data Mining Data Mining. PODS, San Diego, California, June 2003. . PODS, San Diego, California, June 2003.

  • R.
  • R. Agrawal

Agrawal, J. Kiernan, R. , J. Kiernan, R. Srikant Srikant, Y. , Y. Xu Xu. . An An Xpath Xpath Based Preference Language for Based Preference Language for P3P P3P. . 12th Int'l World Wide Web Conf. (WWW), Budapest, Hungary, May 20 12th Int'l World Wide Web Conf. (WWW), Budapest, Hungary, May 2003. 03.

  • R.
  • R. Agrawal

Agrawal, J. Kiernan, R. , J. Kiernan, R. Srikant Srikant, Y. , Y. Xu Xu. . Implementing P3P Using Database Implementing P3P Using Database Technology Technology. . 19th Int'l Conf.on Data Engineering(ICDE), Bangalore, India, Mar 19th Int'l Conf.on Data Engineering(ICDE), Bangalore, India, March 2003. ch 2003.

  • R.
  • R. Agrawal

Agrawal, J. Kiernan, R. , J. Kiernan, R. Srikant Srikant, Y. , Y. Xu Xu. . Server Centric P3P. Server Centric P3P. W3C Workshop on the W3C Workshop on the Future of P3P, Dulles, Virginia, Nov. 2002. Future of P3P, Dulles, Virginia, Nov. 2002.

  • R.
  • R. Agrawal

Agrawal, J. Kiernan, R. , J. Kiernan, R. Srikant Srikant, Y. , Y. Xu Xu. . Hippocratic Databases Hippocratic Databases. . 28th Int'l Conf. on Very 28th Int'l Conf. on Very Large Databases (VLDB), Hong Kong, August 2002. Large Databases (VLDB), Hong Kong, August 2002.

  • R.
  • R. Agrawal

Agrawal, J. Kiernan. , J. Kiernan. Watermarking Relational Databases Watermarking Relational Databases. . 28th Int'l Conf. on Very 28th Int'l Conf. on Very Large Databases (VLDB), Hong Kong, August 2002. Expanded version Large Databases (VLDB), Hong Kong, August 2002. Expanded version in VLDB Journal in VLDB Journal 2003. 2003.

  • A.
  • A. Evfimievski

Evfimievski, R. , R. Srikant Srikant, R. , R. Agrawal Agrawal, J. , J. Gehrke Gehrke. . Mining Association Rules Over Privacy Mining Association Rules Over Privacy Preserving Data Preserving Data. . 8th Int'l Conf. on Knowledge Discovery in Databases and Data Min 8th Int'l Conf. on Knowledge Discovery in Databases and Data Mining ing (KDD), Edmonton, Canada, July 2002 (KDD), Edmonton, Canada, July 2002. .

  • R.
  • R. Agrawal

Agrawal, R. , R. Srikant Srikant. . Privacy Preserving Data Mining Privacy Preserving Data Mining. ACM Int’l Conf. On . ACM Int’l Conf. On Management of Data (SIGMOD), Dallas, Texas, May 2000. Management of Data (SIGMOD), Dallas, Texas, May 2000.