What% is %(and% is#not )%Privacy?% CompSci#590.03# - - PowerPoint PPT Presentation

what is and is not privacy
SMART_READER_LITE
LIVE PREVIEW

What% is %(and% is#not )%Privacy?% CompSci#590.03# - - PowerPoint PPT Presentation

What% is %(and% is#not )%Privacy?% CompSci#590.03# Instructor:#Ashwin#Machanavajjhala# Lecture%3%:%590.03%Fall%16% 1% Outline%of%lecture% Recap:%DifferenEal%Privacy% Exercise:%DifferenEally%Private%KJmeans%Clustering% Consistency%


slide-1
SLIDE 1

What%is%(and%is#not)%Privacy?%

CompSci#590.03# Instructor:#Ashwin#Machanavajjhala#

1% Lecture%3%:%590.03%Fall%16%

slide-2
SLIDE 2

Outline%of%lecture%

  • Recap:%DifferenEal%Privacy%
  • Exercise:%DifferenEally%Private%KJmeans%Clustering%

– Consistency%

  • Privacy%Problem%Statement%
  • What%privacy%is%not#…#
  • What%is%privacy?%

%

Lecture%3%:%590.03%Fall%16% 2%

slide-3
SLIDE 3

DifferenEal%Privacy%

For%every%output%…% O" D2" D1" Adversary%should%not%be%able%to%disEnguish% between%any%D1%and%D2%based%on%any%O% ! ! !Pr[A(D1)!=!O]!!!! ! !Pr[A(D2)!=!O]!!!!!!!!!!!!!!!!.! For%every%pair%of%inputs% that%differ%in%one%row" !!<!!ε!!!(ε>0)!

log%

[Dwork!ICALP!2006]!

Lecture%3%:%590.03%Fall%16% 3%

slide-4
SLIDE 4

Privacy%Parameter%ε%

D2" D1" For%every%pair%of%inputs% that%differ%in%one%row" Pr[A(D1) = O] ≤ e Pr[A(D2) = O] For%every%output%…% O" Controls the degree to which D1 and D2 can be distinguished. Smaller the more the privacy (and better the utility)

Lecture%3%:%590.03%Fall%16% 4%

slide-5
SLIDE 5

Laplace%Mechanism%

0! 0.2! 0.4! 0.6! ;10! ;8! ;6! ;4! ;2! 0! 2! 4! 6! 8! 10!

Laplace!DistribuGon!–!Lap(λ)!

Database!

Researcher!

Query!q!

True!answer!

q(D)! q(D)!+!η! η!

h(η)%α%exp(Jη%/%λ)%

Privacy%depends%on% the%λ%parameter% Mean:%0,%% Variance:%2%λ2%

5% Lecture%3%:%590.03%Fall%16%

slide-6
SLIDE 6

How%much%noise%for%privacy?%

% SensiGvity:%Consider%a%query%q:%I%!%R.%S(q)%is%the%smallest%number% s.t.%for%any%neighboring%tables%D,%D’,%% |%q(D)%–%q(D’)%|%%≤%%S(q)%% % % Thm:%If%sensiGvity!of%the%query%is%S,%then%the%following%guarantees%εJ differenEal%privacy.%%

λ%=%S/ε%

6% Lecture%3%:%590.03%Fall%16%

[Dwork%et%al.,%TCC%2006]%

slide-7
SLIDE 7

SequenEal%ComposiEon%

  • If%M1,%M2,%...,%Mk#are%algorithms%that%access%a%private%database%D%

such%that%each%Mi##saEsfies%εi#JdifferenEal%privacy,%% % then%the%combinaEon%of%their%outputs%saEsfies%% εJdifferenEal%privacy%withε=ε1+...+εk%%

Lecture%3%:%590.03%Fall%16% 7%

slide-8
SLIDE 8

Parallel%ComposiEon%

  • If%M1,%M2,%...,%Mk#are%algorithms%that%access%disjoint%databases%D1,%

D2,%…,%Dk%such%that%each%Mi##saEsfies%εi#JdifferenEal%privacy,%% % then%the%combinaEon%of%their%outputs%saEsfies%% εJdifferenEal%privacy%with%ε=%max{ε1,...,εk}%

Lecture%3%:%590.03%Fall%16% 8%

slide-9
SLIDE 9

Postprocessing%

  • If%M1%is%an%εdifferenEally%private%algorithm%that%accesses%a%private%

database%D,%% % then%outpupng%M2(M1(D))%also%saEsfies%εJdifferenEal%privacy.%

Lecture%3%:%590.03%Fall%16% 9%

slide-10
SLIDE 10

Outline%of%lecture%

  • Recap:%DifferenEal%Privacy%
  • Exercise:%DifferenEally%Private%KJmeans%Clustering%

– Consistency%

  • Privacy%Problem%Statement%
  • What%privacy%is%not#…#
  • What%is%privacy?%

%

Lecture%3%:%590.03%Fall%16% 10%

slide-11
SLIDE 11

Case%Study:%KJmeans%Clustering%

Lecture%3%:%590.03%Fall%16% 11%

slide-12
SLIDE 12

Kmeans%

  • ParEEon%a%set%of%points%x1,%x2,%…,%xn%into%k%clusters%S1,%S2,%…,%Sk%such%

that%the%following%is%minimized:%%

Lecture%3%:%590.03%Fall%16% 12%

!! − !!

! ! !!∈!! ! !!!

!

!

Mean%of%the%cluster%Si%

slide-13
SLIDE 13

Kmeans%

Algorithm:%%

  • IniEalize%a%set%of%k%centers%
  • Repeat%

%Assign%each%point%to%its%nearest%center% %Recompute%the%set%of%centers% UnEl%convergence%…%%

  • Output%final%set%of%k%centers%

Tutorial:%DifferenEal%Privacy%in%the% Wild% 13%

Module%2%

slide-14
SLIDE 14

DifferenEally%Private%Kmeans%

  • Suppose%we%fix%the%number%of%iteraEons%to%T%

%

  • In%each%iteraEon%(given%a%set%of%centers):%%

% %1.%Assign%the%points%to%the%new%center%to%form%clusters% % %2.%Noisily%compute%the%size%of%each%cluster%% % % %3.%Compute%noisy%sums%of%points%in%each%cluster% % %

Tutorial:%DifferenEal%Privacy%in%the% Wild% 14%

Module%2%

[BDMN%05]%

slide-15
SLIDE 15

DifferenEally%Private%Kmeans%

  • Suppose%we%fix%the%number%of%iteraEons%to%T%

%

  • In%each%iteraEon%(given%a%set%of%centers):%%

% %1.%Assign%the%points%to%the%new%center%to%form%clusters% % %2.%Noisily%compute%the%size%of%each%cluster%% % % %3.%Compute%noisy%sums%of%points%in%each%cluster% % %

Tutorial:%DifferenEal%Privacy%in%the% Wild% 15%

Module%2%

Each%iteraEon%uses%ε/T%privacy%budget,%total%privacy%loss%is%ε%

slide-16
SLIDE 16

DifferenEally%Private%Kmeans%

  • Suppose%we%fix%the%number%of%iteraEons%to%T%

%

  • In%each%iteraEon%(given%a%set%of%centers):%%

% %1.%Assign%the%points%to%the%new%center%to%form%clusters% % %2.%Noisily%compute%the%size%of%each%cluster%% % % %3.%Compute%noisy%sums%of%points%in%each%cluster% % %

Tutorial:%DifferenEal%Privacy%in%the% Wild% 16%

Module%2%

Exercise:%Which%of%these%steps%expends%privacy%budget?%%

slide-17
SLIDE 17

DifferenEally%Private%Kmeans%

  • Suppose%we%fix%the%number%of%iteraEons%to%T%

%

  • In%each%iteraEon%(given%a%set%of%centers):%%

% %1.%Assign%the%points%to%the%new%center%to%form%clusters% % %2.%Noisily%compute%the%size%of%each%cluster%% % %3.%Compute%noisy%sums%of%points%in%each%cluster% % %

Tutorial:%DifferenEal%Privacy%in%the% Wild% 17%

Module%2%

Exercise:%Which%of%these%steps%expends%privacy%budget?%% NO% YES% YES%

slide-18
SLIDE 18

DifferenEally%Private%Kmeans%

  • Suppose%we%fix%the%number%of%iteraEons%to%T%

%

  • In%each%iteraEon%(given%a%set%of%centers):%%

% %1.%Assign%the%points%to%the%new%center%to%form%clusters% % %2.%Noisily%compute%the%size%of%each%cluster%% % %3.%Compute%noisy%sums%of%points%in%each%cluster% % %

Tutorial:%DifferenEal%Privacy%in%the% Wild% 18%

Module%2%

1% Domain% size% What%is%the%sensiEvity?%%

slide-19
SLIDE 19

DifferenEally%Private%Kmeans%

  • Suppose%we%fix%the%number%of%iteraEons%to%T%

%

  • In%each%iteraEon%(given%a%set%of%centers):%%

% %1.%Assign%the%points%to%the%new%center%to%form%clusters% % %2.%Noisily%compute%the%size%of%each%cluster%% % %3.%Compute%noisy%sums%of%points%in%each%cluster% % %

Tutorial:%DifferenEal%Privacy%in%the% Wild% 19%

Module%2%

Each%iteraEon%uses%ε/T%privacy%budget,%total%privacy%loss%is%ε%

Laplace(2T/ε)% Laplace(2T%|dom|/ε)%

slide-20
SLIDE 20

Results%(T%=%10%iteraEons,%random%iniEalizaEon)%

Tutorial:%DifferenEal%Privacy%in%the% Wild% 20%

  • Original!Kmeans!algorithm!!

Laplace!Kmeans!algorithm!!

  • Even%though%we%noisily%compute%centers,%Laplace%kmeans%can%disEnguish%

clusters%that%are%far%apart.% %%

  • Since%we%add%noise%to%the%sums%with%sensiEvity%proporEonal%to%|dom|,%

Laplace%kJmeans%can’t%disEnguish%small%clusters%that%are%close%by.% Module%2%

slide-21
SLIDE 21

Consistency%

Lecture%3%:%590.03%Fall%16% 21%

slide-22
SLIDE 22

Outline%of%lecture%

  • Recap:%DifferenEal%Privacy%
  • Exercise:%DifferenEally%Private%KJmeans%Clustering%

– Consistency%

  • Privacy%Problem%Statement%
  • What%privacy%is%not#…#
  • What%is%privacy?%

%

Lecture%3%:%590.03%Fall%16% 22%

slide-23
SLIDE 23

Lecture%3%:%590.03%Fall%16% 23%

Google%

DB"

Person%1%

r1"

Person%2%

r2"

Person%3%

r3"

Person%N%

rN%

Census%

DB"

Hospital%

DB"

Doctors% Medical% Researchers% Economists% InformaEon% Retrieval% Researchers% RecommenJ% daEon% Algorithms%

StaEsEcal%Databases%

Individuals!with! !sensiGve!data! Data!Collectors! Data!Analysts!

slide-24
SLIDE 24

Lecture%3%:%590.03%Fall%16%

24%

Person%1% r1" Person%2% r2" Person%3% r3" Person%N! rN"

Server%

DB"

StaEsEcal%Database%Privacy%%

FuncEon%provided% by%the%analyst%

Output%can%disclose% sensiEve%informaEon% about%individuals%

slide-25
SLIDE 25

Lecture%3%:%590.03%Fall%16%

25%

Person%1% r1" Person%2% r2" Person%3% r3" Person%N! rN"

Server%

DB"

StaEsEcal%Database%Privacy%%

!

!"#$%&'(!", !)!! Privacy%for%individuals%

(controlled%by%a%parameter% ε)%%%%

slide-26
SLIDE 26

Lecture%3%:%590.03%Fall%16%

26%

Person%1% r1" Person%2% r2" Person%3% r3" Person%N! rN"

Server%

DB"

StaEsEcal%Database%Privacy%%

!

!"#$%&'(!", !)!! UElity%for%analyst%

%%% %

slide-27
SLIDE 27

Lecture%3%:%590.03%Fall%16%

27%

Person%1% r1" Person%2% r2" Person%3% r3" Person%N! rN"

Server%

DB"

%f#(%%%%%%)%

StaEsEcal%Database%Privacy%% (untrusted%collector)%

Individuals%do%not%want% server%to%infer%their% records%

Server%wants%to% compute%f%

slide-28
SLIDE 28

Lecture%3%:%590.03%Fall%16%

28%

Person%1% r1" Person%2% r2" Person%3% r3" Person%N! rN"

Server%

DB*"

%f#(%%%%%%)%

StaEsEcal%Database%Privacy%% (untrusted%collector)%

Perturb%records%to% ensure%privacy%for% individuals%and% UElity%for%server%

slide-29
SLIDE 29

StaEsEcal%Databases%in%realJworld%applicaEons%

Lecture%3%:%590.03%Fall%16% 29%

ApplicaGon! Data!Collector! Private! InformaGon! Analyst! FuncGon!(uGlity)! Medical% Hospital% Disease% Epidemiologist% CorrelaEon%between% disease%and% geography% Genome% analysis% Hospital% Genome% StaEsEcian/% Researcher% CorrelaEon%between% genome%and%%disease% AdverEsing% Google/FB/Y!% Clicks/ Browsing% AdverEser% Number%of%clicks%on% an%ad%by%age/region/ gender%…% Social% RecommenJ daEons% Facebook% Friend%links%/% profile% Another%user% Recommend%other% users%or%ads%to%users% based%on%social% network%

slide-30
SLIDE 30

StaEsEcal%Databases%in%realJworld%applicaEons%

  • Sepngs%where%data%collector%may%not%be%trusted%

Lecture%3%:%590.03%Fall%16% 30%

ApplicaGon! Data!Collector! Private! InformaGon! FuncGon!(uGlity)! LocaEon% Services% Verizon/AT&T% LocaEon% Traffic%predicEon% RecommenJ daEons% Amazon/Google% Purchase% history% RecommendaEon% model% Traffic% Shaping% Internet%Service% Provider% Browsing% history% Traffic%pazern%of% groups%of%users%

slide-31
SLIDE 31

Privacy%is%not#…%

Lecture%3%:%590.03%Fall%16% 31%

slide-32
SLIDE 32

StaEsEcal%Database%Privacy%is%not%…%

  • EncrypEon:%

%

Lecture%3%:%590.03%Fall%16% 32%

slide-33
SLIDE 33

StaEsEcal%Database%Privacy%is%not%…%

  • EncrypEon:%

Alice%sends%a%message%to%Bob%such%that%Trudy%(azacker)%does%not% learn%the%message.%Bob%should%get%the%correct%message%…%

  • StaEsEcal%Database%Privacy:%

Bob%(azacker)%can%access%a%database%% %J%Bob%must%learn%aggregate%staEsEcs,%but%% %J%Bob%must%not%learn%new%informaEon%about%individuals%in% database.%%

Lecture%3%:%590.03%Fall%16% 33%

slide-34
SLIDE 34

StaEsEcal%Database%Privacy%is%not%…%

  • ComputaEon%on%Encrypted%Data:%%

%

Lecture%3%:%590.03%Fall%16% 34%

slide-35
SLIDE 35

StaEsEcal%Database%Privacy%is%not%…%

  • ComputaEon%on%Encrypted%Data:%%

J%Alice%stores%encrypted%data%on%a%server%controlled%by%Bob% (azacker).%% J%Server%returns%correct%query%answers%to%Alice,%without%Bob% learning%anything%about%the%data.%% %

  • StaEsEcal%Database%Privacy:%

J%Bob%is%allowed%to%learn%aggregate%properEes%of%the%database.%

Lecture%3%:%590.03%Fall%16% 35%

slide-36
SLIDE 36

StaEsEcal%Database%Privacy%is%not%…%

  • The%Millionaires%Problem:%

%

Lecture%3%:%590.03%Fall%16% 36%

slide-37
SLIDE 37

StaEsEcal%Database%Privacy%is%not%…%

  • Secure%MulEparty%ComputaEon:%

J%A%set%of%agents%each%having%a%private%input%xi%…% J%…%Want%to%compute%a%funcEon%f(x1,%x2,%…,%xk)% J%Each%agent%can%learn%the%true%answer,%but%must%learn%no%other% informaEon%than%what%can%be%inferred%from%their%private%input% and%the%answer.% %

  • StaEsEcal%Database%Privacy:%

J%FuncEon%output%must#not#disclose%individual%inputs.%

Lecture%3%:%590.03%Fall%16% 37%

slide-38
SLIDE 38

StaEsEcal%Database%Privacy%is%not%…%

  • Access%Control:%

%

Lecture%3%:%590.03%Fall%16% 38%

slide-39
SLIDE 39

StaEsEcal%Database%Privacy%is%not%…%

  • Access%Control:%

J%A%set%of%agents%want%to%access%a%set%of%resources%(could%be%files%

  • r%records%in%a%database)%

J%Access%control%rules%specify%who%is%allowed%to%access%(or#not# access)%certain%resources.% J%‘Not%access’%usually%means%no%informaEon%must%be%disclosed%

  • StaEsEcal%Database:%

J%A%single%database%and%a%single%agent% J%Want%to%release%aggregate%staEsEcs%about%a%set%of%records% without%allowing%access%to%individual%records%%%%

Lecture%3%:%590.03%Fall%16% 39%

slide-40
SLIDE 40

Privacy%Problems%

  • In%todays%cloud%context%a%number%of%privacy%problems%arise:%%

– EncrypEon%when%communicaEng%data%across%a%unsecure%channel% – Secure%MulEparty%ComputaEon%when%different%parEes%want%to%compute%

  • n%a%funcEon%on%their%private%data%without%using%a%centralized%third%party%

– CompuEng%on%encrypted%data%when%one%wants%to%use%an%unsecure%cloud% for%computaEon% – Access%control%when%different%users%own%different%parts%of%the%data%

  • StaEsEcal%Database%Privacy:%%

QuanEfying%(and%bounding)%the%amount%of%informaEon%disclosed% about%individual%records%by%the%output%of%a%valid%computaEon.%%

Lecture%3%:%590.03%Fall%16% 40%

slide-41
SLIDE 41

What%is%privacy?%

Lecture%3%:%590.03%Fall%16% 41%

slide-42
SLIDE 42

The%Massachusezs%Governor%% Privacy%Breach%

Lecture%3%:%590.03%Fall%16% 42%

  • Name%
  • SSN%
  • Visit%Date%
  • Diagnosis%
  • Procedure%
  • MedicaEon%
  • Total%Charge%

Medical!Data! Release!

  • Zip
  • Birth

date

  • Sex

[S%02]%

slide-43
SLIDE 43

The%Massachusezs%Governor%% Privacy%Breach%

Lecture%3%:%590.03%Fall%16% 43%

  • Name%
  • SSN%
  • Visit%Date%
  • Diagnosis%
  • Procedure%
  • MedicaEon%
  • Total%Charge%
  • Name%
  • Address%
  • Date%%

%%%Registered%

  • Party%%

%%%affiliaEon %%

  • Date%last%

%%%voted%

  • Zip
  • Birth

date

  • Sex

Medical!Data! Release! Voter!List! [S%02]%

slide-44
SLIDE 44

Linkage%Azack%

Lecture%3%:%590.03%Fall%16% 44%

  • Name%
  • SSN%
  • Visit%Date%
  • Diagnosis%
  • Procedure%
  • MedicaEon%
  • Total%Charge%
  • Name%
  • Address%
  • Date%%

%%%Registered%

  • Party%%

%%%affiliaEon %%

  • Date%last%

%%%voted%

  • Zip
  • Birth

date

  • Sex

Medical!Data! Release! Voter!List!

  • %Governor%of%MA%

%%%!uniquely!idenGfied% %%%%using%ZipCode,%% %%%%Birth%Date,%and%Sex.% %%%%% Name!linked!to!Diagnosis! %

[S%02]%

slide-45
SLIDE 45

Linkage%Azack%

Lecture%3%:%590.03%Fall%16% 45%

  • Name%
  • SSN%
  • Visit%Date%
  • Diagnosis%
  • Procedure%
  • MedicaEon%
  • Total%Charge%
  • Name%
  • Address%
  • Date%%

%%%Registered%

  • Party%%

%%%affiliaEon %%

  • Date%last%

%%%voted%

  • Zip
  • Birth

date

  • Sex

Medical!Data! Release! Voter!List!

  • %Governor%of%MA%

%%%!uniquely!idenGfied% %%%%using%ZipCode,%% %%%%Birth%Date,%and%Sex.% %%%%% ! % Quasi!IdenGfier! 87%%%of%US%populaEon%

[S%02]%

slide-46
SLIDE 46

Privacy%Breach:%Informal%DefiniEon%

A%privacy%mechanism%M(D)%% that%allows%% an%unauthorized%party%%%% to%learn%sensiEve%informaEon%about%any%individual%in%D,%% % which%%%%%%%%%%%%could%not%have%learnt%without%access%to% M(D).%%

Lecture%3%:%590.03%Fall%16%

46%

slide-47
SLIDE 47

Lecture%3%:%590.03%Fall%16% 47%

Alice%

Alice%has% %Cancer% Is#this#a#privacy#breach?# NO#

slide-48
SLIDE 48

Privacy%Breach:%Revised%DefiniEon%

A%privacy%mechanism%M(D)%that%allows%% an%unauthorized%party%%%% to%learn%sensiEve%informaEon%about%% any%individual%Alice%in%D,%% % which%%%%%%%%%%%%could%not%have%learnt%without%access%to% M(D)% if%Alice%was%not#in#the#dataset.%%

Lecture%3%:%590.03%Fall%16%

48%

slide-49
SLIDE 49

KJAnonymity:%Avoiding%Linkage%Azacks%

  • If%every%row%corresponds%to%one%individual%…%

% …%every%row%should%look%like%kJ1%other%rows%based%on%the%quasiK idenLfier%azributes%

Lecture%3%:%590.03%Fall%16% 49%

[S%02]%

slide-50
SLIDE 50

KJAnonymity%

50%

Zip Age Nationality

Disease 13053 28 Russian Heart 13068 29 American Heart 13068 21 Japanese Cancer 13053 23 American Cancer 14853 50 Indian Cancer 14853 55 Russian Heart 14850 47 American Flu 14850 59 American Flu 13053 31 American Cancer 13053 37 Indian Cancer 13068 36 Japanese Cancer 13068 32 American Cancer

Zip Age Nationality

Disease 130** <30 * Heart 130** <30 * Heart 130** <30 * Cancer 130** <30 * Cancer 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer

Lecture%3%:%590.03%Fall%16%

slide-51
SLIDE 51

Problem:%Background%knowledge%

51%

Zip Age Nationality

Disease 130** <30 * Heart 130** <30 * Heart 130** <30 * Cancer 130** <30 * Cancer 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer

%%%Adversary%knows% prior%knowledge% about%Umeko%

%

Adversary%learns% Umeko%has%Cancer%%

Name Zip Age

Nat. Umeko 13053 25 Japan

[MKGV%06]%

Lecture%3%:%590.03%Fall%16%

slide-52
SLIDE 52

A%privacy%mechanism%must%be%able%to% protect%individuals’%privacy%%from%azackers% who%may%possess%% background!knowledge!

Lecture%3%:%590.03%Fall%16% 52%

slide-53
SLIDE 53

Healthcare%Cost%and%UElizaEon%Project%

Lecture%3%:%590.03%Fall%16% 53%

slide-54
SLIDE 54

#Hospital%discharges%in%NJ%of%ovarian%cancer% paEents,%2009%%

Age #disc harge s White Black Hispani c Asian/ Pcf Hlnder Native American Other Missing #dischar ges 735 535 82 58 18 * 19 22 1-17 * * * * * * * * 18-44 70 40 13 * * * * * 45-64 330 236 31 32 * * 11 * 65-84 298 229 35 13 * * * * 85+ 34 29 * * * * * *

Counts%less%than%k%are% suppressed%achieving%kJ anonymity%

Lecture%3%:%590.03%Fall%16% 54%

slide-55
SLIDE 55

#Hospital%discharges%in%NJ%of%ovarian%cancer% paEents,%2009%%

Age #disc harge s White Black Hispani c Asian/ Pcf Hlnder Native American Other Missing #dischar ges 735 535 82 58 18

1

19 22 1-17

3 1

* * * * * * 18-44 70 40 13 * * * * * 45-64 330 236 31 32 * * 11 * 65-84 298 229 35 13 * * * * 85+ 34 29 * * * * * *

=%535%–% (40+236+229+29)%

Lecture%3%:%590.03%Fall%16% 55%

slide-56
SLIDE 56

#Hospital%discharges%in%NJ%of%ovarian%cancer% paEents,%2009%%

Age #disc harge s White Black Hispani c Asian/ Pcf Hlnder Native American Other Missing #dischar ges 735 535 82 58 18

1

19 22 1-17

3 1 [0-2] [0-2] % [0-2] % [0-2] % [0-2] % [0-2] %

18-44 70 40 13 * * * * * 45-64 330 236 31 32 * * 11 * 65-84 298 229 35 13 * * * * 85+ 34 29 * * * * * *

Lecture%3%:%590.03%Fall%16% 56%

slide-57
SLIDE 57

#Hospital%discharges%in%NJ%of%ovarian%cancer% paEents,%2009%%

Age #disc harge s White Black Hispani c Asian/ Pcf Hlnder Native American Other Missing #dischar ges 735 535 82 58 18

1

19 22 1-17

3 1 [0-2] [0-2] % [0-2] % [0-2] % [0-2] % [0-2] %

18-44 70 40 13 * * * * * 45-64 330 236 31 32 * * 11 * 65-84 298 229 35 13 * * * * 85+ 34 29

[1-3]

* * * * *

Lecture%3%:%590.03%Fall%16% 57%

slide-58
SLIDE 58

Can%reconstruct%Eght%bounds%on%rest%of%data%

Age #disch arges White Black Hispanic Asian/ Pcf Hlnder Native American Other Missing #dischar ges 735 535 82 58 18

1

19 22 1-17

3 1 [0-2] [0-2] [0-1] [0] [0-1] [0-1]

18-44 70 40 13

[9-10] [0-6] [0] [0-6] [1-8]

45-64 330 236 31 32

[10] [0]

11

[10]

65-84 298 229 35 13

[2-8] [1] [2-8] [4-10]

85+ 34 29

[1-3] [1-4] [0-1] [0] [0-1] [0-1]

[VSJO%13]%

Lecture%3%:%590.03%Fall%16% 58%

slide-59
SLIDE 59

MulEple%Release%problem%

  • Privacy%preserving%access%to%data%must%necessarily%release%some%

informaEon%about%individual%records%(to%ensure%uElity)%

  • However,%kJanonymous%algorithms%can%reveal%individual%level%

informaEon%even%with%two%releases.%%

Lecture%3%:%590.03%Fall%16% 59%

slide-60
SLIDE 60

A%bound%on%the%number%of%queries%

  • In%order%to%ensure%uElity,%a%staEsEcal%database%must%leak%some%

informaEon%about%each%individual%%

  • We%can%only%hope%to%bound%the%%

amount%of%disclosure%

  • Hence,%there%is%a%limit%on%number%of%%

queries%that%can%be%answered%

Lecture%2%:%590.03%Fall%16% 60%

slide-61
SLIDE 61

Dinur%Nissim%Result%

  • A%vast%majority%of%records%in%a%database%of%size%n#can%be%

reconstructed%when%n#log(n)2%queries%are%answered%by%a% staEsEcal%database%…% % …%even%if%each%answer%has%been%arbitrarily%altered%to%have%up%to%

  • (√n)%error%

.%%

[Dinur;Nissim!PODS!2003]!

Lecture%2%:%590.03%Fall%16% 61%

slide-62
SLIDE 62

Outline%of%Dinur%Nissim%Result%

  • Baseline%for%Privacy:%Blatant%NonJPrivacy%
  • ExponenEal%Time%Adversaries%
  • Polynomial%Time%Adversaries%

Lecture%7%:%590.03%Fall%13% 62%

slide-63
SLIDE 63

Model%

  • Database%of%bits:%%

%

  • Queries:%Subset%sums%

– Consider% – %%

  • Perturbed%Answer%returned%by%a%private%algorithm:%

– Error:%%%%

Lecture%7%:%590.03%Fall%13% 63%

slide-64
SLIDE 64

Blatant%NonJPrivacy%

Lecture%7%:%590.03%Fall%13% 64%

  • !dist(c,d)%=%Hamming%distance%

%%%%%%%%%%%%%%%%%=%number%of%posiEons%where%databases%c%and%d%differ.%

  • %neg(n):%
  • Meaning%of%the%definiEon:%%

A%database%d%along%with%a%perturbed%access%mechanism%A%is%t(n)JnonJprivate%if% an%azacker%can%“decode”%the%database%with%high%probability%using%queryJ (perturbed)%answer%pairs%in%t(n)%Eme.%%%

slide-65
SLIDE 65

Outline%of%Dinur%Nissim%Result%

  • Baseline%for%Privacy:%Blatant%NonJPrivacy%
  • ExponenEal%Time%Adversaries%
  • Polynomial%Time%Adversaries%

Lecture%7%:%590.03%Fall%13% 65%

slide-66
SLIDE 66

ExponenEal%Time%Adversary%

Lecture%7%:%590.03%Fall%13% 66%

ExponenEal%number%of%query,%answer%pairs%

slide-67
SLIDE 67

ExponenEal%Time%Adversary%

A`ack!always!terminates!!(why?)!

  • Algorithm%considers%all%database%in%the%weeding%phase.%%
  • Original%database%d%is%never%weeded%out.%%

Lecture%7%:%590.03%Fall%13% 67%

slide-68
SLIDE 68

ExponenEal%Time%Adversary%

% % %

Lecture%7%:%590.03%Fall%13% 68%

Database!c!would!not!have!! passed!the!weeding!phase!

slide-69
SLIDE 69

ExponenEal%Time%Adversary%

Lecture%7%:%590.03%Fall%13% 69%

With%an%exponenEal%number%of%queries,%an%adversary%can% reconstruct%the%enEre%database%even!if!error!in!each!query!is!o(n)!

slide-70
SLIDE 70

ExponenEal%Time%Adversary%

  • What%about%Θ(n)%error?%%
  • Error%=%n/2%

– Trivial%…% – Always%answer%n/2% – No%uElity%

  • Error%=%n/40%

– Hint:%Using%the%proof%of%the%theorem%…% – Can%reconstruct%9/10%of%the%database!%%

Lecture%7%:%590.03%Fall%13% 70%

slide-71
SLIDE 71

Summary%of%ExponenEal%Adversary%

  • An%adversary%who%can%ask%all%queries%can%reconstruct%a%large%

fracEon%of%the%database%with%probability%1.%%

  • What%if%the%adversary%is%only%allowed%to%asked%a%small%set%of%

queries?%%

Lecture%7%:%590.03%Fall%13% 71%

slide-72
SLIDE 72

Outline%

  • Baseline%for%Privacy:%Blatant%NonJPrivacy%
  • ExponenEal%Time%Adversaries%
  • Polynomial%Time%Adversaries%

Lecture%7%:%590.03%Fall%13% 72%

slide-73
SLIDE 73

Polynomial%Time%Adversaries%

Lecture%7%:%590.03%Fall%13% 73%

slide-74
SLIDE 74

Polynomial%Time%Adversaries%

Lecture%7%:%590.03%Fall%13% 74%

With%n%log2n%queries,%an%adversary%can%reconstruct%the%enEre% database%even!if!error!in!each!query!is!o(√n)!

slide-75
SLIDE 75

Summary%of%negaEve%results%

  • Azackers%can%ask%mulEple%quesEons%to%the%database%to%learn%

sensiEve%informaEon,%even%when%each%query%answer%is%perturbed%

  • General%result%

– PerturbaEon%need%not%be%independent%for%each%query%(no%assumpEon%on% how%noise%is%infused)% – Subset%sum%queries%are%quite%general.%Just%use%a%random%set%of%queries%…% – Both%exponenEal%Eme%and%polynomial%Eme%azacks%

  • Need%to%think%of%privacy%as%a%budgetJconstrained%problem%

– Given%a%perturbaEon%level,%there%is%an%upper%bound%on%the%number%of% queries%that%can%be%answered.%% – Once%the%limit%is%reached,%no%more%queries%can%be%answered%

Lecture%7%:%590.03%Fall%13% 75%

slide-76
SLIDE 76

A%privacy%mechanism%must%saEsfy% composiGon!…%

…%or%allow%a%graceful%degradaEon%of%privacy%with%mulEple% invocaEons%on%the%same%data.%% %

[DN03,%GKS08]%

%

Lecture%3%:%590.03%Fall%16% 76%

slide-77
SLIDE 77

Postprocessing%the%output%of%a%privacy% mechanism%must%not%change%the% privacy%guarantee%

Lecture%3%:%590.03%Fall%16% 77%

[KL10,%MK15]%

slide-78
SLIDE 78

Privacy%must%not%be%achieved%through%

  • bscurity.%

%%Azacker%must%be%assumed%to%know%the%algorithm%used%as%well% as%all%parameters%

Lecture%3%:%590.03%Fall%16% 78%

slide-79
SLIDE 79

Simulatability%

“The#enemy#knows#the#system”,#Claude#Shannon#

79% Lecture%6%:%590.03%Fall%13%

slide-80
SLIDE 80

Query%AudiEng%

Database%has%numeric%values%(say%salaries%of%employees).% Should%not%release%exact%value%of%any%value.%% % Database%either%truthfully%answers%a%quesEon%or%denies#answering.%% MIN,%MAX,%SUM%queries%over%subsets%of%the%database.% % QuesLon:###When#to#allow/deny#queries?# #

Database!

Researcher!

Query! Safe!to! publish?! Yes! No!

80% Lecture%6%:%590.03%Fall%13%

slide-81
SLIDE 81

Why%should%we%deny%queries?%

  • Q1:%AK’s%sensiEve%value?%%

– DENY%

  • %Q2:%Max%sensiEve%value%of%%

ugrads?%

– ANSWER:%2%

  • Q3:%Max%sensiEve%value%of%US%

%students?%%

– ANSWER:%3%

  • But%Q3%+%Q2%=>%AK%=%3%

Lecture%6%:%590.03%Fall%13% 81%

Name! Grad! Interna Gonal! SensiGv e!value! NR% Y% Y% 1% AK% Y% N% 3% SR% N% N% 1% KL% N% N% 2% YC% Y% Y% 1% SY% Y% Y% 2% HC% Y% Y% 1%

slide-82
SLIDE 82

ValueJBased%AudiEng%

  • Let%a1,%a2,%…,%ak%be%the%answers%to%previous%queries%Q1,%Q2,%…,%Qk.%%
  • Let%ak+1%be%the%answer%to%Qk+1.%%

ai!=!f(ci1x1,!ci2x2,!…,!cinxn),!!i!=!1!…!k+1! cim%=%1%if%Qi%depends%on%xm%% ! Check!if!any!xj!has!a!unique!soluGon.!!

82% Lecture%6%:%590.03%Fall%13%

slide-83
SLIDE 83

ValueJBased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

83% Lecture%6%:%590.03%Fall%13%

slide-84
SLIDE 84

ValueJbased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10%

J∞%≤%x1%…%x5≤%10%

84% Lecture%6%:%590.03%Fall%13%

slide-85
SLIDE 85

ValueJbased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10% max(x1,%x2%,%x3%,%x4)%

Ans:%8%

DENY%

J∞%≤%x1%…%x4%≤%8% %=>%x5%=%10%

85% Lecture%6%:%590.03%Fall%13%

slide-86
SLIDE 86

ValueJbased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10% max(x1,%x2%,%x3%,%x4)%

Ans:%8%

DENY%

Denial%means%some% value%can%be% compromised!%

86% Lecture%6%:%590.03%Fall%13%

slide-87
SLIDE 87

ValueJbased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10% max(x1,%x2%,%x3%,%x4)%

Ans:%8%

DENY%

What%could%% max(x1,%x2,%x3,%x4)% be?%

87% Lecture%6%:%590.03%Fall%13%

slide-88
SLIDE 88

ValueJbased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10% max(x1,%x2%,%x3%,%x4)%

Ans:%8%

DENY%

From%first%answer,%

max(x1,x2,x3,x4)%≤%10%

88% Lecture%6%:%590.03%Fall%13%

slide-89
SLIDE 89

ValueJbased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10% max(x1,%x2%,%x3%,%x4)%

Ans:%8%

DENY%

If,%max(x1,x2,x3,x4)%=%

10%

Then,%no%privacy% breach%

89% Lecture%6%:%590.03%Fall%13%

slide-90
SLIDE 90

ValueJbased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10% max(x1,%x2%,%x3%,%x4)%

Ans:%8%

DENY%

Hence,%

max(x1,x2,x3,x4)%<%10%

=>%x5%=%10!%

90% Lecture%6%:%590.03%Fall%13%

slide-91
SLIDE 91

ValueJbased%AudiEng%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10% max(x1,%x2%,%x3%,%x4)%

Ans:%8%

DENY%

Hence,%

max(x1,x2,x3,x4)%<%10%

=>%x5%=%10!%

Denials!leak!informaGon.! ! A`ack!occurred!since!privacy!analysis!did! not!assume!that!a`acker!knows!the!algorithm.!%

91% Lecture%6%:%590.03%Fall%13%

slide-92
SLIDE 92

Simulatable%AudiEng%[Kenthapadi%et%al%PODS%‘05]%

  • An%auditor%is%simulatable#if%the%decision%to%deny%a%query%Qk%is%

made%based%on%informaGon!already!available!to!the!a`acker.!! – Can%use%queries%Q1,#Q2,#…,#Qk%and%answers%a1,#a2,#…,#akK1# – Cannot#use%ak%or%the%actual%data%to%make%the%decision.%

  • Denials%provably%do%not%leak%informaiton%

– Because%the%azacker%could%equivalently%determine%whether% the%query%would%be%denied.%% – Azacker%can%mimic%or%simulate%the%auditor.%

92% Lecture%6%:%590.03%Fall%13%

slide-93
SLIDE 93

Simulatable%AudiEng%Algorithm%

  • Data%Values:%{x1,%x2%,%x3%,%x4%,%x5},%Queries:%MAX.%
  • Allow%query%if%value%of%xi%can’t%be%inferred.%

%

x1 x2% x3% x4%% x5%

max(x1,%x2%,%x3%,%x4%,%x5)%

Ans:%10%

10% max(x1,%x2%,%x3%,%x4)%

Before% compuEng% answer%

DENY%

% % % Ans%>%10%=>%not%possible% Ans%=%10%=>%J∞%≤%x1%…%x4%≤%10% Ans%<%10%=>%x5%=%10%

SAFE% UNSAFE%

93% Lecture%6%:%590.03%Fall%13%

slide-94
SLIDE 94

Summary%of%Simulatable%AudiEng%

  • Decision%to%deny%answers%must%be%based%on%past%queries%

answered%in%some%(many!)%cases.%%

  • Denials%can%leak%informaEon%if%the%adversary%does%not%know%all%

the%informaEon%that%is%used%to%decide%whether%to%deny%the% query.%%

94% Lecture%6%:%590.03%Fall%13%

slide-95
SLIDE 95

Summary%

  • StaEsEcal%database%privacy%is%the%problem%of%releasing%aggregates%

while%not%disclosing%individual%records%

  • The%problem%is%disEnct%from%encrypEon,%secure%computaEon%and%

access%control.%

  • Defining%privacy%is%nonJtrivial%

– Desiderata%include%resilience%to%background%knowledge%and%composiEon% and%closure%under%postprocessing.%

Lecture%3%:%590.03%Fall%16% 95%

slide-96
SLIDE 96

References%

[S02]%Sweeney,%“KJanonymity”,%IJFUKS%2010% [DN03]%Dinur,%Nissim,%“Revealing%informaEon%while%preserving%privacy”,%PODS%2003% [D06]%Dwork,%“DifferenEal%Privacy”,%ICALP%2006% [MKGV06]%Machanavajjhala,%Kifer,%Gehrke,%Venkitasubramaniam,%“LJDiversity”%ICDE% 2006% [GKS08]%Ganta,%Kasiviswanathan,%Smith,%“ComposiEon%azacks%and%auxiliary% informaEon%in%data%privacy”,%KDD%2008% [KL10]%Kifer,%Lin,%“Towards%an%AxiomaEzaEon%of%StaEsEcal%Privacy%and%UElity.”,%PODS% 2010% [VSJO13]%Vaidya,%Shafiq,%Jiang,%OhnoJMachado,%“IdenEfying%inference%azacks%against% healthcare%data%repositories”,%AMIA%2013% [MK15]%Machanavajjhala,%Kifer,%“Designing%staEsEcal%privacy%for%your%data”,%CACM% 2015%

Lecture%3%:%590.03%Fall%16% 96%