- 'ehlol owrdl' looked like a good encryption for 'hello world - No - - PowerPoint PPT Presentation

ehlol owrdl looked like a good
SMART_READER_LITE
LIVE PREVIEW

- 'ehlol owrdl' looked like a good encryption for 'hello world - No - - PowerPoint PPT Presentation

P RIVATE C ORE -S ETS FOR F ACE I DENTIFICATION D AN F ELDMAN , U NIVERSITY OF H AIFA A JOINT WORK WITH Rita Osadchy : S OME S LIDES F ROM : K OBBI N ISSIM [STOC11] P RIVATE C ORESETS [OAKLAND10] - S ECURE C OMPUTATION OF F ACE I


slide-1
SLIDE 1

PRIVATE CORE-SETS

FOR FACE IDENTIFICATION

DAN FELDMAN, UNIVERSITY OF HAIFA

A JOINT WORK WITH Rita Osadchy: SOME SLIDES FROM: KOBBI NISSIM

[STOC’11] – PRIVATE CORESETS [OAKLAND’10] - SECURE COMPUTATION OF FACE

IDENTIFICATION

slide-2
SLIDE 2

GOALS OF THIS WORK

Practical and useful implementation of private core-sets by combining:

  • Theoretical proofs regarding Private

Coresets

[D., Nissim, Fiat Kaplan]

  • Practical system for face

identification

[Osadchy, Pinkas, Jarrous, Moskovitch]

slide-3
SLIDE 3

K-ANONYMITY [SS98,S02]

As in ancient crypto:

  • 'ehlol owrdl' looked like a good

encryption for 'hello world‘

  • No proofs against smart attacker or

usage of auxiliary data

  • In general:
  • (1) Algorithm is published against last

attack mechanism

  • (2) Attack mechanism is published
  • (3) Goto 1
slide-4
SLIDE 4

K-ANONYMITY [SS98,S02]

Prevent re-identification:

 Make every individual’s identity unidentifiable from other k-1

individuals

Disease sex Age ZIP Heart Female 55 23456 Heart Male 30 12345 Heart Male 33 12346 Breast Cancer Female 45 13144 Hepatitis Male 42 13155 Viral Male 42 23456 Disease sex Age ZIP Heart * ** 23456 Heart Male 3* 1234* Heart Male 3* 1234* Breast Cancer * 4* 131** Hepatitis * 4* 131** Viral * ** 23456

Bugger! I Cannot tell which disease for the patients from zip 23456 Both guys from zip 1234* that are in their thirties have heart problems My (male) neighbor from zip 13155 has hepatitis! Slide borrowed from Kobbi Nissim

slide-5
SLIDE 5

5

SEARCH FOR: PRIVACY

slide-6
SLIDE 6

AOL SEARCH HISTORY RELEASE (2006)

 650,000 users, 20 Million queries, 3 months

Goal: provide real query log data that is

based on real users

 “It could be used for personalization, query

reformulation or other types of search research”

 Privacy?  Identifying information replaced with random identifiers

6

slide-7
SLIDE 7

4417749best dog for older owner 3/6/2006 11:48:24 1 http://www.canismajor.com 4417749best dog for older owner 3/6/2006 11:48:24 5 http://dogs.about.com 4417749landscapers in lilburn ga. 3/6/2006 18:37:26 4417749 effects of nicotine 3/7/2006 19:17:19 6 http://www.nida.nih.gov 4417749best retirement in the world 3/9/2006 21:47:26 4 http://www.escapeartist.com 4417749best retirement place in usa 3/9/2006 21:49:37 10 http://www.clubmarena.com 4417749best retirement place in usa 3/9/2006 21:49:37 9 http://www.committment.com 4417749bi polar and heredity 3/13/2006 20:57:11 4417749adventure for the older american 3/17/2006 21:35:48 4417749nicotine effects on the body 3/26/2006 10:31:15 3 http://www.geocities.com 4417749nicotine effects on the body 3/26/2006 10:31:15 2 http://health.howstuffworks.com 4417749wrinkling of the skin 3/26/2006 10:38:23 4417749mini strokes 3/26/2006 14:56:56 1 http://www.ninds.nih.gov 4417749panic disorders 3/26/2006 14:58:25 4417749jarrett t. arnold eugene oregon 3/23/2006 21:48:01 2 http://www2.eugeneweekly.com 4417749jarrett t. arnold eugene oregon 3/23/2006 21:48:01 3 http://www2.eugeneweekly.com 4417749plastic surgeons in gwinnett county 3/28/2006 15:04:231 http://www.wedalert.com 4417749plastic surgeons in gwinnett county 3/28/2006 15:04:234 http://www.implantinfo.com 4417749plastic surgeons in gwinnett county 3/28/2006 15:31:00 441774960 single men 3/29/2006 20:11:52 6 http://www.adultlovecompass.com 441774960 single men 3/29/2006 20:14:14 4417749clothes for 60 plus age 4/19/2006 12:44:03 4417749clothes for age 60 4/19/2006 12:44:41 10 http://www.news.cornell.edu 4417749clothes for age 60 4/19/2006 12:45:41 4417749lactose intolerant 4/21/2006 20:53:51 2 http://digestive.niddk.nih.gov 4417749lactose intolerant 4/21/2006 20:53:51 10 http://www.netdoctor.co.uk 4417749dog who urinate on everything 4/28/2006 13:24:07 6 http://www.dogdaysusa.com 4417749fingers going numb 5/2/2006 17:35:47

slide-8
SLIDE 8

Name: Thelma Arnold Age: 62 Widow Residence: Lilburn, GA

slide-9
SLIDE 9

LINKAGE ATTACKS [SWEENEY 02]

Ethnicity visit date Diagnosis Procedure Medication Total Charge ZIP Birth date Sex Anonymized GIC data ZIP Birth date Sex Name Address Date registered Party affiliation Date last voted Voter registration

GIC Group Insurance Commission patient specific data ( 135,000 patients) 100 attributes per encounter Anonymized Voter registration

  • f Cambridge MA

“Public records”

  • pen for inspection by anyone
slide-10
SLIDE 10

LINKAGE ATTACKS [SWEENEY 02]

 William Weld (governor of Massachusetts at

the time)

 According to the Cambridge Voter list:

Six people had his particular birth date Of which three were men He was the only one in his 5-digit ZIP code!

slide-11
SLIDE 11

LINKAGE ATTACK ON THE NETFLIX DATASET

 Netflix: online movie rental service  In October 2006, released real movie ratings of

500,000 subscribers

 10% of all Netflix users as of late 2005  Names removed, maybe perturbed

slide-12
SLIDE 12

THE NETFLIX DATASET

Movie 1 Movie 2 Movie 3 … … Rating/ timestamp Rating/ timestamp Rating/ timestamp …… 1234 5678 2589 4379 … … 500K users

17K movies – high dimensional! Average subscriber has 214 dated ratings

slide-13
SLIDE 13

NETFLIX DATASET: NEAREST NEIGHBOR

Considering just movie names, for 90% of records there isn’t a single other record which is more than 30% similar

similarity Slide borrowed from Elaine Shi

slide-14
SLIDE 14

Threat: deanonymization

User Movie Rating 1234 Rocky II 3/5 1234 The Wizard 4/5 1234 The Dark Knight 5/5 … 1234 Girls Gone Wild 5/5 User Movie Rating dukefan The Wizard 8/10 dukefan The Dark Knight 10/10 dukefan Rocky II 6/10 …

User 1234 is dukefan!

slide-15
SLIDE 15

Auditing using diffential privacy

data

Query log

q1,…,qi Here’s a new query: qi+1 Answer is… Query denied (answering would cause privacy loss) Auditor OR

slide-16
SLIDE 16

Database Privacy: The Setting

Government, Businesses, Researchers (or) Malicious adversary

Users

  • Database x = (x1,x2, …,xn) (a table of n rows)
  • Each element is from some domain D
  • D can be numbers, categories, tax forms, etc.

Database x xn xn-1 x3 x2 x1

Algorithm A

(queries) answer s

slide-17
SLIDE 17

DIFFERENTIAL PRIVACY [DMNS06]

xn xn-1  x3 x2 x1

x=

Distribs at “distance” < 

x’=

xn xn-1  x3 x2’ x1

A A One row modified A(x) A(x’)

slide-18
SLIDE 18

DIFFERENTIAL PRIVACY [DMNS06]

slide-19
SLIDE 19

CONCLUSION

k-Anonymity is practical and easy to

use, but not so safe in theory and practice

Differential privacy is safe but not so

practical to use

slide-20
SLIDE 20

Main Tool : Coresets

20

Given data D and Algorithm A with A(D)

intractable, can we efficiently reduce D to C so that A(C) fast and A(C)~A(D)?

slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32

From Big Data to Small Data

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36

Delete the pair of original coresets from memory

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46

46

Parallel+ Streaming Computation

slide-47
SLIDE 47

Good: Coresets reveal little information

Coreset Point Original Point

slide-48
SLIDE 48

Bad: Still, not preserve privacy Bad: Coresets are not differential private

Coreset Point Original Point

slide-49
SLIDE 49
slide-50
SLIDE 50

(STOC’11)

slide-51
SLIDE 51

(STOC’11)

slide-52
SLIDE 52
slide-53
SLIDE 53

Application: Face Identification

Our user will have ID 72135: his eyes are similar to user No. 7, ears are similar to user No. 2, lips are similar to user No. 1, …

match / no match Operator

Public Database of faces

slide-54
SLIDE 54

NEW FACE REPRESENTATION: PATCH-BASED FACE REPRESENTATION

A face is represented by a collection of

informative patches:

Assume that the face is represented by p

patches.

Patch centers Patch size –could vary

slide-55
SLIDE 55

COMPUTE K-REPRESENTATIVES FROM EACH

DATABASE

K-Means – Minimize the sum of squared distances over each point to its nearest center

slide-56
SLIDE 56

Your Eye ID is the closest center

slide-57
SLIDE 57

1 5, 8, 9, 14 2 3, 8, 10, 11 3 7, 9, 12, 18 p 4, 6, 10, 12 V= 1 2 p

Representing a face

For each of the p patches, store indices of the closest patche in the dictionary.

slide-58
SLIDE 58

Privacy

  • The same technique can be used for

databases of Genes, drivers, and customers

  • Currently we use public database:

People agreed to have their faces published

  • We want to use a private database instead
  • You can compute your ID without learning

about the actual people in the database

slide-59
SLIDE 59

Experiment

  • Input: Database P of users (faces) that want to

keep their privacy

  • Compute Private Coreset for k-means of P
  • ah the coreset with its k-means
  • Repeat for each database:

– Ears, noses, etc.

  • Every user have a public ID based on a private

coreset

slide-60
SLIDE 60

Implementation

  • Generation of representations from images:

– Implemented in Matlab, translated to Java using Matlab Java builder.

  • Timing on Linux servers:

– ~0.3 sec to compare to an image in the database – An Implementation in C will be much faster

  • Private Coreset in Python (By Gilad Levi, Oren

Efraimov, Yona Zahi)

slide-61
SLIDE 61

Results on Face Identification

No Privacy d=100, alpha=0.5 leakage d=20, alpha=0.4 leakage d=10, alpha=0.3 leakage d=5, alpha=0.05 leakage d=3, alpha=0.01 leakage False Positive True Positive

slide-62
SLIDE 62

Future Work

  • Compute Private Coreset on the Cloud

– Using Homomorphic Encryption (with Shafi Goldwaser & Daniela Rus)

  • Compute coresets of error polynomial in d

– (with Daniela Rus & Kobbi Nissim)

  • Fit the error function to Face Identification

– (With Rita Osadchy and Kobbi Nissim?)

  • Private coresets for other machine learning

problems

slide-63
SLIDE 63
slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66
slide-67
SLIDE 67
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73
slide-74
SLIDE 74
slide-75
SLIDE 75
slide-76
SLIDE 76
slide-77
SLIDE 77
slide-78
SLIDE 78
slide-79
SLIDE 79
slide-80
SLIDE 80
slide-81
SLIDE 81
slide-82
SLIDE 82
slide-83
SLIDE 83
slide-84
SLIDE 84
slide-85
SLIDE 85
slide-86
SLIDE 86
slide-87
SLIDE 87
slide-88
SLIDE 88
slide-89
SLIDE 89
slide-90
SLIDE 90
slide-91
SLIDE 91
slide-92
SLIDE 92
slide-93
SLIDE 93
slide-94
SLIDE 94
slide-95
SLIDE 95
slide-96
SLIDE 96
slide-97
SLIDE 97
slide-98
SLIDE 98

Input

A set of n point s P ½ Rd, k ¸ 1.

slide-99
SLIDE 99

Output

N : a small bicrit eria approximat ion t o t he k median of P

slide-100
SLIDE 100

The Algorithm

1) t à 1 Count er for it erat ions 2) N à ; T he out put set of cent ers

3) Const ruct a weak ( 1

8k) -net Nt for P

4) N Ã N [ Nt

5) 8p : Comput e dist (p; Nt)

6) Remove Pt: t he half of P t hat is closer t o Nt

7) t à t + 1 8) Repeat st eps 3 t o 6 t ill t here are no more input point s. 9) Ret urn N

slide-101
SLIDE 101

Proof of Correctness (for the non-private case)

slide-102
SLIDE 102

A point b 2 P is bad for Nt, if: dist (b; Nt) > 2 dist (b; N ¤)

b

slide-103
SLIDE 103

A point g 2 P is good for Nt ot herwise: dist (g; Nt) · 2 dist (g; N ¤)

g

slide-104
SLIDE 104

Main Technical Theorem

W e can map every bad point b 2 Pt t o a dist inct good point g 2 Pt+ 1.

g

b

slide-105
SLIDE 105

dist ( b; N ) · dist ( b; Nt) , because N ¶ Nt. Since b 2 Pt and g 2 Pt+ 1: dist ( b; Nt) · dist ( g; Nt) Since g is good for Nt: dist ( g; Nt) · 2 dist ( g; N ¤)

slide-106
SLIDE 106

dist ( b; N ) · dist ( b; Nt) , because N ¶ Nt. Since b 2 Pt and g 2 Pt+ 1: dist ( b; Nt) · dist ( g; Nt) Since g is good for Nt: dist ( g; Nt) · 2 dist ( g; N ¤) dist (b; N ) · 2 dist (g; N ¤)

slide-107
SLIDE 107

Bi-Criteria for k-Median

X

p2P

dist ( p; N )=

X

g

dist ( g; N ) +

X

b

dist ( b; N ) ·

X

g

2 dist ( g; N ¤) +

X

g

2 dist ( g; N ¤) · 4

X

p2 P

dist ( p; N ¤)

slide-108
SLIDE 108
slide-109
SLIDE 109

Bi-Criteria Approximation Algorithm [FFS07]

slide-110
SLIDE 110

Initialization

1) t à 1 Count er for it erat ions 2) F à ; T he out put set of j -° at s

slide-111
SLIDE 111

3) Const ruct a weak ( 1

8k)-net Nt for P

t = 1

slide-112
SLIDE 112

4) N Ã N [ Nt

(t = 1)

slide-113
SLIDE 113

5) 8p : Comput e dist (p; Nt)

p

(t = 1)

slide-114
SLIDE 114

(t = 1)

6) Remove Pt: t he half of P t hat is closer t o Nt

slide-115
SLIDE 115

(t = 1)

6) Remove Pt: t he half of P t hat is closer t o Nt

slide-116
SLIDE 116

7) t à t + 1

8) Repeat st eps 3 t o 6:

slide-117
SLIDE 117

(t = 2)

3) Const ruct a weak (1=k)-net Nt for P

slide-118
SLIDE 118

(t = 2)

4) N Ã N [ Nt

slide-119
SLIDE 119

5) 8p : Comput e dist (p; Nt)

p

(t = 2)

slide-120
SLIDE 120

6) Remove Pt: t he half of P t hat is closer t o Nt

(t = 2)

slide-121
SLIDE 121

6) Remove Pt: t he half of P t hat is closer t o Nt

(t = 2)

slide-122
SLIDE 122

6) Remove Pt: t he half of P t hat is closer t o Nt

(t = 2)

slide-123
SLIDE 123

7) t à t + 1 8) Repeat st eps 3 t o 6 t ill t here are no more input point s. 9) Ret urn N :

slide-124
SLIDE 124

Let N ¤ be any set of k point s in Rd.

slide-125
SLIDE 125

Let N ¤ be any set of k point s in Rd.

slide-126
SLIDE 126

Let N ¤ be any set of k point s in Rd.

Consider Nt t hat is const ruct ed during t he tt h it erat ion.

slide-127
SLIDE 127

A point b 2 P is bad for Nt, if: dist (b; Nt) > 2 dist (b; N ¤)

b

slide-128
SLIDE 128

A point g 2 P is good for Nt ot herwise: dist (g; Nt) · 2 dist (g; N ¤)

g

slide-129
SLIDE 129

Main Technical Theorem

W e can map every bad point b 2 Pt t o a dist inct good point g 2 Pt+ 1.

g

b

slide-130
SLIDE 130

dist ( b; N ) · dist ( b; Nt) , because N ¶ Nt. Since b 2 Pt and g 2 Pt+ 1: dist ( b; Nt) · dist ( g; Nt) Since g is good for Nt: dist ( g; Nt) · 2 dist ( g; N ¤)

slide-131
SLIDE 131

dist ( b; N ) · dist ( b; Nt) , because N ¶ Nt. Since b 2 Pt and g 2 Pt+ 1: dist ( b; Nt) · dist ( g; Nt) Since g is good for Nt: dist ( g; Nt) · 2 dist ( g; N ¤) dist (b; N ) · 2 dist (g; N ¤)

slide-132
SLIDE 132

Bi-Criteria for k-Median

X

p2P

dist ( p; N )=

X

g

dist ( g; N ) +

X

b

dist ( b; N ) ·

X

g

2 dist ( g; N ¤) +

X

g

2 dist ( g; N ¤) · 4

X

p2 P

dist ( p; N ¤)

slide-133
SLIDE 133

² T he number of bad point s is at most jB j = jPtj 8 ²

¯ ¯ ¯Pt+ 1 ¯ ¯ ¯ = jPtj

2 T he number of good point s in Pt+ 1 is at least

¯ ¯ ¯Pt+ 1 ¯ ¯ ¯ ¡ jB j ¸

jPtj 2 ¡ jPtj 8 ¸ jB j

Proof of the Technical Theorem

slide-134
SLIDE 134

Claim: O nly B0 = jPtj 8k point s are bad for q 2 Nt

q

q¤ p

dist (p; q) · 2 dist (p; q¤)

slide-135
SLIDE 135

B 0: t he jPtj

8k closest point s t o q¤ q¤

slide-136
SLIDE 136

B 0: t he jPtj

8k closest point s t o q¤

B0 cont ains q 2 Nt

³ 1

8k-net

´

q

slide-137
SLIDE 137

dist (p; q) · dist ( p; q¤) + dist ( q¤; q) · 2 dist ( p; q¤) For every yellow point p 2 P n B0:

p

q

slide-138
SLIDE 138

dist (p; q) · 2 dist (p; q¤)

All t he yellow point s are good for Nt

p

q

slide-139
SLIDE 139

jB 0j = jPtj 8

O nly t he black point s B 0 are bad for Nt

q