Disclosure Risk Measurement with Entropy in Sample Based Frequency - - PowerPoint PPT Presentation

disclosure risk measurement with entropy in sample based
SMART_READER_LITE
LIVE PREVIEW

Disclosure Risk Measurement with Entropy in Sample Based Frequency - - PowerPoint PPT Presentation

Disclosure Risk Measurement with Entropy in Sample Based Frequency Tables L. Antal N. Shlomo M. Elliot laszlo.antal@postgrad.manchester.ac.uk University of Manchester New Techniques and Technologies for Statistics 10 March 2015 L. Antal, N.


slide-1
SLIDE 1

Disclosure Risk Measurement with Entropy in Sample Based Frequency Tables

  • L. Antal
  • N. Shlomo
  • M. Elliot

laszlo.antal@postgrad.manchester.ac.uk

University of Manchester

New Techniques and Technologies for Statistics 10 March 2015

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 1 / 14

slide-2
SLIDE 2

Outline

1

Idea and Notation

2

Disclosure Risk Measures

3

Results

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 2 / 14

slide-3
SLIDE 3

Idea and Notation

Outline

1

Idea and Notation

2

Disclosure Risk Measures

3

Results

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 3 / 14

slide-4
SLIDE 4

Idea and Notation

Idea and Notation

We would like to measure the disclosure risk of a population based frequency table Information theoretical expressions (e.g. entropy) can reflect the properties of attribute disclosure Notation Population based frequency table: F = (F1, F2, . . . , FK) Population size: N = K

i=1 Fi

Sample based frequency table: f = (f1, f2, . . . , fK) Sample size: n = K

i=1 fi

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 4 / 14

slide-5
SLIDE 5

Disclosure Risk Measures

Outline

1

Idea and Notation

2

Disclosure Risk Measures

3

Results

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 5 / 14

slide-6
SLIDE 6

Disclosure Risk Measures

Properties of a desired disclosure risk measure

Properties: If only one cell is populated in the table, then the disclosure risk should be high. Uniformly distributed frequencies imply low risk. The smaller the cells, the higher the disclosure risk. The more number of zeroes, the higher the disclosure risk. The disclosure risk bounded by 0 and 1.

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 6 / 14

slide-7
SLIDE 7

Disclosure Risk Measures

The Disclosure Risk Measure

We developed the disclosure risk measure for population based frequency tables first Now we extend it for sample based frequency tables The disclosure risk measure for population based frequency tables: R1(F, w) = w1 · |D| K + w2 ·

  • 1 − H(X)

log K

  • − w3 ·

1 √ N · log 1 e · √ N where D is the set of zeroes in F and w = (w1, w2, w3) is a vector of weights

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 7 / 14

slide-8
SLIDE 8

Disclosure Risk Measures

Disclosure Risk Measure for Sample Based Tables

The disclosure risk of a sample based table should be lower than that

  • f the original population based table

R2(F, f, w) = w1 · |D| K |D∪E|

|D∩E|

+ w2 ·

  • 1 − H(X)

log K

  • · H(X|Y)

H(X) − w3 · 1 √ N · log 1 e · √ N where E is the set of zeroes in the sample based table and H(X|Y) is the conditional entropy of the original table with respect to the sample based table.

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 8 / 14

slide-9
SLIDE 9

Results

Outline

1

Idea and Notation

2

Disclosure Risk Measures

3

Results

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 9 / 14

slide-10
SLIDE 10

Results

Results

Data: 2001 UK census tables 10 selected output areas N = 2449 Weights: w = (0.1, 0.8, 0.1) Initial population based table: output area (10 output areas) × religion 1,000 sample based tables, 1,000 estimated population based frequency tables for each sample based table

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 10 / 14

slide-11
SLIDE 11

Results

Results

Estimation of population based frequency tables: Drawing samples from a population based table Applying a log-linear model to the sample based tables to estimate population parameters Drawing N − n ’individuals’ from a multinomial distribution Adding the individuals to the sample based table

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 11 / 14

slide-12
SLIDE 12

Results

Results

Sampling fraction 0.1 0.05 0.01 R1(F, w) From true population frequencies 0.2315 0.2315 0.2315 From estimated popula- tion frequencies 0.2173 0.2169 0.2161 R2(F, f, w) From true population frequencies 0.1697 0.1533 0.0950 From estimated popula- tion frequencies 0.1543 0.1400 0.0884

Table: Table: output area (10 output areas) × religion. 1,000 samples, 1,000 estimated population based table for each sample.

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 12 / 14

slide-13
SLIDE 13

Summary

Summary

A disclosure risk measure has been extended to sample based tables. The disclosure risk measure is based on information theory. Initial results show good estimates for a two-dimensional table. The model needs to be explored for higher dimensional tables.

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 13 / 14

slide-14
SLIDE 14

Thank you for your attention!

  • L. Antal, N. Shlomo, M. Elliot

Disclosure Risk Measurement NTTS 2015 14 / 14