TOWARDS A QUALITY ASSESSMENT OF DISCLOSURE-LIMITED STATISTICAL DATA - PDF document

TOWARDS A QUALITY ASSESSMENT OF DISCLOSURE-LIMITED STATISTICAL DATA Lawrence H. Cox, Ph.D. National Center for Health Statistics LCOX@CDC.GOV

QUALITY-CONFIDENTIALITY TRADEOFF To reduce risk of statistical disclosure to an acceptable level , statistical disclosure limitation (SDL) methods - abbreviate - eliminate - modify original data Lowering disclosure risk typically forces reduction of data quality in terms of - accuracy - completeness - usability Over the past 4 decades, SDL methods have been - studied/developed - improved/refined/implemented with considerable success At the same time, efforts to assess/control/assure quality were virtually absent

This presentation - examines quality effects of three SDL methods for tabular data - explores quality-preserving methods The three methods - rounding - complementary cell suppression - controlled tabular adjustment

HIGHLIGHTS In view of time limitations, the take-home messages are Rounding - rounding keeps the data release intact - methods for quality-preserving rounding � preserving mean, variance � preserving distribution - available to NSOs - rounding can limit disclosure effectively Complementary cell suppression - has very negative effects on data quality, especially as the data release is not intact - in the absence of a mathematical model, in some cases suppression can be undone - the security of suppression hinges on a single quantity that often can be estimated - p-percent rules can be vulnerable - p/q-ambiguity rules are vulnerable Controlled tabular adjustment - keeps the data release intact - can preserve key values and statistics - can preserve original distribution - effectively limits disclosure

ROUNDING Rounding (base B) : replace original data values x = qB + r, 0 < r < B by integer multiples R(x) = mB of an integer rounding base B Adjacent rounding (typical): |x – R(x)| < B Zero-restricted rounding (typical): R(mB) = mB Controlled rounding preserves additivity We are concerned with - effects of base B rounding on statistical properties of original data ( data quality ) • mean • variance/TMSE • distribution - effects on disclosure risk : P[x | R(x)]

Principal issues in evaluating an SDL method (1) Is the method effective for limiting disclosure? (2) Are its effects on data quality acceptable? Examined these questions for four rounding rules - conventional rounding - modified conventional rounding - zero-restricted 50/50 rounding - unbiased rounding We only on report zero-restricted 50/50 rounding We evaluate rounding rule/base (1) in terms of the posterior probability of an original data value given its rounded value (2) in terms expected increase in total mean squared error and expected difference between pre- and post-rounding distributions as measured by a conditional Chi-square statistic

We assume - r - and q -distributions independent B − ∼ {0, 1} - r Uniform (can be relaxed) Focus on adjacent rounding - R(x) = qB or (q + 1)B - R(x) = qB + R(r) with R(r) = 0 or B Zero-restricted 50/50 rounding - r = 0: round down - r ≠ 0: round down or up each with probability ½ Assumptions imply - E[x] = BE[q] + E[r] - P[r] = P[r|q] = 1/B - V(x) = B 2 V(q) + V(r)

EFFECTS OF ROUNDING ON MEAN, VARIANCE For zero-restricted 50/50 rounding + 1 -1 B B - [ ] and [ ] = = = = ( ) 0 ( ) , thus P R r P R r B 2 2 B B − -1 1 B B 2 - [ ] = = ( ) [ ( )] and E R r V R r 2 4 Expected value of x and R(x) Unrounded 50/50 -1 -1 B B qB + qB + 2 2 Variance of x and R(x) Unrounded 50/50 B − B − 2 2 1 1 B 2 V[q] + B 2 V[q] + 4 12

EFFECTS OF ROUNDING ON x-DISTRIBUTION Use the conditional Chi-square statistic χ = 2 - ∑ x U x − − [ ( ) ] [ ( ) ] R x x 2 R r r 2 = = - (x = 0, U x = 0) U x x x x x Degrees of freedom df determined by the tabular structure − ⎡ ⎤ ⎡ ⎤ ( ) 2 2 r B r = + = = ∑ ( ) [ | ] [ ( ) 0] [ ( ) ] P R r P R r B E U x x x ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ x x x x x r x - d = # {x} = the number of x-observations - e = # {x < B}, viz., zeroes and confidential values Can derive − − 2 ( 1) 1 1 B B B ≤ + − ≥ [ ] ( ) [ | 1] - E U e d e E q 2 6 q − B B − ( ( ) ) R r r 2 2 1 ( 1) B − − 1 B [ ] ∑ E = ( ) 2 2 r = s s 1 − − − ( ( ) ) R r r 2 = ( 1)(2 1) B B [ ] E 6 B 1 ≥ [ | 1] NSO can estimate E q q So, NSO can select B so that the expected conditional Chi-square value is not statistically significant

EFFECTIVENESS FOR DISCLOSURE LIMITATION Evaluate effectiveness of rounding for SDL in terms of posterior predictive probabilities P[x=r|R(r) = 0] P[x|R(x)=0] x=r 50/50 {r = 0} 1 B + R(x)=0 1 2 { B + } 1 Confidentiality analysis - prior r-probabilities uniform on {0, 1, …, B-1} - ideally, posterior probabilities uniform on same set - or, if x=0 is not a confidential value, then uniform over its B-1 nonzero values - if x=r=0 is not confidential, under 50/50 rounding posterior probabilities are uniform over the confidential values Reference: Cox and Kim (2006)

COMPLEMENTARY CELL SUPPRESSION p-PERCENT RULE For magnitude data , each respondent ( contributor ) to the value of cell X contributes an individual amount, e.g., - monthly sales for a clothing store - weekly payroll for a factory - number of patient visits for an emergency room Cell value of X is x = sum of all contributions x i to X ∑ = ≥ ≥ ≥ ; .... .... x x x x x 1 2 i i i The p-percent rule is designed to prevent narrow estimation of any contribution to a cell value by a second contributor or third party. It says: A tabulation cell X is a disclosure ( sensitive ) cell if, after subtracting the second largest contribution from the cell value, the remainder is within p-percent of the largest contribution Express p as a decimal (not a percent); e.g., 20% = 0.20 ∑ = − > Sensitivity expressed via ( ) (1/ ) 0 S X x p x p 1 i ≥ 3 i NB: Protecting largest from second largest protects all

p/q-AMBIGUITY RULE In addition to p-percent protection, data releaser assumes intruder can estimate any contribution within q-percent Express q as decimal : q < 1 and, of course, q >> p ∑ = − > Sensitivity expressed via ( ) ( / ) 0 S X x q p x p q / 1 i ≥ 3 i Thus, p/q-ambiguity rule is stricter than p-percent rule, viz., all p-percent sensitive cells are p/q-sensitive When q = 1: p/q-ambiguity rule = p-percent rule Disclosure limitation method must take into account the ability of the intruder to estimate within q-percent

CCS - suppress from publication all sensitive cells - the disclosure rule enables releaser to compute for each sensitive cell the minimum uncertainty in estimation required to protect the cell - that quantity is dependent on the distribution of contributions within the cell and differs from cell to cell and cell value to cell value - it is called X ’s protection limit r(X) = r - select other, nonsensitive cells whose suppression will render the tabulations safe according to the disclosure rule--the complementary suppressions - safe means that no interval for x finer than [x-r, x+r] is derivable from released tabulations - select the complementary suppressions optimally with respect to some information loss criterion, e.g., # total value suppressed # total number of suppressions # Berg entropy - very complex mathematically/computationally - for the p/q-rule, the mathematical suppression must take into account the ability of the intruder to estimate values to within q-percent

Mathematical models for CCS Tabular structure is represented as Ay = 0 Entries of A = -1, 0, +1 Original data: a = (a 1 ,…., a n ); Aa = 0 Sensitive cell values: a d(i) , i = 1, …, s Protection: r d(i) , 0 < r d(i) < a d(i) , and r k = 0 otherwise CCS Models ∑ min c z k k k = = = 1,...., ; 1,2; 1,...., : i s j k n = 0 Ay , i j − ≤ ≤ − (1 ) a z y a r z ,1, k k i k k k k + ≥ ≥ + (1 ) a z y a r z ,2, k k i k k k k = = 0,1; 1 z z j d i ( ) Minimize number of cells suppressed: c k = 1 Minimize total value suppressed c k = a k Minimize Berg entropy c k = log (1+ a k )

Suppression done “by hand” can be vulnerable 3x3x3 contingency table, all internal entries suppressed ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ * * * 11 * * * 5 * * * 5 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ * * * 5 * * * 11 * * * 5 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ * * * ⎠ ⎝ 5 ⎠ ⎝ * * * ⎠ ⎝ 5 ⎠ ⎝ * * * ⎠ ⎝ 11 ⎠ ( ) ( ) ( ) 11 5 5 (21) 5 11 5 (21) 5 5 11 (21) ⎛ ⎞ 1 10 10 ⎜ ⎟ 10 1 10 ⎜ ⎟ ⎜ ⎟ 10 10 1 ⎝ ⎠ Unique solution: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 5 5 0 5 0 0 0 5 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 5 0 0 5 1 5 0 0 5 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 5 0 0 ⎠ ⎝ 0 5 0 ⎠ ⎝ 5 5 1 ⎠ contains three 1’s-- DISCLOSURE

TOWARDS A QUALITY ASSESSMENT OF DISCLOSURE-LIMITED STATISTICAL DATA - PDF document

TOWARDS A QUALITY ASSESSMENT OF DISCLOSURE-LIMITED STATISTICAL DATA Lawrence H. Cox, Ph.D. National Center for Health Statistics LCOX@CDC.GOV QUALITY-CONFIDENTIALITY TRADEOFF To reduce risk of statistical disclosure to an acceptable level ,

External Quality Assessment AIM of QUALITY SYSTEM AIM of QUALITY SYSTEM The aim of QUALITY SYSTEM

RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW

Student Assessment in Scarsdale Education Report November, 2016 Assessment Defined Purposes

7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests Intelligibility

Mail Service Quality Support: Mail Service Quality Support: Mail Service Quality Support: Mail

Molecule Screen and Cell Quality Molecule Screen and Cell Quality Assessment Assessment

Disclosure Disclosure Caries Management by Risk Caries Management by Risk Assessment

eQualite: eQualite: Quality Assessment Quality Assessment of of Software Suppliers Software

Assessment at SCIS February 2019 Why do we need assessment? How does assessment align with

Agritech Agritech Agritech Limited Agritech Agritech

New quality paradigm: New quality paradigm: Quality by Design Quality by Design ICH

The National Assessment of Education Quality in China National Assessment of Education Quality

ASX:LEG Investor Presentation July 2019 DISCLOSURE This is not a disclosure document. Any

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Towards an Italian RSG ? Towards an Italian RSG ? Achille Zappa achille.zappa@gmail.com

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

DRUG RELEASE METHODOLOGIES FOR NANOMEDICINES ADDRESSING CHALLENGES PROF. PADMA V. DEVARAJAN

Turning Your Organization Into a Leadership Talent Factor Paul Tesluk Donald S. Carmichael

Lions College 2007-2008 Action Research Improving students completion of homework through

THE ESSENTIALS Camp History and Philosophy Community Partnerships Camp Dates

Factors affecting the water extractable phosphorus from compost M. Grigatti , L. Cavani, S.

Development of catch control devices Results from: Industral project (FHF) "

Alert!!! Today, the Nosocomial* infection rates have reached alarming levels in the Hospitals. The

Indemnification and Limitations of Liability August 29, 2019 Presented by: Kevin T. Wills,

TOWARDS A QUALITY ASSESSMENT OF DISCLOSURE-LIMITED STATISTICAL DATA - PDF document

TOWARDS A QUALITY ASSESSMENT OF DISCLOSURE-LIMITED STATISTICAL DATA Lawrence H. Cox, Ph.D. National Center for Health Statistics LCOX@CDC.GOV QUALITY-CONFIDENTIALITY TRADEOFF To reduce risk of statistical disclosure to an acceptable level ,

External Quality Assessment AIM of QUALITY SYSTEM AIM of QUALITY SYSTEM The aim of QUALITY SYSTEM

RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW

Student Assessment in Scarsdale Education Report November, 2016 Assessment Defined Purposes

7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests Intelligibility

Mail Service Quality Support: Mail Service Quality Support: Mail Service Quality Support: Mail

Molecule Screen and Cell Quality Molecule Screen and Cell Quality Assessment Assessment

Disclosure Disclosure Caries Management by Risk Caries Management by Risk Assessment

eQualite: eQualite: Quality Assessment Quality Assessment of of Software Suppliers Software

Assessment at SCIS February 2019 Why do we need assessment? How does assessment align with

Agritech Agritech Agritech Limited Agritech Agritech

New quality paradigm: New quality paradigm: Quality by Design Quality by Design ICH

The National Assessment of Education Quality in China National Assessment of Education Quality

ASX:LEG Investor Presentation July 2019 DISCLOSURE This is not a disclosure document. Any

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Towards an Italian RSG ? Towards an Italian RSG ? Achille Zappa achille.zappa@gmail.com

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

DRUG RELEASE METHODOLOGIES FOR NANOMEDICINES ADDRESSING CHALLENGES PROF. PADMA V. DEVARAJAN

Turning Your Organization Into a Leadership Talent Factor Paul Tesluk Donald S. Carmichael

Lions College 2007-2008 Action Research Improving students completion of homework through

THE ESSENTIALS Camp History and Philosophy Community Partnerships Camp Dates

Factors affecting the water extractable phosphorus from compost M. Grigatti , L. Cavani, S.

Development of catch control devices Results from: Industral project (FHF) &quot;

Alert!!! Today, the Nosocomial* infection rates have reached alarming levels in the Hospitals. The

Indemnification and Limitations of Liability August 29, 2019 Presented by: Kevin T. Wills,

Development of catch control devices Results from: Industral project (FHF) "