de identifying education
play

De-Identifying Education Data Future of Privacy Forum Webinar - PowerPoint PPT Presentation

De-Identifying Education Data Future of Privacy Forum Webinar October 13, 2017 MICHAEL HAWES Director of Student Privacy Policy U.S. Department of Education United States Department of Education 2 Student Privacy Policy and Assistance


  1. De-Identifying Education Data Future of Privacy Forum Webinar October 13, 2017 MICHAEL HAWES Director of Student Privacy Policy U.S. Department of Education United States Department of Education 2 Student Privacy Policy and Assistance Division

  2. What is PII? 2 2 Student Privacy Policy and Assistance Division

  3. Personally Identifiable Information Captain Hook 2 3 Student Privacy Policy and Assistance Division

  4. Personally Identifiable Information A one-handed pirate, with an irrational fear of crocodiles and ticking clocks 2 4 Student Privacy Policy and Assistance Division

  5. FERPA: Personally Identifiable Information (PII) • Direct Identifiers • e.g., Name, SSN, Student ID Number, etc. (1:1 relationship to student) • Indirect Identifiers • e.g., Birthdate, Demographic Information (1:Many relationship to student) • “ Other information that, alone or in combination, is linked or linkable to a specific student that would allow a reasonable person in the school community, who does not have personal knowledge of the relevant circumstances, to identify the student with reasonable certainty. ” (§ 99.3) 5 2 5 Student Privacy Policy and Assistance Division

  6. FERPA’s Confidentiality Standard Can a “reasonable person” in the school community re-identify the individual with any reasonable certainty? Tabular Data: A small degree of uncertainty (“reasonable doubt”) is often sufficient. [e.g., “the rule of 3”] Individual-level Data: The abundance of data points for each individual, the availability of easy to use data-manipulation and data mining tools, and the ability to link to external data sources make the risk of re- identification much higher. 6 2 6 Student Privacy Policy and Assistance Division

  7. FERPA vs. HIPAA’s “Safe Harbor” 2 7 Student Privacy Policy and Assistance Division

  8. PII? But I’m only releasing aggregate data … Aggregate data tables can still contain PII if they report information on small groups, or individuals with unique or uncommon characteristics 8 2 8 Student Privacy Policy and Assistance Division

  9. How States are Doing It Under ESEA, States adopted minimum n-size rules to protect student privacy in aggregate reports, but there is substantial variation across states on the minimum n-size selected. State Adoption of Sub-Group Suppression Rules 35 (circa 2012) 30 25 20 15 # of States 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Suppression Rule (n < X) There is also substantial variation in how States have interpreted and implemented those minimum n-size requirements. 2 9 Student Privacy Policy and Assistance Division

  10. Small cells increase disclosure risk… BUT, suppressing the small cells may not be sufficient 10 2 10 Student Privacy Policy and Assistance Division

  11. Common Mistakes in Public Reporting 2 11 Student Privacy Policy and Assistance Division

  12. Population Size vs. Cell Size Assume a minimum n-size rule of 5: Subgroup # Tested # Proficient % Proficient Subgroup 1 6 1 16.7% 2 12 Student Privacy Policy and Assistance Division

  13. Population Size vs. Cell Size Assume a minimum n-size rule of 5: Subgroup # Tested # Proficient % Proficient Subgroup 1 6 1 16.7% What if I’m that 1 student? I now know something about the other 5! 2 13 Student Privacy Policy and Assistance Division

  14. Fixed Top/Bottom Coding Thresholds Assume a minimum n-size rule of 5: Subgroup # Tested # Proficient % Proficient Subgroup 1 8 * <5% 2 14 Student Privacy Policy and Assistance Division

  15. Fixed Top/Bottom Coding Thresholds Assume a minimum n-size rule of 5: Subgroup # Tested # Proficient % Proficient Subgroup 1 8 * <5% 0/8 = 0% 1/8 = 12.5% So, “<5%” of 8 students = 0 students! 2 15 Student Privacy Policy and Assistance Division

  16. A Better Approach for Handling Extreme Values Number of Top/Bottom Coding for Students Percentages (denominator) 1-5 Suppressed 6-15 <50%, ≥50% 16-30 ≤20%, ≥80% 31-60 ≤10%, ≥90% 61-300 ≤5%, ≥95% 301-3,000 ≤1%, ≥99% 3,001 or more ≤0.1%, ≥99.9% 2 16 Student Privacy Policy and Assistance Division

  17. What’s the missing number? 12 8 14 ? 6 2 17 Student Privacy Policy and Assistance Division

  18. What’s the missing number? 12 8 14 ? 6 44 2 18 Student Privacy Policy and Assistance Division

  19. What’s the missing number? 12 8 14 4 6 44 2 19 Student Privacy Policy and Assistance Division

  20. What’s the missing number? 12 8 14 CENSORED 6 44 CENSORED 2 20 Student Privacy Policy and Assistance Division

  21. What’s the missing number? Students by Subgroup 12 8 Students by 14 Gender 20 4 24 6 44 2 21 Student Privacy Policy and Assistance Division

  22. Lack of Complementary Suppression Below Subgroup # Tested Advanced Proficient Basic Basic Subgroup 1 11 0% 45% 36% 18 Subgroup 2 1 * * * * All Students 12 0% 42% 42% 17% 2 22 Student Privacy Policy and Assistance Division

  23. Lack of Complementary Suppression Below Subgroup # Tested Advanced Proficient Basic Basic Subgroup 1 11 0% 45% 36% 18 Subgroup 2 1 * * * * All Students 12 0% 42% 42% 17% 2 23 Student Privacy Policy and Assistance Division

  24. Lack of Complementary Suppression Below Subgroup # Tested Advanced Proficient Basic Basic Subgroup 1 11 0% 45% 36% 18 Subgroup 2 1 * * 100% * All Students 12 0% 42% 42% 17% 2 24 Student Privacy Policy and Assistance Division

  25. The Trouble with Cell Size Rules Remember: It’s not just the small cells that are important. Bigger cells/values can still be disclosive if: • they are extreme values (e.g., ~0% or ~100% of students in a group) , or • they can be used to calculate the values of protected cells elsewhere (in the same table, or even in another data release!) 2 25 Student Privacy Policy and Assistance Division

  26. Take Home Point: Consider All Reporting Levels Education data are often reported in a multi- dimensional structure. To be effective, a disclosure avoidance methodology must consider all levels of aggregation. 2 26 Student Privacy Policy and Assistance Division

  27. Lack of Complementary Suppression Below Subgroup # Tested Advanced Proficient Basic Basic Subgroup 1 11 0% 45% 36% 18 Subgroup 2 1 * * 100% * All Students 12 0% 42% 42% 17% 2 27 Student Privacy Policy and Assistance Division

  28. Complementary Suppression When using suppression to protect privacy, consider all the ways that data are aggregated. Subgroup Grade School District State national If a cell is suppressed (primary or complementary) at one level, it needs to be suppressed in at least one other reporting entity at the next level of aggregation. And make sure that those additional entities have proper complementary suppression too! 2 28 Student Privacy Policy and Assistance Division

  29. Take Home Point: Data Releases by Others When performing a disclosure risk analysis, you must consider data releases made by other organizations. How schools, districts, states, and the Federal government release the same (or related) data, may impact the re-identifiability of the data you (or they) release! 2 29 Student Privacy Policy and Assistance Division

  30. So What Are Your Options? The 3 “Flavors” of Disclosure Avoidance Techniques: • Suppression • “Blurring” • Perturbation 2 30 Student Privacy Policy and Assistance Division

  31. Suppression Removing data to prevent the identification of individuals Definition: in small cells or with unique characteristics  Cell Suppression Examples:  Row Suppression  Sampling Effect on Data Utility:  Results in very little data being produced for small populations  Requires suppression of additional, non-sensitive data (e.g., complementary suppression)  Suppression can be difficult to perform correctly Residual Risk of (especially for large multi-dimensional tables) Disclosure:  If additional data is available elsewhere, the suppressed data may be re-calculated. 2 31 Student Privacy Policy and Assistance Division

  32. “ Blurring “ Reducing the precision of data that is presented to Definition: reduce the certainty of identification  Aggregation Examples:  Percents  Ranges  Top/Bottom-Coding  Rounding Effect on Data Utility:  Users cannot make inferences about small changes in the data  Reduces the ability to perform time-series or cross- case analysis  Generally low risk, but if row/column totals are Residual Risk of published (or available elsewhere) then it may be Disclosure: possible to calculate the actual values of sensitive cells 2 32 Student Privacy Policy and Assistance Division

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend