Inform rming Deci ecisi sions s while Prot
- tecting P
Privacy: y: T The Future of the F Federal St Statistical System
Katharine G. Abraham University of Maryland, NBER and IZA Data Science for the Public Good Forum September 17, 2019
Inform rming Deci ecisi sions s while Prot otecting P Privacy: - - PowerPoint PPT Presentation
Inform rming Deci ecisi sions s while Prot otecting P Privacy: y: T The Future of the F Federal St Statistical System Katharine G. Abraham University of Maryland, NBER and IZA Data Science for the Public Good Forum September 17, 2019
Katharine G. Abraham University of Maryland, NBER and IZA Data Science for the Public Good Forum September 17, 2019
statistical agencies for understanding our economy and society
Evidence-Based Policymaking two years ago this month
‒ Commission grew out of bipartisan interest in better using data Federal government holds while respecting rights to privacy and confidentiality ‒ Anniversary of Commission’s report a good occasion to take stock of the statistical agencies and where they’re headed
comes from surveys of households and businesses. Some examples:
‒ Poverty ‒ Health insurance coverage ‒ Crime victimization ‒ Employment and unemployment ‒ Wage rates and annual earnings ‒ Retail sales
‒ Consistent questions should produce consistent estimates over time
‒ Respondents provide information under a pledge of confidentiality (though understanding of what it means to honor that pledge is evolving)
6
survey respondents
‒ Respondents less motivated?
‒ Size of survey samples limits detail in published estimates
Source: Meyer, Mok and Sullivan (2015), adapted and updated
survey respondents
‒ Respondents less motivated?
‒ Size of survey samples limits detail in published estimates
12
0.2 AFDC/TANF FSP/SNAP OASI SSDI SSI UI WC Proportional bias in mean program dollars SIPP CPS ACS PSID CE
Source: Meyer, Mok, and Sullivan (2015), by program and survey, 2000-2012
survey respondents
‒ Respondents less motivated?
‒ Size of survey samples limits detail in published estimates
survey respondents
‒ Respondents less motivated?
‒ Size of survey samples limits detail in published estimates
‒ Example: In 1990, 87% of U.S. population had reported characteristics that likely made them unique based only on 5-digit ZIP2, gender, date of birth (Sweeney 2002)
sample file can be matched to same variables in public records or other accessible information
by Massachusetts Group Insurance Commission of hospital records for Governor William Weld, based on sex, date of birth and zip code linked to voter records
increase risk of a disclosure
Source: Krenzke and Li (2019)
information ‒ Example: Query tool may preclude reporting for samples that are too small, but results that are individually acceptable may reveal information about smaller implicit samples
‒ More than 7.7 billion linearly independent statistics—or about 25 statistics per person—published from 2010 Census data ‒ Can show possible to infer information about individuals through comparisons across tables (Garfinkel, Abowd and Martindale 2018)
19
administering government programs.
‒ Income tax returns (household and business) ‒ Unemployment insurance wage records ‒ Social assistance program applications and benefit receipt histories (e.g., TANF, SNAP, housing assistance) ‒ Social Security and Medicare records ‒ School records ‒ Customs declarations
23
‒Census Bureau authorized to obtain administrative data ‒Other statistical agencies do not have same authority
‒Transparency in use of data ‒Opportunity for public comment
‒ Ensure that data releases do not reveal information about individuals
‒ For microdata: Coarsening categorical variables, top-coding continuous variables, noise infusion, data swapping ‒ Tabular releases: Cell suppression (Swiss cheese tables), noise infusion and data swapping in underlying microdata, cell value rounding
disclosure associated with a data release
‒ Measure pertains to most vulnerable case in data ‒ Risk controlled by adding noise to output data
Speaker Paul Ryan (R-WI) and Senator Patty Murray (D-WA)
‒ Signed into law March 30, 2016
‒ Determine optimal arrangement under which administrative data, survey data, and related statistical data series may be integrated and made available for evidence building while protecting privacy and confidentiality. ‒ Consider whether a clearinghouse for program and survey data should be established and how to create such a clearinghouse. ‒ Make recommendations on how best to incorporate evidence building into program design.
29
Minority Leader, and the Senate Majority and Minority Leaders – 1/3 experts on privacy; 2/3 experts on program administration, data, or research
received and distilled areas of agreement into 22 recommendations
‒ Recommendations endorsed by all 15 Commissioners
30
researchers to data and facilitate linking of data sets
across government and not sufficiently dynamic in face of changing risks associated with use of data
and actors inside and outside government, including the establishment of a single entity to better support access and privacy
31
recommendations co-filed in House by Speaker Ryan and in Senate by Senator Murray
‒ Passed quickly through the House ‒ Voted out of Senate on December 19, 2018 ‒ Law signed by President Trump on January 14, 2019
made available to statistical agencies for use in building evidence (i.e., for statistical purposes)
make recommendations regarding the coordination and availability of data
researchers, state and local governments and other entities to access data for evidence-building purposes
the management and governance of data at each agency in collaboration with the agency’s statistical officials
Efficiency Act of 2002
‒ Protects information collected for statistical purposes ‒ Violation a Class E felony (5 years in prison and/or $250,000 fine)
‒ Establishes responsibilities for statistical agencies
analyses of data sensitivity
agendas)
the value of federal data for mission, service and the public good by guiding the Federal Government in practicing ethical governance, conscious design and learning culture.”
‒ Ethical governance includes building in checks and balances; practicing effective data stewardship, protecting individual privacy, maintaining promised confidentiality, and ensuring appropriate access and use; and promoting transparency. ‒ Conscious design includes protecting data quality and integrity; harnessing existing data; anticipating future uses when new data collections are designed; and demonstrating responsiveness. ‒ Learning culture includes investing in data infrastructure and human resources; developing data leaders; and practicing accountability.
Fall 2019
towards making better use of administrative data while protecting privacy
Evidence-Based Policymaking. Still to be addressed
‒ Legal barriers that preclude legitimate statistical uses of administrative data ‒ Institutional capacity within the federal government to facilitate data linkages and drive implementation of new privacy protection methodologies
recommendations about how to do this
“big data”. Some examples:
‒ Prices and product characteristics posted to the Web ‒ Scanner data from retail outlets ‒ Credit card transactions data (e.g., JP Morgan Chase data, Spending Pulse MasterCard data) ‒ Medical records data ‒ Sensor data (e.g., satellite imaging, traffic cameras) ‒ GPS tracking data (e.g., tractors, trucks)
‒Fill in missing information (e.g., industry, franchise status) ‒Improve early estimates by providing timely information about recent trends ‒Inform modeled estimates for local geographies ‒More ambitiously, allow agencies to rethink how core estimates produced
‒ Cost of acquiring data ‒ Suitability for use in producing official statistics
‒ Privacy and confidentiality
centralized agency approach
‒ Inefficient and possibly counterproductive for agencies all to be developing separate relationships with private sector data providers
‒ Expect new model to involve tiered access together with expanded capacity for external researchers to work behind(virtual) firewall
results from 2020 Census
‒ Expect use of differential privacy to spread
methods
privacy and information
produce the data their customers need
‒ Better data ‒ Stronger privacy and confidentiality protections
Strategy are exciting developments
‒ Creating new opportunities