Economics and the economics of privacy: new methods of accessing new data
Lars Vilhuber1
1Labor Dynamics Institute, ILR, Cornell University, United States
November 2015 UQAM Montr´ eal, Canada
Vilhuber UQAM2015 1 / 96
Economics and the economics of privacy: new methods of accessing new - - PowerPoint PPT Presentation
Economics and the economics of privacy: new methods of accessing new data Lars Vilhuber 1 1 Labor Dynamics Institute, ILR, Cornell University, United States November 2015 UQAM Montr eal, Canada Vilhuber UQAM2015 1 / 96 Disclaimer Context
1Labor Dynamics Institute, ILR, Cornell University, United States
Vilhuber UQAM2015 1 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 2 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Vilhuber’s work is partially funded by NSF Grants
This paper reports the results of research and analysis undertaken by Census Bureau staff. It has undergone a more limited review by the Census Bureau than its official publications. This report is released to inform interested parties and to encourage discussion. Any findings, conclusions or opinions are those of the authors. They do not necessarily reflect those of the Center for Economic Studies, the U.S. Census Bureau, or the National Science Foundation.
Vilhuber UQAM2015 3 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 4 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 5 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 6 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 7 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 7 / 96
Disclaimer Context Replicability Confidentiality Conclusion source Vilhuber UQAM2015 8 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Autor/Houseman doi:10.1257/app.2.3.96 Vilhuber UQAM2015 9 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Autor/Houseman doi:10.1257/app.2.3.96
Carrel and Hoekstra doi:10.1257/app.2.1.211 Vilhuber UQAM2015 9 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Damon doi:10.1257/app.2.2.147 Vilhuber UQAM2015 10 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 11 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Sounded by young scholars pursuing research programs
◮ Geospatial relations, ◮ Exact genome data, ◮ Networks of all sorts, ◮ Linked administrative records
◮ These researchers acquire authorized, generally
◮ But...
Vilhuber UQAM2015 12 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 13 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Replication of methods, data inputs, computational
◮ Journals, funding agencies (in the U.S.) have been moving
Vilhuber UQAM2015 14 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 15 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 15 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 15 / 96
Disclaimer Context Replicability Confidentiality Conclusion
http://www.econometricsociety.org/submissions.asp#4 as cited in Anderson et al (2005) Vilhuber UQAM2015 16 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Archiving (curation) of input data is complicated ◮ Knowledge discovery is complicated
Vilhuber UQAM2015 17 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 18 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 19 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Census of articles in the American Economic Journal:
◮ Each article is analyzed for availability of replication
◮ If data and programs are available, reproducibility is tested.
Vilhuber UQAM2015 20 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Table: Replication Success
Vilhuber UQAM2015 21 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Table: Reason for Replication Failure Missing Corrupted Code Missing Data Data Error Code Sum 2010 15 1 1 2 19 2011 15 1 1 3 20 2013 12 12 Total 42 2 2 5 51
Vilhuber UQAM2015 22 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Table: Reason for Missing Data
Vilhuber UQAM2015 23 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Table: Type of Access to Confidential Data
Vilhuber UQAM2015 24 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ article is open-access ◮ not clear about data access
Vilhuber UQAM2015 25 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Disclaimer Context Replicability Confidentiality Conclusion
Disclaimer Context Replicability Confidentiality Conclusion
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 26 / 96
Disclaimer Context Replicability Confidentiality Conclusion
(Huberman, Nature 482, 308 (16 February 2012) doi:10.1038/482308d)
◮ Biology (genetics data, chemical compounds) ◮ Computer science (search records, single-firm examples)
Vilhuber UQAM2015 27 / 96
Disclaimer Context Replicability Confidentiality Conclusion Vilhuber UQAM2015 28 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Better documentation about confidential data ◮ Solving the reproducibility problem
◮ New disclosure limitation techniques ◮ New data access models
Vilhuber UQAM2015 29 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 30 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ openICPSR https://www.openicpsr.org/ ◮ Harvard Dataverse
DS)
◮ Ontario Council of University Libraries:
DV, 5,289 files)
Vilhuber UQAM2015 31 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Underutilized ◮ When integrated into journal workflows, useless (blobs of
◮ Review process scrutinizes article citations ◮ Would be easy to enforce data citations
Vilhuber UQAM2015 32 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Deschenes, Elizabeth Piper, Susan Turner, and Joan Petersilia. Intensive Community Supervision in Minnesota, 1990-1992: A Dual Experiment in Prison Diversion and Enhanced Supervised Release [Computer file]. ICPSR06849-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2000. doi:10.3886/ICPSR06849 Abowd, John M.; Vilhuber, Lars, 2014, "Replication data for: National estimates of gross employment and job flows from the Quarterly Workforce Indicators with demographic and industry detail", doi:10.7910/DVN/27923, Harvard Dataverse [Distributor], V2 [src]
Vilhuber UQAM2015 33 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Deschenes, Elizabeth Piper, Susan Turner, and Joan Petersilia. Intensive Community Supervision in Minnesota, 1990-1992: A Dual Experiment in Prison Diversion and Enhanced Supervised Release [Computer file]. ICPSR06849-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2000. doi:10.3886/ICPSR06849 Abowd, John M.; Vilhuber, Lars, 2014, "Replication data for: National estimates of gross employment and job flows from the Quarterly Workforce Indicators with demographic and industry detail", doi:10.7910/DVN/27923, Harvard Dataverse [Distributor], V2 [src]
Vilhuber UQAM2015 33 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 34 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 34 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Disclaimer Context Replicability Confidentiality Conclusion
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 35 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Disclaimer Context Replicability Confidentiality Conclusion
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 36 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Simmhan, Plale, and Gannon, “A survey of data provenance in e-science,” ACM Sigmod Record, 2005 Vilhuber UQAM2015 37 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 38 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 39 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 40 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 41 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 42 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 43 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 44 / 96
Disclaimer Context Replicability Confidentiality Conclusion Vilhuber UQAM2015 45 / 96
Disclaimer Context Replicability Confidentiality Conclusion Vilhuber UQAM2015 46 / 96
Disclaimer Context Replicability Confidentiality Conclusion Vilhuber UQAM2015 47 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 48 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 49 / 96
Disclaimer Context Replicability Confidentiality Conclusion Vilhuber UQAM2015 50 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Capture the essential elements of programs, data, and
◮ Reproducible archives! ◮ Disclosure avoidance requests (Census RDC, German
Vilhuber UQAM2015 51 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 52 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Lacking from existing repositories of both data and
◮ Exposure of data providers ◮ Sometimes manually (labor intensive) performed by data
◮ Not currently done on RePEc
Vilhuber UQAM2015 53 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 54 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Deploy a graphical interface that maps co-author networks,
◮ ... and data provenance
◮ incoming: what data did an article use? (LDI Replication
workshop scaled up)
◮ outgoing: what data did an article create? (Better tracking
◮ Users (or contributors!) can “claim” data, or if hosted on a
Vilhuber UQAM2015 55 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ RD-Switchboard, based on ORCID IDs ◮ Direct DataCite/ORCID efforts
Vilhuber UQAM2015 56 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 57 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 58 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 59 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 60 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 60 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 61 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Src: Univ. Edinburgh – Micro, remote, safe settings (safePODS) – extending a safe setting network across a country Vilhuber UQAM2015 62 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Providing detailed and accurate statistics ◮ Protecting privacy and confidentiality
Vilhuber UQAM2015 63 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Providing detailed and accurate statistics ◮ Protecting privacy and confidentiality
Vilhuber UQAM2015 63 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Let researchers run wild (with models)... ◮ ... and limit what can be removed (mostly adhoc) ◮ RDCs ◮ remote processing with delay and cost
◮ Disclosure limitation (aggregation, swapping, suppression,
Vilhuber UQAM2015 64 / 96
Disclaimer Context Replicability Confidentiality Conclusion
p (δj) = (b − δ)
(b + δ − 2)
0, otherwise F (δj) = 0, δ < 2 − b
2 (b − a)2 , δ ∈ [2 − b, 2 − a] 0.5, δ ∈ (2 − a, a) 0.5 +
2 (b − a)2 , δ ∈ [a, b] 1, δ > b where a = 1 + c/100 and b = 1 + d/100 are constants chosen such that the true value is distorted by a minimum of c percent and a maximum of d percent
Vilhuber UQAM2015 65 / 96
Disclaimer Context Replicability Confidentiality Conclusion
jt computed from confidential value Xjt as
jt = δjXjt,
Vilhuber UQAM2015 66 / 96
Disclaimer Context Replicability Confidentiality Conclusion
k
nobs)
Vilhuber UQAM2015 67 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 68 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 69 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 70 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 71 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Users request account (no restrictions) ◮ Users run regression on synthetic data ◮ Users request validation against confidential data
Vilhuber UQAM2015 72 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 73 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 74 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 74 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 75 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 76 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 76 / 96
Disclaimer Context Replicability Confidentiality Conclusion
SSB v5.0 released SynLBD v2 released SSB v5.1 released SSB training SDS upgraded 25 50 75 100 2 1 Q 4 2 1 1 Q 1 2 1 1 Q 2 2 1 1 Q 3 2 1 1 Q 4 2 1 2 Q 1 2 1 2 Q 2 2 1 2 Q 3 2 1 2 Q 4 2 1 3 Q 1 2 1 3 Q 2 2 1 3 Q 3 2 1 3 Q 4 2 1 4 Q 1 2 1 4 Q 2 2 1 4 Q 3 2 1 4 Q 4 2 1 5 Q 1 2 1 5 Q 2 2 1 5 Q 3 2 1 5 Q 4
Accounts
SSB SynLBD
Vilhuber UQAM2015 77 / 96
Disclaimer Context Replicability Confidentiality Conclusion
k,m
k,m = 1
k,m over all estimated models and
Vilhuber UQAM2015 78 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Table: Confidence interval overlap J∗
k,m
Vilhuber UQAM2015 79 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 80 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 81 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Order matters! ◮ Data custodian must decide which queries (=tables) to
◮ Then leave remaining privacy budget to researchers (?)
Vilhuber UQAM2015 82 / 96
Disclaimer Context Replicability Confidentiality Conclusion
1≤i≤k {Pr [|ai − fi(x)| ≤ α]} ≥ 1 − β.
Vilhuber UQAM2015 83 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 84 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ database ◮ history
Vilhuber UQAM2015 85 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ Restricted-access: e.g. Health and Retirement
◮ Restricted remote access (remote data enclave): health
◮ Trade-off: Midlife in the United States (MIDUS) coarsens
Vilhuber UQAM2015 86 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 87 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 88 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 89 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 90 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 91 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 92 / 96
Disclaimer Context Replicability Confidentiality Conclusion
◮ In order to simulate Iterative Database Construction (IDC),
◮ Structure imposed by Synthetic Data Server (SDS) is
◮ Actionable metadata is critical for scalability
Vilhuber UQAM2015 93 / 96
Disclaimer Context Replicability Confidentiality Conclusion
Vilhuber UQAM2015 94 / 96
Extra slides
Vilhuber UQAM2015 95 / 96
Extra slides
Vilhuber UQAM2015 96 / 96