Locally Private Release of Marginal Statistics Graham Cormode - PowerPoint PPT Presentation

Locally Private Release of Marginal Statistics Graham Cormode g.cormode@warwick.ac.uk Tejas Kulkarni (Warwick) Divesh Srivastava (AT&T) 1

Privacy with a coin toss Perhaps the simplest possible formal privacy algorithm:  Scenario. Each user has a single private bit of information – Encoding e.g. political/sexual/religious preference, illness, etc. 2

Privacy with a coin toss Perhaps the simplest possible formal privacy algorithm:  Scenario. Each user has a single private bit of information – Encoding e.g. political/sexual/religious preference, illness, etc.  Algorithm. Toss a (biased) coin, and – With probability p > ½, report the true answer – With probability 1-p, lie 2

Privacy with a coin toss Perhaps the simplest possible formal privacy algorithm:  Scenario. Each user has a single private bit of information – Encoding e.g. political/sexual/religious preference, illness, etc.  Algorithm. Toss a (biased) coin, and – With probability p > ½, report the true answer – With probability 1-p, lie  Aggregation. Collect responses from a large number N of users – Can ‘unbias’ the estimate (if we know p) of the population fraction – The error in the estimate is proportional to 1/√N 2

Privacy with a coin toss Perhaps the simplest possible formal privacy algorithm:  Scenario. Each user has a single private bit of information – Encoding e.g. political/sexual/religious preference, illness, etc.  Algorithm. Toss a (biased) coin, and – With probability p > ½, report the true answer – With probability 1-p, lie  Aggregation. Collect responses from a large number N of users – Can ‘unbias’ the estimate (if we know p) of the population fraction – The error in the estimate is proportional to 1/√N  Analysis. Gives differential privacy with parameter ε = ln (p/(1-p)) – Works well in theory, but would anyone ever use this? 2

Privacy in practice 3

Privacy in practice  Differential privacy based on coin tossing is widely deployed – In Google Chrome browser, to collect browsing statistics – In Apple iOS and MacOS, to collect typing statistics – This yields deployments of over 100 million users 3

Privacy in practice  Differential privacy based on coin tossing is widely deployed – In Google Chrome browser, to collect browsing statistics – In Apple iOS and MacOS, to collect typing statistics – This yields deployments of over 100 million users  The model where users apply differential privately and then aggregated is known as “ Local Differential Privacy ” – The alternative is to give data to a third party to aggregate – The coin tossing method is known as ‘randomized response’ 3

Privacy in practice  Differential privacy based on coin tossing is widely deployed – In Google Chrome browser, to collect browsing statistics – In Apple iOS and MacOS, to collect typing statistics – This yields deployments of over 100 million users  The model where users apply differential privately and then aggregated is known as “ Local Differential Privacy ” – The alternative is to give data to a third party to aggregate – The coin tossing method is known as ‘randomized response’  Local Differential privacy is state of the art in 2017: Randomized response invented in 1965: five decade lead time! 3

Going beyond 1 bit of data 1 bit can tell you a lot, but can we do more?  Recent work: materializing marginal distributions – Each user has d bits of data (encoding sensitive data) – We are interested in the distribution of combinations of attributes 4

Going beyond 1 bit of data 1 bit can tell you a lot, but can we do more?  Recent work: materializing marginal distributions – Each user has d bits of data (encoding sensitive data) – We are interested in the distribution of combinations of attributes Gender Obese High BP Smoke Disease Alice 1 0 0 1 0 Bob 0 1 0 1 1 … Zayn 0 0 1 0 0 4

Going beyond 1 bit of data 1 bit can tell you a lot, but can we do more?  Recent work: materializing marginal distributions – Each user has d bits of data (encoding sensitive data) – We are interested in the distribution of combinations of attributes Gender Obese High BP Smoke Disease Alice 1 0 0 1 0 Bob 0 1 0 1 1 … Zayn 0 0 1 0 0 Gender/Obese 0 1 Disease/Smoke 0 1 0 0.28 0.22 0 0.55 0.15 1 0.29 0.21 1 0.10 0.20 4

Nail, meet hammer  Could apply Randomized Reponse to each entry of each marginal – To give an overall guarantee of privacy, need to change p – The more bits released by a user, the closer p gets to ½ (noise) 5

Nail, meet hammer  Could apply Randomized Reponse to each entry of each marginal – To give an overall guarantee of privacy, need to change p – The more bits released by a user, the closer p gets to ½ (noise)  Need to design algorithms that minimize information per user 5

Nail, meet hammer  Could apply Randomized Reponse to each entry of each marginal – To give an overall guarantee of privacy, need to change p – The more bits released by a user, the closer p gets to ½ (noise)  Need to design algorithms that minimize information per user  First observation: a sampling trick – If we release n bits of information per user, the error is n/√N – If we sample 1 out of n bits, the error is √(n/N) – Quadratically better to sample than to share! 5

What to materialize? Different approaches based on how information is revealed 6

What to materialize? Different approaches based on how information is revealed 1. We could reveal information about all marginals of size k – There are (d choose k) such marginals, of size 2 k each 6

What to materialize? Different approaches based on how information is revealed 1. We could reveal information about all marginals of size k – There are (d choose k) such marginals, of size 2 k each 2. Or we could reveal information about the full distribution – There are 2 d entries in the d-dimensional distribution – Then aggregate results here (obtaining additional error) 6

What to materialize? Different approaches based on how information is revealed 1. We could reveal information about all marginals of size k – There are (d choose k) such marginals, of size 2 k each 2. Or we could reveal information about the full distribution – There are 2 d entries in the d-dimensional distribution – Then aggregate results here (obtaining additional error)  Still using randomized response on each entry – Approach 1 (marginals): cost proportional to 2 3k/2 d k/2 /√N – Approach 2 (full): cost proportional to 2 (d+k)/2 /√N 6

What to materialize? Different approaches based on how information is revealed 1. We could reveal information about all marginals of size k – There are (d choose k) such marginals, of size 2 k each 2. Or we could reveal information about the full distribution – There are 2 d entries in the d-dimensional distribution – Then aggregate results here (obtaining additional error)  Still using randomized response on each entry – Approach 1 (marginals): cost proportional to 2 3k/2 d k/2 /√N – Approach 2 (full): cost proportional to 2 (d+k)/2 /√N  If k is small (say, 2), and d is large (say 10s), Approach 1 is better – But there’s another approach to try… 6

Hadamard transform Instead of materializing the data, we can transform it  Via Hadamard transform (the discrete Fourier transform for the binary hypercube) – Simple and fast to apply 7

Hadamard transform Instead of materializing the data, we can transform it  Via Hadamard transform (the discrete Fourier transform for the binary hypercube) – Simple and fast to apply  Property 1: only (d choose k) coefficients are needed to build any k-way marginal – Reduces the amount of information to release 7

Hadamard transform Instead of materializing the data, we can transform it  Via Hadamard transform (the discrete Fourier transform for the binary hypercube) – Simple and fast to apply  Property 1: only (d choose k) coefficients are needed to build any k-way marginal – Reduces the amount of information to release  Property 2: Hadamard transform is a linear transform – Can estimate global coefficients by sampling and averaging 7

Hadamard transform Instead of materializing the data, we can transform it  Via Hadamard transform (the discrete Fourier transform for the binary hypercube) – Simple and fast to apply  Property 1: only (d choose k) coefficients are needed to build any k-way marginal – Reduces the amount of information to release  Property 2: Hadamard transform is a linear transform – Can estimate global coefficients by sampling and averaging  Yields error proportional to 2 k/2 d k/2 /√N – Better than both previous methods (in theory) 7

Empirical behaviour  Compare three methods: Hadamard based (Inp_HT), marginal materialization (Marg_PS), Expectation maximization (Inp_EM)  Measure sum of absolute error in materializing 2-way marginals  N = 0.5M individuals, vary privacy parameter ε from 0.4 to 1.4 8

Applications – χ -squared test  Anonymized, binarized NYC taxi data  Compute χ -squared statistic to test correlation  Want to be same side of the line as the non-private value! 9

Application – building a Bayesian model  Aim: build the tree with highest mutual information (MI)  Plot shows MI on the ground truth data for evaluation purposes 10

Locally Private Release of Marginal Statistics Graham Cormode - PowerPoint PPT Presentation

Locally Private Release of Marginal Statistics Graham Cormode g.cormode@warwick.ac.uk Tejas Kulkarni (Warwick) Divesh Srivastava (AT&T) 1 Privacy with a coin toss Perhaps the simplest possible formal privacy algorithm: Scenario. Each

Release Reporting RELEASE NOTI FI CATI ON RELEASE NOTI FI CATI ON RELEASE NOTI FI CATI ON

Locally tabular polymodal logics Ilya Shapirovsky Institute for Information Transmission Problems

Release management in Debian - Can we do better? Frans Pop FOSDEM 2009, Brussels Frans Pop

Grid.java public public class class Grid { private private final final int int width;

Thinking Globally, Thinking Globally, Acting Locally Acting Locally The Chicago Experience with

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Presentation on Locally-Owned Wind Market Community Wind Community Wind What is Community Wind?

Hands-On Getting ready... Run a task that accesses ESD Locally Locally with ROOT

Entropy on totally disconnected locally compact groups Anna Giordano Bruno (joint works with

Non locally modular reducts of ACF Dmitry Sustretov Hebrew University Neostability theory,

POZIERES RELIC Private WOOD HC Private POTTER TJA DIV FIELD ARTILLERY LCPL PRIEST TH Private

CSC2412: Private Multiplicative Weights Sasho Nikolov 1 Query Release Reminder: Query Release

Building Blocks of Privacy: Differentially Private Mechanisms Graham Cormode graham@cormode.org

NOT FOR DISTRIBUTION OR RELEASE IN THE UNITED STATES NOT FOR DISTRIBUTION OR RELEASE IN THE UNITED

Delayed Release Sustained Release Controlled Release

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Large-Scale Data Engineering Modern SQL-on-Hadoop Systems event.cwi.nl/lsde2015 Analytical

choco Marc de Falco (IML) The GoI of Differential Nets LiCS08 1 / 22 Outline We study

Complexity of Well-Quasi-Orderings and Well-Structured Transition Systems Part IV: Complexity of

Attila Szegedi, Software Engineer @asz Thursday, October 13, 11 Everything I ever learned about

Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical

Chapter 26 Compression, Information and Entropy Huffmans coding CS 573: Algorithms, Fall

Todays meeting: Early Steps into Inferotemporal Cortex Lecturer: Carlos R. Ponce, M.D., Ph.D.

A New Encoding Algorithm for a Multidimensional Version of the Montgomery Ladder Aaron Hutchinson

Locally Private Release of Marginal Statistics Graham Cormode - PowerPoint PPT Presentation

Locally Private Release of Marginal Statistics Graham Cormode g.cormode@warwick.ac.uk Tejas Kulkarni (Warwick) Divesh Srivastava (AT&T) 1 Privacy with a coin toss Perhaps the simplest possible formal privacy algorithm: Scenario. Each

Release Reporting RELEASE NOTI FI CATI ON RELEASE NOTI FI CATI ON RELEASE NOTI FI CATI ON

Locally tabular polymodal logics Ilya Shapirovsky Institute for Information Transmission Problems

Release management in Debian - Can we do better? Frans Pop FOSDEM 2009, Brussels Frans Pop

Grid.java public public class class Grid { private private final final int int width;

Thinking Globally, Thinking Globally, Acting Locally Acting Locally The Chicago Experience with

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Presentation on Locally-Owned Wind Market Community Wind Community Wind What is Community Wind?

Hands-On Getting ready... Run a task that accesses ESD Locally Locally with ROOT

Entropy on totally disconnected locally compact groups Anna Giordano Bruno (joint works with

Non locally modular reducts of ACF Dmitry Sustretov Hebrew University Neostability theory,

POZIERES RELIC Private WOOD HC Private POTTER TJA DIV FIELD ARTILLERY LCPL PRIEST TH Private

CSC2412: Private Multiplicative Weights Sasho Nikolov 1 Query Release Reminder: Query Release

Building Blocks of Privacy: Differentially Private Mechanisms Graham Cormode graham@cormode.org

NOT FOR DISTRIBUTION OR RELEASE IN THE UNITED STATES NOT FOR DISTRIBUTION OR RELEASE IN THE UNITED

Delayed Release Sustained Release Controlled Release

Market Simulation Winter 2017 Release Trang Deluca Sr. Change &amp; Release Planner Winter

Large-Scale Data Engineering Modern SQL-on-Hadoop Systems event.cwi.nl/lsde2015 Analytical

choco Marc de Falco (IML) The GoI of Differential Nets LiCS08 1 / 22 Outline We study

Complexity of Well-Quasi-Orderings and Well-Structured Transition Systems Part IV: Complexity of

Attila Szegedi, Software Engineer @asz Thursday, October 13, 11 Everything I ever learned about

Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical

Chapter 26 Compression, Information and Entropy Huffmans coding CS 573: Algorithms, Fall

Todays meeting: Early Steps into Inferotemporal Cortex Lecturer: Carlos R. Ponce, M.D., Ph.D.

A New Encoding Algorithm for a Multidimensional Version of the Montgomery Ladder Aaron Hutchinson

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter