Differential Privacy and Redistricting: Investigations On and Off - - PowerPoint PPT Presentation

differential privacy and redistricting investigations on
SMART_READER_LITE
LIVE PREVIEW

Differential Privacy and Redistricting: Investigations On and Off - - PowerPoint PPT Presentation

Differential Privacy and Redistricting: Investigations On and Off the Census Spine Aloni Cohen, Moon Duchin, JN Matthews, Bhushan Suwal, Peter Wayner What is differential privacy? And Why is the Census using it? The problem We performed a


slide-1
SLIDE 1

Differential Privacy and Redistricting: Investigations On and Off the Census Spine

Aloni Cohen, Moon Duchin, JN Matthews, Bhushan Suwal, Peter Wayner

slide-2
SLIDE 2

What is differential privacy? And Why is the Census using it?

slide-3
SLIDE 3

“We performed a reconstruction attack and re-identified data from 17% of the US

  • population. “

Simson Garfinkel, Senior Scientist, US Census Bureau Joint Statistical Meetings, July 31 2019

https://tinyurl.com/y8zndygh

The problem

slide-4
SLIDE 4

A demonstration reconstruction attack

Encode constraints, throw them into a solver Garfinkel, Abowd, Martindale : https://dl.acm.org/doi/10.1145/3287287

slide-5
SLIDE 5

Title 13, Section 9 of the US Code forbids the Bureau from releasing personally identifiable information about individuals. The Census Bureau is obligated to use a privacy protecting mechanism.

slide-6
SLIDE 6

What is Differential Privacy?

“Differential Privacy is a definition, not an algorithm.” - Dwork and Roth, 2014 Mathematically: Intuitively: You can only extract the same knowledge from the database, with or without my data.

slide-7
SLIDE 7

Essentially, the Census is going to intentionally inject random noise into their counts to protect the privacy of the population. Their differentially-private algorithm is called TopDown.

What is Differential Privacy for the US Census?

slide-8
SLIDE 8

Some vocabulary: Epsilon Budget and Epsilon Splits

  • What is the “privacy budget” 𝜗?

○ Real positive number ○ The bigger 𝜗 is the more accurate the results ○ The smaller it is the more privacy ○ At 𝜗 = 0 – total noise ○ At 𝜗 = ∞ – truth

  • The 𝜗 is split among the different levels, for eg a valid split would be

Nation - State - County - Tract - Block Group - Block

  • How much epsilon budget to set, and how to split it, is a policy decision.
  • 0. 1 - 0.1 - 0.2 - 0.2 - 0.2 - 0.2 = 1.0
slide-9
SLIDE 9

Why TopDown?

We want consistency both at each level and up/down the hierarchy Within level: The sum of counts of people in each age group is the same as the total population. Across levels: The population of the Counties in a state sum to the population of that state

Census Geographical Hierarchy

slide-10
SLIDE 10

How TopDown works?

Input: Table of responses to the census. List of people / responses grouped by households. Output: List of counts of people at each census unit The consistency we want define constraints:

  • Add noise to each level
  • Adjust noised results at level to satisfy

constraints

  • Then adjusted lower levels.

TopDown Algorithm

https://tinyurl.com/y8zndygh

slide-11
SLIDE 11

What does this look like?

Dallas County:

  • 1 out of 55 Counties in Texas
  • 529 Tracts
  • 1669 Block Groups
  • 44113 Blocks
slide-12
SLIDE 12

How will it affect Redistricting?

VRA cases, Population Balance, and our experiments

slide-13
SLIDE 13

Common Concerns

  • Adding noise to the data will make it un-usable for research.
  • Will it weaken a Racially Polarized Voting (RPV) signal? (Gingles 2)
  • Can we trust it to draw population-balanced districts?
slide-14
SLIDE 14

We started by building Toy Models

“noising” “post-processing” Captures: consistency across hierarchy Does not capture: integer valued counts, non-negativity, multiple attributes

slide-15
SLIDE 15

Less Error when you stay on the Census Spine

This Toy Experiment has 3 levels: Blocks, Block groups, Tracts. We repeatedly constructed synthetic districts built of proportion p blocks of every block group in tract.

slide-16
SLIDE 16

“ToyDown” Model

Counts at Block level Tree of census hierarchy with aggregated counts Epsilon budget per level Noised tree Solver (Gurobi) Previously adjusted level (TopDown) Tree with adjusted counts Captures: consistency across hierarchy, non-negativity, limited multiple attributes Does not capture: integer valued counts Constraints

slide-17
SLIDE 17

RPV in Irving ISD

Gaussian Noise vs. Model `ToyDown` Noise

slide-18
SLIDE 18

Variance in Random districts (Dallas County)

Random districts with ¼ of Dallas County Population