CSE 416, Section 1 Semester Project Approach Session Objectives - - PDF document

cse 416 section 1
SMART_READER_LITE
LIVE PREVIEW

CSE 416, Section 1 Semester Project Approach Session Objectives - - PDF document

Session 3 Project Approach CSE 416, Section 1 Semester Project Approach Session Objectives Understand the analysis approach taken by MGGG in their analysis of the effects of racial Gerrymandering in the Virginia House of Delegates


slide-1
SLIDE 1

Session 3 – Project Approach 11/9/2020 1

Robert Kelly, 2020

CSE 416, Section 1

Semester Project Approach

Session Objectives

Understand the analysis approach taken by MGGG in their analysis

  • f the effects of racial Gerrymandering in the Virginia House of

Delegates Understand your high-level approach to the project Begin to think about design choices Begin to understand data requirements to support analysis

Robert Kelly, 2020 2

slide-2
SLIDE 2

Session 3 – Project Approach 11/9/2020 2

Robert Kelly, 2020

Reading

Comparison of Districting Plans for the Virginia House of Delegates, Metric Geometry and Gerrymandering Group, MGG, https://mggg.org/VA-report.pdf Wikipedia - https://en.wikipedia.org/wiki/Graph_partition - Basic background

Robert Kelly, 2020 3

Lots of approaches to graph partitioning, but we are not concerned with minimizing edges and generating equality

  • f nodes, so we use a multi-level method

Project Background

Project based on MGGG analysis for the court

Virginia House of Delegates Eleven state districts were ruled unconstitutional Analysis examined original (unconstitutional) plan and possible replacement plans (e.g., Republican suggested plan) Analysis method “highlights and quantifies the dilutive effects of packing Black Voting Age Population (BVAP)” (>55% in 11 districts) Analysis looked at the 11 districts and immediate neighbors (total of 33)

Robert Kelly, 2020 4

Study remarks that 37% BVAP is the empirical line for African- American representation, >55% considered excessive

slide-3
SLIDE 3

Session 3 – Project Approach 11/9/2020 3

Robert Kelly, 2020

Analysis Results

Districts sorted by (lowest to highest) BVAP Each district shows the range of BVAP values in ensemble

Robert Kelly, 2020 5

Typical box and whisker plot

Box And Whisker Plot

Robert Kelly, 2020 6

Median 99th percentile Evaluated plan result Target BVAP range

slide-4
SLIDE 4

Session 3 – Project Approach 11/9/2020 4

Robert Kelly, 2020

How Do You Generate a “Random” Districting Plan?

Recall that a districting plan is a partition of the k-node precinct graph into n subgraphs, each of which is 1) connected and 2) adheres to state districting requirements (e.g., equal population, compact, fewer counties, etc.) 2-stage process in which initially each node is considered a cluster

  • 1. Recursively combine neighboring clusters until the overall graph

reaches n clusters

  • 2. Recursively balance pairs of clusters until eventually each cluster

achieves the compactness goal and the population distribution goal

Robert Kelly, 2020 7

How Do We Form n Clusters

In each iteration

For each cluster, select a random neighboring cluster, and combine the two clusters into a new cluster Update neighbors for each newly formed cluster Terminate when number of clusters equals n

Robert Kelly, 2020 8

There will likely be optional use cases in which you can try out variations on this algorithm

slide-5
SLIDE 5

Session 3 – Project Approach 11/9/2020 5

Robert Kelly, 2020

How Do We Balance the Clusters?

For each cluster

Select a neighboring cluster at random to rebalance Generate a spanning tree of the combined cluster (you can try different approaches) Form the set of edges to cut that will “improve” some combination of 1) compactness and 2) population equality

Robert Kelly, 2020 9

A tree is a connected undirected graph with no cycles. It is a spanning tree of a graph G if it spans G (that is, it includes every vertex of G) and is a subgraph of G (every edge in the tree belongs to G). A spanning tree of a connected graph G can also be defined as a maximal set of edges of G that contains no cycle, or as a minimal set of edges that connect all vertices. - Wikipedia

When Do We Terminate the Algorithm?

Terminate when the redistricting plan has

  • 1. population difference between the most populous cluster (district) and

the least populous cluster (district) in the state is less than a user provided threshold Compactness measure for each district is less than a user provided threshold

Robert Kelly, 2020 10

Will this approach provide “random” districting plans? How do you determine if the plans appear random?

slide-6
SLIDE 6

Session 3 – Project Approach 11/9/2020 6

Robert Kelly, 2020

Begin to Think about Implications

How do we represent a node (i.e., precinct)? How do we represent a cluster (i.e., district)? How do we calculate neighbors? How do we measure compactness? How do we display a “random” district plan to the user? How do we verify that the results are random?

Robert Kelly, 2020 11

Multiple District Plans

You will generate multiple district plans (usually referred to as districtings) Run multiple Python processes, each of which will generate a plan Run a small number of processes on your laptop/desktop, but a larger number on the SeaWulf Each process might run multiple algorithms in sequence until desired number of plans are generated

Robert Kelly, 2020 12

slide-7
SLIDE 7

Session 3 – Project Approach 11/9/2020 7

Robert Kelly, 2020

Graph Node Geography

Project building block unit is the geography of a precinct Boundary data is generally available, but may not be totally accurate Combine/split of a cluster requires calculating the new cluster boundaries BVAP (and other minority) population data is generally not available for a precinct (you will need to map census data to it)

Robert Kelly, 2020 13

Note: MGGG paper uses census blocks as the building block

Reports

Your project will generate summary data for a variety of runs For example

Independence of results from seed Change in population balance vs. iteration Ideal Markov chain length Single seed districting plan vs. one for each random district Comparison with Gerrymandering measure results

Robert Kelly, 2020 14

slide-8
SLIDE 8

Session 3 – Project Approach 11/9/2020 8

Robert Kelly, 2020

Differences Between Project and MGGG Report

MGGG

  • State districting
  • Virginia
  • Analysis of 33 VA house districts
  • Markov chain
  • 100 seed plans
  • Census block building blocks
  • Seed plan population balanced
  • Spanning tree cuts balance
  • Phase 2: Flip, ReCom, and Mix
  • Specific compactness measure

CSE416

  • Congressional Districting
  • Multiple states
  • Complete state analysis
  • No concern for Markov chain validity
  • Phase 1 graph partition approach
  • Precinct building blocks
  • Seed plans unbalanced by population
  • Spanning tree cuts reduce pop. Disparity
  • Phase 2: Modified ReCom only
  • Multiple compactness measures

Robert Kelly, 2020 15

This comparison is meant to help you when reading the MGGG paper

Things to Think About

What data is needed to run the algorithm? What data is needed to display the results in the Web client? How and when do you transmit data from the client to the server? Can you store partial results when building a districting ensemble? How are results passed from the SeaWulf to your server? What does the GUI look like?

How does the user request a run of multiple districting plans? How do you display summary data from a run? What debugging features should be built into the GUI?

How do you keep track of design options/decisions/questions?

Robert Kelly, 2020 16

slide-9
SLIDE 9

Session 3 – Project Approach 11/9/2020 9

Robert Kelly, 2020

Top-Level System Architecture

Robert Kelly, 2017-2020 17

GUI (JavaScript) Server Logic (Java) Resource DB Data Population (Python) Data sources SeaWulf (Java) Project DB

Have You Satisfied the Objectives?

Understand the analysis approach taken by MGGG in their analysis

  • f the effects of racial Gerrymandering in the Virginia House of

Delegates Understand your high-level approach to the project Begin to think about design choices Begin to understand data requirements to support analysis

Robert Kelly, 2020 18