Fair k -centers via Maximum Matching by Huy Nguyen, Matthew Jones, - - PowerPoint PPT Presentation

fair k centers via maximum matching
SMART_READER_LITE
LIVE PREVIEW

Fair k -centers via Maximum Matching by Huy Nguyen, Matthew Jones, - - PowerPoint PPT Presentation

Fair k -centers via Maximum Matching by Huy Nguyen, Matthew Jones, Thy Nguyen June 15, 2020 by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k -centers via Maximum Matching June 15, 2020 1 / 18 Content Introduction The fair k -centers problem


slide-1
SLIDE 1

Fair k-centers via Maximum Matching

by Huy Nguyen, Matthew Jones, Thy Nguyen June 15, 2020

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 1 / 18

slide-2
SLIDE 2

Content

Introduction The fair k-centers problem Approach using maximum matching Experiments

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 2 / 18

slide-3
SLIDE 3

Introduction

Clustering

Clustering - using a small set of centers to approximate a large data set. k-centers clustering - minimize the maximum cluster radius Formally: Input: k, a set S of n points, a metric d Find: arg min

S′⊆S,|S′|=k max s∈S d(s, S′)

where d(s, S′) = mins′∈S′ d(s, s′).

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 3 / 18

slide-4
SLIDE 4

Introduction

k-Centers Clustering

The k-centers problem is NP-hard (up to a 2-approximation)

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 4 / 18

slide-5
SLIDE 5

Introduction

k-Centers Clustering

The k-centers problem is NP-hard (up to a 2-approximation) Gonzalez gives a greedy 2-approximation algorithm

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 4 / 18

slide-6
SLIDE 6

Introduction

k-Centers Clustering

The k-centers problem is NP-hard (up to a 2-approximation) Gonzalez gives a greedy 2-approximation algorithm

Choose the first center arbitrarily

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 4 / 18

slide-7
SLIDE 7

Introduction

k-Centers Clustering

The k-centers problem is NP-hard (up to a 2-approximation) Gonzalez gives a greedy 2-approximation algorithm

Choose the first center arbitrarily Choose each center as the farthest from the previously selected centers

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 4 / 18

slide-8
SLIDE 8

Introduction

k-Centers Clustering

The k-centers problem is NP-hard (up to a 2-approximation) Gonzalez gives a greedy 2-approximation algorithm

Choose the first center arbitrarily Choose each center as the farthest from the previously selected centers O(n) time to choose each center, whole algorithm is O(nk)

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 4 / 18

slide-9
SLIDE 9

Introduction

A Framework for Fairness

Fairness - removing inherent bias in an algorithm. Not necessarily an inherent mathematical concept To add fairness: Items in S have a demographic group property Each dem. group i gets ki centers m

i=1 ki = k

In these slides, we use ”fair” to mean satisfying all ki as upper bounds.

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 5 / 18

slide-10
SLIDE 10

The Fair k-Centers Problem

Previous Work on k-centers with Fairness

Multiple papers present algorithms for fair k-centers: Chen et al. presented a 3-approximation algorithm, runs in Ω(n2 log n)

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 6 / 18

slide-11
SLIDE 11

The Fair k-Centers Problem

Previous Work on k-centers with Fairness

Multiple papers present algorithms for fair k-centers: Chen et al. presented a 3-approximation algorithm, runs in Ω(n2 log n) Kleindessner et al. introduced an O(nkm2 + km4) algorithm with guaranteed approximation factor 3 · 2m−1 − 1

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 6 / 18

slide-12
SLIDE 12

The Fair k-Centers Problem

Previous Work on k-centers with Fairness

Multiple papers present algorithms for fair k-centers: Chen et al. presented a 3-approximation algorithm, runs in Ω(n2 log n) Kleindessner et al. introduced an O(nkm2 + km4) algorithm with guaranteed approximation factor 3 · 2m−1 − 1 We present an O(nk)-time 3-approximation algorithm for fair k-centers

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 6 / 18

slide-13
SLIDE 13

Our Approach

Overview

A high-level overview of the algorithm is as follows: Obtain k initial (unfair) centers, using Gonzalez

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 7 / 18

slide-14
SLIDE 14

Our Approach

Overview

A high-level overview of the algorithm is as follows: Obtain k initial (unfair) centers, using Gonzalez Find the largest prefix of these which can be ”shifted fairly”

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 7 / 18

slide-15
SLIDE 15

Our Approach

Overview

A high-level overview of the algorithm is as follows: Obtain k initial (unfair) centers, using Gonzalez Find the largest prefix of these which can be ”shifted fairly” Shift these centers, choose the rest arbitrarily

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 7 / 18

slide-16
SLIDE 16

Our Approach

Overview

A high-level overview of the algorithm is as follows: Obtain k initial (unfair) centers, using Gonzalez Find the largest prefix of these which can be ”shifted fairly” Shift these centers, choose the rest arbitrarily The first step is well-defined, how do we accomplish the second and third steps?

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 7 / 18

slide-17
SLIDE 17

Our Approach

Fair Shift Constraint

Fair Shift - replacing each point with a ”neighbor” such that the new set is fair Does a fair shift exist within radius r for some set of points P?

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 8 / 18

slide-18
SLIDE 18

Our Approach

Fair Shift Constraint

Fair Shift - replacing each point with a ”neighbor” such that the new set is fair Does a fair shift exist within radius r for some set of points P? Draw balls of radius r around the centers

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 8 / 18

slide-19
SLIDE 19

Our Approach

Fair Shift Constraint

Fair Shift - replacing each point with a ”neighbor” such that the new set is fair Does a fair shift exist within radius r for some set of points P? Draw balls of radius r around the centers Reduce to matching:

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 8 / 18

slide-20
SLIDE 20

Our Approach

Fair Shift Constraint

Fair Shift - replacing each point with a ”neighbor” such that the new set is fair Does a fair shift exist within radius r for some set of points P? Draw balls of radius r around the centers Reduce to matching:

Each point in P gets one point in partition A

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 8 / 18

slide-21
SLIDE 21

Our Approach

Fair Shift Constraint

Fair Shift - replacing each point with a ”neighbor” such that the new set is fair Does a fair shift exist within radius r for some set of points P? Draw balls of radius r around the centers Reduce to matching:

Each point in P gets one point in partition A Each demographic group gets ki points in partition B

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 8 / 18

slide-22
SLIDE 22

Our Approach

Fair Shift Constraint

Fair Shift - replacing each point with a ”neighbor” such that the new set is fair Does a fair shift exist within radius r for some set of points P? Draw balls of radius r around the centers Reduce to matching:

Each point in P gets one point in partition A Each demographic group gets ki points in partition B ab ∈ E iff point a (in P) has demographic group b in its ball (including a itself)

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 8 / 18

slide-23
SLIDE 23

Our Approach

Fair Shift Constraint

Fair Shift - replacing each point with a ”neighbor” such that the new set is fair Does a fair shift exist within radius r for some set of points P? Draw balls of radius r around the centers Reduce to matching:

Each point in P gets one point in partition A Each demographic group gets ki points in partition B ab ∈ E iff point a (in P) has demographic group b in its ball (including a itself)

Edges in match of size |P| give a fair shift iff one exists

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 8 / 18

slide-24
SLIDE 24

Our Approach

Optimizing the Algorithm

For runtime, it is more efficient to view this as a maximum flow problem: Partition B gets 1 point per demographic group.

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 9 / 18

slide-25
SLIDE 25

Our Approach

Optimizing the Algorithm

For runtime, it is more efficient to view this as a maximum flow problem: Partition B gets 1 point per demographic group. Add edges from s to Partition A with capacity 1 and from Partition B to t with capacity ki.

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 9 / 18

slide-26
SLIDE 26

Our Approach

Optimizing the Algorithm

For runtime, it is more efficient to view this as a maximum flow problem: Partition B gets 1 point per demographic group. Add edges from s to Partition A with capacity 1 and from Partition B to t with capacity ki. Now, each point in S yields at most 1 edge ab ∈ E so |E| ≤ n + O(k) = O(n) and |V | = 2 + 2k + m = O(k)

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 9 / 18

slide-27
SLIDE 27

Our Approach

Optimizing the Algorithm

For runtime, it is more efficient to view this as a maximum flow problem: Partition B gets 1 point per demographic group. Add edges from s to Partition A with capacity 1 and from Partition B to t with capacity ki. Now, each point in S yields at most 1 edge ab ∈ E so |E| ≤ n + O(k) = O(n) and |V | = 2 + 2k + m = O(k) ⇒ O(nk1/2) time to check for a fair shift, using Dinitz’s algorithm.

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 9 / 18

slide-28
SLIDE 28

Our Approach

Optimizing the Algorithm

O(nk1/2) time to check for a fair shift ⇒ Use binary search, checking fair shift at each level Largest prefix of initial centers with a fair shift

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 10 / 18

slide-29
SLIDE 29

Our Approach

Optimizing the Algorithm

O(nk1/2) time to check for a fair shift ⇒ Use binary search, checking fair shift at each level Largest prefix of initial centers with a fair shift

binary search over k initial centers

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 10 / 18

slide-30
SLIDE 30

Our Approach

Optimizing the Algorithm

O(nk1/2) time to check for a fair shift ⇒ Use binary search, checking fair shift at each level Largest prefix of initial centers with a fair shift

binary search over k initial centers r is maximized such that balls are non-overlapping.

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 10 / 18

slide-31
SLIDE 31

Our Approach

Optimizing the Algorithm

O(nk1/2) time to check for a fair shift ⇒ Use binary search, checking fair shift at each level Largest prefix of initial centers with a fair shift

binary search over k initial centers r is maximized such that balls are non-overlapping.

Optimizing the shift radius on largest prefix

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 10 / 18

slide-32
SLIDE 32

Our Approach

Optimizing the Algorithm

O(nk1/2) time to check for a fair shift ⇒ Use binary search, checking fair shift at each level Largest prefix of initial centers with a fair shift

binary search over k initial centers r is maximized such that balls are non-overlapping.

Optimizing the shift radius on largest prefix

binary search over discrete shift radii as r

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 10 / 18

slide-33
SLIDE 33

Our Approach

Optimizing the Algorithm

O(nk1/2) time to check for a fair shift ⇒ Use binary search, checking fair shift at each level Largest prefix of initial centers with a fair shift

binary search over k initial centers r is maximized such that balls are non-overlapping.

Optimizing the shift radius on largest prefix

binary search over discrete shift radii as r at most km ≤ k2 such values, so log k levels

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 10 / 18

slide-34
SLIDE 34

Our Approach

Optimizing the Algorithm

O(nk1/2) time to check for a fair shift ⇒ Use binary search, checking fair shift at each level Largest prefix of initial centers with a fair shift

binary search over k initial centers r is maximized such that balls are non-overlapping.

Optimizing the shift radius on largest prefix

binary search over discrete shift radii as r at most km ≤ k2 such values, so log k levels

O(nk) time total to build all the graphs, so each binary search has time complexity O(nk + nk1/2 log k) = O(nk).

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 10 / 18

slide-35
SLIDE 35

Our Approach

Analysis

Therefore, it takes O(nk) time each to run Gonzalez’s algorithm

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 11 / 18

slide-36
SLIDE 36

Our Approach

Analysis

Therefore, it takes O(nk) time each to run Gonzalez’s algorithm find the largest prefix of initial centers

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 11 / 18

slide-37
SLIDE 37

Our Approach

Analysis

Therefore, it takes O(nk) time each to run Gonzalez’s algorithm find the largest prefix of initial centers

  • ptimize the fair shift

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 11 / 18

slide-38
SLIDE 38

Our Approach

Analysis

Therefore, it takes O(nk) time each to run Gonzalez’s algorithm find the largest prefix of initial centers

  • ptimize the fair shift

heuristically fill the remaining centers (similar to Gonzalez)

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 11 / 18

slide-39
SLIDE 39

Our Approach

Analysis

Therefore, it takes O(nk) time each to run Gonzalez’s algorithm find the largest prefix of initial centers

  • ptimize the fair shift

heuristically fill the remaining centers (similar to Gonzalez) For performance,

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 11 / 18

slide-40
SLIDE 40

Our Approach

Analysis

Therefore, it takes O(nk) time each to run Gonzalez’s algorithm find the largest prefix of initial centers

  • ptimize the fair shift

heuristically fill the remaining centers (similar to Gonzalez) For performance, fair shift costs no more than costOPT

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 11 / 18

slide-41
SLIDE 41

Our Approach

Analysis

Therefore, it takes O(nk) time each to run Gonzalez’s algorithm find the largest prefix of initial centers

  • ptimize the fair shift

heuristically fill the remaining centers (similar to Gonzalez) For performance, fair shift costs no more than costOPT largest prefix with a fair shift has objective cost at most 2 · costOPT

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 11 / 18

slide-42
SLIDE 42

Our Approach

Analysis

Therefore, it takes O(nk) time each to run Gonzalez’s algorithm find the largest prefix of initial centers

  • ptimize the fair shift

heuristically fill the remaining centers (similar to Gonzalez) For performance, fair shift costs no more than costOPT largest prefix with a fair shift has objective cost at most 2 · costOPT ⇒ 3-approximation algorithm with O(nk) runtime.

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 11 / 18

slide-43
SLIDE 43

Experiments

Overview

We compared the following methods with Kleindessner et al.: Alg 2-Seq - our fair k-centers algorithm, arbitrarily picks centers at the last step Alg 2-Heu B - our fair k-centers algorithm, uses Heuristic B at the last step. Heuristic A - runs Gonzalez for each demographic group i. Heuristic B - runs Gonzalez but only keep centers that don’t violate fairness. Heuristic C - similar to A, but use distance to centers from all demographic groups.

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 12 / 18

slide-44
SLIDE 44

Experiments

Simulated Data

Figure: Mean runtime in seconds on simulated data

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 13 / 18

slide-45
SLIDE 45

Experiments

Simulated Data

Table: Mean and standard deviation of objective value on simulated data Algo Task 50 Groups 100 Groups 250 Groups 500 Groups Alg 2-Seq 6.89 (0.2) 6.52 (0.31) 6.5 (0.41) 6.46 (0.38) Alg 2-Heu B 6.91 (0.26) 6.48 (0.25) 6.51 (0.43) 6.44 (0.38) Kleindessner et al. 7.01 (0.46) 6.88 (0.75) 7.45 (0.78) 7.26 (0.51) Heuristic A 21.38 (2.84) 17.7 (1.55) 16.61 (1.57) 13.87 (1.33) Heuristic B 7.66 (1.09) 8.16 (0.94) 7.81 (0.71) 7.8 (0.62) Heuristic C 7.26 (1.17) 7.43 (0.87) 7.44 (0.6) 7.42 (0.62)

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 14 / 18

slide-46
SLIDE 46

Experiments

Real Data

Figure: Adult dataset runtime

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 15 / 18

slide-47
SLIDE 47

Experiments

Real Data

Figure: Student and wholesale dataset runtime

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 16 / 18

slide-48
SLIDE 48

Experiments

Real Data

Table: Mean and standard deviation of objective value on real data

Algo Task A-Gender A-Race S-Sex S-School S-Address W-Location Alg 2-Seq 0.32 (0.01) 0.32 (0.01) 1.29 (0.04) 1.3 (0.04) 1.31 (0.05) 0.26 (0.01) Alg 2-Heu B 0.32 (0.01) 0.32 (0.01) 1.28 (0.03) 1.28 (0.04) 1.3 (0.04) 0.26 (0.01) Kleindessner et al. 0.36 (0.03) 0.34 (0.02) 1.29 (0.05) 1.29 (0.06) 1.3 (0.05) 0.27 (0.03) Heuristic A 0.41 (0.02) 0.35 (0.03) 1.36 (0.02) 1.39 (0.04) 1.37 (0.04) 0.28 (0.01) Heuristic B 0.37 (0.02) 0.32 (0.01) 1.29 (0.03) 1.3 (0.04) 1.3 (0.04) 0.27 (0.01) Heuristic C 0.4 (0.02) 0.32 (0.02) 1.29 (0.03) 1.29 (0.02) 1.35 (0.05) 0.24 (0.02) by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 17 / 18

slide-49
SLIDE 49

Summary

Our Algorithm: 3-approximation O(nk) runtime Best algorithm in both runtime and performance Experimental support

by Huy Nguyen, Matthew Jones, Thy Nguyen Fair k-centers via Maximum Matching June 15, 2020 18 / 18