Guarantees for Spectral Clustering with Fairness Constraints
Matthäus Kleindessner, Samira Samadi, Pranjal Awasthi & Jamie Morgenstern
Guarantees for Spectral Clustering with Fairness Constraints Matthus - - PowerPoint PPT Presentation
Guarantees for Spectral Clustering with Fairness Constraints Matthus Kleindessner, Samira Samadi , Pranjal Awasthi & Jamie Morgenstern Spectral Clustering (SC) and Fairness SC is the method of choice for clustering the nodes of a graph.
Matthäus Kleindessner, Samira Samadi, Pranjal Awasthi & Jamie Morgenstern
SC is the method of choice for clustering the nodes of a graph. Friendship network: SC can re- sult in highly unfair clustering with respect to the two demo- graphic groups. Fair clustering (Chierichetti et al. 2017): in every cluster, each group
Vs should be represented with (approximately) the same fraction as in the whole data set V . Goal: Study spectral clustering with fairness constraints.
2 / 7
Goal:Partition V into k clusters with min RatioCut objective value. ⋄ Encode a clustering V = C1 ˙ ∪ . . . ˙ ∪Ck by H ∈ Rn×k with Hil =
i ∈ Cl 0, i / ∈ Cl (1) RatioCut(C1, . . . , Ck) = Tr(HTLH). L is the graph Laplacian matrix. ⋄ The exact problem: min
H∈Rn×k Tr(HTLH) subject to H is of form (1)
⋄ Solve the relaxed version: min
H∈Rn×k Tr(HTLH) subject to HTH = Ik.
⋄ Apply k-means clustering to the rows of H.
3 / 7
Approach: Incorporate fairness as a linear constraint min
H∈Rn×k Tr(HTLH) subject to HTH = Ik & F TH = 0.
Convert the program to the standard form and solve. Our approach is analogous to existing versions of constrained SC that try to incorporate must-link constraints (e.g. Yu and Shi ’04) Friendship network: Our algo- rithm finds a fair clustering with respect to the two demographic groups.
4 / 7
Given V with a fair ground-truth clustering e.g., V = C1 ˙ ∪C2 Pr(i, j) = a, i and j in same group and in same cluster b, i and j in same group, but in different clusters c, i and j in different groups, but in same cluster d, i and j in different groups, and in different clusters for some a > b > c > d. Theorem (informal): Fair SC recovers the ground-truth clus- tering C1 ˙ ∪C2 with high proba- bility.
V1 C2 C1 V2
Standard SC is likely to return V1 ˙ ∪V2.
5 / 7
FriendshipNet, FacebookNet, DrugNet
5 10 15 k 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Balance 5 10 15 RatioCut FriendshipNet --- gender 5 10 15 k 0.2 0.4 0.6 0.8 Balance 10 20 30 40 50 60 RatioCut FacebookNet --- gender 5 10 15 k 0.05 0.1 0.15 0.2 Balance 1 2 3 4 5 6 RatioCut DrugNet --- ethnicity Balance of data set Standard SC Algorithm 1 Normalized SC
Average balance of clusters and RatioCut value as a function of number of clusters.
6 / 7
7 / 7