Fairness Constraints for Graph Embeddings* William L. Hamilton - - PowerPoint PPT Presentation

fairness constraints for graph embeddings
SMART_READER_LITE
LIVE PREVIEW

Fairness Constraints for Graph Embeddings* William L. Hamilton - - PowerPoint PPT Presentation

Fairness Constraints for Graph Embeddings* William L. Hamilton Assistant Professor at McGill University and Mila Canada CIFAR Chair in AI Visiting Researcher at Facebook AI Research *Joint work with my PhD student Joey Bose, to appear in


slide-1
SLIDE 1

Fairness Constraints for Graph Embeddings*

William L. Hamilton Assistant Professor at McGill University and Mila Canada CIFAR Chair in AI Visiting Researcher at Facebook AI Research

William L. Hamilton, McGill University and Mila 1

*Joint work with my PhD student Joey Bose, to appear in ICML 2019 (pdf)

slide-2
SLIDE 2

Graph embeddings

William L. Hamilton, McGill University and Mila 2

slide-3
SLIDE 3

Application: Node classification

William L. Hamilton, McGill University and Mila 4

? ? ? ? ?

Machine Learning

slide-4
SLIDE 4

Application: Link prediction

William L. Hamilton, McGill University and Mila 5

Machine Learning

? ? ?

x

slide-5
SLIDE 5

Becoming ubiquitous in social applications

§ Graph embedding techniques are a powerful approach for social recommendations, bot detection, content screening, behavior prediction, geo-localization,

§ E.g., Facebook, Huawei, Uber Eats, Pinterest, LinkedIn, WeChat

§ Classic collaborative filtering approaches can be re- interpreted in a more general graph embedding framework.

William L. Hamilton, McGill University and Mila 6

slide-6
SLIDE 6

But what about fairness and privacy?

§ Graph embeddings designed to capture ev ever erything that might be useful for the objective. § Even if we don’t provide the model information about se sensi sitive a attributes s (e.g., gender or age), the model wi will use th this infor formati tion

  • n.

. § Wha What if a us user do doesn’ n’t want nt thi his inf nformation n us used? d?

William L. Hamilton, McGill University and Mila 7

slide-7
SLIDE 7

Fairness from a pragmatic perspective

§ Strict privacy and discrimination concerns are one motivation. § But what if users just don’t want their recommendations do depend on certain attributes? § What if users want the system to “ignore” parts of their demographics or past behavior?

William L. Hamilton, McGill University and Mila 8

slide-8
SLIDE 8

Fairness in graph embeddings

William L. Hamilton, McGill University and Mila 9

§ Ba Basic idea: How can we learn node embeddings that are invariant to particular sensitive attributes? § Cha Challeng nges:

§ Graph data is not i.i.d. § There is not just one classification task that we are trying to enforce fairness on. § There are often many possible sensitive attributes.

slide-9
SLIDE 9

Our work: Fairness in graph embeddings

William L. Hamilton, McGill University and Mila 10

slide-10
SLIDE 10

Preliminaries and set-up

§ Learning an encoder function to map nodes to embeddings: § Using these embeddings to “score” the likelihood of a relationship between nodes:

William L. Hamilton, McGill University and Mila 11

slide-11
SLIDE 11

Preliminaries and set-up

§ Learning an encoder function to map nodes to embeddings: § Using these embeddings to “score” the likelihood of a relationship between nodes:

William L. Hamilton, McGill University and Mila 12

Score of a (possible) edge is a function of the two node embeddings and the relation type.

slide-12
SLIDE 12

Preliminaries and set-up

§ Learning an encoder function to map nodes to embeddings: § Using these embeddings to “score” the likelihood of a relationship between nodes:

William L. Hamilton, McGill University and Mila 13

Goal: Train the embeddings (with a subset of the true edges) so that the score for all real edges is larger than all non-edges.

slide-13
SLIDE 13

Preliminaries and set-up

§ Generic loss function:

William L. Hamilton, McGill University and Mila 14

X

e∈Etrain

Ledge(s(e), s(e−

1 ), ..., s(e− m))

Sum over (batch

  • f) training edges.

Task-specific loss function Score assigned to positive/real edge. Scores assigned to random negative sample edges.

slide-14
SLIDE 14

Preliminaries and set-up: Concrete examples

§ Score functions: § Loss-functions:

William L. Hamilton, McGill University and Mila 15

slide-15
SLIDE 15

Preliminaries and set-up: Concrete examples

§ Score functions:

§ Dot-product:

§ Loss-functions:

William L. Hamilton, McGill University and Mila 16

s(e) = s(hzu, r, zvi) = z>

u zv

slide-16
SLIDE 16

Preliminaries and set-up: Concrete examples

§ Score functions:

§ Dot-product: § TransE:

§ Loss-functions:

William L. Hamilton, McGill University and Mila 17

s(e) = s(hzu, r, zvi) = z>

u zv

s(e) = s(hzu, r, zvi) = kzu + r zvk2

2

slide-17
SLIDE 17

Preliminaries and set-up: Concrete examples

§ Score functions:

§ Dot-product: § TransE:

§ Loss-functions:

§ Max-margin:

William L. Hamilton, McGill University and Mila 18

s(e) = s(hzu, r, zvi) = z>

u zv

s(e) = s(hzu, r, zvi) = kzu + r zvk2

2

Ledge(s(e), s(e−

1 ), ..., s(e− m)) = m

X

i=1

max(1 − s(e) + s(e−

i ), 0)

slide-18
SLIDE 18

Preliminaries and set-up: Concrete examples

§ Score functions:

§ Dot-product: § TransE:

§ Loss-functions:

§ Max-margin: § Cross-entropy:

William L. Hamilton, McGill University and Mila 19

s(e) = s(hzu, r, zvi) = z>

u zv

s(e) = s(hzu, r, zvi) = kzu + r zvk2

2

Ledge(s(e), s(e−

1 ), ..., s(e− m)) = m

X

i=1

max(1 − s(e) + s(e−

i ), 0)

Ledge(s(e), s(e−

1 ), ..., s(e− m)) = − log(σ(s(e)) − m

X

i=1

log(1 − σ(s(e−

i ))

slide-19
SLIDE 19

Formalizing fairness

§ How do we ensure fairness in this context?

William L. Hamilton, McGill University and Mila 20

slide-20
SLIDE 20

Formalizing fairness

§ How do we ensure fairness in this context? § So Solution: re repre resentational invariance

§ Want embeddings to be independent from the attributes: § Which is equivalent to minimizing the mutual information to between the embeddings and the attributes:

William L. Hamilton, McGill University and Mila 21

slide-21
SLIDE 21

Enforcing fairness through an adversary

William L. Hamilton, McGill University and Mila 22

slide-22
SLIDE 22

Enforcing fairness through an adversary

William L. Hamilton, McGill University and Mila 23

§ Key ey co componen ent 1: Composi sitional al en enco coder er. § Given a set of attributes, it outputs “filtered” embeddings that should be invariant to those attributes.

Trainable filter function (neural network) outputs embedding that is invariant to attribute k. Input: node ID and set of sensitive attributes Sum over all sensitive attributes

slide-23
SLIDE 23

Enforcing fairness through an adversary

William L. Hamilton, McGill University and Mila 24

§ Key y comp mponent 2: Ad Adve versarial discrimi minators § For each sensitive attribute, train an adversarial discriminator that tries to predict that sensitive attribute from the filtered embeddings:

Ou Outpu put: Likelihood that node u has that attribute value. Discriminator for sensitive attribute k. In Input: : Filtered embeddding for node u and attribute value.

slide-24
SLIDE 24

Enforcing fairness through an adversary

William L. Hamilton, McGill University and Mila 25

§ Pu Putting it all together in an adversarial loss:

Likelihood of discriminator predicting the sensitive attributes. Original loss function for the edge prediction task Constant that determines the strength of the fairness constraints

slide-25
SLIDE 25

Enforcing fairness through an adversary

William L. Hamilton, McGill University and Mila 26

§ Pu Putting i g it a all t toge

  • gether i

in a an a adversarial l los

  • ss:

§ During training the encoder tries to minimize this loss and the adversarial discriminators are trained to maximize it.

slide-26
SLIDE 26

Enforcing fairness through an adversary

William L. Hamilton, McGill University and Mila 28

slide-27
SLIDE 27

Dataset 1: MovieLens-1M

§ Classic recommender system benchmark. § Bipartite graph between users and movies. § No Node des (~1 (~10,000): ): Users and movies § Ed Edges (~1,000,000): Rating a user gives a movie § Se Sensitive ve attributes:

§ Gender § Age (binned to become a categorical attribute) § Occupation

William L. Hamilton, McGill University and Mila 30

slide-28
SLIDE 28

Dataset 2: Reddit

William L. Hamilton, McGill University and Mila 31

§ Derived from public Reddit comments. § Bipartite graph between users and communities. § No Node des (~3 (~300,000): ): Users and communities § Ed Edges (~7,000,000): Whether a user commented on that community § Se Sensitive ve attributes: Randomly select 50 communities to be “sensitive” communities

slide-29
SLIDE 29

Dataset 3: Freebase 15k-237

William L. Hamilton, McGill University and Mila 32

§ Derived from classic knowledge base completion benchmark. § Knowledge graph between set of typed entities. § No Node des (~1 (~15,000): ): Users and communities § Ed Edges (~150,000): 237 different relation types (e.g., married_to, born_in, capital_of, director_of) § Se Sensitive ve attributes: Randomly selected 3 entity type annotations (e.g., is_actor) to be “sensitive attributes”

slide-30
SLIDE 30

Experiments: Three questions

  • 1. What is the cost of invariance?
  • 2. What is the impact of compositionality?
  • 3. Can we generalize to unseen combinations
  • f attributes?

William L. Hamilton, McGill University and Mila 33

slide-31
SLIDE 31

MovieLens: Fairness results

William L. Hamilton, McGill University and Mila 34

§ How strongly can we enforce fairness? § Compare three approaches to enforcing fairness: § No adversary (i.e., just train on the recommendation task) § Independent adversarial model for each attribute § Full compositional model

slide-32
SLIDE 32

MovieLens: Fairness results

William L. Hamilton, McGill University and Mila 35

§ How strongly can we enforce fairness? § Evaluate how well a two-layer MLP can classify the sensitive attributes from the learned node embeddings. § AUC for the binary gender attribute § Micro-averaged F1-score for the age and occupation attributes.

slide-33
SLIDE 33

MovieLens: Fairness results

William L. Hamilton, McGill University and Mila 36

§ Key takeaways:

§ After applying the compositional adversary, accuracy is no better than majority classifier! § Performance of compositional adversary on par with independent adversaries!

slide-34
SLIDE 34

MovieLens: Impact on recommendations

William L. Hamilton, McGill University and Mila 37

§ Evaluate recommendation performance (RMSE) with and without enforcing fairness. § There is a drop in accuracy, but not catastrophic.

25 50 75 100 125 150 175 200 ESRchs 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 506E

GenGer AGversary Age AGversary Occupation AGversary CompositionaO AGversary BaseOine No AGversary

slide-35
SLIDE 35

MovieLens: Trade-off

William L. Hamilton, McGill University and Mila 38

10 10

1

10

2

10

3

10

4

λ 0.50 0.55 0.60 0.65 0.70 AUC

Compositional GenGeU AUC GenGeU Baseline AUC 10 10

1

10

2

10

3

10

4

λ 0.90 0.95 1.00 50SE

CRmSRsitiRnal Adversary Baseline RMSE

§ ! allows trade-off between fairness and recommendation performance.

slide-36
SLIDE 36

Reddit results: Fairness

William L. Hamilton, McGill University and Mila 39

B a s H O i n H N

  • n

C

  • m

S

  • s

i t i

  • n

a O N

  • H

H O d 2 u t C

  • m

S

  • s

i t i

  • n

a O H H O d 2 u t C

  • m

S

  • s

i t i

  • n

a O 0.0 0.2 0.4 0.6 0.8 A8C 6coUH 10 20 30 40 50 Epochs 0.70 0.72 0.74 0.76 0.78 0.80 0.82 A8C

BasHOinH Non CompositionaO HHOd Out CompositionaO No HHOd Out CompositionaO

Ac Accuracy predic ictin ing sensit itiv ive attrib ibutes Ed Edge-pr predi edict ction accu accuracy acy

§ Same set-up as MovieLens, but here we have 10 sensitive attributes. § Again, able to strongly enforce fairness, but at a non-trivial cost.

slide-37
SLIDE 37

Freebase results

William L. Hamilton, McGill University and Mila 41

Ab Abilit ility to predic ict sensit itiv ive attrib ibutes (me measured in in AUC AUC) an and d the e impact pact on tas ask-performance (mean rank) k)

§ On the synthetic Freebase data we see that enforcing fairness leads to a significant drop in task performance.

slide-38
SLIDE 38

Conclusions and outlook

§ Fairness in network representation learning is an understudied issue. § We can enforce fairness in a flexible way, but at a cost. § There is no perfect notion of fairness.

William L. Hamilton, McGill University and Mila 43