A Study on Generative Adversarial Networks Exacerbating Social Data - PowerPoint PPT Presentation

A Study on Generative Adversarial Networks Exacerbating Social Data Bias Thesis by Niharika Jain Chair: Dr. Subbarao Kambhampati Committee Members: Dr. Huan Liu and Dr. Lydia Manikonda

https://www.forbes.com/sites/bernardmarr/2018/11/05/does- synthetic-data-hold-the-secret-to-artificial-intelligence/#3c30abd442f8 https://techcrunch.com/2018/05/11/deep-learning-with-synthetic- data-will-democratize-the-tech-industry/ Machine learning practitioners have celebrated Generative Adversarial Networkss as an economical technique https://synthetichealth.github.io/synthea/ to augment their training sets for data- hungry models when acquiring real data is expensive or infeasible. It’s not clear that they realize the dangers of this http://news.mit.edu/2017/artificial-data-give-same-results-as-real-data-0303 approach! data augmentation

If GANs worked perfectly, they would capture the distribution of the data, and thus capture any biases within it. GANs have a failure mode which causes them to exacerbate bias.

Generative Adversarial Networks: counterfeiter and cop Figure inspired by Thalles Silva 2018 𝐸(𝑦) or 𝐸(𝐻 𝑨 ) Real 𝑦 samples from real distribution 𝑞 !"#" Fake discriminator ( 𝐸 ) 𝐾 ! = − 1 − 1 2 𝔽 "~$ !"#" log 𝐸 𝑦 2 𝔽 %~$ $ log 1 − 𝐸 𝐻(𝑨) 𝑨 𝐻(𝑨) (Goodfellow et al. 2014) 𝐾 & = 𝔽 %~$ $ −log 𝐸 𝐻(𝑨) 𝑨 samples from fake (Goodfellow 2016) generator ( 𝐻 ) distribution 𝑞 $

Deep Convolutional Generative Adversarial Networks (DCGAN) github.com/carpedm20/DCGAN-tensorflow (Radford, Metz, and Chintala 2015) GANs are explosively popular, in part, because scalable models are readily available off-the-shelf. (Zhu et al. 2017) github.com/junyanz/pytorch-CycleGAN-and-pix2pix Cycle-Consistent Adversarial Networks (CycleGAN)

What do these images have in common? These are GAN-generated faces, trained on a dataset of engineering professors.

hypothesis: when a feature is biased in the training set, a GAN amplifies the biases along that dimension in its generated distribution

all biases are equal, but some are more equal than others. This hypothesis makes a blanket claim about GANs indiscriminately picking up all types of biases that can exist in the data. For facial images, these biased features could be lighting, facial expression, accessories, or hairstyle. We only aim to bring attention to exacerbation of sensitive features: social characteristics that have been historically discriminated against. This work investigates bias over race and gender.

hypothesis: when a feature is biased in the training set, a GAN amplifies the biases along that dimension in its generated distribution for facial datasets, these datasets are often skewed along race and gender, so GANs exacerbate sensitive social biases

don’t try this at home! Using photos to measure human characteristics has a complicated and dangerous history: in the 19 th century, “photography helped to animate—and lend a ‘scientific’ veneer to—various forms of phrenology, physiognomy, and eugenics.” (Crawford and Paglen 2019) Neither gender nor race can be ascertained from appearance. We use human annotators to classify masculinity of features and lightness of skin color as a crude metric of gender and race to illustrate our argument. This work is not advocating for the use of facial data in machine learning applications. We create a hypothetical experiment using data with easily-detectable biases to tell a cautionary tale about the shortcomings of this approach.

imagining an engineer if we train a GAN to imagine faces of US university engineering professors, will it skew the new data toward white males?

We scrape from engineering faculty directories from 47 universities on the U.S. News “Best Engineering Schools” list, remove all noisy images, and crop to the face. 17,245 headshots image pre-processing contribution: Alberto Olmo Images from cidse.engineering.asu.edu/faculty/

𝑞 $ ! 𝑞 $ " 𝑞 $ # DCGAN trained on three random initializations GAN training contribution: Alberto Olmo

To measure the distributions in their diversity along gender and race, we ask humans on Amazon Mechanical Turk to annotate the images. For each task, we ask master Turkers to annotate 50 images: T1a gender on professor images randomly sampled from 𝑞 !"#" T1b gender on DCGAN-generated images randomly sampled from 𝑞 $ T2a race on professor images randomly sampled from 𝑞 !"#" T2b race on DCGAN-generated images randomly sampled from 𝑞 $ evaluation human annotation contribution: Sailik Sengupta

For each image, select the most appropriate description: o face has mostly masculine features o face has mostly feminine features ü neither of the above is true ü skin color is white o skin color is non-white o can’t tell Between-subject design: for each distribution ( 𝑞 !"#" , 𝑞 $ ! , 𝑞 $ " , or 𝑞 $ # ), we ask a Turker to annotate 50 images for race and gender. human annotation contribution: Sailik Sengupta

̂ ̂ One-tailed two-proportion z-test 𝐼 ' : 𝑞 = 𝑞 ' 𝐼 ( : 𝑞 < 𝑞 ' p = 0.0094 p = 0.000087 Using majority thresholding to label images, we find that the representation of minorities is further decreased in the synthetic data.

confidence metrics 𝑞 )(*( 𝑞 + 𝑞 )(*( 𝑞 + percentage of images classified percentage of images classified threshold for classification threshold for classification gender race Turkers are not as confident when generated images belong to minority classes as they are when the images belong to the majority. Is human or machine bias to blame? confidence metrics contribution: Alberto Olmo, Lydia Manikonda

A Study on Generative Adversarial Networks Exacerbating Social Data - PowerPoint PPT Presentation

A Study on Generative Adversarial Networks Exacerbating Social Data Bias Thesis by Niharika Jain Chair: Dr. Subbarao Kambhampati Committee Members: Dr. Huan Liu and Dr. Lydia Manikonda https://www.forbes.com/sites/bernardmarr/2018/11/05/does-

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Generative Adversarial Networks Aaron Mishkin UBC MLRG 2018W2 1 Generative Adversial Networks

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Model-Assisted Generative Adversarial Networks Leigh Whitehead ICL Seminar 05/06/20

Generative Adversarial Networks Sahin Olut Department of Computer Engineering Istanbul Technical

LAB MEETING: A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Slide 1 Slide 2 1 of 28 3/14/2016 2:32 PM Concepts, Communication, Meaning, and Definitions

Why My International Relations Degree Trumps your Computer Science Degree Ted Neward Neward

PsyPhilProg Ted Neward Neward & Associates http://www.tedneward.com | ted@tedneward.com Who

Mental Models HCI, 18th Feb 2009 Test Yourself What were the two kinds of model we talked

Introductions We have primary responsibility for Employee Concerns, Research Integrity, and

Towards Physical Hybrid Systems Katherine Cordwell and Andr Platzer Carnegie Mellon University

Algorithmic Aspects of WQO (Well-Quasi-Ordering) Theory Part I: Basics of WQO Theory Sylvain

Atmospheric Dynamics with Polyharmonic Spline RBFs Greg Barnett Sandia National Laboratories is

A Study on Generative Adversarial Networks Exacerbating Social Data - PowerPoint PPT Presentation

A Study on Generative Adversarial Networks Exacerbating Social Data Bias Thesis by Niharika Jain Chair: Dr. Subbarao Kambhampati Committee Members: Dr. Huan Liu and Dr. Lydia Manikonda https://www.forbes.com/sites/bernardmarr/2018/11/05/does-

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Generative Adversarial Networks Aaron Mishkin UBC MLRG 2018W2 1 Generative Adversial Networks

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Model-Assisted Generative Adversarial Networks Leigh Whitehead ICL Seminar 05/06/20

Generative Adversarial Networks Sahin Olut Department of Computer Engineering Istanbul Technical

LAB MEETING: A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Slide 1 Slide 2 1 of 28 3/14/2016 2:32 PM Concepts, Communication, Meaning, and Definitions

Why My International Relations Degree Trumps your Computer Science Degree Ted Neward Neward

PsyPhilProg Ted Neward Neward &amp; Associates http://www.tedneward.com | ted@tedneward.com Who

Mental Models HCI, 18th Feb 2009 Test Yourself What were the two kinds of model we talked

Introductions We have primary responsibility for Employee Concerns, Research Integrity, and

Towards Physical Hybrid Systems Katherine Cordwell and Andr Platzer Carnegie Mellon University

Algorithmic Aspects of WQO (Well-Quasi-Ordering) Theory Part I: Basics of WQO Theory Sylvain

Atmospheric Dynamics with Polyharmonic Spline RBFs Greg Barnett Sandia National Laboratories is

PsyPhilProg Ted Neward Neward & Associates http://www.tedneward.com | ted@tedneward.com Who