Bias, Fairness, Accountability, and Transparency in Machine Learning - PowerPoint PPT Presentation

Bias, Fairness, Accountability, and Transparency in Machine Learning CS 115 Computing for the Socio-Techno Web Instructor: Brian Brubach

Announcements • Adjustment to deadline schedule • Assignment 5 due Tuesday • Project milestone 4 due Friday • Elissa Redmiles remote lecture Thursday 9:45-11:00am • Reading posted • If you can’t make it, but have questions, email me by Wednesday night

Some questions • How much data about each of us is collected online (and offline)? • How are computers/websites/algorithms using that data to make decisions about us or that affect us? • Can algorithms discriminate and how? • Can we prevent algorithms from discriminating? • Can algorithms combat discrimination and how?

Examples of computers making decisions • Email spam filtering • Is an email spam or not? • Advertising • Which ads should be shown to you? • Social networks • What posts do you see? Who sees your posts? • Web search • What results do you see when you search online?

Higher stakes examples of computer decisions • Hiring and recruiting web sites • Who sees job ad? Which applications get filtered out? • Banking • Which loans/credit cards do you qualify for? Amount? Interest rate? • Criminal justice • Who is released on bail and how much? Which neighborhoods get patrolled? • Self-driving cars • Insurance • What should your insurance rate be? How risky are you? • Healthcare • Who gets access to more urgent care?

Introduction to machine learning classification • Each data point has a set features and a label • Data point could be an email, job application, image, etc. • Features à Information we have about the data point • Email à Length, spelling errors, common spam words (watch, Rolex, medicine, prince)? • Picture à Pixel colors, shapes • Label à Something we want to know about the data • Email à Spam or not spam • Picture à This is a picture of a car, tree, horse, etc. • Goal à Algorithm that can look at features for a data point and guess its label

Introduction to machine learning classification • One approach à “Train” a classifier • Classifier à An algorithm that performs the classification task • Show the algorithm labeled data (training set) • Have it develop rules for predicting labels on unlabeled data • Supervised learning

Spam filter example • Linear classifier with two features + Labeled spam — Labeled not spam — + + + + + + + + Number of + — + spelling — + + errors — + — — — + — + — + — — + + — — — Number of “spam words” (watch, rolex, medicine)

Spam filter example • Linear classifier with two features + Labeled spam — Labeled not spam — + + + + Unlabeled data + + + + Number of + — + spelling — + + errors — + — — — + — + — + — — + + — — — Number of “spam words” (watch, rolex, medicine)

Real world classifiers • May use thousands of features or more • Previous example was 2-dimensional • Imagine 3-dimensional, 4-dimensional, 1,000-dimensional • Not limited to a linear classifier • Could be a curvy line, a list of conditional rules, or something else entirely • Often not obvious why a classifier is making a decision • E.g., deep learning • Obey the principle of garbage in, garbage out • But this is not the only problem!

Sensitive features • Some common features associated with people • Browsing and shopping history • Location • Ratings (how they rate movies, recipes, books, etc.) • Content of emails and social media posts • Pictures of the person or pictures they share • Medical history • Common “sensitive” features • Race, gender, age, disability status, etc. • Often things you can’t legally discriminate based on • How can we avoid bias and discrimination based on sensitive features? • Simple idea à What if we just remove sensitive features from our data?

Redundant encoding: the invisible red line • Redundant encoding à Information about one feature can be inferred from other features • Well-known examples à Redlining and congressional districting • Redlining à Discrimination based on residential location that masks discrimination based on a sensitive feature, often race • Historic practice of color-coding a map based partly on racial and ethnic demographics and designating certain neighborhoods as risky to loan to • Modern equivalent à Using a person’s address as a feature to determine their insurance rate or whether they qualify for a loan • Sensitive features are redundantly encoded in the location feature

Redundant encoding: the invisible red line • Why not also remove features that redundantly encode sensitive features? • Might throw away too much useful information • Location information can be useful • Might be hard to identify which features to remove • Which shopping data? • Deeply engrained in image classification and facial recognition

Other issues (not an exhaustive list) • Feature selection • Biased training set • Perpetuating existing biases • Proxy labels • Lack of diversity in tech

Feature selection • Recall features are information about a data point • Millions of features we could use • Need to choose a smaller number of features for most classifiers • Programmers get to choose which features to use • Including or excluding certain features may lead to bias • Redundant encoding of sensitive features • Favoring features which measure one group better than another • Intersects with lack of diversity in tech • Can you think of examples?

Biased training set • Ideal training set à Random sample of data points with accurate labels • Reality à Nope! • Biased labeling à How is the training set labeled? How will the bias of a human labeler affect the outcome? • Biased sampling à Do the data points in the training set represent a random sample of the data points in the real word? • Example à Bail recommendation software • Predict likelihood someone will jump bail to decide whether to release a person on bail and what to set the bail at • Also used in sentencing in some places

Perpetuating existing bias • Algorithms can perpetuate biases existing in society even if humans are trying not to • Wage gap problem à Different groups of people paid differently • Human perpetuation à Employers ask about previous salary • Possible legislative solution à Ban employers from asking about previous salary • Algorithmic problem à Previously salary can be inferred from other data • How do we even know if this is happening? • Capable of magnifying bias • Can you think of other examples?

Proxy labels • Proxy label à Different from true label you want predict • Used in classier training when true labels are hard to get • Hopefully correlated with true label • Triage problem à Predict which patients need extra care and attention • True label to predict à Future healthcare needs • Give those patients more attention and preventative care • Proxy label used à Future healthcare expenses • Problem à Racial disparities influence healthcare expenses • Result à Healthier white patients prioritized over sicker black patients • Good news à Computer science researchers contacted software company and they made improvements

Some solutions • Fairness • Can we make algorithms more fair than human decision-makers? • Efforts to define “fair” • Actually using sensitive features in the training step • Accountability • Testing algorithms for bias/discrimination • Requiring companies to justify their decisions • Transparency • Translating the computer classifiers into something humans can read and interpret • Interactive machine learning à Lets us ask an algorithm why it made a decision • Huge efforts to understand deep learning

Testing for bias/discrimination “We set the agents’ gender to female or male on Google’s Ad Settings page. We then had both the female and male groups of agents visit webpages associated with employment. We established that Google used this gender information to select ads, as one might expect. The interesting result was how the ads differed between the groups: during this experiment, Google showed the simulated males ads from a certain career coaching agency that promised large salaries more frequently than the simulated females, a finding suggestive of discrimination.” -Automated Experiments on Ad Privacy Settings (Datta, Tschantz, and Datta, 2015)

Some questions • How much data about each of us is collected online (and offline)? • How are computers/websites/algorithms using that data to make decisions about us or that affect us? • Can algorithms discriminate and how? • Can we prevent algorithms from discriminating? • Can algorithms combat discrimination and how?

Bias, Fairness, Accountability, and Transparency in Machine Learning - PowerPoint PPT Presentation

Bias, Fairness, Accountability, and Transparency in Machine Learning CS 115 Computing for the Socio-Techno Web Instructor: Brian Brubach Announcements Adjustment to deadline schedule Assignment 5 due Tuesday Project milestone 4 due

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Beyond Theories of Change: Working Politically for Transparency and Accountability Brendan

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Accountability Standard Setting ANN-MICHELLE NEAL, ED.S ACCOUNTABILITY SPECIALIST UTAH STATE

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Fairness in Machine Learning: Part I Privacy & Fairness in Data Science CS848 Fall 2019

Bias and Fairness in Machine Learning Irene Y. Chen @irenetrampoline

Fairness and bias in Machine Learning A quick review on tools to detect biases in machine

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Fairness in Artificial Intelligence On accountability and transparency in applied AI AIML.lu.se

FINGERPRINTING CLICK-SPAM IN AD NETWORKS Vacha Dave , Saikat Guha and Yin Zhang * The

Exploring Linguistic Features for Web Spam Detection A Preliminary Study Jakub Piskorski 1 Marcin

Spamming Botnets: Signatures and Characteristics

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Email Spam and the Ethics of An3spam measures Behrooz

Exploring Python Bytecode @AnjanaVakil EuroPython 2016 Hi! Im Anjana, and Im a Pythoholic

CIS4930/5930: Machine Learning Introduction to ML Alan Kuhnle Florida State University Slides

Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health,

Bias, Fairness, Accountability, and Transparency in Machine Learning - PowerPoint PPT Presentation

Bias, Fairness, Accountability, and Transparency in Machine Learning CS 115 Computing for the Socio-Techno Web Instructor: Brian Brubach Announcements Adjustment to deadline schedule Assignment 5 due Tuesday Project milestone 4 due

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Beyond Theories of Change: Working Politically for Transparency and Accountability Brendan

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Accountability Standard Setting ANN-MICHELLE NEAL, ED.S ACCOUNTABILITY SPECIALIST UTAH STATE

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Fairness in Machine Learning: Part I Privacy &amp; Fairness in Data Science CS848 Fall 2019

Bias and Fairness in Machine Learning Irene Y. Chen @irenetrampoline

Fairness and bias in Machine Learning A quick review on tools to detect biases in machine

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Fairness in Artificial Intelligence On accountability and transparency in applied AI AIML.lu.se

FINGERPRINTING CLICK-SPAM IN AD NETWORKS Vacha Dave *, Saikat Guha and Yin Zhang * * The

Exploring Linguistic Features for Web Spam Detection A Preliminary Study Jakub Piskorski 1 Marcin

Spamming Botnets: Signatures and Characteristics

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Email Spam and the Ethics of An3spam measures Behrooz

Exploring Python Bytecode @AnjanaVakil EuroPython 2016 Hi! Im Anjana, and Im a Pythoholic

CIS4930/5930: Machine Learning Introduction to ML Alan Kuhnle Florida State University Slides

Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health,

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Fairness in Machine Learning: Part I Privacy & Fairness in Data Science CS848 Fall 2019

FINGERPRINTING CLICK-SPAM IN AD NETWORKS Vacha Dave , Saikat Guha and Yin Zhang * The