Algorithmic Bias Machine Learning An area of AI that studies how to - - PDF document

algorithmic bias
SMART_READER_LITE
LIVE PREVIEW

Algorithmic Bias Machine Learning An area of AI that studies how to - - PDF document

Algorithmic Bias Machine Learning An area of AI that studies how to get computers to learn from experience (e.g. data) Identify patterns from a training dataset Then generalize from these patterns and apply it to future data (that is


slide-1
SLIDE 1

Algorithmic Bias

Machine Learning  An area of AI that studies how to get computers to learn from experience (e.g. data)  Identify patterns from a training dataset  Then generalize from these patterns and apply it to future data (that is different). This is called a test dataset  Supervised learning

  • Features -> Classifier -> Class Label
  • Features: traits of a data instance (e.g. keywords in

your email) that are informative as to the classification

  • Class Label: the classification (e.g. Personal or School

for email sorting)

  • Produce the classifier by training on the training data

Algorithmic Bias  What is it?

  • Bias introduced to machine learning due to the training

data

  • Garbage in / Garbage out: machine learning algorithms

reflect societal bias when applied to biased data (bias in the form of discrimination, prejudice and unfairness)

  • Do machine learning algorithms respect protected

variables?

  • These are characteristics that anti-discrimination laws

protect in certain situations

  • E.g. Fair housing act prevents landlords from

discrimination based on 7 protected classes:  Race

slide-2
SLIDE 2

 Gender  Religion  Disability  Color  National Origin  Family status

  • Can’t just ignore features that correspond to these

protected variables and say your algorithm is not biased  Due to confounding factors e.g. Zip code and race are closely correlated in many parts of the US

  • What are causes of algorithmic bias?
  • Biased training data (e.g. biased class labels)
  • Inclusion of protected variables as features;

inclusion of variables correlated with protected variables are highly problematic

  • Downstream goals (e.g. business profitability)

might conflict with discrimination

  • Misunderstanding / misuse of machine learning

 Machine learning applied to the wrong tasks

  • Domain adaptation: machine learning algorithm

trained on data from one distribution but applied to test data from another distribution

  • Missing / corrupted data
  • Sampling selection bias
  • How do you fix this problem?
  • Not sure if you can:

 A lot of these are societal problems

slide-3
SLIDE 3

 Can you correct the bias without introducing bias of a different sort?

  • Understand the problem so that you can use the

right machine learning algorithm  Know when NOT to use a particular algorithm

  • Make systems that are auditable.
  • Have less high-impact outcomes earlier on,

especially when an algorithm is involved

  • Difficult problems that make a solution hard
  • Limited to what data you actually have. What

about the data you don’t have?

  • Definitions of fairness vary greatly – which one do

you use?

  • Lack of social context: can’t transfer a machine

learning algorithm from one context to another