ML Alice was ey eycited! Lots of tutorials Loads of resources ML - - PowerPoint PPT Presentation

ml alice was ey eycited lots of tutorials loads of
SMART_READER_LITE
LIVE PREVIEW

ML Alice was ey eycited! Lots of tutorials Loads of resources ML - - PowerPoint PPT Presentation

ML Alice was ey eycited! Lots of tutorials Loads of resources ML Endless ey eyamples Fast paced research How to even data science? How to even data science? https://miro.medium.com/max/1552/1*Nv2NNALuokZEcV6hYEHdGA.png Challenge How


slide-1
SLIDE 1
slide-2
SLIDE 2

ML

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

ML

Alice was ey eycited! Lots of tutorials Loads of resources Endless ey eyamples Fast paced research

slide-6
SLIDE 6

How to even data science?

slide-7
SLIDE 7

How to even data science?

https://miro.medium.com/max/1552/1*Nv2NNALuokZEcV6hYEHdGA.png

slide-8
SLIDE 8

How to make this work in the real world?

Challenge

slide-9
SLIDE 9

A Checklist for Developers when Building ML Systems

Machine Learning’s Surprises

slide-10
SLIDE 10

Hi, I’m Jade Abbott

masakhane.io

@alienelf

slide-11
SLIDE 11

Hi, I’m Jade Abbott

slide-12
SLIDE 12
slide-13
SLIDE 13

Surprises while...

Trying to deploy the model Afuer deployment

  • f model

Trying to improve the model

slide-14
SLIDE 14

Some context

❖ I won’t be talking about training machine learning models ❖ I won’t be talking about which models to chose ❖ I work primarily in deep learning & NLP ❖ I am a one person ML team working in a staruup context ❖ I work in a normal world where data is scarce and we need to collect more

slide-15
SLIDE 15

Ti Tie Problem

I want to meet... I can provide... Yes, they should meet No they shouldn’t

Embedding + LSTM + Downstream NN

slide-16
SLIDE 16

Ti Tie Problem

I want to meet... I can provide... Yes, they should meet No they shouldn’t someone to look after my cat pet sitting cat breeding software development chef lessons

Language Model + Downstream Task

slide-17
SLIDE 17

Ti Tie Problem

I want to meet... I can provide... Yes, they should meet No they shouldn’t someone to look after my cat pet sitting cat breeding software development chef lessons 

slide-18
SLIDE 18

Ti Tie Problem

I want to meet... I can provide... Yes, they should meet No they shouldn’t someone to look after my cat pet sitting cat breeding software development chef lessons

The Model

slide-19
SLIDE 19

Surprises trying to deploy the model

Surprises

slide-20
SLIDE 20

Ey Eypectations

model API

Unit Tests

CI/CD

user testing

train & evaluate model

slide-21
SLIDE 21

Is the model good enough?

Surprise #1

slide-22
SLIDE 22

75% Accuracy

slide-23
SLIDE 23

Pergormance Metrics

❖ Business needs to understand it ❖ Active discussion about pros & cons ❖ Get sign ofg ❖ Threshold selection strategy

slide-24
SLIDE 24

Can we trust it?

Surprise #2

slide-25
SLIDE 25

Skin Cancer Detection

1. https://visualsonline.cancer.gov/details.cfm?imageid=9288 2. htups://arxiv.org/pdf/1602.04938.pdf

Husky/Dog Classifjer

slide-26
SLIDE 26

Skin Cancer Detection

1. https://visualsonline.cancer.gov/details.cfm?imageid=9288 2. htups://arxiv.org/pdf/1602.04938.pdf

Husky/Dog Classifjer

slide-27
SLIDE 27

Explanations

slide-28
SLIDE 28

https://pair-code.github.io/what-if-tool/ htups://github.com/marcotcr/lime

slide-29
SLIDE 29

Will this model harm users?

Surprise #3

slide-30
SLIDE 30

“Racial bias in a medical

algorithm favors white patients over sicker black patients”

Washington Pot

  • tt
slide-31
SLIDE 31

“Racist robots, as I invoke them here,

represent a much broader process: social bias embedded in technical aruifacts, the allure of objectivity without public accountability” ~ Ruha Benjamin @ruha9

slide-32
SLIDE 32

“What are the unintended consequences of designing systems at scale on the basis of existing patuerns of society?”

~ M.C. Eilish & Danah Boyd, Don’t Believe Every AI You See @m_c_elish @zephoria

slide-33
SLIDE 33

❖ Word2Vec has known gender and race biases ❖ It’s in English ❖ Is it robust to spelling errors? ❖ How does it pergorm with malicious data?

slide-34
SLIDE 34

❖ Word2Vec has known gender and race biases ❖ It’s in English ❖ Is it robust to spelling errors? ❖ How does it pergorm with malicious data?

Make it measurable!

slide-35
SLIDE 35

htup://aif360.mybluemix.net htups://pair-code.github.io htups://github.com/fairlearn/fairlearn htups://github.com/jphall663/awesome-machine-learning-interpretability

slide-36
SLIDE 36

Ey Eypectations

model API

Unit Tests

CI/CD

user testing

train & evaluate model

slide-37
SLIDE 37

Reality

model API

Unit Tests

user testing

Evaluate model Choose threshold Explain predictions Fairness Framework choose a useful metric

slide-38
SLIDE 38

Surprises afuer deploying the model

Surprises

slide-39
SLIDE 39

Ey Eypectations

user testing logs a bug or submits a complaint user drop ofg bug tracking tool Bug Triage agile cycle reproduce, debug, fjx, release

slide-40
SLIDE 40

Surprise #5 I can provide marijuana and other drugs which improves health

I want to meet a doctor

slide-41
SLIDE 41

The model has some “bugs”

Surprise #5

slide-42
SLIDE 42

❖ What is a model “bug” ❖ How to fjx the bug? ❖ When is the “bug” fjxed? ❖ How do I ensure test regression? ❖ “Bug” priority?

Surprise #5 continued...

slide-43
SLIDE 43

Surprise #5 I can provide marijuana and other drugs which improves health

I want to meet a doctor

slide-44
SLIDE 44

Describing the “bugs”

Prediction Target I can provide marijuana and other drugs which improves health I want to meet a doctor YES NO I can provide marijuana I want to meet a doctor NO NO I can provide drugs for cancer patients I want to meet a doctor YES NO I can provide general practitioner services I want to meet a doctor NO YES I can provide medicine I want to meet a drug addiction sponsor YES YES I can provide medicine I want to meet a pharmacist YES YES I can provide illegal drugs I want to meet a drug dealer YES NO

False Potitive False Negative True Negative True Potitives Add to your test set

slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47

drugs-doctors-false-pos politicians-false-neg tech-too-general designers-too-general Candidate Model Over Time Classifjcation Error

Is my “bug” fiy fiyed?

slide-48
SLIDE 48

How do we triage these “bugs”?

slide-49
SLIDE 49

How do we triage these “bugs”?

% Users Afgected x Normalized Error x Harm

slide-50
SLIDE 50

How do we triage these “bugs”?

Problem Impact Error the-arus-too-general 2.931529 health-more-specific 1.53985 brand-marketing-social-media 1.285735 developer 1.054248 1-services 0.960129

slide-51
SLIDE 51

Is this new model betuer than my old model?

Surprise #6

slide-52
SLIDE 52

Alice replied, rather

shyly, “I—I hardly know, sir, just at present—at least I know who I was when I got up this morning, but I think I must have changed several times since then.”

slide-53
SLIDE 53

Why is model comparison hard?

slide-54
SLIDE 54

Living Test Set

0.8 0.75

slide-55
SLIDE 55

Re-evaluate ALL models

0.72 0.75

slide-56
SLIDE 56

Surprise #7 I demoed the model yesterday and it went ofg-script! What changed?

slide-57
SLIDE 57

Why is the model doing something difgerently today?

Surprise #7

slide-58
SLIDE 58

What changed?

❖ My data? ❖ My model? ❖ My preprocessing?

slide-59
SLIDE 59

How to fi figure out what changed?

Data Repository Code repository

ea2541df da1341bb

Experiment Metadata Store

experiment: 3 data: ea2541df code: da1341bb desc: “Added feature to training pipeline” run_on: 10-10-2019 completed_on:11-10-2019 model: model-3 results: 3

CI/CD Model Repository

model-3

Results Repository

slide-60
SLIDE 60

Ey Eypectations

user testing logs a bug or submits a complaint user drop ofg bug tracking tool Prioritization agile cycle reproduce, debug, fjx

slide-61
SLIDE 61

Actual

user reporus bug Identify problem Triage Add to model bug tracking tool Calculate Priority Retrain Describe problem with test patuerns

  • Evaluate model against
  • ther models
  • Evaluate individual

problems

  • Select model
  • Gather More

Data for Problem

  • Change Model
  • Create

Features “Agile Sprint” Pick Problem

slide-62
SLIDE 62

Surprises maintaining and improving the model over time

Surprises

slide-63
SLIDE 63

Ey Eypectation

Generate/select unlabelled patuerns Get them labelled Add to data set Pick an issue Retrain

slide-64
SLIDE 64

User behaviour drifus

Surprise #8

slide-65
SLIDE 65
  • Regularly sample

data from production for training

  • Regularly refresh

your test set

Now what?

slide-66
SLIDE 66

Data labellers are rarely experus

Surprise #9

slide-67
SLIDE 67

The model is not robust

Surprise #10

slide-68
SLIDE 68

The model knows when it’s unceruain

Surprise #10

slide-69
SLIDE 69

Techniques for detecting robustness & uncertainty

❖ Sofumax predictions that are unceruain ❖ Dropout at Inference ❖ Add noise to data and see how much output changes

slide-70
SLIDE 70

Changing and updating the data so

  • fuen gets messy

Surprise #11

slide-71
SLIDE 71
  • Data Leakage
  • Duplicates
  • Distributions

Needed to check the following

slide-72
SLIDE 72

Ey Eypectation

Generate/select unlabelled patuerns Get them labelled Add to data set Pick an issue Retrain

slide-73
SLIDE 73

Actual

Generate/select unlabelled data Get data labelled

  • n crowdsourced

platgorm Pick Problem Model tells you which patuerns it’s unceruain about Experu data label platgorm Reject Review sample from each data labeller New data! Escalate confmicting data labels

CI/CD Data Version Control

Add to branch of dataset Runs tests on data

Data Version Control

Merge into dataset Approve

slide-74
SLIDE 74

The Checklist

Fjrst Release Careful metric selection Threshold selection strategy Explain Predictions Fairness Framework

slide-75
SLIDE 75

The Checklist

Afuer Fjrst Release

ML Problem Tracker Problem Triage Strategy Reproducible Training Comparable Results Result Management Be able to answer why

slide-76
SLIDE 76

The Checklist

Long term improvements & maintenance

Data refresh strategy Data Version Control CI/CD or Metrics for Data Data Labeller Platgorm + Strategy Robustness & Unceruainty

slide-77
SLIDE 77

Things I didn’t cover

Pipelines & Orchestration Kubefmow, MLFlow End-to-end Products TFX, Sage Maker, Azure ML Unit Testing ML systems “Testing your ML pipelines” by Kristina Georgieva Debugging ML models A fjeld guide to fjxing your neural network model by Josh Tobin. Privacy Google’s Federated Learning Hyper parameter optimization So many!

slide-78
SLIDE 78

Tie End

@alienelf ja@retrorabbit.co.za htups://retrorabbit.co htups://kalido.me htups://masakhane.io