ML Alice was ey eycited! Lots of tutorials Loads of resources ML - PowerPoint PPT Presentation

Alice was ey eycited! Lots of tutorials Loads of resources ML Endless ey eyamples Fast paced research

How to even data science?

How to even data science? https://miro.medium.com/max/1552/1*Nv2NNALuokZEcV6hYEHdGA.png

Challenge How to make this work in the real world?

Machine Learning’s Surprises A Checklist for Developers when Building ML Systems

Hi, I’m Jade Abbott @alienelf masakhane.io

Hi, I’m Jade Abbott

Surprises while... Trying to deploy Trying to improve the model the model Afuer deployment of model

Some context ❖ I won’t be talking about training machine learning models ❖ I won’t be talking about which models to chose ❖ I work primarily in deep learning & NLP ❖ I am a one person ML team working in a staruup context ❖ I work in a normal world where data is scarce and we need to collect more

Ti Tie Problem Yes, they should I want to meet... meet No they I can provide... shouldn’t Embedding + LSTM + Downstream NN

Ti Tie Problem Yes, they should I want to meet... meet someone to look after my cat No they I can provide... shouldn’t pet sitting cat breeding software development Language Model + Downstream Task chef lessons

Ti Tie Problem Yes, they should I want to meet... meet someone to look after my cat No they I can provide... shouldn’t pet sitting cat breeding software development chef lessons

Ti Tie Problem Yes, they should I want to meet... meet someone to look after my cat The Model No they I can provide... shouldn’t pet sitting cat breeding software development chef lessons

Surprises Surprises trying to deploy the model

Ey Eypectations train & evaluate model CI/CD API model Unit Tests user testing

Surprise #1 Is the model good enough?

75% Accuracy

Pergormance Metrics Business needs to understand it ❖ Active discussion about pros & cons ❖ Get sign ofg ❖ Threshold selection strategy ❖

Surprise #2 Can we trust it?

Husky/Dog Classifjer Skin Cancer Detection 1. https://visualsonline.cancer.gov/details.cfm?imageid=9288 2. htups://arxiv.org/pdf/1602.04938.pdf

Explanations

htups://github.com/marcotcr/lime https://pair-code.github.io/what-if-tool/

Surprise #3 Will this model harm users?

“ Racial bias in a medical algorithm favors white patients over sicker black patients” Washington Pot ott

“ Racist robots, as I invoke them here, represent a much broader process: social bias embedded in technical aruifacts, the allure of objectivity without public accountability” ~ Ruha Benjamin @ruha9

“What are the unintended consequences of designing systems at scale on the basis of existing patuerns of society?” ~ M.C. Eilish & Danah Boyd, Don’t Believe Every AI You See @m_c_elish @zephoria

❖ Word2Vec has known gender and race biases ❖ It’s in English ❖ Is it robust to spelling errors? ❖ How does it pergorm with malicious data?

❖ Word2Vec has known gender and race biases Make it measurable! ❖ It’s in English ❖ Is it robust to spelling errors? ❖ How does it pergorm with malicious data?

htups://pair-code.github.io htup://aif360.mybluemix.net htups://github.com/fairlearn/fairlearn htups://github.com/jphall663/awesome-machine-learning-interpretability

Ey Eypectations train & evaluate model CI/CD API model Unit Tests user testing

Reality choose a useful metric Evaluate model model Choose threshold API Explain predictions Fairness Framework Unit user Tests testing

Surprises Surprises afuer deploying the model

Ey Eypectations user drop ofg agile cycle Bug Triage bug tracking tool reproduce, debug, fjx, release user testing logs a bug or submits a complaint

Surprise #5 I want to meet a doctor I can provide marijuana and other drugs which improves health

Surprise #5 The model has some “bugs”

Surprise #5 continued... ❖ What is a model “bug” ❖ How to fjx the bug? ❖ When is the “bug” fjxed? ❖ How do I ensure test regression? ❖ “Bug” priority?

Surprise #5 I want to meet a doctor I can provide marijuana and other drugs which improves health

Add to your test set Describing the “bugs” Prediction Target False Potitive I can provide marijuana and other drugs I want to meet a doctor YES NO which improves health True Negative I can provide marijuana I want to meet a doctor NO NO I can provide drugs for cancer patients I want to meet a doctor YES NO I can provide general practitioner I want to meet a doctor NO YES False Negative services I can provide medicine I want to meet a drug addiction sponsor YES YES True Potitives I can provide medicine I want to meet a pharmacist YES YES I can provide illegal drugs I want to meet a drug dealer YES NO

Is my “bug” fiy fiyed? Classifjcation Error politicians-false-neg designers-too-general drugs-doctors-false-pos tech-too-general Candidate Model Over Time

How do we triage these “bugs”?

How do we triage these “bugs”? % Users Afgected x Normalized Error x Harm

How do we triage these “bugs”? Problem Impact Error the-arus-too-general 2.931529 health-more-specific 1.53985 brand-marketing-social-media 1.285735 developer 1.054248 1-services 0.960129

Surprise #6 Is this new model betuer than my old model?

A lice replied, rather shyly, “I—I hardly know, sir, just at present—at least I know who I was when I got up this morning, but I think I must have changed several times since then.”

Why is model comparison hard?

Living Test Set 0.8 0.75

Re-evaluate ALL models 0.72 0.75

Surprise #7 I demoed the model yesterday and it went ofg-script! What changed?

Surprise #7 Why is the model doing something difgerently today?

What changed? ❖ My data? ❖ My model? ❖ My preprocessing?

Experiment How to fi figure out what changed? Metadata Store Results Model Repository Repository experiment: 3 model-3 data: ea2541df code: da1341bb desc: “Added feature to training pipeline” CI/CD run_on: 10-10-2019 completed_on:11-10-2019 model: model-3 results: 3 ea2541df da1341bb Data Repository Code repository

Ey Eypectations user drop ofg agile cycle Prioritization bug tracking tool reproduce, debug, fjx user testing logs a bug or submits a complaint

Actual Add to Describe Calculate Identify model bug Triage user reporus problem Priority problem tracking bug with test tool patuerns “Agile Sprint” Pick Problem - Evaluate model against other models Retrain - Gather More - Evaluate individual Data for Problem problems - Change Model - Select model - Create Features

Surprises Surprises maintaining and improving the model over time

Ey Eypectation Generate/select Add to Get them Retrain unlabelled data set Pick an issue labelled patuerns

Surprise #8 User behaviour drifus

Now what? ● Regularly sample data from production for training ● Regularly refresh your test set

Surprise #9 Data labellers are rarely experus

Surprise #10 The model is not robust

Surprise #10 The model knows when it’s unceruain

Techniques for detecting robustness & uncertainty ❖ Sofumax predictions that are unceruain ❖ Dropout at Inference ❖ Add noise to data and see how much output changes

Surprise #11 Changing and updating the data so ofuen gets messy

Needed to check the following ● Data Leakage ● Duplicates ● Distributions

Ey Eypectation Add to Get them Generate/select Retrain data set Pick an issue labelled unlabelled patuerns

Reject Actual Review Get data labelled Pick Generate/select sample from Problem on crowdsourced unlabelled data each data platgorm labeller Approve Model tells you Escalate which patuerns confmicting it’s unceruain data labels about Data Version Data Version Experu data label Control Control CI/CD platgorm Runs tests on Add to branch of Merge into dataset New data! data dataset

The Checklist Fjrst Release Careful metric selection Threshold selection strategy Explain Predictions Fairness Framework

The Checklist Afuer Fjrst Release ML Problem Tracker Problem Triage Strategy Reproducible Training Comparable Results Result Management Be able to answer why

The Checklist Long term improvements & maintenance Data refresh strategy Data Version Control CI/CD or Metrics for Data Data Labeller Platgorm + Strategy Robustness & Unceruainty

ML Alice was ey eycited! Lots of tutorials Loads of resources ML - PowerPoint PPT Presentation

ML Alice was ey eycited! Lots of tutorials Loads of resources ML Endless ey eyamples Fast paced research How to even data science? How to even data science? https://miro.medium.com/max/1552/1*Nv2NNALuokZEcV6hYEHdGA.png Challenge How

Structural Loads Structural Loads Table 1. Typical Design Dead Loads Dead Loads: Gravity loads of

Structural Loads Structural Loads Dead Loads: Gravity loads of constant magnitudes and fixed t

01 Meet ALICE ALICE A sset L imited I ncome C onstrained E mployed ALICE How we learned

Critical Loads Critical Loads Tim Sullivan Tim Sullivan and and Jack Cosby Jack Cosby

Tutorials By Dr Sharon Truter To the Tutorials By Dr Sharon Truter What to expect from the

Introduction to Alice Alice is named in honor of Lewis Carroll s Alice in Wonderland Slides

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving

2018 ALICE Report Overview Do You Know ALICE? ALICE is an acronym that stands for Asset

Introduction to Alice 23 October 2012 Alice is named in honor of Lewis Carroll s Alice in

DIGITAL SIGNATURES 1 / 74 Signing by hand ALICE Pay Bob $100 COSMO ALICE

Load Introduction and Design Load Case Loads and Stresses Loads and Stresses Wing/fuselage

CS 147: Computer Systems Performance Analysis Test Loads 1 / 33 Overview CS147 Overview

Reading with your child Steps to reading Talking chatting lots and lots and lots (and

Pine Grove Area School District ALiCE Implementation What is ALiCE? ALiCE is an options-based

An introduction of the ALICE - FAIR prototype Dr. Charalampos S. Kouzinopoulos CERN ALICE

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Git Best Practices Viceniu Ciorbaru Software Engineer @ MariaDB Foundation Agenda Regular

Duplicate bug report detection through machine learning techniques Irving Muller Rodrigues

How to run a successful hackathon? Lessons learned from 8 hackathon/bug smash events in China

Introduc)on to Bridging Professional Development Bringing

The Power of Cycling through education, advocacy and community engagement Bio and what makes me

Little Bulldog Academy Batesville Community School Corpora5on Implementa5on

A CCESSIBILITY R ESOURCE C ENTER FOR PEOPLE WITHOUT DISABILITIES, ACCESSIBILITY MAKES THINGS