Scaling Data Products Under Startup Constraints A Case Study of ML - - PowerPoint PPT Presentation

scaling data products under startup constraints
SMART_READER_LITE
LIVE PREVIEW

Scaling Data Products Under Startup Constraints A Case Study of ML - - PowerPoint PPT Presentation

Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing Edwin Ong @edwin Co-Founder, TinyData Founded CastTV (acquired by Tribune)


slide-1
SLIDE 1

Scaling Data Products Under Startup Constraints

A Case Study of ML Bias Testing

slide-2
SLIDE 2

Scaling Data Products Under Startup Constraints

A Case Study of ML Bias Testing

slide-3
SLIDE 3

Edwin Ong @edwin Co-Founder, TinyData

Founded CastTV (acquired by Tribune) Founded FileFish (acquired by Oracle) Stanford Symbolic Systems

slide-4
SLIDE 4

TinyData

  • Help other companies make data products
  • Make our own data products
slide-5
SLIDE 5

Problem: Testing Machine Learning in Production

  • Tools for machine learning testing in training
  • Not as many tools for machine learning testing in

production

  • Different tools needed because ML testing is different

from traditional software testing

slide-6
SLIDE 6

Traditional Software Has Deterministic Outcomes

slide-7
SLIDE 7

Traditional Software Has Deterministic Outcomes

slide-8
SLIDE 8

ML Has Probabilistic Outcomes

Dog vs Muffin given new user input

slide-9
SLIDE 9

ML Has Probabilistic Outcomes That Change Over Time

Version 1: Muffin (59%) Version 2: Muffin (66%)

slide-10
SLIDE 10

ML Platforms Often End at Deploy

New User Input

Production Testing ML Chaos Engineering

slide-11
SLIDE 11

Requirements for Production ML Testing Tool

  • 1. “Entropy”: Generation of new inputs against model

servers

  • 2. Recording of outputs from model servers
  • 3. Feedback loop for additional training
slide-12
SLIDE 12

Challenges for Building as a Startup

  • 1. Need access to non-toy model servers
  • 2. Need access to generated data for testing model servers
slide-13
SLIDE 13

Access to Non-Toy Model Servers

slide-14
SLIDE 14

Non-Toy Model Servers: Commercial Cloud Services

slide-15
SLIDE 15

Commercial Image Recognition Services

  • Opaque systems
  • Object and scene detection, facial recognition, facial analysis,

NSFW detection, text detection

  • Facial analysis includes gender detection
slide-16
SLIDE 16

GenderShades.org

slide-17
SLIDE 17

Testing Commercial Systems for Gender Bias

  • Testing = Finding cases where trained systems fail
  • Hypothesis: Gender labels are trained on traditional images
  • What if we generate “non-traditional” images?
slide-18
SLIDE 18

Training Data Test Data

Training Data vs Test Data

slide-19
SLIDE 19

A Man with Long Hair

slide-20
SLIDE 20

A Man with Long Hair

slide-21
SLIDE 21

A Man with Long Hair

slide-22
SLIDE 22

A Woman with Short Hair

slide-23
SLIDE 23

A Woman with Short Hair

slide-24
SLIDE 24

A Woman with Short Hair

slide-25
SLIDE 25

A Woman with Short Hair

slide-26
SLIDE 26

A Woman with Short Hair

slide-27
SLIDE 27

Woman with Long Hair

slide-28
SLIDE 28

Woman with Long Hair

slide-29
SLIDE 29

“Facial Analysis”?

slide-30
SLIDE 30

Data Generation

slide-31
SLIDE 31

Data Generation

slide-32
SLIDE 32

Prototype Data

slide-33
SLIDE 33

Global Standard

slide-34
SLIDE 34

Data Generation

slide-35
SLIDE 35

Data Generation

slide-36
SLIDE 36

Data Generation

slide-37
SLIDE 37

Woman with Short Hair

slide-38
SLIDE 38

Woman with Short Hair

slide-39
SLIDE 39

Man with Long Hair

slide-40
SLIDE 40

Man with Long Hair

slide-41
SLIDE 41

Man with Long Hair

slide-42
SLIDE 42

Man with Long Hair

slide-43
SLIDE 43

Man with Makeup

slide-44
SLIDE 44

Man with Makeup

slide-45
SLIDE 45

Man with Makeup

slide-46
SLIDE 46

Man with Makeup

slide-47
SLIDE 47

Man with Makeup

slide-48
SLIDE 48

Man with Makeup

slide-49
SLIDE 49

Automating Data Generation + Testing

slide-50
SLIDE 50

Automating Data Generation + Testing

slide-51
SLIDE 51

Tracking Results Over Time

slide-52
SLIDE 52

Takeaways

  • Even the best trained commercial ML systems are far

from perfect

  • Systems return different results over time as new

versions get deployed

  • Cumbersome & intractable to test without tools &

automation

slide-53
SLIDE 53

Scaling Data Products as a Startup

  • Bootstrap servers with commercial APIs
  • Bootstrap data with open web, public & synthetic

datasets

  • Automation is startups’ best friend
slide-54
SLIDE 54

Questions / Comments

edwin@tinydata.co Twitter: @edwin