data-driven AI Using data about models to accelerate ML development - - PowerPoint PPT Presentation

data driven ai
SMART_READER_LITE
LIVE PREVIEW

data-driven AI Using data about models to accelerate ML development - - PowerPoint PPT Presentation

How Captricity built a human-level handwriting recognition engine using data-driven AI Using data about models to accelerate ML development Ramesh Sridharan @tweetsbyramesh Machine learning has the potential to change industries but ML


slide-1
SLIDE 1

How Captricity built a human-level handwriting recognition engine using

data-driven AI

Using data about models to accelerate ML development

Ramesh Sridharan @tweetsbyramesh

slide-2
SLIDE 2

Machine learning has the potential to change industries

slide-3
SLIDE 3

…but ML in real-world production workflows can be hazardous

slide-4
SLIDE 4

Example: Combining models

R&D: phase 1

✓ ✓

Validation data Validation data

Model 1 ✓ Model 2 ✓

slide-5
SLIDE 5

Example: Combining models

Stacked model

✓ ✓

R&D: phase 2

Held-out test data

slide-6
SLIDE 6

Example: Combining models

Real-world data

Stacked model

✓ ✗

Production: week 1

slide-7
SLIDE 7

Example: Combining models

Stacked model

✗ ✗

Production: week 4

Real-world data

slide-8
SLIDE 8

Example: Combining models

Stacked model

✗ ✗

Production: week 4

Real-world data

  • Silent failures go undetected
  • Can’t inspect model inputs/outputs
  • Rerunning models can be costly
  • Debugging is hard
slide-9
SLIDE 9
  • When input conditions change, ML models can be unpredictable
  • Unpredictability slows productionizing ML models

Challenges

slide-10
SLIDE 10
  • How Captricity works
  • Data-driven ML deployment
  • Data-driven ML development

Outline

slide-11
SLIDE 11
  • How Captricity works
  • Data-driven ML deployment
  • Data-driven ML development

Outline

slide-12
SLIDE 12

How Captricity works

slide-13
SLIDE 13

How Captricity works

(dummy data)

slide-14
SLIDE 14

How Captricity works

(dummy data)

slide-15
SLIDE 15

Training data

Crowdsourcing

! Machine Learning

Decision algorithms

To customer review

How Captricity works

Tristan Chan 561-80-0123 7/22/1950 Smoker

(dummy data)

slide-16
SLIDE 16

Challenge: scan quality

slide-17
SLIDE 17
  • How Captricity works
  • Data-driven ML deployment
  • Data-driven ML development

Outline

slide-18
SLIDE 18

Challenge: How can we accelerate the deployment of ML research into production?

slide-19
SLIDE 19

Solution: track all models

Metrics

Input

Model

Output input

  • utput

correctness model_snapshot …

slide-20
SLIDE 20

Provide access to aggregate metrics

  • Company-wide daily email
  • ML performance snapshot
  • Critical business metrics
slide-21
SLIDE 21

Challenge: models will fail

✓ ✗

slide-22
SLIDE 22

Challenge: models will fail

✓ ✗

Solution: model tracking enables identification, debugging and data curation

slide-23
SLIDE 23

Challenge: state changes

F757558 Model Crowd F757558

slide-24
SLIDE 24

Challenge: state changes

F757558 Model Crowd F757558

Customer/ expert E757558

slide-25
SLIDE 25

Challenge: state changes

F757558 Model Crowd F757558

Customer/ expert E757558

Solution: capture everything needed to reproduce state

slide-26
SLIDE 26

Parallel testing

Metrics Metrics

Data

Model v3.0 Model v4.0

91% 94% ✓

slide-27
SLIDE 27

Automatic model activation

Crowd

Evaluation Metrics Model

slide-28
SLIDE 28

Challenge: How do we accelerate the deployment of research into production? Key learning: Monitor and instrument all predictions from all ML models

slide-29
SLIDE 29
  • Carefully track every prediction from every model
  • Provide easy access to aggregation and reporting
  • Track any and all factors correlated with low accuracy
  • Capture all state to reproduce results

– Training data – Model snapshot – Pre- and post-processing

Data-driven ML deployment

slide-30
SLIDE 30
  • How Captricity works
  • Data-driven ML deployment
  • Data-driven ML development

Outline

slide-31
SLIDE 31

Challenge: How do we determine which (sub-)problems to tackle with ML?

slide-32
SLIDE 32

Evaluation

Input

Model

Results

Evaluation

slide-33
SLIDE 33

Challenge: How do we determine which (sub-)problems to tackle with ML? Key learning: Collect data about input problem space, and use it to prioritize subproblems

slide-34
SLIDE 34
  • Gather data on all predictions from all models

–Enables debugging, deployment, and decision-making –Capture relevant state information

  • Use data about inputs to drive problem-solving

Questions?

Key Learnings

Ramesh Sridharan @tweetsbyramesh rameshs@captricity.com