How Captricity built a human-level handwriting recognition engine using
data-driven AI
Using data about models to accelerate ML development
Ramesh Sridharan @tweetsbyramesh
data-driven AI Using data about models to accelerate ML development - - PowerPoint PPT Presentation
How Captricity built a human-level handwriting recognition engine using data-driven AI Using data about models to accelerate ML development Ramesh Sridharan @tweetsbyramesh Machine learning has the potential to change industries but ML
Using data about models to accelerate ML development
Ramesh Sridharan @tweetsbyramesh
Machine learning has the potential to change industries
…but ML in real-world production workflows can be hazardous
Example: Combining models
R&D: phase 1
Validation data Validation data
Model 1 ✓ Model 2 ✓
Example: Combining models
Stacked model
✓ ✓
R&D: phase 2
Held-out test data
Example: Combining models
Real-world data
Stacked model
✓ ✗
Production: week 1
Example: Combining models
Stacked model
✗ ✗
Production: week 4
Real-world data
Example: Combining models
Stacked model
✗ ✗
Production: week 4
Real-world data
Challenges
Outline
Outline
How Captricity works
How Captricity works
(dummy data)
How Captricity works
(dummy data)
Training data
Crowdsourcing
Decision algorithms
To customer review
How Captricity works
Tristan Chan 561-80-0123 7/22/1950 Smoker
(dummy data)
Challenge: scan quality
Outline
Challenge: How can we accelerate the deployment of ML research into production?
Solution: track all models
Metrics
Input
Model
Output input
correctness model_snapshot …
Provide access to aggregate metrics
Challenge: models will fail
Challenge: models will fail
Challenge: state changes
F757558 Model Crowd F757558
Challenge: state changes
F757558 Model Crowd F757558
Customer/ expert E757558
Challenge: state changes
F757558 Model Crowd F757558
Customer/ expert E757558
Parallel testing
Metrics Metrics
Data
Model v3.0 Model v4.0
91% 94% ✓
Automatic model activation
Crowd
Evaluation Metrics Model
Challenge: How do we accelerate the deployment of research into production? Key learning: Monitor and instrument all predictions from all ML models
– Training data – Model snapshot – Pre- and post-processing
Data-driven ML deployment
Outline
Challenge: How do we determine which (sub-)problems to tackle with ML?
Evaluation
Input
Model
Results
Evaluation
Challenge: How do we determine which (sub-)problems to tackle with ML? Key learning: Collect data about input problem space, and use it to prioritize subproblems
–Enables debugging, deployment, and decision-making –Capture relevant state information
Key Learnings
Ramesh Sridharan @tweetsbyramesh rameshs@captricity.com