data-driven AI Using data about models to accelerate ML development - PowerPoint PPT Presentation

How Captricity built a human-level handwriting recognition engine using data-driven AI Using data about models to accelerate ML development Ramesh Sridharan @tweetsbyramesh

Machine learning has the potential to change industries

…but ML in real-world production workflows can be hazardous

Example: Combining models ✓ Validation Model 1 ✓ data ✓ Validation Model 2 ✓ data R&D: phase 1

Example: Combining models ✓ ✓ Stacked Held-out test data model ✓ R&D: phase 2

Example: Combining models ✓ ✓ Stacked Real-world data model ✗ Production: week 1

Example: Combining models ✗ ✗ Stacked Real-world data model ✗ Production: week 4

Example: Combining models • Silent failures go undetected ✗ • Can’t inspect model inputs/outputs ✗ Stacked Real-world data model • Rerunning models can be costly ✗ • Debugging is hard Production: week 4

Challenges • When input conditions change, ML models can be unpredictable • Unpredictability slows productionizing ML models

Outline • How Captricity works • Data-driven ML deployment • Data-driven ML development

How Captricity works

How Captricity works (dummy data)

How Captricity works Smoker 561-80-0123 Tristan Chan 7/22/1950 (dummy data) To customer ! Machine Learning review Decision algorithms Training data Crowdsourcing

Challenge: scan quality

Challenge: How can we accelerate the deployment of ML research into production?

Solution: track all models Input Output Model Metrics input output correctness model_snapshot …

Provide access to aggregate metrics • Company-wide daily email • ML performance snapshot • Critical business metrics

Challenge: models will fail ✗ ✓

Challenge: models will fail ✗ ✓ Solution: model tracking enables identification, debugging and data curation

Challenge: state changes ✓ Model F757558 Crowd F757558

Challenge: state changes ✗ Model F757558 Crowd F757558 Customer/ expert E 757558

Challenge: state changes ✗ Model F757558 Crowd F757558 Solution: capture everything needed to reproduce state Customer/ expert E 757558

Parallel testing 91% Metrics Data Model v3.0 Model v4.0 94% ✓ Metrics

Automatic model activation Model Metrics Crowd Evaluation

Challenge: How do we accelerate the deployment of research into production? Key learning: Monitor and instrument all predictions from all ML models

Data-driven ML deployment • Carefully track every prediction from every model • Provide easy access to aggregation and reporting • Track any and all factors correlated with low accuracy • Capture all state to reproduce results – Training data – Model snapshot – Pre- and post-processing

Challenge: How do we determine which (sub-)problems to tackle with ML?

Evaluation Input Results Model Evaluation

Challenge: How do we determine which (sub-)problems to tackle with ML? Key learning: Collect data about input problem space, and use it to prioritize subproblems

Key Learnings • Gather data on all predictions from all models –Enables debugging, deployment, and decision-making –Capture relevant state information • Use data about inputs to drive problem-solving Questions? Ramesh Sridharan @tweetsbyramesh rameshs@captricity.com

data-driven AI Using data about models to accelerate ML development - PowerPoint PPT Presentation

How Captricity built a human-level handwriting recognition engine using data-driven AI Using data about models to accelerate ML development Ramesh Sridharan @tweetsbyramesh Machine learning has the potential to change industries but ML

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

False fasting is driven by pride False fasting is driven by pride False fasting is

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

SCE Map Update: Data-Driven Spatial and E Field Maps Michael Mooney, Hannah Rogers Colorado

Gillian Smith September 13, 2012 gillian@ccs.neu.edu Graphics-Driven Game Design

Domain Driven Domain Driven Design with relational Design with relational Databases and Spring

Data Driven Marketing the DNA of customer oriented companies 00101001 yes no Data Driven

1 Data-dr Data-driven philosophy n philosophy Data-dr Data-driven: push n: push 7 8

CS 528 Mobile and Ubicomp Lecture 3a: Data-Driven Layouts & Android Components Emmanuel Agu

Social Media Advocacy and Social Media Advocacy and Data Driven Outreach Data Driven Outreach

Enabling the data driven organisation TFMA 2015 Terry Hogan Golden Orb TFM&A 2015 -

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Data-driven Clustering via Parameterized Lloyds Families Travis Dick Joint work with

The Data-Driven Web of Now Extending D3js Travis Smith Developer Evangelist Atlassian

Data-driven COVID-19 modeling Data-driven COVID-19 modeling 1 Cyprien Neverov th August 28 ,

Strangle The Monolith A Data Driven Approach Amjad Sidqi, Associate Director | Pivotal Labs David

PARALLEL SESSION B SUPPLIES TO MULTIPLE LOCATION ENTITIES 17-18 April 2014 Tokyo, Japan Rob

While waiting for our session to begin: 1. Make sure you have a DARS report with your intended

Debugging Highly-Parallel Programs Joo M. Loureno , Jos C. Cunha and Vitor Duarte CITI /

& Parallel Computing New Syllabus 2019-20 Visit : python.mykvs.in for regular updates

From Serial to Parallel A simple training using the Martix-Vector multiplication algorithm Petros

PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun Dai BGP International Inc,

Photovoltaic-Thermal Systems (PVT) achieve market relevance Thomas Ramschak, AEE INTEC

Full Network Model: Scheduling and Pricing Scott Harvey Member: California Market Surveillance

data-driven AI Using data about models to accelerate ML development - PowerPoint PPT Presentation

How Captricity built a human-level handwriting recognition engine using data-driven AI Using data about models to accelerate ML development Ramesh Sridharan @tweetsbyramesh Machine learning has the potential to change industries but ML

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

False fasting is driven by pride False fasting is driven by pride False fasting is

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

SCE Map Update: Data-Driven Spatial and E Field Maps Michael Mooney, Hannah Rogers Colorado

Gillian Smith September 13, 2012 gillian@ccs.neu.edu Graphics-Driven Game Design

Domain Driven Domain Driven Design with relational Design with relational Databases and Spring

Data Driven Marketing the DNA of customer oriented companies 00101001 yes no Data Driven

1 Data-dr Data-driven philosophy n philosophy Data-dr Data-driven: push n: push 7 8

CS 528 Mobile and Ubicomp Lecture 3a: Data-Driven Layouts &amp; Android Components Emmanuel Agu

Social Media Advocacy and Social Media Advocacy and Data Driven Outreach Data Driven Outreach

Enabling the data driven organisation TFMA 2015 Terry Hogan Golden Orb TFM&amp;A 2015 -

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Data-driven Clustering via Parameterized Lloyds Families Travis Dick Joint work with

The Data-Driven Web of Now Extending D3js Travis Smith Developer Evangelist Atlassian

Data-driven COVID-19 modeling Data-driven COVID-19 modeling 1 Cyprien Neverov th August 28 ,

Strangle The Monolith A Data Driven Approach Amjad Sidqi, Associate Director | Pivotal Labs David

PARALLEL SESSION B SUPPLIES TO MULTIPLE LOCATION ENTITIES 17-18 April 2014 Tokyo, Japan Rob

While waiting for our session to begin: 1. Make sure you have a DARS report with your intended

Debugging Highly-Parallel Programs Joo M. Loureno , Jos C. Cunha and Vitor Duarte CITI /

&amp; Parallel Computing New Syllabus 2019-20 Visit : python.mykvs.in for regular updates

From Serial to Parallel A simple training using the Martix-Vector multiplication algorithm Petros

PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun Dai BGP International Inc,

Photovoltaic-Thermal Systems (PVT) achieve market relevance Thomas Ramschak, AEE INTEC

Full Network Model: Scheduling and Pricing Scott Harvey Member: California Market Surveillance

CS 528 Mobile and Ubicomp Lecture 3a: Data-Driven Layouts & Android Components Emmanuel Agu

Enabling the data driven organisation TFMA 2015 Terry Hogan Golden Orb TFM&A 2015 -

& Parallel Computing New Syllabus 2019-20 Visit : python.mykvs.in for regular updates