Scaling Data Products Under Startup Constraints A Case Study of ML - PowerPoint PPT Presentation

Apr 20, 2023 •459 likes •1.01k views

Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing Edwin Ong @edwin Co-Founder, TinyData Founded CastTV (acquired by Tribune)

Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing
Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing
Edwin Ong @edwin Co-Founder, TinyData Founded CastTV (acquired by Tribune) Founded FileFish (acquired by Oracle) Stanford Symbolic Systems
TinyData ● Help other companies make data products ● Make our own data products
Problem: Testing Machine Learning in Production ● Tools for machine learning testing in training ● Not as many tools for machine learning testing in production ● Different tools needed because ML testing is different from traditional software testing
Traditional Software Has Deterministic Outcomes
Traditional Software Has Deterministic Outcomes
ML Has Probabilistic Outcomes Dog vs Muffin given new user input
ML Has Probabilistic Outcomes That Change Over Time Version 1: Muffin (59%) Version 2: Muffin (66%)
ML Platforms Often End at Deploy New User Input Production Testing ML Chaos Engineering
Requirements for Production ML Testing Tool 1. “Entropy”: Generation of new inputs against model servers 2. Recording of outputs from model servers 3. Feedback loop for additional training
Challenges for Building as a Startup 1. Need access to non-toy model servers 2. Need access to generated data for testing model servers
Access to Non-Toy Model Servers
Non-Toy Model Servers: Commercial Cloud Services
Commercial Image Recognition Services Opaque systems ● Object and scene detection, facial recognition, facial analysis, ● NSFW detection, text detection Facial analysis includes gender detection ●
GenderShades.org
Testing Commercial Systems for Gender Bias Testing = Finding cases where trained systems fail ● Hypothesis: Gender labels are trained on traditional images ● What if we generate “non-traditional” images? ●
Training Data vs Test Data Training Data Test Data
A Man with Long Hair
A Man with Long Hair
A Man with Long Hair
A Woman with Short Hair
A Woman with Short Hair
A Woman with Short Hair
A Woman with Short Hair
A Woman with Short Hair
Woman with Long Hair
Woman with Long Hair
“Facial Analysis”?
Data Generation
Data Generation
Prototype Data
Global Standard
Data Generation
Data Generation
Data Generation
Woman with Short Hair
Woman with Short Hair
Man with Long Hair
Man with Long Hair
Man with Long Hair
Man with Long Hair
Man with Makeup
Man with Makeup
Man with Makeup
Man with Makeup
Man with Makeup
Man with Makeup
Automating Data Generation + Testing
Automating Data Generation + Testing
Tracking Results Over Time
Takeaways ● Even the best trained commercial ML systems are far from perfect ● Systems return different results over time as new versions get deployed ● Cumbersome & intractable to test without tools & automation
Scaling Data Products as a Startup ● Bootstrap servers with commercial APIs ● Bootstrap data with open web, public & synthetic datasets ● Automation is startups’ best friend
Questions / Comments edwin@tinydata.co Twitter: @edwin

Recommend

TEN things every startup TE founder can do to un-fail their startup Colin Kinner | Founder,

8/9/17 TEN things every startup TE founder can do to un-fail their startup Colin Kinner | Founder, Startup Onramp | 2 September 2017 @ColinKinner 1 8/9/17 billion billion 2 8/9/17 Startup = growth Paul Graham, Y Combinator NOT

986 views • 36 slides

Device connection and startup 1 computer startup startup via network bootp

Device connection and startup 1 computer startup startup via network bootp connection to the network 2 when powered on the CPU sets the PC (program counter) on a predefined value challenge: what value is the PC set to on an

591 views • 41 slides

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

Scaling Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Scaling-at-large Principles of Complex Systems Allometry Allometry Definitions Definitions Course 300, Fall, 2008 Examples Examples

568 views • 27 slides

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software with Jonas Bonr CTO Typesafe @jboner Scaling Scaling software with software with Scaling Scaling software with software with Akka

1.98k views • 174 slides

How to make a startup become a scale up About today My background Latest startup tech

How to make a startup become a scale up About today My background Latest startup tech trends The Nine Steps for Startup Success 1. Expectations 2. The Idea 3. Cash is King 4. Technology 5. Team 6. UAT 7. Branding

724 views • 26 slides

JUNE 2019 INDIA: 2 nd Largest Startup Ecosystem 22,000+ Startups in India Regional Startup

JUNE 2019 INDIA: 2 nd Largest Startup Ecosystem 22,000+ Startups in India Regional Startup Density 270 11 2000 0 100 0 INCUBATORS & AVERAGE JOBS Delhi ACCELERATORS PER STARTUP* Jaipur 231 28yrs Ahmedabad Indore ANGEL INVESTOR

180 views • 17 slides

Estonian Startup Ecosystem Tallinn, 2019 startupestonia.ee #startupestonia Maarika Truu Head of

Estonian Startup Ecosystem Tallinn, 2019 startupestonia.ee #startupestonia Maarika Truu Head of Startup Estonia Estonia. Your startup OS. Start Europe here startupestonia.ee #startupestonia Estonian Startup Scene in Numbers 650 startups, of

647 views • 20 slides

How to Become a Unicorn WHAT IS A UNICORN? STARTUP UNICORN A privately held startup

How to Become a Unicorn WHAT IS A UNICORN? STARTUP UNICORN A privately held startup company with a current valuation of US $1 billion or more. WHAT IS A UNICORN? STARTUP UNICORN NONPROFIT UNICORN A privately held A

654 views • 16 slides

AN OVERVIEW Marketing Role of Engineering in Marketing Startup Role of Marketing in Startup

STARTUP MARKETING AN OVERVIEW Marketing Role of Engineering in Marketing Startup Role of Marketing in Startup Customer Discovery Needs & Wants Segmentation. Adoption Value Proposition Go to Market Positioning & Branding

940 views • 61 slides

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Rafael Oliveira University of Toronto Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms Three Step Analysis Generalization One More Application of Scaling Non-Negative Matrices &

604 views • 15 slides

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive breastfeeding breastfeeding Creating Distt. Level Model Creating Distt. Level Model Effectively scaling up /universalizing Effectively scaling

574 views • 12 slides

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent publications Insight Series Tool Animation The rationale for scaling As the SDGs require transformational change, scaling can provide: Reaching more

609 views • 37 slides

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Scaling Scaling Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of Complex Systems Allometry Allometry CSYS/MATH 300, Fall, 2010 Definitions Definitions Examples Examples History: Metabolism

251 views • 20 slides

Product presentation Content Types of products Pre-cooked products Delicacy products

Product presentation Content Types of products Pre-cooked products Delicacy products SWOT Contacts Types of products 1/ Pre-cooked products 2/ Delicacy products Pre-cooked products 1. Chicken nuggets European 1kg European

175 views • 14 slides

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

The Business of Making Strategies for Success from Startup to Exit Hardware and Robotic Startup Accelerator what hardware used to be . VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to Entry Open

605 views • 32 slides

Project Management in Startups Dr. Blent Gm Key Expert, Outspeed Start-up Project

Project Management in Startups Dr. Blent Gm Key Expert, Outspeed Start-up Project Co-founder, innoCentrum bulent.gumus@pmi.org.tr Outline Outspeed Startup Project Startup Survey Basics of Startup World Favorite Startup

954 views • 61 slides

Cheap Tricks and the Perils of Machine Learning Percy Liang Stanford / (Semantic Machines /

Cheap Tricks and the Perils of Machine Learning Percy Liang Stanford / (Semantic Machines / Microsoft) NAACL Workshop on New Forms of Generalization June 5, 2018 [with Pranav Rajpurkar et al; 2016] Reading Comprehension (SQuAD) 1 2 [with

1.16k views • 88 slides

Whats wrong with these sentences? I am anxious to meet you. Fred bit off more than he

Paragraph and Section Development Whats wrong with these sentences? I am anxious to meet you. Fred bit off more than he could chew. Paragraph and Section Development Avoid cliches. Hold your tongue Beat a dead horse March

211 views • 10 slides

Introduction Alessandro Moschitti Department of Computer Science and Information Engineering

MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule - Revised 27 apr 9:30-12:30 Garda (Introduction to Machine

801 views • 68 slides

Selective Search for Object Recognition J.R.R. Uijlings 1,2 , K.E.A. van de Sande 2 , T.

Selective Search for Object Recognition J.R.R. Uijlings 1,2 , K.E.A. van de Sande 2 , T. Gevers 2 , and A.W.M. Smeulders 2 1 University of Trento, Italy 2 University of Amsterdam, the Netherlands Technical Report 2012, submitted to IJCV

673 views • 31 slides

Enthymemes as Rhetorical Resources Ellen Breitholtz and Robin Cooper Department of Philosophy,

Enthymemes as Rhetorical Resources Ellen Breitholtz and Robin Cooper Department of Philosophy, Linguistics and Theory of Science University of Gothenburg June 17th 2011 Consider the interpretation of rise in (1): (1) Cherrilyn: Yeah I mean

999 views • 89 slides

Primitive Types and Strings Variables, Values, and Expressions The Class String

Primitive Types and Strings Variables, Values, and Expressions The Class String Reading: => Section 1.2 1 Variables and Values Variables are memory locations that store data such as numbers and letters. The data stored by

844 views • 62 slides

Distilling Collective Intelligence from Twitter Crowdsourcing and Human Computation Lecture 17

Distilling Collective Intelligence from Twitter Crowdsourcing and Human Computation Lecture 17 Instructor: Chris Callison-Burch TA: Ellie Pavlick Website: crowdsourcing-class.org Todays slides come courtesy of Miles Osborne and Benjamin

1.05k views • 67 slides

Glitchy App? Compsci 201 Collections, Hashing, Objects Susan Rodger February 5, 2020 2/5/2020

Glitchy App? Compsci 201 Collections, Hashing, Objects Susan Rodger February 5, 2020 2/5/2020 CompSci 201, Spring 2020 1 2/5/2020 CompSci 201, Spring 2020 2 H is for Announcements Assignment P2 out later this week Hashing

340 views • 9 slides