Bootstrapping Labels for One-Hundred Million Images Jimmy Whitaker - PowerPoint PPT Presentation

Aug 12, 2023 •574 likes •791 views

Bootstrapping Labels for One-Hundred Million Images Jimmy Whitaker We are drowning in data Data Never Sleeps 2.0 - DOMO (2014) 4/5/16 GTC 2016 2 Ripe Opportunities Many problems to solve Limitless amounts of image data Deep

Bootstrapping Labels for One-Hundred Million Images Jimmy Whitaker
We are drowning in data Data Never Sleeps 2.0 - DOMO (2014) 4/5/16 GTC 2016 2
Ripe Opportunities • Many problems to solve • Limitless amounts of image data • Deep Learning pushing State of the Art everywhere • GPUs making everything possible 4/5/16 GTC 2016 3
The Problem • Deep Learning is data-driven • ImageNet has 1.2 million training examples • Few large, labeled image datasets exist • It’s expensive to label data • Our datasets are +100m images • Few qualified to label • Highly sensitive customer data • Necessary subject matter expertise 4/5/16 GTC 2016 4
Ever labeled data? • Not as easy as it seems: • It’s repetitive • Less accurate over time • One day computers will do it all for you? • But not yet • Can some of this effort be automated? 4/5/16 GTC 2016 5
Many Approaches • Mechanical Turk • Pre-trained classifiers • Costly • What if pre-trained • Time Consuming classifiers don’t work • Clustering well on data? • Expensive • Active learning • How many clusters • Iterative labeling • What features to • Open problem use? Can we combine these into something useful? 4/5/16 GTC 2016 6
The Goal • Inspired by Image Similarity experience and Jeremy Howard TED talk • Use machines to filter the noise • Reduce repetitive tasks • Leverage human labeler • Understand the data • Label iteratively • Allow exploration 4/5/16 GTC 2016 7
Our Approach 4/5/16 GTC 2016 8
Our Approach Compare Image Hashes to filter Duplicate Images 4/5/16 GTC 2016 9
Our Approach 4/5/16 GTC 2016 10
Our Approach 4/5/16 GTC 2016 11
Our Approach 4/5/16 GTC 2016 12
Our Approach Prevents over- focusing on one portion of feature space 4/5/16 GTC 2016 13
Our Approach 4/5/16 GTC 2016 14
Our Approach Label Images on the boundary of the class 4/5/16 GTC 2016 15
Our Approach Improve CNN features for labeled classes 4/5/16 GTC 2016 16
GUI 4/5/16 GTC 2016 17
Hardware • Cirrascale GB5670 • 56 CPU Cores • 8x NVIDIA K-80 • 512GB DDR4 • 1 TB SSD 4/5/16 GTC 2016 18
Benefits • Create Large, Labeled Datasets • High quality • Allows data exploration • Dramatic time reduction • ~3-5x faster initially • Multiplicative efficiency gains • Flexible framework • Perform data science with images 4/5/16 GTC 2016 19
CONFIDENTIAL 20

Recommend

PowerPoint2007 hundred percent self-presentation to perfection (with PowerPoint2007 hundred

4HLZOEHYT6VG ^ PDF \\ PowerPoint2007 hundred percent self-presentation to perfection (with CD) self-hundred percent perfect PowerPoint2007 hundred percent self-presentation to perfection (with PowerPoint2007 hundred percent self-presentation to

438 views • 4 slides

2016 Vegetable Pesticide Update: Weeds 1) New/Changed labels 2) Labels soon 3) Auxin Technologies

2016 Vegetable Pesticide Update: Weeds 1) New/Changed labels 2) Labels soon 3) Auxin Technologies drift management New Herbicide Labels for 2016 1) New Select Max Labels 2) Chateau row middle for Cabbage Grass Control Herbicides Select Max

565 views • 35 slides

2012 GFVGA: Herbicide Update 2012 Weed Control Update 1. Recent labels 2. New labels 3. Near

2012 GFVGA: Herbicide Update 2012 Weed Control Update 1. Recent labels 2. New labels 3. Near future labels LABELED OR NOT LABELED FOR GEORGIA! GFVGA Third Party Registrations 1. Labels are held by GFVGA. 2. Label must be in hand of user. 3.

633 views • 35 slides

Bootstrapping without the Boot We like minimally supervised learning (bootstrapping).

Executive Summary (if youre not an executive, you may stay for the rest of the talk) What: Bootstrapping without the Boot We like minimally supervised learning (bootstrapping). Lets convert it to unsupervised learning

344 views • 7 slides

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter to estimate the variation of estimates of the parameter! Data: x 1 , . . . , x n drawn from a parametric distribution F ( ). Estimate by a

63 views • 5 slides

The Hundred Years War 1337 - 1453 Causes of The Hundred Years War William of Normandy (the

The Hundred Years War 1337 - 1453 Causes of The Hundred Years War William of Normandy (the Conqueror) became King of England in 1066 Edward III of England refused to pay homage to Philip VI of France, so he took Edwards

347 views • 8 slides

Noun Phrases February 13, 2017 Next assignments Hundred noun phrases Hundred sentences

Noun Phrases February 13, 2017 Next assignments Hundred noun phrases Hundred sentences Morphological analyzer The Universe Why does syntax exist? Around seven seconds of puffed air Communication between sentient beings puts a

990 views • 61 slides

CS4495/6495 Introduction to Computer Vision 2A-L1 Images as functions Images as functions Images

CS4495/6495 Introduction to Computer Vision 2A-L1 Images as functions Images as functions Images as functions Images as functions Images as functions Quiz An image can be thought of as: a) A 2-dimensional array of numbers ranging from some

522 views • 27 slides

Conceptual Nexus: Strategic Leadership and General Management Marius Oosthuizen GIBS: Strategic

Conceptual Nexus: Strategic Leadership and General Management Marius Oosthuizen GIBS: Strategic Foresight oosthuizenm@gibs.co.za $100 Billion One hundred billion, eight hundred eighty- five million, five hundred thousand USD Marius Oosthuizen

269 views • 11 slides

GENERAL PRESENTATION PROTECTION- CONTROL- IDENTIFICATION TRACKING 2506 RFID LABELS 02 What

2506 RFID LABELS GENERAL PRESENTATION PROTECTION- CONTROL- IDENTIFICATION TRACKING 2506 RFID LABELS 02 What they are 1/2 RFID labels are based on a Radio Frequency Identification technology. This technology allows to uniquely

259 views • 14 slides

Eco Labels in AEC Dr.Lunchakorn Prathumratana Thailand Environment Institute (TEI) Eco labels in

NSTDA Annual Conference 2013 Eco Labels in AEC Dr.Lunchakorn Prathumratana Thailand Environment Institute (TEI) Eco labels in AEC ASEAN and Type 1 Ecolabelling No . of No . of certified ASEAN GEN member standards products 45 2150

363 views • 23 slides

Background for Hundred Sentences and Morphology Assignments: Part 1 February 3, 2016 Next two

Background for Hundred Sentences and Morphology Assignments: Part 1 February 3, 2016 Next two assignments One hundred sentences in your language Build a Finite State Transducer that parses words into morphemes. Review from linguistics

594 views • 34 slides

Explorations in Bootstrapping Guided Search 8th Language and Computation Day Deirdre Lungley

Research Overview Interactive Experimentation Bootstrapping Experimentation Going Forward Explorations in Bootstrapping Guided Search 8th Language and Computation Day Deirdre Lungley dmlung@essex.ac.uk October 8, 2009 Deirdre Lungley

437 views • 28 slides

Improved Bootstrapping Approach in Multichannel Cognitive Radio Ad Hoc Networks The 4th Workshop

Improved Bootstrapping Approach in Multichannel Cognitive Radio Ad Hoc Networks The 4th Workshop of COST Action IC0902 October 9-11, 2013 Oleksandr (Alex) Artemenko, Paulo M. R. dos Santos Improved Bootstrapping Approach in Multichannel

287 views • 16 slides

SFU NatLangLab Bootstrapping via Graph Propagation Max Whitney Anoop Sarkar Simon Fraser

SFU NatLangLab Bootstrapping via Graph Propagation Max Whitney Anoop Sarkar Simon Fraser University Natural Language Laboratory http://natlang.cs.sfu.ca Bootstrapping Semi-supervised (vs supervised) Single domain (vs domain

1.58k views • 155 slides

INF5210 Information Infrastructure Class #11 Bootstrapping & Gateways Ben Eaton Dan Truong

INF5210 Information Infrastructure Class #11 Bootstrapping & Gateways Ben Eaton Dan Truong Le 30/10/2013 Discuss this weeks reading for class discussion Hanseth & Aanestad (2003) - Design as bootstrapping Hanseth (2002)

1.39k views • 26 slides

Community Shares Booster Programme Good Finance Live! GMCVO 27 June 2019 What were going to

Power to Change Community Shares Booster Programme Good Finance Live! GMCVO 27 June 2019 What were going to cover? 1. Introduction to Power to Change 2. What is a community business? 3. What are community shares? 4. Community Shares

464 views • 19 slides

Todd McCaskeys Senior Thesis The Pennsylvania State University Architectural Engineering

Todd McCaskeys Senior Thesis The Pennsylvania State University Architectural Engineering Department Construction Management Option Spring 2006 Todd McCaskey Spring 2006 Senior Thesis Construction Management Penn State University

407 views • 23 slides

Societys Disconnect from Nature mmo Cathy Wang, Chloe Thai, Julea Chin, & Sara Olsson

Societys Disconnect from Nature mmo Cathy Wang, Chloe Thai, Julea Chin, & Sara Olsson Interviews 8 interviewees 2 Swedish international students 2 working professionals 2 undergrads 2 grad students Insights from

303 views • 8 slides

HLTS Kickoff Presentation Derek Bridges Bobby Moore July 7, 2011 LIGO-G09xxxxx-v1 G1100147 1

HLTS Kickoff Presentation Derek Bridges Bobby Moore July 7, 2011 LIGO-G09xxxxx-v1 G1100147 1 Form F0900040-v1 HLTS Description HLTS vs. HSTS optic diameter 265 mm vs. 150 mm Suspends recycling mirror optics (PR3, SR3)

683 views • 39 slides

Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features Marc Schulder Michael

International Joint Conference on Natural Language Processing November 29, 2017 Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features Marc Schulder Michael Wiegand Josef Ruppenhofer Benjamin Roth Spoken Language Institute

874 views • 48 slides

BOA Bootstrapping the Linked Data Web Daniel Gerber, Axel-Cyrille Ngonga Ngomo AKSW,

http://aksw.org/files/boa.pdf BOA Bootstrapping the Linked Data Web Daniel Gerber, Axel-Cyrille Ngonga Ngomo AKSW, Universitt Leipzig http://www.volunteer-conservation-peru.org Motivation most knowledge bases extracted from

523 views • 19 slides

Sensors development for under sea water study Dr. Amporn Poyai Thai Microelectronics Center

Sensors development for under sea water study Dr. Amporn Poyai Thai Microelectronics Center (TMEC) National Electronics and Computer Technology Center (NECTEC) ISO9001:2000 Certified Contents 1. Why 2. Benchmark 3. Temperature sensor 4.

647 views • 35 slides

UNRAVELLING AND GUIDING THE MOLECULAR SELF-ASSEMBLY ON SURFACES An Ver Heyen February 2008

`ONTRAFELEN EN STUREN VAN MOLECULAIRE ZELF-ASSEMBLAGE OP SUBSTRATEN UNRAVELLING AND GUIDING THE MOLECULAR SELF-ASSEMBLY ON SURFACES An Ver Heyen February 2008 Overview Introduction Atomic force microscopy Experiments and

998 views • 69 slides