Deep Learning in the Connected Kitchen or Launching a Computer - - PowerPoint PPT Presentation

▶

Oct 14, 2023 207 likes •494 views

Deep Learning in the Connected Kitchen or Launching a Computer Vision program in a new vertical Hristo Bojinov, CTO Company Vision The Problem Food People disconnect Not-so-smart smart kitchen Food info not available, not

SLIDE 1

Deep Learning in the Connected Kitchen

“Launching a Computer Vision program in a new vertical”

Hristo Bojinov, CTO

SLIDE 2

SLIDE 3

Company Vision

SLIDE 4

The Problem

Food ↔ People disconnect Not-so-smart “smart kitchen” Food info not available, not actionable

SLIDE 5

What We Do

Food, personalization, technology “Give food a voice” (⇒ Computer Vision is essential)

Icons made by Madebyoliver, Popcorn Arts, Freepik from www.flaticon.com are licensed by CC 3.0 BY

SLIDE 6

SLIDE 7

Computer Vision at Innit

Helps us understand users

❖ Inventory, behaviors, multi-sensor fusion, market analytics ❖ And, build a delightful user experience

Applications in storage and processing

❖ Recognize and act on food state ❖ Visible light, depth, IR

SLIDE 8

Program Logistics

Multi-site program (HQ, academia) Food Recognition service (AWS)

❖ G2 instance backend (blend of CPU and GPU workload) ❖ Frontend orchestrates auto and manual processing ❖ Service API for 3rd party use

SLIDE 9

CV Tech: Food Recognition System

SLIDE 10

CV Tech: Food Recognition System

SLIDE 11

CV Tech: Food Recognition System

Data is King!

SLIDE 12

CV Tech: Object Detection Stage

SLIDE 13

CV Tech: Object Detection Stage

SLIDE 14

CV Tech: Object Detection Stage

SLIDE 15

CV Tech: Object Detection Stage

DetectNet ➔ Easy setup and initial training ➔ Python layers, “low resolution” Faster-RCNN ➔ Multi-phase training/tuning ➔ High resolution & recall 😁 DeepMask & SharpMask

SLIDE 16

CV Tech: Object Detection Stage

SLIDE 17

CV Tech: Classification Stage

SLIDE 18

CV Tech: Classification Stage

SLIDE 19

CV Tech: Classification Stage

SLIDE 20

CV Tech: Classification Stage

Controlled scene layout ⇒ precision In-house data collection and tools Command-line → DIGITS AlexNet → VGG

SLIDE 21

CV Tech: Product DB Image Retrieval

SLIDE 22

CV Tech: Product DB Image Retrieval

❖ Exact product (or attribute) matching ❖ KAZE descriptors (GPU acceleration WIP) ➢ Current need to balance CPU/GPU ➢ Order-of-magnitude acceleration ❖ Hierarchical analysis in the pipeline

SLIDE 23

CV Research: Training on Synthetic Sets

SLIDE 24

CV Research: Text Extraction

SLIDE 25

In a nutshell...

❖ Focus on differentiated capabilities, in the food space ❖ Tie in with all stages of human ↔ food interaction ❖ Fusion of images & other “sensors” ❖ GPU tech a strong enabler

SLIDE 26

Takeaways

❖ Objectives → domain constraints (good!) ❖ Sources of initial training+test data; build tools ❖ Hardware (local experiments OK, cloud for serving) ❖ Software (don’t get tied to a framework; abstract away)

SLIDE 27

We are hiring! 🚁 hristo@innit.com

SLIDE 28

Deep Learning in the Connected Kitchen

“Launching a Computer Vision program in a new vertical”

Hristo Bojinov, CTO

Company Vision

The Problem

Food ↔ People disconnect Not-so-smart “smart kitchen” Food info not available, not actionable

What We Do

Food, personalization, technology “Give food a voice” (⇒ Computer Vision is essential)

Computer Vision at Innit

Helps us understand users

❖ Inventory, behaviors, multi-sensor fusion, market analytics ❖ And, build a delightful user experience

Applications in storage and processing

❖ Recognize and act on food state ❖ Visible light, depth, IR

Program Logistics

Multi-site program (HQ, academia) Food Recognition service (AWS)

❖ G2 instance backend (blend of CPU and GPU workload) ❖ Frontend orchestrates auto and manual processing ❖ Service API for 3rd party use

CV Tech: Food Recognition System

CV Tech: Food Recognition System

CV Tech: Food Recognition System

Data is King!

CV Tech: Object Detection Stage

CV Tech: Object Detection Stage

CV Tech: Object Detection Stage

CV Tech: Object Detection Stage

DetectNet ➔ Easy setup and initial training ➔ Python layers, “low resolution” Faster-RCNN ➔ Multi-phase training/tuning ➔ High resolution & recall 😁 DeepMask & SharpMask

CV Tech: Object Detection Stage

CV Tech: Classification Stage

CV Tech: Classification Stage

CV Tech: Classification Stage

CV Tech: Classification Stage

Controlled scene layout ⇒ precision In-house data collection and tools Command-line → DIGITS AlexNet → VGG

CV Tech: Product DB Image Retrieval

CV Tech: Product DB Image Retrieval

❖ Exact product (or attribute) matching ❖ KAZE descriptors (GPU acceleration WIP) ➢ Current need to balance CPU/GPU ➢ Order-of-magnitude acceleration ❖ Hierarchical analysis in the pipeline

CV Research: Training on Synthetic Sets

CV Research: Text Extraction

In a nutshell...

❖ Focus on differentiated capabilities, in the food space ❖ Tie in with all stages of human ↔ food interaction ❖ Fusion of images & other “sensors” ❖ GPU tech a strong enabler

Takeaways

❖ Objectives → domain constraints (good!) ❖ Sources of initial training+test data; build tools ❖ Hardware (local experiments OK, cloud for serving) ❖ Software (don’t get tied to a framework; abstract away)

We are hiring! 🚁 hristo@innit.com

About Innit

❖ Inform and elevate the interaction between people and food ❖ 4+ years in the making, substantial funding, IP & tech ❖ Pirch SOHO, ShopWell

About the Speaker

❖ Embedded & Security ❖ Android, Computer Vision ❖ Computer technology at Innit