SLIDE 1 Deep Learning in the Connected Kitchen
“Launching a Computer Vision program in a new vertical”
Hristo Bojinov, CTO
SLIDE 2
SLIDE 3
Company Vision
SLIDE 4
The Problem
Food ↔ People disconnect Not-so-smart “smart kitchen” Food info not available, not actionable
SLIDE 5 What We Do
Food, personalization, technology “Give food a voice” (⇒ Computer Vision is essential)
Icons made by Madebyoliver, Popcorn Arts, Freepik from www.flaticon.com are licensed by CC 3.0 BY
SLIDE 6
SLIDE 7
Computer Vision at Innit
Helps us understand users
❖ Inventory, behaviors, multi-sensor fusion, market analytics ❖ And, build a delightful user experience
Applications in storage and processing
❖ Recognize and act on food state ❖ Visible light, depth, IR
SLIDE 8
Program Logistics
Multi-site program (HQ, academia) Food Recognition service (AWS)
❖ G2 instance backend (blend of CPU and GPU workload) ❖ Frontend orchestrates auto and manual processing ❖ Service API for 3rd party use
SLIDE 9
CV Tech: Food Recognition System
SLIDE 10
CV Tech: Food Recognition System
SLIDE 11
CV Tech: Food Recognition System
Data is King!
SLIDE 12
CV Tech: Object Detection Stage
SLIDE 13
CV Tech: Object Detection Stage
SLIDE 14
CV Tech: Object Detection Stage
SLIDE 15
CV Tech: Object Detection Stage
DetectNet ➔ Easy setup and initial training ➔ Python layers, “low resolution” Faster-RCNN ➔ Multi-phase training/tuning ➔ High resolution & recall 😁 DeepMask & SharpMask
SLIDE 16
CV Tech: Object Detection Stage
SLIDE 17
CV Tech: Classification Stage
SLIDE 18
CV Tech: Classification Stage
SLIDE 19
CV Tech: Classification Stage
SLIDE 20
CV Tech: Classification Stage
Controlled scene layout ⇒ precision In-house data collection and tools Command-line → DIGITS AlexNet → VGG
SLIDE 21
CV Tech: Product DB Image Retrieval
SLIDE 22
CV Tech: Product DB Image Retrieval
❖ Exact product (or attribute) matching ❖ KAZE descriptors (GPU acceleration WIP) ➢ Current need to balance CPU/GPU ➢ Order-of-magnitude acceleration ❖ Hierarchical analysis in the pipeline
SLIDE 23
CV Research: Training on Synthetic Sets
SLIDE 24
CV Research: Text Extraction
SLIDE 25
In a nutshell...
❖ Focus on differentiated capabilities, in the food space ❖ Tie in with all stages of human ↔ food interaction ❖ Fusion of images & other “sensors” ❖ GPU tech a strong enabler
SLIDE 26
Takeaways
❖ Objectives → domain constraints (good!) ❖ Sources of initial training+test data; build tools ❖ Hardware (local experiments OK, cloud for serving) ❖ Software (don’t get tied to a framework; abstract away)
SLIDE 27
We are hiring! 🚁 hristo@innit.com
SLIDE 28
About Innit
❖ Inform and elevate the interaction between people and food ❖ 4+ years in the making, substantial funding, IP & tech ❖ Pirch SOHO, ShopWell
About the Speaker
❖ Embedded & Security ❖ Android, Computer Vision ❖ Computer technology at Innit