Recent Progress on CNNs for Object Detection & Image Compression - PowerPoint PPT Presentation

Recent Progress on CNNs for Object Detection & Image Compression Rahul Sukthankar Google Research Confidential + Proprietary

Credits: My Research Group at Google Lifelong Learning Object Detection ++ Learning from Video NN Compression Individual Explorers - Vitto Ferrari (TL) - Kevin Murphy (TL) - Susanna Ricco (TL) - George Toderici (TL) - Chunhui Gu - Danfeng Qin - Alireza Fathi - Alexey Vorobyov - Damien Vincent - Ian Fischer - Hassan Rom - Anoop Korattikara - Bryan Seybold - David Minnen - Mohamad Tarifi - Jasper Uijlings - Chen Sun - Dave Marwood - Joel Shor - Noah Snavely - Stefan Popov - George Papandreou - David Ross - Nick Johnston - Shumeet Baluja - Hyun Oh Song - Sudheendra - Michele Covell - Jonathan Huang Vijayanarasimhan - Saurabh Singh 3D People/VR/AR Part-Time Faculty - Nathan Silberman - Sung Jin Hwang - Chris Bregler (TL) - Abhinav Gupta Event Understanding - Sergio Guadarrama - Avneesh Sud - Irfan Essa - Caroline - Tyler Zhu - Christian Frueh - Jitendra Malik NN Theorem Proving Pantofaru (TL) - Vivek Rathod - Diego Ruspini - Kate Fragkiadaki - Christian Szegedy (TL) - Arthur Wait - Nick Dufour [+ Noah & Vitto] - Alex Alemi - Cheol Park - Nori Kanazawa - Niklas Een - Eric Nichols - Vivek Kwatra - Sarah Loos - Radhika Marvin - Shrenik Lad - Vinay Bettadapura Confidential + Proprietary

Credits: My Research Group at Google Lifelong Learning Object Detection ++ Learning from Video NN Compression Individual Explorers - Vitto Ferrari (TL) - Kevin Murphy (TL) - Susanna Ricco (TL) - George Toderici (TL) - Chunhui Gu - Danfeng Qin - Alireza Fathi - Alexey Vorobyov - Damien Vincent - Ian Fischer - Hassan Rom - Anoop Korattikara - Bryan Seybold - David Minnen - Mohamad Tarifi - Jasper Uijlings - Chen Sun - Dave Marwood - Joel Shor - Noah Snavely - Stefan Popov - George Papandreou - David Ross - Nick Johnston - Shumeet Baluja - Hyun Oh Song - Sudheendra - Michele Covell - Jonathan Huang Vijayanarasimhan - Saurabh Singh 3D People/VR/AR Part-Time Faculty - Nathan Silberman - Sung Jin Hwang - Chris Bregler (TL) - Abhinav Gupta Event Understanding - Sergio Guadarrama - Avneesh Sud - Irfan Essa - Caroline - Tyler Zhu - Christian Frueh - Jitendra Malik NN Theorem Proving Pantofaru (TL) - Vivek Rathod - Diego Ruspini - Kate Fragkiadaki - Christian Szegedy (TL) - Arthur Wait - Nick Dufour Part 1 [+ Noah & Vitto] - Alex Alemi - Cheol Park - Nori Kanazawa - Niklas Een - Eric Nichols - Vivek Kwatra - Sarah Loos - Radhika Marvin - Shrenik Lad - Vinay Bettadapura Confidential + Proprietary

Credits: My Research Group at Google Lifelong Learning Object Detection ++ Learning from Video NN Compression Individual Explorers - Vitto Ferrari (TL) - Kevin Murphy (TL) - Susanna Ricco (TL) - George Toderici (TL) - Chunhui Gu - Danfeng Qin - Alireza Fathi - Alexey Vorobyov - Damien Vincent - Ian Fischer - Hassan Rom - Anoop Korattikara - Bryan Seybold - David Minnen - Mohamad Tarifi - Jasper Uijlings - Chen Sun - Dave Marwood - Joel Shor - Noah Snavely - Stefan Popov - George Papandreou - David Ross - Nick Johnston - Shumeet Baluja - Hyun Oh Song - Sudheendra - Michele Covell - Jonathan Huang Vijayanarasimhan - Saurabh Singh 3D People/VR/AR Part-Time Faculty - Nathan Silberman - Sung Jin Hwang - Chris Bregler (TL) - Abhinav Gupta Event Understanding - Sergio Guadarrama - Avneesh Sud - Irfan Essa Part 2 - Caroline - Tyler Zhu - Christian Frueh - Jitendra Malik NN Theorem Proving Pantofaru (TL) - Vivek Rathod - Diego Ruspini - Kate Fragkiadaki - Christian Szegedy (TL) - Arthur Wait - Nick Dufour [+ Noah & Vitto] - Alex Alemi - Cheol Park - Nori Kanazawa - Niklas Een - Eric Nichols - Vivek Kwatra - Sarah Loos - Radhika Marvin - Shrenik Lad - Vinay Bettadapura Confidential + Proprietary

Part 1: Object Detection Huang, Rathod, Sun, Zhu, Korattikara, Fathi, Fischer, Wojna, Song, Guadarrama, and Murphy, “Speed/accuracy trade-offs for modern convolutional object detectors” https://arxiv.org/abs/1611.10012 Confidential + Proprietary

Object Detection Confidential + Proprietary

Object Detection For a given set of object categories, Battery mark each instance with a bounding box and a category label Confidential + Proprietary

Bullet Object Detection Bullet For a given set of object categories, Battery mark each instance with a bounding box and a category label Can add object categories Confidential + Proprietary

7.62x51mm NATO cartridge Object Detection 5.56x45mm NATO cartridge For a given set of object categories, AA Battery mark each instance with a bounding box and a category label Can add more object categories (fine grained recognition) Confidential + Proprietary

Object Detection For a given set of object categories, mark each instance with a bounding box and a category label Becomes very challenging in complex scenes due to object size, clutter and partial occlusion Confidential + Proprietary

Object Detection -- Sampling of Key Ideas - Dense sliding windows -- searching over x, y, scale - Neural net based face detection [Rowley et al., 1995] - Classifier cascade, efficient ``integral image’’ features [Viola & Jones, 2001] - HoG + SVM for pedestrian detection [Dalal & Triggs, 2005] - Deformable part models [Felzenszwalb et al., 2010] - Proposals (selective search) vs. sliding windows [e.g., van de Sande et al., 2011] {overcomes issue of densely sampling x, y, scale + aspect ratio} - Return of neural nets -- learned feature extractors [Krizhevsky et al., 2012] - Current generation of object detectors -- pioneered by Multibox and R-CNN. Confidential + Proprietary

Typical Modern Approach: Predict Region Offset & Classify Classify regions as foreground or Object background. Predict offset for positive patches. Classify foreground ● Predicting bounding box offset is a counterintuitive concept regions into 1 of C ● How to select the initial boxes (often called anchors )? classes. Lizard: 0.8 ○ External process (R-CNN) Frog: 0.1 ○ Clustering ground truth boxes (Multibox) Dog: 0.1 ○ Dense grid (now popular) ● Interesting connection to sliding windows and object proposals Confidential + Proprietary

Typical Modern Approach: Predict Region Offset & Classify Classify regions as foreground or Object background. Predict offset for positive patches. Classify foreground regions into 1 of C classes. Lizard: 0.8 Frog: 0.1 Dog: 0.1 Confidential + Proprietary

Aside: What is a Neural Network? Magic box Numbers you have Numbers you want Learns from lots of data using gradient and grad student descent Confidential + Proprietary

Aside: What is a Neural Network? Magic box [0.01,…,0.76,…, 0.14] bicycle building forest Numbers you have (e.g., RGB pixels) Trained on a large labeled dataset like ImageNet Confidential + Proprietary

Aside: What is a Convolutional Neural Network? CNN Cuboid of numbers Cuboid of numbers (X x Y x D) (X’ x Y’ x D’) ● Patch-to-patch mapping ● Shared weights (shift invariant) ● Retinal connectivity (local support) Confidential + Proprietary

Components of Modern Object Detection Systems 1. Feature Extractor Input: RGB pixels Output: a feature vector of numbers for each patch 2. Proposal Generator Input: feature vector Output: objectness classifier -- foreground or background? Output: bounding box regression -- where? 3. Box Classifier -- can be combined with (2) Input: features for cropped box Output: multi-way classifier -- what class is this object? Output: bounding box refinement -- how to adjust box to be on object Confidential + Proprietary

Object Detection Meta-Architecture Type 1: Single-Shot Detector (SSD) & variants [Liu et al., 2015] Confidential + Proprietary

Object Detection Meta-Architecture Type 2: Faster R-CNN & variants [Ren et al., 2015] Confidential + Proprietary

Object Detection Meta-Architecture Type 3: Region-Based Fully Convolutional (R-FCN) [Dai et al., 2015] Confidential + Proprietary

Wide Choice of Feature Extractors Accuracy on ImageNet vs. model size Confidential + Proprietary

Build Your Own Object Detector -- Lots of Combinations! Meta Architecture Feature Extractor Other Important Choices 1. SSD 1. Inception Resnet V2 ● Input: low-res, hi-res 2. Faster R-CNN 2. Inception V2 ● Match: argmax, bipartite,... 3. R-FCN 3. Inception V3 ● Location loss: smooth L1, 4. MobileNet Bounding box encoding ● 5. Resnet 101 ● Stride 6. VGG 16 ● # Proposals ● Other hyperparameters... [Huang et al.] evaluate ~150 combinations in the paper! Confidential + Proprietary

mAP vs. Computation Confidential + Proprietary

mAP vs. Computation Optimality “Frontier” Models below the curve are generally dominated, both in accuracy & speed Focus discussion on the ones close to the curve Confidential + Proprietary

mAP vs. Computation Meta architecture SSD models are fastest Faster R-CNN is slow but more accurate Dropping #proposals makes Faster R-CNN fast w/o much mAP drop R-FCN is close to that sweet spot Confidential + Proprietary

Recent Progress on CNNs for Object Detection & Image Compression - PowerPoint PPT Presentation

Recent Progress on CNNs for Object Detection & Image Compression Rahul Sukthankar Google Research Confidential + Proprietary Credits: My Research Group at Google Lifelong Learning Object Detection ++ Learning from Video NN Compression

From image classification to object detection Image classification Object detection Image source

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Recent Progress in Object Detection Jiaqi Wang Multimedia Laboratory The Chinese University of

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Deep Learning in Image Processing Topics: Image Filtering 101 CNNs 101 Image

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Object Detection JunYoung Gwak 1 Motivation Image classification Input: Image

A challenge of packing CSS-sprites J.Marsza lkowski, J.Mizgajski, D.Mokwa, M.Drozdowski

JPEG CODING STANDARD Laboratory session Fernando Pereira Instituto Superior Tcnico

YHDP Round 3 New Project Application June 3, 2020 Lena McGinn, ICF Jen Best, ICF In

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai

Advanced Service Worker / PWA with Google Workbox Patrik Bschenstein, Senior Consultant

An Introduction to XHTM L Print Presented to the W 3C Print Symposium 2006 Dean Anderson,

A brief review of similar attempts .. In the paper Using very deep autoencoders for

the quality in quantity - enhancing text-based research Bernie cs, National Center for

Recent Progress on CNNs for Object Detection & Image Compression - PowerPoint PPT Presentation

Recent Progress on CNNs for Object Detection & Image Compression Rahul Sukthankar Google Research Confidential + Proprietary Credits: My Research Group at Google Lifelong Learning Object Detection ++ Learning from Video NN Compression

From image classification to object detection Image classification Object detection Image source

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Recent Progress in Object Detection Jiaqi Wang Multimedia Laboratory The Chinese University of

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye &amp; Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Deep Learning in Image Processing Topics: Image Filtering 101 CNNs 101 Image

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Object Detection JunYoung Gwak 1 Motivation Image classification Input: Image

A challenge of packing CSS-sprites J.Marsza lkowski, J.Mizgajski, D.Mokwa, M.Drozdowski

JPEG CODING STANDARD Laboratory session Fernando Pereira Instituto Superior Tcnico

YHDP Round 3 New Project Application June 3, 2020 Lena McGinn, ICF Jen Best, ICF In

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai

Advanced Service Worker / PWA with Google Workbox Patrik Bschenstein, Senior Consultant

An Introduction to XHTM L Print Presented to the W 3C Print Symposium 2006 Dean Anderson,

A brief review of similar attempts .. In the paper Using very deep autoencoders for

the quality in quantity - enhancing text-based research Bernie cs, National Center for

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung