Turbo-boosting Neural Networks for Object Detection Hongyang Li - PowerPoint PPT Presentation

S9551 | Mar 20, 2019 | 14:00 pm, RM 231 Turbo-boosting Neural Networks for Object Detection Hongyang Li The Chinese University of Hong Kong / Microsoft Research Asia

Research Timeline Hongyang Ph.D. student start 2015 CUHK Ph.D. candidate / ImageNet Challenge (PAMI) , Object Attributes (ICCV) 2015 Microsoft Intern Multi-bias Activation (ICML) 2016 Recurrent Design for Detection (ICCV) , COCO Loss (NIPS) 2017 2018 Zoom-out-and-in Network (IJCV) , Capsule Nets (ECCV) 2019 Feature Intertwiner (ICLR) , Few-shot Learning (CVPR) First-author Papers

Outline 1. Introduction to Object Detection a. Pipeline overview b. Dataset and evaluation c. Popular methods d. Existing problems 2. Solution: A Feature Intertwiner Module 3. Detection in Reality a. Implementation on GPUs b. Efficiency and accuracy tradeoff 4. Future of Object Detection

1. Introduction to Object Detection

Object Detection: core and fundamental task in computer vision He et al. Mask-RCNN ICCV 2017 Best paper

Object Detection is everywhere OBJECT DETECTION

How to solve it? A naive solution: place many boxes on top of image/feature maps and classify them ! Not person person

How to solve it? And yet challenges are: Helmet Cotton Hat baseball person 2. Ambiguity in cluttered scenarios 1. Variations in shape/appearance/size

How to solve it? (a) Place anchors as many as possible and (b) have layers deeper and deeper. (a) place anchors (b) network design

Popular methods at a glance Pipeline/system design Component/structure/loss design One-stage: Feature Pyramid Network Focal loss (RetinaNet) YOLO and variants SSD and variants Online hard negative mining (OHEM) Zoom-out-and-in Network (ours) Two-stage: Recurrent Scale Approximation (ours) R-CNN family Feature Intertwiner (ours) (Fast RCNN, Faster RCNN, etc)

Pipeline: a roadmap of R-CNN family (two-stage detector) P _l is the feature map output at level l ; P _m is from a higher level m . level l level m ...

Pipeline: a roadmap of R-CNN family (two-stage detector) P _l is the feature map output at level l ; P _m is from a higher level m . level l level m RoI RoI output (fixed size) ... Small anchors cropped out of P _l

Pipeline: a roadmap of R-CNN family (two-stage detector) P _l is the feature map output at level l ; P _m is from a higher level m . level l level m RoI ... Person detected!

Pipeline: a roadmap of R-CNN family (two-stage detector) P _l is the feature map output at level l ; P _m is from a higher level m . level l level m RoI ... Person detected! RoI Large anchors cropped out of P _m

Pipeline: a roadmap of R-CNN family (two-stage detector) P _l is the feature map output at level l ; P _m is from a higher level m . level l level m RPN loss RoI ... Person detected! RoI RPN loss

Side: what is RoI (region of interest) operation? Arbitrary Fixed RoI* size of *Achieved by pooling; size No learned parameters here feature map output Many variants of RoI operations RPN loss RoI ... Person detected! RoI RPN loss

R-CNN family (two-stage detector) vs. YOLO (one -stage detector) RPN loss Image size can RoI vary Two stage: R-CNN family ... K -class cls. problem (dog, cat, etc) RoI RPN: Two-class cls. problem RPN loss (object or not?)

R-CNN family (two-stage detector) vs. YOLO (one -stage detector) RPN loss Image size can RoI vary More accurate Two stage: R-CNN family ... K -class cls. problem (dog, cat, etc) RoI RPN: Two-class cls. problem RPN loss (object or not?) Image size can NOT vary Faster One stage: Multiple K -class classifiers YOLO/SSD (dog, cat, etc) ...

Both R-CNN and SSD models have been tremendously adopted in academia/industry. In this talk, we focus on the two-stage detector with RoI operation.

Datasets COCO dataset http://mscoco.org/ YouTube-8M dataset https://research.google.com/youtube8m/ And many others ImageNet, VisualGenome, Pascal VOC, KITTI, etc.

Evaluation - mean AP For category person , Get a set of Correct/incorrect predictions, compute the precision/recall. Get the average precision (AP) from the precision/recall figure. Done. Ground truth Get all categories, prediction that’s mAP (under threshold ). If IoU (intersection / union) = 0.65 > threshold, Then current prediction is counted as Correct

What is uncomfortable in current pipelines? Large objects RoI input 40 → 20 Accurate features in down-sampling ! Inaccurate features due to up-sampling ! RoI input 7 → 20 Small objects Assume RoI’s output is 20

What percentage of objects suffer from this? Table 3 in our paper. Proposal assignment on each level before RoI operation. ‘below #’ indicates how many proposals are there whose size is below the size of RoI output. We define small set to be the anchors on current level and large set to be all anchors above current level.

2. Solution: A Feature Intertwiner Module

Our assumption same!!! Semantic feature Visual feature The semantic features among instances (large or small) within the same class should be the same .

Our motivation Suppose we have two sets of features already - one is from large objects and the other is from small ones. Naive feature intertwiner concept: Intuition: let reliable features Inaccurate supervise/guide the learning of the maps/features less reliable ones.

The Feature Intertwiner For small objects Make-up layer: fuel back the lost information during RoI and compensate necessary details for small instances. (one conv. layer) Cls. loss Reg. loss (bbox) For current level l

The Feature Intertwiner For large objects Input to Intertwiner Intertwiner loss Critic layer: transfer features to a larger channel size and reduce spatial size to one. (two conv. layers) Cls. loss Reg. loss (bbox) For current level l

The Feature Intertwiner Input to Intertwiner Intertwiner loss Cls. loss Reg. loss (bbox) Total loss = (Intertwiner+cls.+reg.) for all levels For current level l

The Feature Intertwiner Anchors are placed at various levels. What if there are no large instances in this mini-batch, for the current level? We define small set to be the anchors on current level and large set to be all anchors above current level.

The Feature Intertwiner - class buffer We use a class buffer to store the accurate feature set from large instances. For level l For all levels Feature Intertwiner Historical logger Inter. loss Level 2 Level 3 ... How to generate the buffer? One simple idea is to Take the average of features of all large objects during training.

Discussions on Feature Intertwiner Historical logger Inter. loss the intertwiner is proposed to optimize feature ● learning of the less reliable set . During test, the green part will be removed. For inference can be seen as a teacher-student guidance in the ● self-supervised domain. detach the gradient update in buffer will obtain ● better results. “Soft targets”, similarly as in RL (replay memory). The buffer is level-agnostic . Improvements over all ● levels/sizes of objects are observed.

The Feature Intertwiner - choosing optimal feature maps How to choose the appropriate maps for large objects? as input to intertwiner For level l For all levels Inter. loss One simple solution is to (a) Use the feature map directly on current level. This is inappropriate. why? We define small set to be the anchors on current level and large set to be all anchors above current level.

The Feature Intertwiner - choosing optimal feature maps How to choose the appropriate maps for large objects? as input to intertwiner Other options are (b) use the feature maps on higher level. (c) upsample higher-level maps to current level, with learnable parameters (or not). We will empirically analyze these later.

The Feature Intertwiner - choosing optimal feature maps How to choose the appropriate maps for large objects? as input to intertwiner Our final option is based on (c) (d), build a better alignment between the upsampled feature map with current map.

The Feature Intertwiner - choosing optimal feature maps How to choose the appropriate maps for large objects? as input to intertwiner Our final option is based on (c) (d), build a better alignment between the upsampled feature map with current map. The approach is Optimal transport (OT). In a nutshell, OT is to optimally move one distribution ( P _ m |l) to the other ( P _l).

The Feature Intertwiner - choosing optimal feature maps How to choose the appropriate maps for large objects? as input to intertwiner Our final option is based on (c) (d), build a better alignment between the upsampled feature map with current map. The approach is Optimal transport (OT). In a nutshell, OT is to optimally move one distribution ( P _ m |l) to the other ( P _l). Q is a cost matrix (distance) P is a proxy matrix satisfying some constraint.

The Feature Intertwiner - choosing optimal feature maps How to choose the appropriate maps for large objects? as input to intertwiner How to compute = Optimal transport (OT). P m F H Q Cost matrix P Sinkhorn iterate OT loss

Turbo-boosting Neural Networks for Object Detection Hongyang Li - PowerPoint PPT Presentation

S9551 | Mar 20, 2019 | 14:00 pm, RM 231 Turbo-boosting Neural Networks for Object Detection Hongyang Li The Chinese University of Hong Kong / Microsoft Research Asia Research Timeline Hongyang Ph.D. student start 2015 CUHK Ph.D. candidate

Turbo Codes and Turbo-Coded Modulation Turbo Codes and Turbo-Coded Modulation in CDMA Mobile

TURBO PARTS Technical Information Brochure www.meat-doria.com TURBO TURBO introduction As the

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Under the Counter Over the Top UCE - 9000 Turbo UCE - 11 Turbo Introducing the all new state of

VIP turbo / After burner Developing on Afterburner VIP turbo and After burner HIB HIB Modem

TURBO PARTS Technical Information Brochure www.hoffer-products.com TURBO introduction As the

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

TURBO SALES & LEASING Turbo Sales & Leasing is based north of Atlanta in Gainesville,

Duo-Binary Circular Turbo Decoder Based on Border Metric Encoding for WiMAX for WiMAX Ji-Hoon

TURBO BEARINGS (P) Ltd. RAJKOT A BRIEF PRESENTATION ON FACILITIES TURBO BEARINGS (P) Ltd.

Turbo Finance 9 Investor Presentation Contents Overview of Aldermore Group 4 Executive Summary

ATLANTIC ACQUISITION CORP. Acquisition of HF Group Holding Corporation Investor Presentation

Preliminary Results for FY 2017 Christoph Vilanek, CEO and Joachim Preisig, CFO 1 March 2018 |

ZMCL Venturing into Radio Broadcasting Business 23 rd November 2016 Transaction Summary

LLOYD MEDIA GROUP Specializing In Integrated Communications Solutions Capabilities Deck

TKN OPPORTUNITY DAY Taokaenoi Food & Marketing Public Company Limited PRESENTATION AGENDA

VETTING INSPECTIONS - EFFECTIVE PREPARATION AND AN INSPECTORS PERSPECTIVE Thursday, 20 th

NO-TILL & MINIMUM-TILL ORGANIC VEGETABLE AND FRUIT PRODUCTION TRIAL AND ERROR Cedar Circle

Toward Sustainable Landscape Design Special Request from the fish of the Doan Brook & Lake

Turbo-boosting Neural Networks for Object Detection Hongyang Li - PowerPoint PPT Presentation

S9551 | Mar 20, 2019 | 14:00 pm, RM 231 Turbo-boosting Neural Networks for Object Detection Hongyang Li The Chinese University of Hong Kong / Microsoft Research Asia Research Timeline Hongyang Ph.D. student start 2015 CUHK Ph.D. candidate

Turbo Codes and Turbo-Coded Modulation Turbo Codes and Turbo-Coded Modulation in CDMA Mobile

TURBO PARTS Technical Information Brochure www.meat-doria.com TURBO TURBO introduction As the

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Under the Counter Over the Top UCE - 9000 Turbo UCE - 11 Turbo Introducing the all new state of

VIP turbo / After burner Developing on Afterburner VIP turbo and After burner HIB HIB Modem

TURBO PARTS Technical Information Brochure www.hoffer-products.com TURBO introduction As the

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

TURBO SALES &amp; LEASING Turbo Sales &amp; Leasing is based north of Atlanta in Gainesville,

Duo-Binary Circular Turbo Decoder Based on Border Metric Encoding for WiMAX for WiMAX Ji-Hoon

TURBO BEARINGS (P) Ltd. RAJKOT A BRIEF PRESENTATION ON FACILITIES TURBO BEARINGS (P) Ltd.

Turbo Finance 9 Investor Presentation Contents Overview of Aldermore Group 4 Executive Summary

ATLANTIC ACQUISITION CORP. Acquisition of HF Group Holding Corporation Investor Presentation

Preliminary Results for FY 2017 Christoph Vilanek, CEO and Joachim Preisig, CFO 1 March 2018 |

ZMCL Venturing into Radio Broadcasting Business 23 rd November 2016 Transaction Summary

LLOYD MEDIA GROUP Specializing In Integrated Communications Solutions Capabilities Deck

TKN OPPORTUNITY DAY Taokaenoi Food &amp; Marketing Public Company Limited PRESENTATION AGENDA

VETTING INSPECTIONS - EFFECTIVE PREPARATION AND AN INSPECTORS PERSPECTIVE Thursday, 20 th

NO-TILL &amp; MINIMUM-TILL ORGANIC VEGETABLE AND FRUIT PRODUCTION TRIAL AND ERROR Cedar Circle

Toward Sustainable Landscape Design Special Request from the fish of the Doan Brook &amp; Lake

TURBO SALES & LEASING Turbo Sales & Leasing is based north of Atlanta in Gainesville,

TKN OPPORTUNITY DAY Taokaenoi Food & Marketing Public Company Limited PRESENTATION AGENDA

NO-TILL & MINIMUM-TILL ORGANIC VEGETABLE AND FRUIT PRODUCTION TRIAL AND ERROR Cedar Circle

Toward Sustainable Landscape Design Special Request from the fish of the Doan Brook & Lake