paper motivation
play

Paper Motivation Fixed geometric structures of CNN models CNNs are - PowerPoint PPT Presentation

Paper Motivation Fixed geometric structures of CNN models CNNs are inherently limited to model geometric transformations Higher-level features combine lower-level features at fixed positions as a weighted sum Pooling chooses


  1. Paper Motivation ● Fixed geometric structures of CNN models – “CNNs are inherently limited to model geometric transformations” ● Higher-level features combine lower-level features at fixed positions as a weighted sum ● Pooling chooses the dominating features / averages features at fixed positions Tomas Jenicek, CMP, CVUT 2

  2. Invariance to Geometric Transformations ● Learned from data augmentation ● Using transformation- invariant features and algorithms ● “Unknown or complex geometric transformations not learned or modeled” Tomas Jenicek, CMP, CVUT 3

  3. Standard Convolution and RoI Pooling ● Convolution samples feature map at fixed locations ● RoI pooling reduces the spatial resolution at a fixed ratio ● “The higher the layer, the less desired behaviour” Tomas Jenicek, CMP, CVUT 4

  4. Deformable Convolution ● Adds 2D offset to the regular grid sampling locations ● Free form deformation of the sampling grid Tomas Jenicek, CMP, CVUT 5

  5. Deformable Convolution ● Offsets are learned from the preceding feature maps via additional convolutional layers Tomas Jenicek, CMP, CVUT 6

  6. Deformable RoI Pooling ● Adds 2D offset to each bin position in the regular bin partition ● Adaptive part localization for objects with different shapes Tomas Jenicek, CMP, CVUT 7

  7. Deformable RoI Pooling ● Offsets are learned from the preceding feature maps via additional RoI and a fully connected layer Tomas Jenicek, CMP, CVUT 8

  8. Deformable Position-Sensitive RoI Pooling ● Differs by having a different set of feature maps for each bin position Tomas Jenicek, CMP, CVUT 9

  9. Deformable Convolution and RoI Pooling Summary ● Inference: offsets depend on the input features ● Learning: offsets are learned from data ● Filters are differentiable Tomas Jenicek, CMP, CVUT 10

  10. Method Details ● Offsets are fractional → bilinear interpolation ● For (PS) RoI pooling, normalized offsets must be used ● The number of additional parameters – Convolution and RoI pooling: – PS RoI pooling: ● Learning rate for offsets can be different Tomas Jenicek, CMP, CVUT 11

  11. PS RoI Offsets Examples ● One 3x3 deformable PS RoI pooling layer ● Input: a bounding box with a label Tomas Jenicek, CMP, CVUT 12

  12. PS RoI Offsets Examples Tomas Jenicek, CMP, CVUT 13

  13. Conv Offsets Examples ● Three consecutive 3x3 deformable convolutional layers = 9^3 points Tomas Jenicek, CMP, CVUT 14

  14. Conv Example – Man and a Goat ● Blue dots – standard convolution sample locations ● Red dots – deformable convolution sample locations ● For 1, 2 and 3 consequent layers Tomas Jenicek, CMP, CVUT 15

  15. Conv Example – Man and a Goat ● Center of convolution on a man, sky and grass ● For 3 consequent layers Tomas Jenicek, CMP, CVUT 16

  16. Conv Example – Man and a Goat ● The magnitude of offsets ● For 3 consequent layers – res5a, res5b and res5c Tomas Jenicek, CMP, CVUT 17

  17. Conv Example – Man and a Goat ● The anisotropic scale HSV visualization ● Red – horizontal, Green – vertical ● For 3 consequent layers Tomas Jenicek, CMP, CVUT 18

  18. Conv Example – Man and a Goat ● Offsets HSV visual. ● For 3 layers Tomas Jenicek, CMP, CVUT 19

  19. Conv Example – Cars ● The magnitude of offsets ● For 3 consequent layers ● The foreground- background separation can be seen Tomas Jenicek, CMP, CVUT 20

  20. Affine Transformation Approximation ● The “unknown and complex” transformation was approximated by an affine transformation ● Format is MEAN (STD), the first is vertical axis ● Unit is pixels in the feature map Man and a Goat Cars Mean squared error 3.1 (1.5) 2.7 (1.4) Scale 3.4, 3.7 (0.8, 1.1) 2.9, 3.6 (1.0, 1.1) Translation 0.8, 0.0 (1.3, 0.2) 0.3, 0.0 (1.2, 0.1) Rotation -0.1 (0.0) -0.1 (0.0) Shear 0.0 (0.0) 0.0 (0.0) ● Other tested images had similar results Tomas Jenicek, CMP, CVUT 21

  21. Statistics of Learned Scale - Effective Dilation ● The mean of the distances between all adjacent pairs of sampling locations in the deformable convolution filter Tomas Jenicek, CMP, CVUT 22

  22. Remarks ● The shift is a function of feature maps and not constrained by any (e.g. affine) transformation ● surprisingly no need for shift regularization Tomas Jenicek, CMP, CVUT 23

  23. Relation to Deformable Part Models ● Maximizing the similarity of parts while minimizing the inter- part connection cost ● Inference can be converted to CNN, learning not end-to-end ● Deformable convolutions: no spatial relations between parts, unlimited in modeling deformations Tomas Jenicek, CMP, CVUT 24

  24. Relation to Spatial Transform Networks 1. Localization net ● Input: feature map ● Output: affine transformation 2. Grid generator ● Generate a sampling grid according to transformation 3. Sampler Tomas Jenicek, CMP, CVUT 25

  25. Relation to Spatial Transform Networks ● Can be inserted between any two layers ● Deformable convolutions: – No global parametric transformation – Easier training Tomas Jenicek, CMP, CVUT 26

  26. Relation to Atrous / Dilated Convolutions ● Exponential expansion of the receptive field ● Deformable convolutions: input-dependent and learnable dilated convolution ● Both can replace filters with larger receptive field while constraining their connectivity Tomas Jenicek, CMP, CVUT 27

  27. Relation to Active Convolution ● Learning the shape of convolution during training ● Deformable convolutions: input-dependent offsets Tomas Jenicek, CMP, CVUT 28

  28. Relation to Dynamic Filter Network ● Weights for convolution are generated from the input feature map ● Deformable convolutions: the same but for offsets Tomas Jenicek, CMP, CVUT 29

  29. Their Task ● Semantic segmentation ● Object detection Tomas Jenicek, CMP, CVUT 30

  30. Their Setup SoA object detection and semantic segmentation CNNs: 1. Deep network generates feature maps – Replace last 3 conv layers with deformable 2. Shallow task specific network generates results – Replace (PS) RoI pooling with deformable Convolutions and offsets are learned simultaneously Tomas Jenicek, CMP, CVUT 31

  31. Results ● Object detection – VOC 07: 82.3 vs. 79.6 mAP@0.5 – COCO: 56.8 vs. 54.3 mAP@0.5 ● Semantic segmentation – Cityscapes: 75.2 vs. 70.3 mIoU – VOC 12: 75.9 vs. 70.7 mIoU ● Others’ results – COCO (with Soft-NMS): 62.8 mAP@0.5 Tomas Jenicek, CMP, CVUT 32

  32. Paper Evaluation – Formal Objections ● Page 2 formula (2) - notation is misleading since depends on ● Page 3 paragraph 3 – scalar gamma further scales normalized offsets, empirically set to 0.1 ● Page 5 figure 4 – figure is misleading, the output feature map has depth (C+1) Tomas Jenicek, CMP, CVUT 33

  33. Paper Evaluation - Subjective Objections ● Page 3 paragraph 1 and 2 – notation is ambiguous ● Max pooling application is missing Tomas Jenicek, CMP, CVUT 34

  34. References ● Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. ● Jeon, Yunho, and Junmo Kim. "Active Convolution: Learning the Shape of Convolution for Image Classification." arXiv preprint arXiv:1703.09076 (2017). ● Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122 (2015). ● Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." IEEE transactions on pattern analysis and machine intelligence 32.9 (2010): 1627-1645. ● De Brabandere, Bert, et al. "Dynamic filter networks." Neural Information Processing Systems (NIPS). 2016. Tomas Jenicek, CMP, CVUT 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend