lecture 13
play

Lecture 13: Segmentation and Attention Fei-Fei Li & Andrej - PowerPoint PPT Presentation

Lecture 13: Segmentation and Attention Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - Lecture 13 - 24 Feb 2016 24 Feb 2016 1 Administrative Assignment 3 due


  1. Semantic Segmentation: Upsampling Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 41

  2. Semantic Segmentation: Upsampling Learnable upsampling! Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 42

  3. Semantic Segmentation: Upsampling Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 43

  4. Semantic Segmentation: Upsampling “skip connections” Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 44

  5. Semantic Segmentation: Upsampling “skip connections” Skip connections = Better results Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 45

  6. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 46

  7. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 1 pad 1 Dot product between filter and input Input: 4 x 4 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 47

  8. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 1 pad 1 Dot product between filter and input Input: 4 x 4 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 48

  9. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 49

  10. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 2 pad 1 Dot product between filter and input Input: 4 x 4 Output: 2 x 2 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 50

  11. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 2 pad 1 Dot product between filter and input Input: 4 x 4 Output: 2 x 2 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 51

  12. Learnable Upsampling: “Deconvolution” 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 52

  13. Learnable Upsampling: “Deconvolution” 3 x 3 “deconvolution”, stride 2 pad 1 Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 53

  14. Learnable Upsampling: “Deconvolution” 3 x 3 “deconvolution”, stride 2 pad 1 Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 54

  15. Learnable Upsampling: “Deconvolution” Sum where 3 x 3 “deconvolution”, stride 2 pad 1 output overlaps Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 55

  16. Learnable Upsampling: “Deconvolution” Sum where 3 x 3 “deconvolution”, stride 2 pad 1 output overlaps Same as backward pass for normal convolution! Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 56

  17. Learnable Upsampling: “Deconvolution” Sum where 3 x 3 “deconvolution”, stride 2 pad 1 output overlaps Same as backward pass for normal convolution! “Deconvolution” is a bad Input gives name, already defined as weight for “inverse of convolution” filter Better names: convolution transpose, backward strided convolution, Input: 2 x 2 Output: 4 x 4 1/2 strided convolution, upconvolution Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 57

  18. Learnable Upsampling: “Deconvolution” Im et al, “Generating images with recurrent adversarial networks”, arXiv 2016 “Deconvolution” is a bad name, already defined as “inverse of convolution” Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016 Better names: convolution transpose, backward strided convolution, 1/2 strided convolution, upconvolution Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 58

  19. Learnable Upsampling: “Deconvolution” Great explanation in appendix Im et al, “Generating images with recurrent adversarial networks”, arXiv 2016 “Deconvolution” is a bad name, already defined as “inverse of convolution” Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016 Better names: convolution transpose, backward strided convolution, 1/2 strided convolution, upconvolution Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 59

  20. Semantic Segmentation: Upsampling Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 60

  21. Semantic Segmentation: Upsampling Normal VGG “Upside down” VGG 6 days of training on Titan X… Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 61

  22. Instance Segmentation Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 62

  23. Instance Segmentation Detect instances, give category, label pixels “simultaneous detection and segmentation” (SDS) Lots of recent work (MS-COCO) Figure credit: Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 63

  24. Instance Segmentation Similar to R-CNN, but with segments Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 64

  25. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 65

  26. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 66

  27. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Mask out background with mean image Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 67

  28. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Mask out background with mean image Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 68

  29. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Mask out background with mean image Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 69

  30. Instance Segmentation: Hypercolumns Hariharan et al, “Hypercolumns for Object Segmentation and Fine-grained Localization”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 70

  31. Instance Segmentation: Hypercolumns Hariharan et al, “Hypercolumns for Object Segmentation and Fine-grained Localization”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 71

  32. Instance Segmentation: Cascades Similar to Faster R-CNN Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 72

  33. Instance Segmentation: Cascades Similar to Faster R-CNN Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 73

  34. Instance Segmentation: Cascades Region proposal network (RPN) Similar to Faster R-CNN Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 74

  35. Instance Segmentation: Cascades Region proposal network (RPN) Similar to Faster R-CNN Reshape boxes to fixed size, figure / ground logistic regression Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 75

  36. Instance Segmentation: Cascades Region proposal network (RPN) Similar to Faster R-CNN Reshape boxes to fixed size, figure / ground logistic regression Mask out background, predict object class Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 76

  37. Instance Segmentation: Cascades Region proposal network (RPN) Similar to Faster R-CNN Reshape boxes to Learn entire model fixed size, end-to-end! figure / ground logistic regression Mask out background, predict object class Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 77

  38. Instance Segmentation: Cascades Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Predictions Ground truth Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 78

  39. Segmentation Overview ● Semantic segmentation ○ Classify all pixels ○ Fully convolutional models, downsample then upsample ○ Learnable upsampling: fractionally strided convolution ○ Skip connections can help ● Instance Segmentation ○ Detect instance, generate mask ○ Similar pipelines to object detection Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 79

  40. Attention Models Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 80

  41. Recall: RNN for Captioning Image: H x W x 3 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 81

  42. Recall: RNN for Captioning CNN Image: Features: H x W x 3 D Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 82

  43. Recall: RNN for Captioning CNN h0 Image: Features: Hidden state: H x W x 3 D H Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 83

  44. Recall: RNN for Captioning Distribution over vocab d1 CNN h0 h1 Image: Features: Hidden state: H x W x 3 D H y1 First word Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 84

  45. Recall: RNN for Captioning Distribution over vocab d1 d2 CNN h0 h1 h2 Image: Features: Hidden state: H x W x 3 D H y1 y2 First Second word word Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 85

  46. Recall: RNN for Captioning RNN only looks at Distribution whole image, once over vocab d1 d2 CNN h0 h1 h2 Image: Features: Hidden state: H x W x 3 D H y1 y2 First Second word word Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 86

  47. Recall: RNN for Captioning RNN only looks at Distribution whole image, once over vocab d1 d2 CNN h0 h1 h2 Image: Features: Hidden state: What if the RNN H x W x 3 D H looks at different y1 y2 parts of the image at each timestep? First Second word word Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 87

  48. Soft Attention for Captioning CNN Features: Image: L x D H x W x 3 Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 88

  49. Soft Attention for Captioning CNN h0 Features: Image: L x D H x W x 3 Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 89

  50. Soft Attention for Captioning Distribution over L locations a1 CNN h0 Features: Image: L x D H x W x 3 Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 90

  51. Soft Attention for Captioning Distribution over L locations a1 CNN h0 Features: Image: L x D Weighted H x W x 3 z1 features: D Weighted combination Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 91

  52. Soft Attention for Captioning Distribution over L locations a1 CNN h0 h1 Features: Image: L x D Weighted H x W x 3 z1 y1 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 92

  53. Soft Attention for Captioning Distribution over Distribution L locations over vocab a1 a2 d1 CNN h0 h1 Features: Image: L x D Weighted H x W x 3 z1 y1 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 93

  54. Soft Attention for Captioning Distribution over Distribution L locations over vocab a1 a2 d1 CNN h0 h1 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 94

  55. Soft Attention for Captioning Distribution over Distribution L locations over vocab a1 a2 d1 CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 95

  56. Soft Attention for Captioning Distribution over Distribution L locations over vocab a1 a2 d1 a3 d2 CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 96

  57. Soft Attention for Captioning Distribution over Distribution Guess which framework L locations over vocab was used to implement? a1 a2 d1 a3 d2 CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 97

  58. Soft Attention for Captioning Distribution over Distribution Guess which framework L locations over vocab was used to implement? a1 a2 d1 a3 d2 Crazy RNN = Theano CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 98

  59. Soft vs Hard Attention a b CNN c d Grid of features Image: (Each D- H x W x 3 dimensional) p a p b From RNN: p c p d Distribution over grid locations Xu et al, “Show, Attend and Tell: Neural p a + p b + p c + p c = 1 Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 99

  60. Soft vs Hard Attention a b CNN c d Grid of features Image: (Each D- H x W x 3 dimensional) Context vector z (D-dimensional) p a p b From RNN: p c p d Distribution over grid locations Xu et al, “Show, Attend and Tell: Neural p a + p b + p c + p c = 1 Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend