Imagenet Xavier Gir-i-Nieto ImageNet ILSRVC Li Fei-Fei, How were - - PowerPoint PPT Presentation

imagenet
SMART_READER_LITE
LIVE PREVIEW

Imagenet Xavier Gir-i-Nieto ImageNet ILSRVC Li Fei-Fei, How were - - PowerPoint PPT Presentation

Day 2 Lecture 4 Imagenet Xavier Gir-i-Nieto ImageNet ILSRVC Li Fei-Fei, How were teaching computers to understand pictures TEDTalks 2014. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L.


slide-1
SLIDE 1

Imagenet

Day 2 Lecture 4

Xavier Giró-i-Nieto

slide-2
SLIDE 2

2

ImageNet ILSRVC

Li Fei-Fei, “How we’re teaching computers to understand pictures” TEDTalks 2014.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]

slide-3
SLIDE 3

3

ImageNet ILSRVC

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]

slide-4
SLIDE 4

4

ImageNet ILSRVC

  • 1,000 object classes

(categories).

  • Images:

○ 1.2 M train ○ 100k test.

slide-5
SLIDE 5

ImageNet ILSRVC

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]

  • Top 5 error rate
slide-6
SLIDE 6

Slide credit: Rob Fergus (NYU)

Image Classification 2012

  • 9.8%

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]

6

ImageNet ILSRVC

Based on SIFT + Fisher Vectors

slide-7
SLIDE 7

AlexNet (Supervision)

7

Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) Orange A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)

slide-8
SLIDE 8

8

Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)

AlexNet (Supervision)

slide-9
SLIDE 9

9

Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)

AlexNet (Supervision)

slide-10
SLIDE 10

10

Image credit: Deep learning Tutorial (Stanford University)

AlexNet (Supervision)

slide-11
SLIDE 11

11

Image credit: Deep learning Tutorial (Stanford University)

AlexNet (Supervision)

slide-12
SLIDE 12

12

Image credit: Deep learning Tutorial (Stanford University)

AlexNet (Supervision)

slide-13
SLIDE 13

13

Rectified Linear Unit (non-linearity) f(x) = max(0,x)

Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)

AlexNet (Supervision)

slide-14
SLIDE 14

14

Dot Product

Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)

AlexNet (Supervision)

slide-15
SLIDE 15

ImageNet Classification 2013

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]

Slide credit: Rob Fergus (NYU)

15

ImageNet ILSRVC

slide-16
SLIDE 16

The development of better convnets is reduced to trial-and- error.

16

Zeiler-Fergus (ZF)

Visualization can help in proposing better architectures.

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.

slide-17
SLIDE 17

“A convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.”

Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.

17

Zeiler-Fergus (ZF)

slide-18
SLIDE 18

18

Zeiler-Fergus (ZF)

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.

DeconvN et Conv Net

slide-19
SLIDE 19

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.

19

Zeiler-Fergus (ZF)

slide-20
SLIDE 20

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.

20

Zeiler-Fergus (ZF)

slide-21
SLIDE 21

21

The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead" features. AlexNet (Layer 1) ZF (Layer 1)

Zeiler-Fergus (ZF): Stride & filter size

slide-22
SLIDE 22

22

Cleaner features in ZF, without the aliasing artifacts caused by the stride 4 used in AlexNet.

AlexNet (Layer 2) ZF (Layer 2)

Zeiler-Fergus (ZF)

slide-23
SLIDE 23

23

Regularization with more dropout: introduced in the input layer.

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Chicago

Zeiler-Fergus (ZF): Drop out

slide-24
SLIDE 24

24

Zeiler-Fergus (ZF): Results

slide-25
SLIDE 25

25

Zeiler-Fergus (ZF): Results

slide-26
SLIDE 26

ImageNet Classification 2013

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]

  • 5%

26

E2E: Classification: ImageNet ILSRVC

slide-27
SLIDE 27

E2E: Classification

27

slide-28
SLIDE 28

E2E: Classification: GoogLeNet

28

Movie: Inception (2010)

slide-29
SLIDE 29

E2E: Classification: GoogLeNet

29

  • 22 layers, but 12 times fewer parameters than AlexNet.

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]

slide-30
SLIDE 30

E2E: Classification: GoogLeNet

30

slide-31
SLIDE 31

E2E: Classification: GoogLeNet

31

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.

slide-32
SLIDE 32

E2E: Classification: GoogLeNet

32

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.

Multiple scales

slide-33
SLIDE 33

E2E: Classification: GoogLeNet (NiN)

33

3x3 and 5x5 convolutions deal with different scales.

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]

slide-34
SLIDE 34

E2E: Classification: GoogLeNet

34

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.

Dimensionality reduction

slide-35
SLIDE 35

35

1x1 convolutions does dimensionality reduction (c3<c2) and accounts for rectified linear units (ReLU).

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]

E2E: Classification: GoogLeNet (NiN)

slide-36
SLIDE 36

E2E: Classification: GoogLeNet

36

In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the expensive 3x3 and 5x5 convolutions.

slide-37
SLIDE 37

E2E: Classification: GoogLeNet

37

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.

slide-38
SLIDE 38

E2E: Classification: GoogLeNet

38

They somewhat spatial invariance, and has proven a benefitial effect by adding an alternative parallel path.

slide-39
SLIDE 39

E2E: Classification: GoogLeNet

39

Two Softmax Classifiers at intermediate layers combat the vanishing gradient while providing regularization at training time. ...and no fully connected layers needed !

slide-40
SLIDE 40

E2E: Classification: GoogLeNet

40

slide-41
SLIDE 41

E2E: Classification: GoogLeNet

41

NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)

slide-42
SLIDE 42

E2E: Classification: GoogLeNet

42

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]

slide-43
SLIDE 43

E2E: Classification: VGG

43

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]

slide-44
SLIDE 44

E2E: Classification: VGG

44

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]

slide-45
SLIDE 45

E2E: Classification: VGG: 3x3 Stacks

45

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]

slide-46
SLIDE 46

E2E: Classification: VGG

46

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]

  • No poolings between some convolutional layers.
  • Convolution strides of 1 (no skipping).
slide-47
SLIDE 47

E2E: Classification

47

3.6% top 5 error… with 152 layers !!

slide-48
SLIDE 48

E2E: Classification: ResNet

48

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]

slide-49
SLIDE 49

E2E: Classification: ResNet

49

  • Deeper networks (34 is deeper than 18) are more difficult to train.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]

Thin curves: training error Bold curves: validation error

slide-50
SLIDE 50

ResNet

50

  • Residual learning: reformulate the layers as learning residual functions with

reference to the layer inputs, instead of learning unreferenced functions

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]

slide-51
SLIDE 51

E2E: Classification: ResNet

51

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]

slide-52
SLIDE 52

52

Thanks ! Q&A ?

Follow me at

https://imatge.upc.edu/web/people/xavier-giro

@DocXavi /ProfessorXavi