GoogLeNet Deeper than deeper Some slides are from Christian Szegedy - - PowerPoint PPT Presentation

googlenet
SMART_READER_LITE
LIVE PREVIEW

GoogLeNet Deeper than deeper Some slides are from Christian Szegedy - - PowerPoint PPT Presentation

GoogLeNet Deeper than deeper Some slides are from Christian Szegedy GoogLeNet Convolution Pooling Softmax Other GoogLeNet vs Previous GoogLeN Convolution et Pooling Softmax Other Zeiler-Fergus Architecture (1 tower) Why is the deep


slide-1
SLIDE 1

GoogLeNet

Deeper than deeper

Some slides are from Christian Szegedy

slide-2
SLIDE 2

GoogLeNet

Convolution Pooling Softmax Other

slide-3
SLIDE 3

GoogLeNet vs Previous

GoogLeN et Zeiler-Fergus Architecture (1 tower)

Convolution Pooling Softmax Other

slide-4
SLIDE 4

Why is the deep learning revolution arriving just now?

slide-5
SLIDE 5

Why is the deep learning revolution arriving just now?

slide-6
SLIDE 6

Why is the deep learning revolution arriving just now?

Rectified Linear Unit

Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier networks In Proceedings of the 14th International Conference on Artificial Intelligence and

  • Statistics. JMLR W&CP Volume (Vol. 15, pp.

315-323).

slide-7
SLIDE 7

Theoretical breakthroughs

Arora, S., Bhaskara, A., Ge, R., & Ma, T. Provable bounds for learning some deep representations. ICML 2014

slide-8
SLIDE 8

Hebbian Principle

Input

Cells that fire together, wire together

slide-9
SLIDE 9

Cluster according activation statistics

Layer 1 Input

slide-10
SLIDE 10

Cluster according correlation statistics

Layer 1 Input Layer 2

slide-11
SLIDE 11

Cluster according correlation statistics

Layer 1 Input Layer 2 Layer 3

slide-12
SLIDE 12

In images, correlations tend to be local

slide-13
SLIDE 13

Cover very local clusters by 1x1 convolutions

1x1

number of filters

slide-14
SLIDE 14

Less spread out correlations

1x1

number of filters

slide-15
SLIDE 15

Cover more spread out clusters by 3x3 convolutions

1x1 3x3

number of filters

slide-16
SLIDE 16

Cover more spread out clusters by 5x5 convolutions

1x1

number of filters

3x3

slide-17
SLIDE 17

Cover more spread out clusters by 5x5 convolutions

1x1

number of filters

3x3 5x5

slide-18
SLIDE 18

A heterogeneous set of convolutions

1x1

number of filters

3x3 5x5

slide-19
SLIDE 19

Schematic view (naive version)

1x1

number of filters

3x3 5x5

1x1 convolutions 3x3 convolutions 5x5 convolutions Filter concatenation Previous layer

slide-20
SLIDE 20

1x1 convolutions 3x3 convolutions 5x5 convolutions Filter concatenation Previous layer

Naive idea

slide-21
SLIDE 21

1x1 convolutions 3x3 convolutions 5x5 convolutions Filter concatenation Previous layer

Naive idea (does not work!)

3x3 max pooling

slide-22
SLIDE 22

1x1 convolutions 3x3 convolutions 5x5 convolutions Filter concatenation Previous layer

Inception module

3x3 max pooling 1x1 convolutions 1x1 convolutions 1x1 convolutions

slide-23
SLIDE 23

Inception

Convolution Pooling Softmax Other

Why does it have so many layers???

slide-24
SLIDE 24

Inception 9 Inception modules

Convolution Pooling Softmax Other Network in a network in a network...

slide-25
SLIDE 25

Inception

Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules. 256 480 480 512 512 512 832 832 1024

slide-26
SLIDE 26

Inception

Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules.

  • Can remove fully connected layers on top completely

256 480 480 512 512 512 832 832 1024

slide-27
SLIDE 27

Inception

Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules.

  • Can remove fully connected layers on top completely
  • Number of parameters is reduced to 5 million
  • 256

480 480 512 512 512 832 832 1024

slide-28
SLIDE 28

Inception

Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules.

  • Can remove fully connected layers on top completely
  • Number of parameters is reduced to 5 million
  • 256

480 480 512 512 512 832 832 1024 Computional cost is increased by less than 2X compared to Krizhevsky’s network. (<1.5Bn operations/ evaluation)

slide-29
SLIDE 29

Efficient Gradient Propatation

  • Shadow network can always provide good performance
  • Auxiliary classifier connected to intermediate layers
slide-30
SLIDE 30

Performance break

Multiple Models and Crops

slide-31
SLIDE 31

Classification performance

slide-32
SLIDE 32

Where Are We Now

slide-33
SLIDE 33
slide-34
SLIDE 34

Where Are We Now

  • It is very hard for hymn
  • Even if the number of choices is reduced to 1000
slide-35
SLIDE 35
  • It is very hard for hyman
  • Even if the number of choices is reduced to 1000
  • It is time consuming
  • 1 image per minute
  • Human performance
  • Without training: 13 - 15% error
  • With training: 5.1%
  • GoogLeNet: 6.7%

Where Are We Now