EE-559 Deep learning 1a. Introduction Fran cois Fleuret - - PowerPoint PPT Presentation

ee 559 deep learning 1a introduction
SMART_READER_LITE
LIVE PREVIEW

EE-559 Deep learning 1a. Introduction Fran cois Fleuret - - PowerPoint PPT Presentation

EE-559 Deep learning 1a. Introduction Fran cois Fleuret https://fleuret.org/dlc/ [version of: June 5, 2018] COLE POLYTECHNIQUE FDRALE DE LAUSANNE Why learning Fran cois Fleuret EE-559 Deep learning / 1a. Introduction 2


slide-1
SLIDE 1

EE-559 – Deep learning

  • 1a. Introduction

Fran¸ cois Fleuret https://fleuret.org/dlc/

[version of: June 5, 2018]

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

slide-2
SLIDE 2

Why learning

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 2 / 63

slide-3
SLIDE 3

Many applications require the automatic extraction of “refined” information from raw signal (e.g. image recognition, automatic speech processing, natural language processing, robotic control, geometry reconstruction). (ImageNet)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 3 / 63

slide-4
SLIDE 4

Our brain is so good at interpreting visual information that the “semantic gap” is hard to assess intuitively.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 4 / 63

slide-5
SLIDE 5

Our brain is so good at interpreting visual information that the “semantic gap” is hard to assess intuitively. This: is a horse

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 4 / 63

slide-6
SLIDE 6

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 4 / 63

slide-7
SLIDE 7

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 4 / 63

slide-8
SLIDE 8

>>> from torchvision import datasets >>> cifar = datasets.CIFAR10 (’./ data/cifar10/’, train=True , download=True) Files already downloaded and verified >>> x = torch.from_numpy (cifar. train_data )[43]. transpose (2, 0).transpose (1, 2) >>> x.size () torch.Size ([3, 32, 32]) >>> x.narrow (1, 0, 4).narrow (2, 0, 12) (0 ,.,.) = 99 98 100 103 105 107 108 110 114 115 117 118 100 100 102 105 107 109 110 112 115 117 119 120 104 104 106 109 111 112 114 116 119 121 123 124 109 109 111 113 116 117 118 120 123 124 127 128 (1 ,.,.) = 166 165 167 169 171 172 173 175 176 178 179 181 166 164 167 169 169 171 172 174 176 177 179 180 169 167 170 171 171 173 174 176 178 179 182 183 170 169 172 173 175 176 177 178 179 181 183 184 (2 ,.,.) = 198 196 199 200 200 202 203 204 205 206 208 209 195 194 197 197 197 199 200 201 202 203 206 207 197 195 198 198 198 199 201 202 203 204 206 207 197 196 199 198 198 199 200 201 203 204 207 208 [torch. ByteTensor

  • f size 3x4x12]

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 5 / 63

slide-9
SLIDE 9

Extracting semantic automatically requires models of extreme complexity, which cannot be designed by hand. Techniques used in practice consist of

  • 1. defining a parametric model, and
  • 2. optimizing its parameters by “making it work” on training data.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 6 / 63

slide-10
SLIDE 10

Extracting semantic automatically requires models of extreme complexity, which cannot be designed by hand. Techniques used in practice consist of

  • 1. defining a parametric model, and
  • 2. optimizing its parameters by “making it work” on training data.

This is similar to biological systems for which the model (e.g. brain structure) is DNA-encoded, and parameters (e.g. synaptic weights) are tuned through experiences.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 6 / 63

slide-11
SLIDE 11

Extracting semantic automatically requires models of extreme complexity, which cannot be designed by hand. Techniques used in practice consist of

  • 1. defining a parametric model, and
  • 2. optimizing its parameters by “making it work” on training data.

This is similar to biological systems for which the model (e.g. brain structure) is DNA-encoded, and parameters (e.g. synaptic weights) are tuned through experiences. Deep learning encompasses software technologies to scale-up to billions of model parameters and as many training examples.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 6 / 63

slide-12
SLIDE 12

There are strong connections between standard statistical modeling and machine learning.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 7 / 63

slide-13
SLIDE 13

There are strong connections between standard statistical modeling and machine learning. Classical ML methods combine a “learnable” model from statistics (e.g “linear regression”) with prior knowledge in pre-processing.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 7 / 63

slide-14
SLIDE 14

There are strong connections between standard statistical modeling and machine learning. Classical ML methods combine a “learnable” model from statistics (e.g “linear regression”) with prior knowledge in pre-processing. “Artificial neural networks” pre-dated these approaches, and do not follow that

  • dichotomy. They consist of “deep” stacks of parametrized processing.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 7 / 63

slide-15
SLIDE 15

From artificial neural networks to “Deep Learning”

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 8 / 63

slide-16
SLIDE 16

130

LOGICAL CALCULUS FOR NERVOUS ACTIVITY

b

e ~

~ 9 h

FIG~E 1

d f

Networks of “Threshold Logic Unit” (McCulloch and Pitts, 1943)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 9 / 63

slide-17
SLIDE 17

1949 – Donald Hebb proposes the Hebbian Learning principle. 1951 – Marvin Minsky creates the first ANN (Hebbian learning, 40 neurons).

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 10 / 63

slide-18
SLIDE 18

1949 – Donald Hebb proposes the Hebbian Learning principle. 1951 – Marvin Minsky creates the first ANN (Hebbian learning, 40 neurons). 1958 – Frank Rosenblatt creates a perceptron to classify 20 × 20 images.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 10 / 63

slide-19
SLIDE 19

1949 – Donald Hebb proposes the Hebbian Learning principle. 1951 – Marvin Minsky creates the first ANN (Hebbian learning, 40 neurons). 1958 – Frank Rosenblatt creates a perceptron to classify 20 × 20 images. 1959 – David H. Hubel and Torsten Wiesel’s demonstrate orientation selectivity and columnar organization in the cat’s visual cortex.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 10 / 63

slide-20
SLIDE 20

1949 – Donald Hebb proposes the Hebbian Learning principle. 1951 – Marvin Minsky creates the first ANN (Hebbian learning, 40 neurons). 1958 – Frank Rosenblatt creates a perceptron to classify 20 × 20 images. 1959 – David H. Hubel and Torsten Wiesel’s demonstrate orientation selectivity and columnar organization in the cat’s visual cortex. 1982 – Paul Werbos proposes back-propagation for ANNs.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 10 / 63

slide-21
SLIDE 21

Neocognitron

195

visuo[ oreo 9l< QSsOCiQtion oreo-- lower-order

  • -,. higher-order .-,. ~

.grandmother retino --,- LGB --,. simple ~ complex --,. hypercomplex hypercomplex " -- cell '~

F- 3 I-- . . . . l r I I I I 1 1

Uo ' , ~' Usl

  • ---->

Ucl t~-~i Us2~ Uc2 ~ Us3----* Uc3 T

[ I L ~ L J

  • Fig. 1. Correspondence

between the hierarchy model by Hubel and Wiesel, and the neural network of the neocognitron shifted in parallel from cell to cell. Hence, all the cells in a single cell-plane have receptive fields of the same function, but at different positions. We will use notations Us~(k~,n ) to represent the

  • utput of an S-cell in the krth S-plane in the l-th

module, and Ucl(k~, n) to represent the output of a C-cell in the krth C-plane in that module, where n is the two- dimensional co-ordinates representing the position of these cell's receptive fields in the input layer. Figure 2 is a schematic diagram illustrating the interconnections between layers. Each tetragon drawn with heavy lines represents an S-plane or a C-plane, and each vertical tetragon drawn with thin lines, in which S-planes or C-planes are enclosed, represents an S-layer or a C-layer. In Fig. 2, a cell of each layer receives afferent connections from the cells within the area enclosed by the elipse in its preceding layer. To be exact, as for the S-cells, the elipses in Fig. 2 does not show the connect- ing area but the connectable area to the S-cells. That is, all the interconnections coming from the elipses are not always formed, because the synaptic connections incoming to the S-cells have plasticity. In Fig. 2, for the sake of simplicity of the figure,

  • nly one cell is shown in each cell-plane. In fact, all the

cells in a cell-plane have input synapses of the same spatial distribution as shown in Fig. 3, and only the positions of the presynaptic cells are shifted in parallel from cell to cell.

R3 ~I

modifioble synapses ) unmodifiable synopses

Since the cells in the network are interconnected in a cascade as shown in Fig. 2, the deeper the layer is, the larger becomes the receptive field of each cell of that

  • layer. The density of the cells in each cell-plane is so

determined as to decrease in accordance with the increase of the size of the receptive fields. Hence, the total number of the cells in each cell-plane decreases with the depth of the cell-plane in the network. In the last module, the receptive field of each C-cell becomes so large as to cover the whole area of input layer U0, and each C-plane is so determined as to have only one C-cell. The S-cells and C-cells are excitatory cells. That is, all the efferent synapses from these cells are excitatory. Although it is not shown in Fig. 2, we also have

  • Fig. 3. Illustration showing the input interconnections

to the cells within a single cell-plane

  • Fig. 2. Schematic

diagram illustrating the interconnections between layers in the neocognitron

Follows Hubel and Wiesel’s results. (Fukushima, 1980)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 11 / 63

slide-22
SLIDE 22

Network for the T-C problem Trained with back-prop. (Rumelhart et al., 1988)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 12 / 63

slide-23
SLIDE 23

LeNet-5

✂✁☎✄✝✆✟✞✠✄☛✡✌☞✎✍✟✏✒✑✓✏✂✏✂✏✎✔✖✕☛✄☎✗☛✏✙✘✛✚✙✏✂✁✢✜✤✣✥✣✧✦
  • INPUT

32x32 Convolutions Subsampling Convolutions C1: feature maps 6@28x28 Subsampling S2: f. maps 6@14x14 S4: f. maps 16@5x5 C5: layer 120 C3: f. maps 16@10x10 F6: layer 84 Full connection Full connection Gaussian connections OUTPUT 10

✁❼✿▲❍✪❦✪❾★❦➝❱❜✸✶❆✶✯✪✿ ✵✶✰❅❆r✵✻✳✪✸✶✰❷✷✴⑥✦P❩✰❺❤❀✰r✵✻❑✂✁★❚♦✱✎❻✇✷✹❈❯▼❯✷✹❴▲✳★✵✶✿▲✷✹❈✪✱✴❴✦❤❳✰❅✳✪✸✶✱✴❴☛❤❳✰r✵●✽❢✷✹✸✻❣✑❚★✯✪✰❅✸✶✰❜⑥✙✷✹✸❢❉✪✿▲❍✹✿ ✵✻✺✏✸✶✰❅❆❅✷✹❍✹❈✪✿ ✵✶✿▲✷✹❈➎❦✥❧✈✱✴❆❖✯✞❃★❴➂✱✴❈✪✰❨✿▲✺✏✱➜⑥✙✰❇✱❘✵✶✳★✸✶✰❷❋✛✱✴❃❩❚✄✿✁❦ ✰✹❦✪✱➜✺✶✰r✵✏✷✴⑥❼✳✪❈✪✿ ✵✻✺ ✽❳✯★✷✹✺✶✰❨✽❢✰❅✿▲❍✹✯❯✵✶✺❜✱✴✸✶✰❨❆❅✷✹❈✪✺ ✵✶✸✶✱✴✿▲❈✪✰❅❉✞✵✶✷✎◗✑✰❷✿▲❉★✰❇❈❯✵✻✿▲❆❺✱✴❴✁❦ å➈ê➞è✧ç✙ï✶ëíï✖å➄è✧ÿ✙ä✥ï➽î➓å➄æ☎ê✒ø➑é➷è✥ç✙ï✶æ☎ä✧ï✖ú✐ø➀ì➈ÿ✎ê✌û➀å✪✘❛ï➊ä❞ð✫ã✛ç✙ï➓è✧ä❿å➄ø➀é☎å✑✡✙û➑ï ❶ì➇ï✱✯➵➊ø➑ï✖é✐è✒å➄é☎ù➒✡✙ø✓å➈ê✌➊ì➈é✐è✧ä✥ì➈û❦è✥ç✙ï➓ï❯➘✟ï✒❶è➞ì➄ë➵è✧ç✙ï✶ê✧ø✠✂➈î➻ì➈ø✓ù❖é☎ì➈é✦✝ û➀ø➑é✙ï❞å➄ä✥ø➩è❅✘❛ð➃➏⑥ë❦è✥ç✙ï➉❶ì➇ï✱✯➵➊ø➑ï✖é✐è➔ø➀ê♣ê✤î➓å➄û➀û✶➌✙è✧ç☎ï➊é✺è✧ç☎ï➞ÿ✙é✙ø➑è➉ì➈æ◆ï➊ä❿å✠è✥ï✖ê ø➀é✺å➵➍✐ÿ☎å❛ê✤ø✙✝⑥û➑ø➀é✙ï✖å➈ä✛î❭ì✂ù✂ï➎➌☎å➄é✎ù✶è✥ç✙ï➞ê✧ÿ❼✡✦✝➠ê✧å➈î❭æ☎û➑ø➀é❼✂➓û✓å✪✘➈ï➊ä✛î➻ï➊ä✥ï➊û✠✘ ✡✙û➀ÿ✙ä❿ê➻è✧ç✙ï❖ø➑é✙æ☎ÿ✂è✖ð ➏⑥ë➳è✥ç✙ï↕❶ì➇ï✱✯➵➊ø➑ï✖é✐è✶ø➀ê➽û➀å➈ä❺✂❛ï✑➌✇ê✧ÿ❼✡✦✝➠ê✥å➄î➻æ✙û➀ø➑é❼✂ ÿ✙é✙ø➑è✥ê④➊å➈é✆✡✎ï✒ê✧ï➊ï✖é✫å➈ê❨æ◆ï➊ä✧ëíì➈ä✥î➻ø➑é❼✂➽å ☛✤é✙ì❛ø➀ê❺✘✿ý✞✍ ✌➻ì❛ä➉å▲☛✤é✙ì❛ø➀ê❺✘ ✕➉ñ④✧✟✌➳ëíÿ☎é✗↔è✥ø➑ì❛é➻ù✙ï➊æ◆ï➊é☎ù✂ø➀é❼✂➤ì➈é❙è✧ç☎ï❨úrå➈û➑ÿ☎ï➵ì➈ë☎è✥ç✙ï✔✡✙ø➀å❛ê➊ð❣þ➇ÿ✗✒❶ï❞ê❅✝ ê✧ø➑ú❛ï❙û✓å✪✘➈ï✖ä✥ê♣ì➄ë✔❶ì➈é➇ú❛ì➈û➀ÿ✂è✧ø➀ì➈é☎ê➳å➄é✎ù❺ê✧ÿ❼✡✦✝➠ê✧å➈î➻æ✙û➑ø➀é❼✂✺å➄ä✥ï✒è❅✘➇æ✙ø✁➊å➈û➑û✠✘ å➄û➑è✧ï✖ä✧é✎å✠è✧ï❞ù❢➌☎ä✥ï✖ê✧ÿ✙û➑è✧ø➀é❼✂➽ø➀é❖å✏☛❺✡✙ø➟✝⑥æ☛✘➇ä✥å➈î❭ø✓ù ✌❇✰✇å✠è♣ï✖å✑❿ç✫û✓å✪✘➈ï✖ä✒➌✙è✧ç✙ï é➇ÿ✙î➉✡◆ï➊ä➳ì➄ë❣ëíï❞å✠è✥ÿ✙ä✧ï❙î➓å➄æ☎ê➉ø➀ê♣ø➑é✥❶ä✥ï✖å➈ê✧ï✖ù✫å❛ê➔è✥ç✙ï❙ê✧æ☎å➄è✧ø✓å➄û❯ä✧ï❞ê✤ì❛û➑ÿ✦✝ è✧ø➀ì➈é❭ø✓ê❯ù✙ï✒❶ä✥ï✖å❛ê✤ï❞ù☛ð ✭✇å✑❿ç➞ÿ☎é✙ø➩è❣ø➀é✒è✥ç✙ï➵è✧ç✙ø➀ä❿ù✒ç✙ø✓ù✙ù✂ï✖é❙û✓å✪✘➈ï✖ä❇ø➀é➑➞✗✂❄✝ ÿ✙ä✥ï✄✑➵î➓å✪✘➉ç☎årú❛ï❣ø➑é☎æ✙ÿ✂è❜➊ì➈é✙é☎ï✒↔è✥ø➑ì❛é☎ê✟ëíä✥ì➈î ê✤ï✖ú➈ï✖ä✥å➈ûrëíï✖å✠è✥ÿ✙ä✥ï➏î➓å➄æ☎ê ø➀é✢è✥ç✙ï✌æ✙ä✥ï➊ú➇ø➀ì➈ÿ☎ê✛û✓å✪✘➈ï✖ä✖ð❦ã✛ç✙ï➓❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎✄✠ê✧ÿ❼✡✦✝➠ê✧å➈î➻æ✙û➑ø➀é❼✂➚❶ì➈î➚✝ ✡✙ø➀é☎å✠è✥ø➑ì❛é✏➌☛ø➀é☎ê✤æ☎ø➑ä✥ï✖ù✆✡☛✘✫õ➔ÿ✗✡✎ï✖û➏å➄é☎ù ✎ ø➑ï❞ê✤ï✖û✡❁ ê➉é☎ì➄è✧ø➀ì➈é✎ê➉ì➈ë ☛✧ê✧ø➑î➚✝ æ✙û➀ï✖✌➤å➄é✎ù ☛❘❶ì❛î❭æ☎û➑ï✄↔✔✌✞❶ï✖û➑û✓ê✒➌➄ó➵å❛ê❦ø➑î➻æ✙û➀ï➊î➻ï✖é❛è✥ï✖ù❭ø➀é➚✜☎ÿ✙ô➇ÿ☎ê✤ç☎ø➑î➓å❂❁ ê ñ➔ï✖ì✦❶ì✑✂❛é✙ø➑è✧ä✥ì➈é ✞ ❜ ✑✆✠❖➌✎è✧ç✙ì❛ÿ❼✂➈ç✫é✙ì➙✂➈û➀ì✑✡☎å➈û➑û✠✘✺ê✤ÿ✙æ◆ï➊ä✥ú➇ø➀ê✧ï✖ù✺û➀ï✖å➈ä✧é☎ø➑é❼✂ æ✙ä✥ì✦❶ï✖ù✙ÿ✙ä✧ï✌ê✧ÿ✗❿ç✺å➈ê✎✡☎å✑❿ô❩✝⑥æ✙ä✥ì➈æ☎å✑✂❛å✠è✥ø➑ì❛é➓ó➵å❛ê❨årú✠å➄ø➀û➀å✑✡✙û➑ï♣è✧ç☎ï➊é✝ð❹✕ û✓å➄ä❘✂➈ï✒ù✂ï✒✂➈ä✥ï➊ï➞ì➈ë❫ø➀é✐ú✠å➈ä✧ø✓å➄é✗➊ï✌è✧ì➙✂➈ï✖ì➈î➻ï❶è✥ä✧ø✁➤è✧ä❿å➄é✎ê④ëíì❛ä✧î➓å✠è✥ø➑ì❛é☎ê❨ì➈ë è✧ç☎ï❭ø➀é✙æ✙ÿ✙è✌✖å➄é➛✡✎ï➓å✑❿ç☎ø➑ï✖ú➈ï✖ù➲ó❨ø➩è✥ç❖è✥ç✙ø➀ê➤æ☎ä✧ì➎✂➈ä✥ï✖ê✥ê✤ø➀ú➈ï➞ä✥ï✖ù✙ÿ✗↔è✥ø➑ì❛é ì➄ë✝ê✧æ☎å✠è✥ø➀å➈û✙ä✥ï✖ê✧ì➈û➀ÿ✂è✧ø➀ì➈é➵❶ì❛î➻æ✎ï✖é☎ê✧å➄è✧ï❞ù➝✡❩✘❭å✌æ✙ä✥ì✑✂❛ä✧ï❞ê✧ê✧ø➑ú❛ï❫ø➀é✗❶ä✥ï✖å❛ê✤ï ì➄ë◆è✧ç☎ï➔ä✧ø✁❿ç✙é✙ï❞ê✧ê❯ì➈ë✎è✥ç✙ï➔ä✥ï➊æ✙ä✥ï✖ê✧ï➊é✐è✥å➄è✧ø➀ì➈é➐➪➺è✧ç☎ï➔é➇ÿ✙î➉✡◆ï➊ä➏ì➈ë✎ëíï❞å✠è✥ÿ✙ä✧ï î➓å➄æ☎ê✴➶↔ð þ➇ø➑é✥❶ï✌å➄û➀û☎è✥ç✙ï➤ó✇ï✖ø✙✂❛ç✐è✥ê➵å➄ä✥ï➉û➀ï✖å➄ä✥é✙ï❞ù➽ó❨ø➑è✧ç➐✡☎å✑❿ô❩✝⑥æ✙ä✧ì❛æ☎å❄✂✐å✠è✥ø➑ì❛é✏➌ ❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û✛é✙ï➊è④ó✇ì❛ä✧ô✂ê➝➊å➈é⑧✡◆ï➲ê✤ï✖ï➊é➘å➈ê➓ê❺✘➇é❛è✥ç✙ï✖ê✧ø✠➽➊ø➀é❼✂❖è✥ç✙ï➊ø➀ä ì✠ó❨é ëíï✖å➄è✧ÿ✙ä✥ï➓ï❯↔➇è✧ä❿å✑❶è✧ì❛ä✖ð➽ã✛ç✙ï✶ó✇ï✖ø✙✂❛ç❛è➞ê✧ç☎å➄ä✥ø➀é❼✂✿è✧ï✒❿ç☎é✙ø✠➍✐ÿ✙ï✶ç☎å➈ê è✧ç☎ï✢ø➀é✐è✧ï➊ä✥ï✖ê✤è✧ø➀é❼✂ ê✤ø✓ù✂ï✿ï❯➘✟ï✒↔è➻ì➄ë➔ä✥ï✖ù✙ÿ✗❶ø➀é❼✂➲è✥ç✙ï✿é✐ÿ☎î➉✡◆ï➊ä➻ì➄ë❨ëíä✥ï➊ï æ☎å➈ä✥å➈î❭ï➊è✧ï✖ä✥ê✒➌✛è✧ç✙ï✖ä✧ï✒✡☛✘ ä✥ï✖ù✙ÿ✗❶ø➀é❼✂ è✥ç✙ï ☛❺➊å➈æ☎å✑➊ø➩è❅✘✔✌➷ì➈ë➞è✧ç✙ï î➓å♦✝ ❿ç✙ø➀é✙ï➉å➈é☎ù➻ä✧ï❞ù✂ÿ✗➊ø➑é❼✂➤è✥ç✙ï✛✂✐å➄æ➝✡✎ï➊è④ó✇ï✖ï➊é➻è✥ï✖ê✤è❫ï➊ä✥ä✧ì❛ä➏å➄é☎ù❭è✥ä✥å➈ø➑é☎ø➑é❼✂ ï➊ä✥ä✥ì➈ä ✞ ❜✫❝✬✠⑥ð➽ã✛ç✙ï➓é✙ï➊è④ó✇ì❛ä✧ô➲ø➀é➒➞✗✂➈ÿ☎ä✧ï❵✑✫❶ì❛é✐è✥å➄ø➀é☎ê✹❜✫❝✗✘❼➌ ❀✻✘✗✺ ➊ì➈é✦✝ é✙ï★↔è✧ø➀ì➈é✎ê✄➌♦✡✙ÿ✙è❦ì❛é✙û✠✘ ✓✗✘❼➌ ✘✻✘✗✘✛è✧ä❿å➄ø➀é☎å❄✡☎û➑ï❫ëíä✧ï✖ï❫æ☎å➈ä✥å➈î❭ï➊è✧ï✖ä✥ê❳✡✎ï★➊å➈ÿ☎ê✤ï ì➄ë❯è✧ç☎ï✌ó✇ï✖ø✙✂❛ç❛è➔ê✧ç☎å➄ä✥ø➀é❼✂☎ð ✜❇ø✙↔✂ï✖ù☛✝➠ê✧ø✙➽✖ï❭☞✇ì❛é✐ú❛ì➈û➀ÿ✂è✧ø➀ì➈é✎å➄û❙ñ➉ï❶è④ó➵ì➈ä✥ô✂ê❖ç☎årú❛ï➹✡◆ï➊ï✖é➶å➄æ☎æ✙û➑ø➀ï✖ù è✧ì î➓å➈é❩✘❑å➄æ✙æ✙û➀ø✁➊å✠è✥ø➑ì❛é☎ê✒➌➵å➈î❭ì❛é❼✂ ì➄è✧ç☎ï➊ä➽ç☎å➈é☎ù✂ó❨ä✥ø➩è✥ø➑é✗✂ ä✥ï✒➊ì✑✂❛é✙ø➟✝ è✧ø➀ì➈é✩✞ ❜ ✙✆✠❖➌ ✞ ❜✗✓ ✠❖➌☛î➓å✑❿ç✙ø➀é✙ï❯✝⑥æ✙ä✥ø➑é✐è✥ï✖ù➔❿ç✎å➄ä❿å✑↔è✥ï➊ä♣ä✧ï★❶ì✑✂❛é✙ø➑è✧ø➀ì➈é ✞ ❜ ✔✆✠✶➌ ì➈é❼✝♠û➀ø➑é☎ï✾ç☎å➈é☎ù✂ó❨ä✥ø➩è✥ø➑é✗✂➶ä✥ï✒➊ì✑✂➈é☎ø➩è✥ø➑ì❛é ✞ ❜✗✺ ✠❖➌➲å➄é☎ù✴ë⑨å✑➊ï✾ä✥ï✒➊ì✑✂❛é✙ø➟✝ è✧ø➀ì➈é ✞ ❜✗❀ ✠⑥ð ✜❇ø✙↔✂ï✖ù☛✝➠ê✧ø✙➽✖ï➹❶ì❛é✐ú❛ì➈û➀ÿ✂è✧ø➀ì➈é✎å➄û➞é✙ï➊è④ó✇ì❛ä✧ô✂ê✿è✥ç☎å✠è➷ê✤ç☎å➈ä✧ï ó➵ï➊ø✠✂➈ç✐è✥ê➻å➄û➀ì➈é✗✂ å ê✤ø➀é❼✂❛û➑ï✿è✧ï✖î➻æ✎ì❛ä✥å➈û✛ù✙ø➑î➻ï➊é✎ê✤ø➀ì➈é➘å➄ä✥ï✢ô➇é✙ì✠ó❨é å➈ê ã✛ø➀î❭ï✄✝❖✧♣ï➊û✓å✪✘♣ñ➉ï➊ÿ✙ä❿å➄û✐ñ➔ï➊è④ó✇ì❛ä✧ô✂ê❷➪⑨ã✩✧➳ñ➉ñ♣ê❘➶❶ð❯ã✩✧➳ñ➉ñ♣ê❀ç✎årú➈ï❷✡◆ï➊ï➊é ÿ☎ê✧ï✖ù❖ø➑é➲æ☎ç✙ì➈é✙ï✖î➻ï❙ä✥ï✒➊ì✑✂➈é☎ø➩è✥ø➑ì❛é✢➪íó❨ø➑è✧ç✙ì❛ÿ✂è➤ê✧ÿ❼✡✦✝➠ê✥å➄î➻æ✙û➀ø➑é❼✂☛➶✜✞ ❝✗✘✬✠✶➌ ✞ ❝✗➾✡✠❖➌➻ê✤æ◆ì➈ô❛ï➊éòó➵ì➈ä❿ù ä✥ï✒❶ì➎✂➈é✙ø➑è✧ø➀ì➈é ➪⑨ó❨ø➩è✥ç✭ê✧ÿ❼✡✦✝➠ê✥å➄î➻æ✙û➀ø➑é❼✂☛➶ ✞ ❝✵✑ ✠✶➌ ✞ ❝ ❜ ✠❖➌✌ì➈é❼✝♠û➀ø➑é☎ï ä✧ï★❶ì✑✂❛é✙ø➑è✧ø➀ì➈é ì➄ë➻ø➀ê✧ì➈û✓å✠è✥ï✖ù❲ç☎å➈é☎ù✂ó❨ä✥ø➩è✧è✧ï➊é②❿ç✎å➄ä❿å✑✹✝ è✧ï✖ä✥ê✹✞ ❝✫❝✫✠✶➌✙å➈é☎ù✺ê✤ø✠✂➈é✎å✠è✧ÿ☎ä✧ï➳ú➈ï✖ä✧ø✙➞✥➊å➄è✧ø➀ì➈é ✞ ❝✦✙✆✠⑥ð ✬ ✛❊✢ ➳❺➸➑➳❯➺❩✭✝✆ ã✛ç✙ø➀ê➉ê✤ï★↔è✧ø➀ì➈é✫ù✂ï❞ê❺➊ä✧ø✠✡◆ï✖ê❨ø➀é✺î❭ì❛ä✧ï✌ù✂ï➊è✥å➈ø➑û❀è✧ç✙ï➞å➈ä❘❿ç✙ø➑è✧ï★↔è✥ÿ✙ä✧ï➤ì➈ë ✗✝ï❞ñ➔ï➊è❇✝ ✙✦➌❨è✥ç✙ï①☞✇ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û➳ñ➔ï➊ÿ☎ä✥å➈û♣ñ➔ï➊è④ó✇ì❛ä✧ô➘ÿ☎ê✧ï✖ù✻ø➑é✲è✧ç✙ï ï❯↔✂æ◆ï➊ä✥ø➑î➻ï✖é❛è❿ê➊ð ✗❀ï✖ñ➔ï➊è❇✝ ✙➉➊ì➈î➻æ✙ä✥ø➀ê✧ï✖ê ✔➞û➀å✪✘❛ï➊ä❿ê✄➌✐é✙ì➄è✩❶ì❛ÿ✙é✐è✧ø➀é❼✂❙è✧ç✙ï ø➀é✙æ✙ÿ✂è★➌☎å➄û➀û✟ì➄ë❀ó❨ç✙ø✁❿ç➙➊ì➈é✐è✥å➈ø➑é✶è✧ä❿å➄ø➀é☎å❄✡☎û➑ï➳æ☎å➄ä❿å➄î➻ï➊è✧ï➊ä❿ê✬➪⑨ó✇ï✖ø✙✂❛ç✐è✥ê✴➶↔ð ã✛ç✙ï➏ø➀é✙æ✙ÿ✙è❀ø➀ê❀å❙❜ ✑✪↔ ❜✵✑❫æ☎ø➟↔✂ï➊û➈ø➑î➓å❄✂❛ï➈ð❀ã✛ç☎ø➀ê❀ø➀ê✝ê✧ø✠✂➈é✙ø✙➞✥➊å➈é✐è✧û✠✘➉û✓å➄ä❘✂➈ï➊ä è✧ç✎å➄é✺è✧ç☎ï➞û➀å➈ä❺✂❛ï✖ê✤è✬❿ç☎å➄ä❿å✑❶è✧ï✖ä❨ø➑é✺è✥ç✙ï✒ù✙å➄è✥å❄✡✎å➈ê✧ï➣➪⑨å✠è♣î❭ì✐ê④è✒✑❆✘♦↔✤✑✻✘ æ✙ø✙↔✂ï➊û✓ê➓❶ï✖é❛è✥ï➊ä✥ï✖ù❺ø➑é å❈✑✻✺♦↔✤✑❆✺ ➞☎ï➊û✓ù✗➶❶ð✶ã✛ç✙ï➓ä✥ï✖å➈ê✧ì➈é ø✓ê➤è✧ç☎å➄è➞ø➩è➞ø✓ê ù✂ï❞ê✤ø➀ä✥å✑✡✙û➀ï✌è✧ç☎å➄è♣æ◆ì➄è✥ï➊é✐è✧ø✓å➄û❦ù✂ø✓ê④è✥ø➑é✗❶è✧ø➀ú➈ï✒ëíï✖å✠è✥ÿ✙ä✥ï✖ê➉ê✧ÿ✗❿ç➲å❛ê♣ê✤è✧ä✥ì➈ô❛ï ï➊é✎ù☛✝♠æ◆ì➈ø➀é✐è✥ê➵ì➈ä✎❶ì❛ä✧é✙ï✖ä✎✖å➄é✶å➈æ✙æ◆ï✖å➄ä❊✣●➨✆➺➭➥❼➳ ❄✴➳✄➨✗➺r➳❯➡❣ì➈ë✝è✧ç☎ï♣ä✥ï✒➊ï➊æ✦✝ è✧ø➀ú➈ï✞➞☎ï✖û➀ù✿ì➈ë❀è✧ç☎ï✌ç✙ø✙✂❛ç✙ï✖ê✤è❇✝⑥û➀ï➊ú➈ï✖û◆ëíï✖å✠è✥ÿ✙ä✥ï➤ù✙ï❶è✧ï★↔è✥ì➈ä❿ê➊ð❨➏➠é❖✗✝ï❞ñ➔ï❶è❺✝✝✙ è✧ç☎ï♣ê✧ï❶è✇ì➈ë✏❶ï➊é✐è✥ï➊ä❿ê➏ì➄ë☛è✥ç✙ï➉ä✥ï✒➊ï➊æ✂è✥ø➑ú❛ï✩➞☎ï✖û➀ù✙ê✇ì➄ë✟è✥ç✙ï➉û✓å➈ê✤è✎➊ì➈é➇ú➈ì❛û➑ÿ✦✝ è✧ø➀ì➈é✎å➄û✂û✓å✪✘➈ï➊ä✛➪❖☞❀❜❼➌➈ê✧ï➊ï✩✡◆ï➊û➀ì✠ó✬➶❀ëíì➈ä✥î➶å✹✑❆✘❄↔✤✑❆✘➳å➄ä✥ï✖å♣ø➑é❭è✧ç☎ï✬❶ï✖é❛è✥ï➊ä ì➄ë❀è✧ç✙ï✹❜ ✑♦↔ ❜ ✑➤ø➀é✙æ✙ÿ✂è❞ð❫ã✛ç✙ï➳ú✠å➄û➀ÿ✙ï✖ê➵ì➄ë❇è✧ç✙ï➳ø➀é✙æ✙ÿ✂è➔æ✙ø✙↔✂ï➊û✓ê✛å➄ä✥ï➉é✙ì❛ä❇✝ î➓å➄û➀ø✙➽✖ï✖ù❺ê✤ì✢è✧ç☎å➄è➤è✧ç✙ï➝✡☎å➎❿ô❩✂❛ä✧ì❛ÿ✙é☎ù✫û➀ï➊ú❛ï➊û✩➪íó❨ç☎ø➩è✥ï★➶t❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✙ê è✧ì✫å✿ú✠å➄û➀ÿ✙ï➻ì➄ë➃✝ ✘☎ð✙➾❭å➄é☎ù❖è✧ç✙ï➻ëíì❛ä✧ï✒✂➈ä✥ì➈ÿ✙é✎ù✢➪➭✡✙û✓å✑❿ô❼➶✞❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✙ê è✧ì①➾❛ð✙➾✍✔✬✙✙ð➻ã✛ç☎ø➀ê✌î➓å➈ô➈ï✖ê➳è✥ç✙ï➓î➻ï✖å➈é ø➀é✙æ✙ÿ✙è✒ä✧ì❛ÿ❼✂➈ç☎û✙✘✦✘✗➌❇å➈é☎ù❺è✧ç✙ï ú✠å➄ä✥ø➀å➈é✗❶ï➳ä✥ì➈ÿ❼✂❛ç✙û✙✘➔➾➳ó❨ç☎ø✠❿ç✺å✑✒❶ï➊û➀ï➊ä❿å✠è✥ï✖ê➵û➀ï✖å➄ä✥é✙ø➀é❼✂❖✞ ❝ ✓ ✠⑥ð ➏➠é➤è✧ç✙ï❫ëíì➈û➀û➑ì✠ó❨ø➀é❼✂✗➌✪❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û✠û✓å✪✘➈ï✖ä✥ê✝å➈ä✧ï❣û✓å❄✡◆ï➊û➀ï✖ù➉☞➜↔❢➌✠ê✧ÿ❼✡✦✝ ê✥å➄î➻æ✙û➀ø➑é❼✂✢û➀å✪✘❛ï➊ä❿ê➉å➈ä✧ï➞û✓å❄✡◆ï➊û➀ï✖ù þ❩↔❢➌☛å➄é☎ù➲ëíÿ✙û➑û✠✘❩✝❖➊ì➈é✙é☎ï✒↔è✥ï✖ù✫û✓å✪✘➈ï✖ä✥ê å➄ä✥ï➤û➀å✑✡✎ï✖û➑ï❞ù➙✜❳↔❢➌✙ó❨ç✙ï✖ä✧ï✞↔✿ø➀ê➵è✥ç✙ï✌û➀å✪✘❛ï➊ä✛ø➀é☎ù✂ï❯↔☛ð ✗❀å✪✘➈ï✖ä➑☞④➾➽ø✓ê✒å✆❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û❣û✓å✪✘➈ï✖ä➞ó❨ø➩è✥ç✮✓✺ëíï✖å➄è✧ÿ✙ä✥ï➓î➓å➄æ☎ê✖ð ✭❫å➎❿ç➻ÿ✙é✙ø➑è➵ø➀é➓ï✖å➎❿ç➻ëíï✖å✠è✥ÿ✙ä✥ï➔î➓å➄æ➽ø✓ê➜➊ì➈é✙é✙ï★↔è✥ï✖ù➻è✥ì✒å ✙✪↔✤✙➤é☎ï➊ø✠✂➈ç✦✝ ✡◆ì➈ä✥ç✙ì➇ì➇ù✒ø➑é✒è✧ç✙ï✛ø➀é✙æ✙ÿ✂è❞ð❯ã✛ç☎ï✛ê✧ø✠➽➊ï❫ì➈ë☎è✥ç✙ï❫ëíï❞å✠è✥ÿ✙ä✧ï➵î➓å➄æ☎ê❯ø✓ê ✑❆✺♦↔✤✑✻✺ ó❨ç✙ø✁❿ç❖æ✙ä✥ï➊ú❛ï➊é✐è✥êt➊ì➈é✙é✙ï★↔è✥ø➑ì❛é➲ëíä✥ì➈î è✧ç☎ï❭ø➀é✙æ✙ÿ✙è✌ëíä✧ì❛î ë⑨å➄û➀û➑ø➀é❼✂✿ì❄➘ è✧ç☎ï➑✡◆ì➈ÿ✙é✎ù✙å➄ä❘✘➈ð➓☞④➾➑➊ì➈é✐è✥å➈ø➑é☎ê➉➾ ✙✻✓➻è✧ä❿å➄ø➀é☎å❄✡☎û➑ï❙æ☎å➄ä❿å➄î➻ï❶è✥ï➊ä❿ê✄➌◆å➄é☎ù ➾ ✑ ✑✦➌ ❜✻✘✫❝➑❶ì❛é✙é✙ï★↔è✧ø➀ì➈é✎ê➊ð ✗❀å✪✘➈ï✖ä➔þ ✑❭ø➀ê➉å➓ê✤ÿ❼✡❼✝⑥ê✥å➄î➻æ✙û➀ø➑é✗✂❭û✓å✪✘➈ï✖ä❨ó❨ø➩è✥ç ✓❙ëíï✖å✠è✥ÿ✙ä✥ï✌î➻å➈æ☎ê❨ì➈ë ê✧ø✙➽✖ï✞➾✖❝❄↔❢➾❪❝✎ð★✭❫å➎❿ç✒ÿ✙é☎ø➩è❣ø➀é❭ï❞å✑❿ç✒ëíï✖å✠è✥ÿ✙ä✥ï➵î➓å➈æ❙ø✓ê❷❶ì❛é✙é✙ï✒❶è✧ï❞ù➞è✥ì➳å ✑✪↔✤✑➤é☎ï➊ø✠✂➈ç☛✡✎ì❛ä✧ç☎ì✐ì✂ù➻ø➀é➓è✧ç☎ï④❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✂ø➀é❼✂➤ëíï❞å✠è✧ÿ☎ä✧ï♣î➻å➈æ➓ø➑é✫☞④➾➈ð ã✛ç✙ï➳ëíì➈ÿ☎ä❨ø➑é✙æ☎ÿ✂è✥ê✛è✥ì➓å✒ÿ✙é☎ø➩è➔ø➀é➲þ ✑✒å➈ä✧ï✌å❛ù✙ù✂ï✖ù✏➌➇è✧ç✙ï✖é✢î❙ÿ✙û➩è✥ø➑æ☎û➑ø➀ï✖ù ✡☛✘ å✫è✧ä❿å➄ø➀é☎å✑✡✙û➑ï➐❶ì➇ï❈✯➣➊ø➑ï✖é❛è★➌➏å➄é☎ù❑å➈ù✙ù✙ï✖ù➷è✥ì å➲è✧ä❿å➄ø➀é☎å❄✡✙û➀ï➣✡✙ø✓å➈ê✖ð ã✛ç✙ï➲ä✥ï✖ê✧ÿ✙û➑è➽ø➀ê✶æ☎å➈ê✥ê✧ï✖ù❑è✧ç✙ä✥ì➈ÿ✗✂➈ç❍å ê✤ø✠✂➈î➻ì❛ø➀ù✙å➈û✛ëíÿ✙é✗❶è✧ø➀ì➈é✝ð ã✛ç✙ï ✑✪↔✤✑✶ä✧ï★❶ï✖æ✂è✧ø➀ú➈ï➑➞☎ï➊û✓ù✙ê✌å➈ä✧ï❙é✙ì➈é✦✝⑥ì✠ú➈ï✖ä✧û✓å➄æ☎æ✙ø➑é✗✂✗➌✎è✧ç✙ï✖ä✧ï➊ëíì➈ä✥ï✒ëíï❞å✠è✧ÿ☎ä✧ï î➓å➄æ☎ê✒ø➑é❤þ✤✑✺ç☎årú➈ï➻ç☎å➈û➩ë✛è✥ç✙ï✶é➇ÿ✙î➑✡✎ï✖ä✒ì➄ë✛ä✥ì✠ó➔ê➞å➄é☎ù①➊ì➈û➀ÿ✙î➻é å➈ê ëíï✖å➄è✧ÿ✙ä✥ï➤î➓å➄æ☎ê❨ø➀é➔☞④➾➈ð ✗❇å✪✘➈ï➊ä❨þ ✑❭ç☎å❛êt➾ ✑✒è✧ä❿å➄ø➀é☎å✑✡✙û➑ï➤æ☎å➈ä✥å➈î➻ï❶è✧ï✖ä✥ê å➄é✎ù✳✙❼➌ ✺✗✺✻✘➚❶ì➈é☎é✙ï✒❶è✧ø➀ì➈é☎ê✖ð ✗❀å✪✘➈ï✖ä✞☞❀❜➻ø✓ê♣å➣➊ì➈é➇ú➈ì❛û➑ÿ✙è✧ø➀ì➈é☎å➈û☛û➀å✪✘❛ï➊ä➔ó❨ø➑è✧ç✢➾✒✓❭ëíï✖å➄è✧ÿ✙ä✥ï➞î➓å➄æ☎ê✖ð ✭❫å➎❿ç✺ÿ✙é✙ø➑è➉ø➑é✫ï❞å✑❿ç✿ëíï✖å➄è✧ÿ✙ä✥ï➞î➓å➄æ✺ø✓ê✬➊ì➈é✙é✙ï★↔è✥ï✖ù✢è✧ì✶ê✧ï➊ú❛ï➊ä❿å➄û ✙♦↔ ✙ é✙ï✖ø✙✂❛ç❩✡◆ì➈ä✥ç✙ì➇ì✂ù✙ê✺å✠è✫ø✓ù✂ï➊é✐è✧ø✁➊å➈û➤û➑ì✦➊å➄è✧ø➀ì➈é☎ê✺ø➀é å❑ê✧ÿ❼✡☎ê✧ï❶è➲ì➈ë➻þ ✑❂❁ ê ëíï✖å➄è✧ÿ✙ä✥ï✿î➻å➈æ☎ê✖ð❑ã❯å❄✡✙û➀ï➙➏❭ê✤ç☎ì✠ó➔ê✒è✥ç✙ï✫ê✤ï➊è❭ì➈ë➤þ ✑➲ëíï✖å➄è✧ÿ✙ä✥ï✿î➻å➈æ☎ê

(leCun et al., 1998)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 13 / 63

slide-24
SLIDE 24

AlexNet (Krizhevsky et al., 2012)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 14 / 63

slide-25
SLIDE 25

GoogLeNet

input Conv 7x7+2(S) MaxPool 3x3+2(S) LocalRespNorm Conv 1x1+1(V) Conv 3x3+1(S) LocalRespNorm MaxPool 3x3+2(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) MaxPool 3x3+2(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) AveragePool 5x5+3(V) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) AveragePool 5x5+3(V) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) MaxPool 3x3+2(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) AveragePool 7x7+1(V) FC Conv 1x1+1(S) FC FC SoftmaxActivation softmax0 Conv 1x1+1(S) FC FC SoftmaxActivation softmax1 SoftmaxActivation softmax2

(Szegedy et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 15 / 63

slide-26
SLIDE 26

Resnet

7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000 image

(He et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 16 / 63

slide-27
SLIDE 27

Deep learning is built on a natural generalization of a neural network: a graph

  • f tensor operators, taking advantage of
  • the chain rule (aka “back-propagation”),
  • stochastic gradient decent,
  • convolutions,
  • parallel operations on GPUs.

This does not differ much from networks from the 90s

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 17 / 63

slide-28
SLIDE 28

This generalization allows to design complex networks of operators dealing with images, sound, text, sequences, etc. and to train them end-to-end. (Yeung et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 18 / 63

slide-29
SLIDE 29

CIFAR10 32 × 32 color images, 50k train samples, 10k test samples. (Krizhevsky, 2009, chap. 3)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 19 / 63

slide-30
SLIDE 30

Performance on CIFAR10

75 80 85 90 95 100 2010 2012 2014 2016 2018

Krizhevsky et al. (2012) Graham (2015) Human performance Real et al. (2018)

Accuracy (%) Year

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 20 / 63

slide-31
SLIDE 31

ImageNet Large Scale Visual Recognition Challenge. 1000 categories, > 1M images (http://image-net.org/challenges/LSVRC/2014/browse-synsets)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 21 / 63

slide-32
SLIDE 32

ImageNet Large Scale Visual Recognition Challenge. 1000 categories, > 1M images (http://image-net.org/challenges/LSVRC/2014/browse-synsets)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 21 / 63

slide-33
SLIDE 33

ImageNet Large Scale Visual Recognition Challenge. 1000 categories, > 1M images (http://image-net.org/challenges/LSVRC/2014/browse-synsets)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 21 / 63

slide-34
SLIDE 34

method top-1 err. top-5 err.

VGG [41] (ILSVRC’14)

  • 8.43†

GoogLeNet [44] (ILSVRC’14)

  • 7.89

VGG [41] (v5) 24.4 7.1 PReLU-net [13] 21.59 5.71 BN-inception [16] 21.99 5.81 ResNet-34 B 21.84 5.71 ResNet-34 C 21.53 5.60 ResNet-50 20.74 5.25 ResNet-101 19.87 4.60 ResNet-152 19.38 4.49 Table 4. Error rates (%) of single-model results on the ImageNet validation set (except † reported on the test set).

method

top-5 err. (test) VGG [41] (ILSVRC’14) 7.32 GoogLeNet [44] (ILSVRC’14) 6.66 VGG [41] (v5) 6.8 PReLU-net [13] 4.94 BN-inception [16] 4.82 ResNet (ILSVRC’15) 3.57 Table 5. Error rates (%) of ensembles. The top-5 error is on the test set of ImageNet and reported by the test server.

(He et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 22 / 63

slide-35
SLIDE 35

Current application domains

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 23 / 63

slide-36
SLIDE 36

Object detection and segmentation (Pinheiro et al., 2016)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 24 / 63

slide-37
SLIDE 37

Human pose estimation (Wei et al., 2016)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 25 / 63

slide-38
SLIDE 38

Image generation (Radford et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 26 / 63

slide-39
SLIDE 39

Reinforcement learning Self-trained, plays 49 games at human level. (Mnih et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 27 / 63

slide-40
SLIDE 40

Strategy games March 2016, 4-1 against a 9-dan professional without handicap. (Silver et al., 2016)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 28 / 63

slide-41
SLIDE 41

Translation

“The reason Boeing are doing this is to cram more seats in to make their plane more competitive with our products,” said Kevin Keniston, head of passenger comfort at Europe’s Airbus.

“La raison pour laquelle Boeing fait cela est de cr´ eer plus de si` eges pour rendre son avion plus comp´ etitif avec nos produits”, a d´ eclar´ e Kevin Keniston, chef du confort des passagers chez Airbus. When asked about this, an official of the American administration replied: “The United States is not conducting electronic surveillance aimed at offices

  • f the World Bank and IMF in Washington.”

Interrog´ e ` a ce sujet, un fonctionnaire de l’administration am´ ericaine a r´ epondu: “Les ´ Etats-Unis n’effectuent pas de surveillance ´ electronique ` a l’intention des bureaux de la Banque mondiale et du FMI ` a Washington”

(Wu et al., 2016)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 29 / 63

slide-42
SLIDE 42

Auto-captioning (Vinyals et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 30 / 63

slide-43
SLIDE 43

Question answering

I: Jane went to the hallway. I: Mary walked to the bathroom. I: Sandra went to the garden. I: Daniel went back to the garden. I: Sandra took the milk there. Q: Where is the milk? A: garden I: It started boring, but then it got interesting. Q: What’s the sentiment? A: positive

(Kumar et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 31 / 63

slide-44
SLIDE 44

Why does it work now?

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 32 / 63

slide-45
SLIDE 45

The success of deep learning is multi-factorial:

  • Five decades of research in machine learning,
  • CPUs/GPUs/storage developed for other purposes,
  • lots of data from “the internet”,
  • tools and culture of collaborative and reproducible science,
  • resources and efforts from large corporations.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 33 / 63

slide-46
SLIDE 46

Five decades of research in ML provided

  • a taxonomy of ML concepts (classification, generative models, clustering,

kernels, linear embeddings, etc.),

  • a sound statistical formalization (Bayesian estimation, PAC),
  • a clear picture of fundamental issues (bias/variance dilemma, VC

dimension, generalization bounds, etc.),

  • a good understanding of optimization issues,
  • efficient large-scale algorithms.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 34 / 63

slide-47
SLIDE 47

From a practical perspective, deep learning

  • lessens the need for a deep mathematical grasp,
  • makes the design of large learning architectures a system/software

development task,

  • allows to leverage modern hardware (clusters of GPUs),
  • does not plateau when using more data,
  • makes large trained networks a commodity.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 35 / 63

slide-48
SLIDE 48

10-3 100 103 106 109 1012 1960 1970 1980 1990 2000 2010 2020 Flops per USD

(Wikipedia “FLOPS”)

TFlops (1012) Price GFlops per $ Intel i7-6700K 0.2 $344 0.6 AMD Radeon R-7 240 0.5 $55 9.1 NVIDIA GTX 750 Ti 1.3 $105 12.3 AMD RX 480 5.2 $239 21.6 NVIDIA GTX 1080 8.9 $699 12.7

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 36 / 63

slide-49
SLIDE 49

103 106 109 1012 1980 1990 2000 2010 2020 Bytes per USD

(John C. McCallum) The typical cost of a 4Tb hard disk is $120 (Dec 2016).

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 37 / 63

slide-50
SLIDE 50

A l e x N e t B N

  • A

l e x N e t B N

  • N

I N G

  • g

L e N e t R e s N e t

  • 1

8 V G G

  • 1

6 V G G

  • 1

9 R e s N e t

  • 3

4 R e s N e t

  • 5

R e s N e t

  • 1

1 I n c e p t i

  • n
  • v

3

50 55 60 65 70 75 80 Top-1 accuracy [%] 5 10 15 20 25 30 35 40 Operations [G-Ops] 50 55 60 65 70 75 80 Top-1 accuracy [%]

AlexNet BN-AlexNet BN-NIN ResNet-18 VGG-16 VGG-19 GoogLeNet ResNet-34 ResNet-50 ResNet-101 Inception-v3 5M 35M 65M 95M 125M 155M

1 2 4 8 16 32 64 Batch size [ / ] 100 200 300 400 500 600 Foward time per image [ms] BN-NIN GoogLeNet Inception-v3 AlexNet BN-AlexNet VGG-16 VGG-19 ResNet-18 ResNet-34 ResNet-50 ResNet-101 1 2 4 8 16 32 64 Batch size [ / ] 5 10 20 50 100 200 500 Foward time per image [ms] BN-NIN GoogLeNet Inception-v3 AlexNet BN-AlexNet VGG-16 VGG-19 ResNet-18 ResNet-34 ResNet-50 ResNet-101

(Canziani et al., 2016)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 38 / 63

slide-51
SLIDE 51

Data-set Year

  • Nb. images

Resolution

  • Nb. classes

MNIST 1998 6.0 × 104 28 × 28 10 NORB 2004 4.8 × 104 96 × 96 5 Caltech 101 2003 9.1 × 103 ≃ 300 × 200 101 Caltech 256 2007 3.0 × 104 ≃ 640 × 480 256 LFW 2007 1.3 × 104 250 × 250 – CIFAR10 2009 6.0 × 104 32 × 32 10 PASCAL VOC 2012 2.1 × 104 ≃ 500 × 400 20 MS-COCO 2015 2.0 × 105 ≃ 640 × 480 91 ImageNet 2016 14.2 × 106 ≃ 500 × 400 21, 841 Cityscape 2016 25 × 103 2, 000 × 1000 30

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 39 / 63

slide-52
SLIDE 52

“Quantity has a Quality All Its Own.” (Thomas A. Callaghan Jr.)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 40 / 63

slide-53
SLIDE 53

Implementing a deep network, PyTorch

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 41 / 63

slide-54
SLIDE 54

Deep-learning development is usually done in a framework:

Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD

  • U. of Montreal

Caffe C++ BSD 2 clauses

  • U. of CA, Berkeley

A fast, low-level, compiled backend to access computation devices, combined with a slow, high-level, interpreted language.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 42 / 63

slide-55
SLIDE 55

We will use the PyTorch framework for our experiments. http://pytorch.org

“PyTorch is a python package that provides two high-level features:

  • Tensor computation (like numpy) with strong GPU acceleration
  • Deep Neural Networks built on a tape-based autograd system

You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed.”

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 43 / 63

slide-56
SLIDE 56

MNIST data-set 28 × 28 grayscale images, 60k train samples, 10k test samples. (leCun et al., 1998)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 44 / 63

slide-57
SLIDE 57

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range (0, nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 45 / 63

slide-58
SLIDE 58

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range (0, nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 45 / 63

slide-59
SLIDE 59

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range (0, nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 45 / 63

slide-60
SLIDE 60

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range (0, nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 45 / 63

slide-61
SLIDE 61

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range (0, nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 45 / 63

slide-62
SLIDE 62

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range (0, nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 45 / 63

slide-63
SLIDE 63

What is really happening?

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 46 / 63

slide-64
SLIDE 64

(Zeiler and Fergus, 2014)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 47 / 63

slide-65
SLIDE 65

(Zeiler and Fergus, 2014)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 48 / 63

slide-66
SLIDE 66

(Google’s Deep Dreams)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 49 / 63

slide-67
SLIDE 67

(Google’s Deep Dreams)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 50 / 63

slide-68
SLIDE 68

(Thorne Brandt)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 51 / 63

slide-69
SLIDE 69

(Duncan Nicoll)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 52 / 63

slide-70
SLIDE 70

(Szegedy et al., 2014) (Nguyen et al., 2015)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 53 / 63

slide-71
SLIDE 71

Relations with the biology

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 54 / 63

slide-72
SLIDE 72

LN LN ... LN LN ... LN LN LN LN LN LN Stimulus

a b c

Encoding Decoding Neurons Behavior RGC LGN V2 V4 V1 DOG ? ? ? PIT CIT AIT

...

Φ1 Φ2 Φk ⊗ ⊗ ⊗ Operations in linear-nonlinear layer Filter Threshold Pool Normalize ... ... ... Spatial convolution

  • ver image input

100-ms visual presentation Pixels LN PIT V2 V4 V1 CIT AIT T(•)

Figure 1 HCNNs as models of sensory

  • cortex. (a) The basic framework in which

sensory cortex is studied is one of encoding—the process by which stimuli are transformed into patterns of neural activity—and decoding, the process by which neural activity generates

  • behavior. HCNNs have been used to make models of the encoding step; that is, they describe

6

(Yamins and DiCarlo, 2016)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 55 / 63

slide-73
SLIDE 73

b

HCNN top hidden layer response prediction IT neural response Test images (sorted by category) IT site 56

a

r = . 8 7 ± . 1 5 H C N N m

  • d

e l s 0.6 1.0 50 IT single-site neural predictivity (% explained variance) HMO (top hidden layer) V2-like HMAX PLOS09 SIFT V1-like Pixels Category ideal

  • bserver

Categorization performance (balanced accuracy)

d

HCNN model Human IT (fMRI) Animate Human Not human Body Face Body Face Natural Artificial Inanimate

A = 0.38

1 2 3 4 1 2 3 4

c

Monkey V4 (n = 128) Monkey IT (n = 168) Ideal

  • bservers

Control models HCNN layers Control models Ideal

  • bservers

HCNN layers Pixels V1-like Category All variables PLOS09 HMAX V2-Like SIFT Pixels V1-Like PLOS09 HMAX V2-like SIFT 50 50 Single-site neural predictivity (% explained variance)

** **** **** **** **** ****

e

0.2 0.4 0.2 0.4 Human V1–V3 Human IT RDM voxel correlation (Kendall’s A) Scores Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Convolutional Fully connected

**** **** **** * **** **** **** ****

SVM Geometry- supervised

**** τ 6

(Yamins and DiCarlo, 2016)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 56 / 63

slide-74
SLIDE 74

Species

  • Nb. neurons
  • Nb. synapses

Roundworm 302 7.5 × 103 Jellyfish 800 Sea slug 1.8 × 104 Fruit fly 1.0 × 105 1.0 × 107 Ant 2.5 × 105 Cockroach 1.0 × 106 Frog 1.6 × 107 Mouse 7.1 × 107 1.0 × 1011 Rat 2.0 × 108 4.5 × 1011 Octopus 3.0 × 108 Human 8.6 × 1010 1.0 × 1015

(Wikipedia “List of animals by number of neurons”)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 57 / 63

slide-75
SLIDE 75

Device

  • Nb. transistors

Intel i7 Haswell-E (8 cores) 2.6 × 109 Intel Xeon Broadwell-E5 (22 cores) 7.2 × 109 AMD Epyc (32 cores) 19.2 × 109 Nvidia GeForce GTX 1080 7.2 × 109 AMD Vega 10 12.5 × 109 NVidia GV100 21.1 × 109 (Wikipedia “Transistor count”)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 58 / 63

slide-76
SLIDE 76

103 106 109 1012 1015 1018 1960 1970 1980 1990 2000 2010 2020

  • Nb. human synapses
  • Nb. mouse synapses
  • Nb. fruit fly synapses
  • Nb. Transistors

Number of transistors per CPU/GPU CPUs GPUs

(Wikipedia “Transistor count”)

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 59 / 63

slide-77
SLIDE 77

Plan, pre-requisites and grading

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 60 / 63

slide-78
SLIDE 78

Lecture content:

  • 1. Introduction.
  • 2. Standard machine-learning concepts and tools.
  • 3. Multi-layer perceptrons, back-prop, stochastic gradient descent.
  • 4. Convolutional networks, arbitrary graphs of operators.
  • 5. Initialization, optimization, and regularization.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 61 / 63

slide-79
SLIDE 79

Lecture content:

  • 1. Introduction.
  • 2. Standard machine-learning concepts and tools.
  • 3. Multi-layer perceptrons, back-prop, stochastic gradient descent.
  • 4. Convolutional networks, arbitrary graphs of operators.
  • 5. Initialization, optimization, and regularization.
  • 6. Going deeper.
  • 7. Deep models for Computer Vision.
  • 8. Analysis of deep models.
  • 9. Auto-encoders, embeddings, and generative models.
  • 10. Generative adversarial networks.
  • 11. Recurrent models, memory networks, NLP.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 61 / 63

slide-80
SLIDE 80

Lecture content:

  • 1. Introduction.
  • 2. Standard machine-learning concepts and tools.
  • 3. Multi-layer perceptrons, back-prop, stochastic gradient descent.
  • 4. Convolutional networks, arbitrary graphs of operators.
  • 5. Initialization, optimization, and regularization.
  • 6. Going deeper.
  • 7. Deep models for Computer Vision.
  • 8. Analysis of deep models.
  • 9. Auto-encoders, embeddings, and generative models.
  • 10. Generative adversarial networks.
  • 11. Recurrent models, memory networks, NLP.
  • 12. Invited speaker (Soumith Chintala, Facebook).
  • 13. Invited lecture (Andreas Steiner, Google).
  • 14. Invited lecture (Andreas Steiner, Google).

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 61 / 63

slide-81
SLIDE 81

Pre-requisites:

  • Linear algebra (vector and Euclidean spaces),
  • differential calculus (gradient, Jacobian, Hessian, chain rule),
  • Python programming

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 62 / 63

slide-82
SLIDE 82

Pre-requisites:

  • Linear algebra (vector and Euclidean spaces),
  • differential calculus (gradient, Jacobian, Hessian, chain rule),
  • Python programming,

... but there is more!

  • basics in probabilities and statistics (discrete and continuous distributions,

law of large numbers, conditional probabilities, Bayes, PCA),

  • basics in optimization (notion of minima, gradient descent),
  • basics in algorithmic (computational costs),
  • basics in signal processing (Fourier transform, wavelets).

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 62 / 63

slide-83
SLIDE 83

The evaluation will be: 50% – One mini-project, by groups of one to three students. Group report and source code, 5 min oral for each student/project. 50% – Final written exam.

Fran¸ cois Fleuret EE-559 – Deep learning / 1a. Introduction 63 / 63

slide-84
SLIDE 84

The end

slide-85
SLIDE 85

References

  • A. Canziani, A. Paszke, and E. Culurciello. An analysis of deep neural network models for

practical applications. CoRR, abs/1605.07678, 2016.

  • K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of

pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193–202, April 1980.

  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR,

abs/1512.03385, 2015.

  • A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis,

Department of Computer Science, University of Toronto, 2009.

  • A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional

neural networks. In Neural Information Processing Systems (NIPS), 2012.

  • A. Kumar, O. Irsoy, J. Su, J. Bradbury, R. English, B. Pierce, P. Ondruska, I. Gulrajani,

and R. Socher. Ask me anything: Dynamic memory networks for natural language

  • processing. CoRR, abs/1506.07285, 2015.
  • Y. leCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to

document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

  • W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity.

The bulletin of mathematical biophysics, 5(4):115–133, 1943.

slide-86
SLIDE 86
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,
  • M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik,
  • I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis.

Human-level control through deep reinforcement learning. Nature, 518(7540):529–533,

  • Feb. 2015.
  • A. M. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High

confidence predictions for unrecognizable images. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

  • P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Doll´
  • ar. Learning to refine object segments.

In European Conference on Computer Vision (ECCV), pages 75–91, 2016.

  • A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep

convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.

  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Neurocomputing: Foundations of

Research, chapter Learning Representations by Back-propagating Errors, pages 696–699. MIT Press, 1988.

  • D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche,
  • J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe,
  • J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu,
  • T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and

tree search. Nature, 529:484–503, 2016.

  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus.

Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.

  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,

and A. Rabinovich. Going deeper with convolutions. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

slide-87
SLIDE 87
  • O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption
  • generator. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  • S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. Convolutional pose machines. CoRR,

abs/1602.00134, 2016.

  • Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao,
  • Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws,
  • Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young,
  • J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean.

Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144, 2016.

  • D. L. K. Yamins and J. J. DiCarlo. Using goal-driven deep learning models to understand

sensory cortex. Nature neuroscience, 19:356–65, Feb 2016.

  • S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei. End-to-end learning of action detection

from frame glimpses in videos. CoRR, abs/1511.06984, 2015.

  • M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In

European Conference on Computer Vision (ECCV), 2014.