AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret - - PowerPoint PPT Presentation

amld deep learning in pytorch 1 introduction
SMART_READER_LITE
LIVE PREVIEW

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret - - PowerPoint PPT Presentation

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret http://fleuret.org/amld/ February 10, 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE Why learning Fran cois Fleuret AMLD Deep Learning in PyTorch / 1.


slide-1
SLIDE 1

AMLD – Deep Learning in PyTorch

  • 1. Introduction

Fran¸ cois Fleuret http://fleuret.org/amld/ February 10, 2018

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

slide-2
SLIDE 2

Why learning

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 2 / 57

slide-3
SLIDE 3

Many applications require the automatic extraction of “refined” information from raw signal (e.g. image recognition, automatic speech processing, natural language processing, robotic control, geometry reconstruction). (ImageNet)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 3 / 57

slide-4
SLIDE 4

Our brain is so good at interpreting visual information that the “semantic gap” is hard to assess intuitively.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 4 / 57

slide-5
SLIDE 5

Our brain is so good at interpreting visual information that the “semantic gap” is hard to assess intuitively. This: is a horse

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 4 / 57

slide-6
SLIDE 6

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 4 / 57

slide-7
SLIDE 7

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 4 / 57

slide-8
SLIDE 8

>>> from torchvision import datasets >>> cifar = datasets.CIFAR10 (’./ data/cifar10/’, train=True , download=True) Files already downloaded and verified >>> x = torch.from_numpy (cifar. train_data )[43]. transpose (2, 0).transpose (1, 2) >>> x.size () torch.Size ([3, 32, 32]) >>> x.narrow (1, 0, 4).narrow (2, 0, 12) (0 ,.,.) = 99 98 100 103 105 107 108 110 114 115 117 118 100 100 102 105 107 109 110 112 115 117 119 120 104 104 106 109 111 112 114 116 119 121 123 124 109 109 111 113 116 117 118 120 123 124 127 128 (1 ,.,.) = 166 165 167 169 171 172 173 175 176 178 179 181 166 164 167 169 169 171 172 174 176 177 179 180 169 167 170 171 171 173 174 176 178 179 182 183 170 169 172 173 175 176 177 178 179 181 183 184 (2 ,.,.) = 198 196 199 200 200 202 203 204 205 206 208 209 195 194 197 197 197 199 200 201 202 203 206 207 197 195 198 198 198 199 201 202 203 204 206 207 197 196 199 198 198 199 200 201 203 204 207 208 [torch. ByteTensor

  • f size 3x4x12]

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 5 / 57

slide-9
SLIDE 9

Extracting semantic automatically requires models of extreme complexity, which cannot be designed by hand. Techniques used in practice consist of

  • 1. defining a parametric model, and
  • 2. optimizing its parameters by “making it work” on training data.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 6 / 57

slide-10
SLIDE 10

Extracting semantic automatically requires models of extreme complexity, which cannot be designed by hand. Techniques used in practice consist of

  • 1. defining a parametric model, and
  • 2. optimizing its parameters by “making it work” on training data.

This is similar to biological systems for which the model (e.g. brain structure) is DNA-encoded, and parameters (e.g. synaptic weights) are tuned through experiences.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 6 / 57

slide-11
SLIDE 11

A simple example is linear regression.

80 100 120 140 160 180 200 220 240 10 20 30 40 50 60 70 80 Systolic Blood Pressure (mmHg) Age (years)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 7 / 57

slide-12
SLIDE 12

A simple example is linear regression.

80 100 120 140 160 180 200 220 240 10 20 30 40 50 60 70 80 Systolic Blood Pressure (mmHg) Age (years) Model Data

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 7 / 57

slide-13
SLIDE 13

A simple example is linear regression.

80 100 120 140 160 180 200 220 240 10 20 30 40 50 60 70 80 Systolic Blood Pressure (mmHg) Age (years) Model Data

Deep learning encompasses software technologies to scale-up to billions of model parameters and as many training examples.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 7 / 57

slide-14
SLIDE 14

From artificial neural networks to “Deep Learning”

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 8 / 57

slide-15
SLIDE 15

130

LOGICAL CALCULUS FOR NERVOUS ACTIVITY

b

e ~

~ 9 h

FIG~E 1

d f

Networks of “Threshold Logic Unit” (McCulloch and Pitts, 1943)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 9 / 57

slide-16
SLIDE 16

1949 – Donald Hebb proposes the Hebbian Learning principle. 1951 – Marvin Minsky creates the first ANN (Hebbian learning, 40 neurons).

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 10 / 57

slide-17
SLIDE 17

1949 – Donald Hebb proposes the Hebbian Learning principle. 1951 – Marvin Minsky creates the first ANN (Hebbian learning, 40 neurons). 1958 – Frank Rosenblatt creates a perceptron to classify 20 × 20 images.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 10 / 57

slide-18
SLIDE 18

1949 – Donald Hebb proposes the Hebbian Learning principle. 1951 – Marvin Minsky creates the first ANN (Hebbian learning, 40 neurons). 1958 – Frank Rosenblatt creates a perceptron to classify 20 × 20 images. 1959 – David H. Hubel and Torsten Wiesel’s demonstrate orientation selectivity and columnar organization in the cat’s visual cortex.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 10 / 57

slide-19
SLIDE 19

1949 – Donald Hebb proposes the Hebbian Learning principle. 1951 – Marvin Minsky creates the first ANN (Hebbian learning, 40 neurons). 1958 – Frank Rosenblatt creates a perceptron to classify 20 × 20 images. 1959 – David H. Hubel and Torsten Wiesel’s demonstrate orientation selectivity and columnar organization in the cat’s visual cortex. 1982 – Paul Werbos proposes back-propagation for ANNs.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 10 / 57

slide-20
SLIDE 20

Neocognitron

195

visuo[ oreo 9l< QSsOCiQtion oreo-- lower-order

  • -,. higher-order .-,. ~

.grandmother retino --,- LGB --,. simple ~ complex --,. hypercomplex hypercomplex " -- cell '~

F- 3 I-- . . . . l r I I I I 1 1

Uo ' , ~' Usl

  • ---->

Ucl t~-~i Us2~ Uc2 ~ Us3----* Uc3 T

[ I L ~ L J

  • Fig. 1. Correspondence

between the hierarchy model by Hubel and Wiesel, and the neural network of the neocognitron shifted in parallel from cell to cell. Hence, all the cells in a single cell-plane have receptive fields of the same function, but at different positions. We will use notations Us~(k~,n ) to represent the

  • utput of an S-cell in the krth S-plane in the l-th

module, and Ucl(k~, n) to represent the output of a C-cell in the krth C-plane in that module, where n is the two- dimensional co-ordinates representing the position of these cell's receptive fields in the input layer. Figure 2 is a schematic diagram illustrating the interconnections between layers. Each tetragon drawn with heavy lines represents an S-plane or a C-plane, and each vertical tetragon drawn with thin lines, in which S-planes or C-planes are enclosed, represents an S-layer or a C-layer. In Fig. 2, a cell of each layer receives afferent connections from the cells within the area enclosed by the elipse in its preceding layer. To be exact, as for the S-cells, the elipses in Fig. 2 does not show the connect- ing area but the connectable area to the S-cells. That is, all the interconnections coming from the elipses are not always formed, because the synaptic connections incoming to the S-cells have plasticity. In Fig. 2, for the sake of simplicity of the figure,

  • nly one cell is shown in each cell-plane. In fact, all the

cells in a cell-plane have input synapses of the same spatial distribution as shown in Fig. 3, and only the positions of the presynaptic cells are shifted in parallel from cell to cell.

R3 ~I

modifioble synapses ) unmodifiable synopses

Since the cells in the network are interconnected in a cascade as shown in Fig. 2, the deeper the layer is, the larger becomes the receptive field of each cell of that

  • layer. The density of the cells in each cell-plane is so

determined as to decrease in accordance with the increase of the size of the receptive fields. Hence, the total number of the cells in each cell-plane decreases with the depth of the cell-plane in the network. In the last module, the receptive field of each C-cell becomes so large as to cover the whole area of input layer U0, and each C-plane is so determined as to have only one C-cell. The S-cells and C-cells are excitatory cells. That is, all the efferent synapses from these cells are excitatory. Although it is not shown in Fig. 2, we also have

  • Fig. 3. Illustration showing the input interconnections

to the cells within a single cell-plane

  • Fig. 2. Schematic

diagram illustrating the interconnections between layers in the neocognitron

Follows Hubel and Wiesel’s results. (Fukushima, 1980)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 11 / 57

slide-21
SLIDE 21

Network for the T-C problem Trained with back-prop. (Rumelhart et al., 1988)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 12 / 57

slide-22
SLIDE 22

LeNet-5

✂✁☎✄✝✆✟✞✠✄☛✡✌☞✎✍✟✏✒✑✓✏✂✏✂✏✎✔✖✕☛✄☎✗☛✏✙✘✛✚✙✏✂✁✢✜✤✣✥✣✧✦
  • INPUT

32x32 Convolutions Subsampling Convolutions C1: feature maps 6@28x28 Subsampling S2: f. maps 6@14x14 S4: f. maps 16@5x5 C5: layer 120 C3: f. maps 16@10x10 F6: layer 84 Full connection Full connection Gaussian connections OUTPUT 10

✁❼✿▲❍✪❦✪❾★❦➝❱❜✸✶❆✶✯✪✿ ✵✶✰❅❆r✵✻✳✪✸✶✰❷✷✴⑥✦P❩✰❺❤❀✰r✵✻❑✂✁★❚♦✱✎❻✇✷✹❈❯▼❯✷✹❴▲✳★✵✶✿▲✷✹❈✪✱✴❴✦❤❳✰❅✳✪✸✶✱✴❴☛❤❳✰r✵●✽❢✷✹✸✻❣✑❚★✯✪✰❅✸✶✰❜⑥✙✷✹✸❢❉✪✿▲❍✹✿ ✵✻✺✏✸✶✰❅❆❅✷✹❍✹❈✪✿ ✵✶✿▲✷✹❈➎❦✥❧✈✱✴❆❖✯✞❃★❴➂✱✴❈✪✰❨✿▲✺✏✱➜⑥✙✰❇✱❘✵✶✳★✸✶✰❷❋✛✱✴❃❩❚✄✿✁❦ ✰✹❦✪✱➜✺✶✰r✵✏✷✴⑥❼✳✪❈✪✿ ✵✻✺ ✽❳✯★✷✹✺✶✰❨✽❢✰❅✿▲❍✹✯❯✵✶✺❜✱✴✸✶✰❨❆❅✷✹❈✪✺ ✵✶✸✶✱✴✿▲❈✪✰❅❉✞✵✶✷✎◗✑✰❷✿▲❉★✰❇❈❯✵✻✿▲❆❺✱✴❴✁❦ å➈ê➞è✧ç✙ï✶ëíï✖å➄è✧ÿ✙ä✥ï➽î➓å➄æ☎ê✒ø➑é➷è✥ç✙ï✶æ☎ä✧ï✖ú✐ø➀ì➈ÿ✎ê✌û➀å✪✘❛ï➊ä❞ð✫ã✛ç✙ï➓è✧ä❿å➄ø➀é☎å✑✡✙û➑ï ❶ì➇ï✱✯➵➊ø➑ï✖é✐è✒å➄é☎ù➒✡✙ø✓å➈ê✌➊ì➈é✐è✧ä✥ì➈û❦è✥ç✙ï➓ï❯➘✟ï✒❶è➞ì➄ë➵è✧ç✙ï✶ê✧ø✠✂➈î➻ì➈ø✓ù❖é☎ì➈é✦✝ û➀ø➑é✙ï❞å➄ä✥ø➩è❅✘❛ð➃➏⑥ë❦è✥ç✙ï➉❶ì➇ï✱✯➵➊ø➑ï✖é✐è➔ø➀ê♣ê✤î➓å➄û➀û✶➌✙è✧ç☎ï➊é✺è✧ç☎ï➞ÿ✙é✙ø➑è➉ì➈æ◆ï➊ä❿å✠è✥ï✖ê ø➀é✺å➵➍✐ÿ☎å❛ê✤ø✙✝⑥û➑ø➀é✙ï✖å➈ä✛î❭ì✂ù✂ï➎➌☎å➄é✎ù✶è✥ç✙ï➞ê✧ÿ❼✡✦✝➠ê✧å➈î❭æ☎û➑ø➀é❼✂➓û✓å✪✘➈ï➊ä✛î➻ï➊ä✥ï➊û✠✘ ✡✙û➀ÿ✙ä❿ê➻è✧ç✙ï❖ø➑é✙æ☎ÿ✂è✖ð ➏⑥ë➳è✥ç✙ï↕❶ì➇ï✱✯➵➊ø➑ï✖é✐è✶ø➀ê➽û➀å➈ä❺✂❛ï✑➌✇ê✧ÿ❼✡✦✝➠ê✥å➄î➻æ✙û➀ø➑é❼✂ ÿ✙é✙ø➑è✥ê④➊å➈é✆✡✎ï✒ê✧ï➊ï✖é✫å➈ê❨æ◆ï➊ä✧ëíì➈ä✥î➻ø➑é❼✂➽å ☛✤é✙ì❛ø➀ê❺✘✿ý✞✍ ✌➻ì❛ä➉å▲☛✤é✙ì❛ø➀ê❺✘ ✕➉ñ④✧✟✌➳ëíÿ☎é✗↔è✥ø➑ì❛é➻ù✙ï➊æ◆ï➊é☎ù✂ø➀é❼✂➤ì➈é❙è✧ç☎ï❨úrå➈û➑ÿ☎ï➵ì➈ë☎è✥ç✙ï✔✡✙ø➀å❛ê➊ð❣þ➇ÿ✗✒❶ï❞ê❅✝ ê✧ø➑ú❛ï❙û✓å✪✘➈ï✖ä✥ê♣ì➄ë✔❶ì➈é➇ú❛ì➈û➀ÿ✂è✧ø➀ì➈é☎ê➳å➄é✎ù❺ê✧ÿ❼✡✦✝➠ê✧å➈î➻æ✙û➑ø➀é❼✂✺å➄ä✥ï✒è❅✘➇æ✙ø✁➊å➈û➑û✠✘ å➄û➑è✧ï✖ä✧é✎å✠è✧ï❞ù❢➌☎ä✥ï✖ê✧ÿ✙û➑è✧ø➀é❼✂➽ø➀é❖å✏☛❺✡✙ø➟✝⑥æ☛✘➇ä✥å➈î❭ø✓ù ✌❇✰✇å✠è♣ï✖å✑❿ç✫û✓å✪✘➈ï✖ä✒➌✙è✧ç✙ï é➇ÿ✙î➉✡◆ï➊ä➳ì➄ë❣ëíï❞å✠è✥ÿ✙ä✧ï❙î➓å➄æ☎ê➉ø➀ê♣ø➑é✥❶ä✥ï✖å➈ê✧ï✖ù✫å❛ê➔è✥ç✙ï❙ê✧æ☎å➄è✧ø✓å➄û❯ä✧ï❞ê✤ì❛û➑ÿ✦✝ è✧ø➀ì➈é❭ø✓ê❯ù✙ï✒❶ä✥ï✖å❛ê✤ï❞ù☛ð ✭✇å✑❿ç➞ÿ☎é✙ø➩è❣ø➀é✒è✥ç✙ï➵è✧ç✙ø➀ä❿ù✒ç✙ø✓ù✙ù✂ï✖é❙û✓å✪✘➈ï✖ä❇ø➀é➑➞✗✂❄✝ ÿ✙ä✥ï✄✑➵î➓å✪✘➉ç☎årú❛ï❣ø➑é☎æ✙ÿ✂è❜➊ì➈é✙é☎ï✒↔è✥ø➑ì❛é☎ê✟ëíä✥ì➈î ê✤ï✖ú➈ï✖ä✥å➈ûrëíï✖å✠è✥ÿ✙ä✥ï➏î➓å➄æ☎ê ø➀é✢è✥ç✙ï✌æ✙ä✥ï➊ú➇ø➀ì➈ÿ☎ê✛û✓å✪✘➈ï✖ä✖ð❦ã✛ç✙ï➓❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎✄✠ê✧ÿ❼✡✦✝➠ê✧å➈î➻æ✙û➑ø➀é❼✂➚❶ì➈î➚✝ ✡✙ø➀é☎å✠è✥ø➑ì❛é✏➌☛ø➀é☎ê✤æ☎ø➑ä✥ï✖ù✆✡☛✘✫õ➔ÿ✗✡✎ï✖û➏å➄é☎ù ✎ ø➑ï❞ê✤ï✖û✡❁ ê➉é☎ì➄è✧ø➀ì➈é✎ê➉ì➈ë ☛✧ê✧ø➑î➚✝ æ✙û➀ï✖✌➤å➄é✎ù ☛❘❶ì❛î❭æ☎û➑ï✄↔✔✌✞❶ï✖û➑û✓ê✒➌➄ó➵å❛ê❦ø➑î➻æ✙û➀ï➊î➻ï✖é❛è✥ï✖ù❭ø➀é➚✜☎ÿ✙ô➇ÿ☎ê✤ç☎ø➑î➓å❂❁ ê ñ➔ï✖ì✦❶ì✑✂❛é✙ø➑è✧ä✥ì➈é ✞ ❜ ✑✆✠❖➌✎è✧ç✙ì❛ÿ❼✂➈ç✫é✙ì➙✂➈û➀ì✑✡☎å➈û➑û✠✘✺ê✤ÿ✙æ◆ï➊ä✥ú➇ø➀ê✧ï✖ù✺û➀ï✖å➈ä✧é☎ø➑é❼✂ æ✙ä✥ì✦❶ï✖ù✙ÿ✙ä✧ï✌ê✧ÿ✗❿ç✺å➈ê✎✡☎å✑❿ô❩✝⑥æ✙ä✥ì➈æ☎å✑✂❛å✠è✥ø➑ì❛é➓ó➵å❛ê❨årú✠å➄ø➀û➀å✑✡✙û➑ï♣è✧ç☎ï➊é✝ð❹✕ û✓å➄ä❘✂➈ï✒ù✂ï✒✂➈ä✥ï➊ï➞ì➈ë❫ø➀é✐ú✠å➈ä✧ø✓å➄é✗➊ï✌è✧ì➙✂➈ï✖ì➈î➻ï❶è✥ä✧ø✁➤è✧ä❿å➄é✎ê④ëíì❛ä✧î➓å✠è✥ø➑ì❛é☎ê❨ì➈ë è✧ç☎ï❭ø➀é✙æ✙ÿ✙è✌✖å➄é➛✡✎ï➓å✑❿ç☎ø➑ï✖ú➈ï✖ù➲ó❨ø➩è✥ç❖è✥ç✙ø➀ê➤æ☎ä✧ì➎✂➈ä✥ï✖ê✥ê✤ø➀ú➈ï➞ä✥ï✖ù✙ÿ✗↔è✥ø➑ì❛é ì➄ë✝ê✧æ☎å✠è✥ø➀å➈û✙ä✥ï✖ê✧ì➈û➀ÿ✂è✧ø➀ì➈é➵❶ì❛î➻æ✎ï✖é☎ê✧å➄è✧ï❞ù➝✡❩✘❭å✌æ✙ä✥ì✑✂❛ä✧ï❞ê✧ê✧ø➑ú❛ï❫ø➀é✗❶ä✥ï✖å❛ê✤ï ì➄ë◆è✧ç☎ï➔ä✧ø✁❿ç✙é✙ï❞ê✧ê❯ì➈ë✎è✥ç✙ï➔ä✥ï➊æ✙ä✥ï✖ê✧ï➊é✐è✥å➄è✧ø➀ì➈é➐➪➺è✧ç☎ï➔é➇ÿ✙î➉✡◆ï➊ä➏ì➈ë✎ëíï❞å✠è✥ÿ✙ä✧ï î➓å➄æ☎ê✴➶↔ð þ➇ø➑é✥❶ï✌å➄û➀û☎è✥ç✙ï➤ó✇ï✖ø✙✂❛ç✐è✥ê➵å➄ä✥ï➉û➀ï✖å➄ä✥é✙ï❞ù➽ó❨ø➑è✧ç➐✡☎å✑❿ô❩✝⑥æ✙ä✧ì❛æ☎å❄✂✐å✠è✥ø➑ì❛é✏➌ ❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û✛é✙ï➊è④ó✇ì❛ä✧ô✂ê➝➊å➈é⑧✡◆ï➲ê✤ï✖ï➊é➘å➈ê➓ê❺✘➇é❛è✥ç✙ï✖ê✧ø✠➽➊ø➀é❼✂❖è✥ç✙ï➊ø➀ä ì✠ó❨é ëíï✖å➄è✧ÿ✙ä✥ï➓ï❯↔➇è✧ä❿å✑❶è✧ì❛ä✖ð➽ã✛ç✙ï✶ó✇ï✖ø✙✂❛ç❛è➞ê✧ç☎å➄ä✥ø➀é❼✂✿è✧ï✒❿ç☎é✙ø✠➍✐ÿ✙ï✶ç☎å➈ê è✧ç☎ï✢ø➀é✐è✧ï➊ä✥ï✖ê✤è✧ø➀é❼✂ ê✤ø✓ù✂ï✿ï❯➘✟ï✒↔è➻ì➄ë➔ä✥ï✖ù✙ÿ✗❶ø➀é❼✂➲è✥ç✙ï✿é✐ÿ☎î➉✡◆ï➊ä➻ì➄ë❨ëíä✥ï➊ï æ☎å➈ä✥å➈î❭ï➊è✧ï✖ä✥ê✒➌✛è✧ç✙ï✖ä✧ï✒✡☛✘ ä✥ï✖ù✙ÿ✗❶ø➀é❼✂ è✥ç✙ï ☛❺➊å➈æ☎å✑➊ø➩è❅✘✔✌➷ì➈ë➞è✧ç✙ï î➓å♦✝ ❿ç✙ø➀é✙ï➉å➈é☎ù➻ä✧ï❞ù✂ÿ✗➊ø➑é❼✂➤è✥ç✙ï✛✂✐å➄æ➝✡✎ï➊è④ó✇ï✖ï➊é➻è✥ï✖ê✤è❫ï➊ä✥ä✧ì❛ä➏å➄é☎ù❭è✥ä✥å➈ø➑é☎ø➑é❼✂ ï➊ä✥ä✥ì➈ä ✞ ❜✫❝✬✠⑥ð➽ã✛ç✙ï➓é✙ï➊è④ó✇ì❛ä✧ô➲ø➀é➒➞✗✂➈ÿ☎ä✧ï❵✑✫❶ì❛é✐è✥å➄ø➀é☎ê✹❜✫❝✗✘❼➌ ❀✻✘✗✺ ➊ì➈é✦✝ é✙ï★↔è✧ø➀ì➈é✎ê✄➌♦✡✙ÿ✙è❦ì❛é✙û✠✘ ✓✗✘❼➌ ✘✻✘✗✘✛è✧ä❿å➄ø➀é☎å❄✡☎û➑ï❫ëíä✧ï✖ï❫æ☎å➈ä✥å➈î❭ï➊è✧ï✖ä✥ê❳✡✎ï★➊å➈ÿ☎ê✤ï ì➄ë❯è✧ç☎ï✌ó✇ï✖ø✙✂❛ç❛è➔ê✧ç☎å➄ä✥ø➀é❼✂☎ð ✜❇ø✙↔✂ï✖ù☛✝➠ê✧ø✙➽✖ï❭☞✇ì❛é✐ú❛ì➈û➀ÿ✂è✧ø➀ì➈é✎å➄û❙ñ➉ï❶è④ó➵ì➈ä✥ô✂ê❖ç☎årú❛ï➹✡◆ï➊ï✖é➶å➄æ☎æ✙û➑ø➀ï✖ù è✧ì î➓å➈é❩✘❑å➄æ✙æ✙û➀ø✁➊å✠è✥ø➑ì❛é☎ê✒➌➵å➈î❭ì❛é❼✂ ì➄è✧ç☎ï➊ä➽ç☎å➈é☎ù✂ó❨ä✥ø➩è✥ø➑é✗✂ ä✥ï✒➊ì✑✂❛é✙ø➟✝ è✧ø➀ì➈é✩✞ ❜ ✙✆✠❖➌ ✞ ❜✗✓ ✠❖➌☛î➓å✑❿ç✙ø➀é✙ï❯✝⑥æ✙ä✥ø➑é✐è✥ï✖ù➔❿ç✎å➄ä❿å✑↔è✥ï➊ä♣ä✧ï★❶ì✑✂❛é✙ø➑è✧ø➀ì➈é ✞ ❜ ✔✆✠✶➌ ì➈é❼✝♠û➀ø➑é☎ï✾ç☎å➈é☎ù✂ó❨ä✥ø➩è✥ø➑é✗✂➶ä✥ï✒➊ì✑✂➈é☎ø➩è✥ø➑ì❛é ✞ ❜✗✺ ✠❖➌➲å➄é☎ù✴ë⑨å✑➊ï✾ä✥ï✒➊ì✑✂❛é✙ø➟✝ è✧ø➀ì➈é ✞ ❜✗❀ ✠⑥ð ✜❇ø✙↔✂ï✖ù☛✝➠ê✧ø✙➽✖ï➹❶ì❛é✐ú❛ì➈û➀ÿ✂è✧ø➀ì➈é✎å➄û➞é✙ï➊è④ó✇ì❛ä✧ô✂ê✿è✥ç☎å✠è➷ê✤ç☎å➈ä✧ï ó➵ï➊ø✠✂➈ç✐è✥ê➻å➄û➀ì➈é✗✂ å ê✤ø➀é❼✂❛û➑ï✿è✧ï✖î➻æ✎ì❛ä✥å➈û✛ù✙ø➑î➻ï➊é✎ê✤ø➀ì➈é➘å➄ä✥ï✢ô➇é✙ì✠ó❨é å➈ê ã✛ø➀î❭ï✄✝❖✧♣ï➊û✓å✪✘♣ñ➉ï➊ÿ✙ä❿å➄û✐ñ➔ï➊è④ó✇ì❛ä✧ô✂ê❷➪⑨ã✩✧➳ñ➉ñ♣ê❘➶❶ð❯ã✩✧➳ñ➉ñ♣ê❀ç✎årú➈ï❷✡◆ï➊ï➊é ÿ☎ê✧ï✖ù❖ø➑é➲æ☎ç✙ì➈é✙ï✖î➻ï❙ä✥ï✒➊ì✑✂➈é☎ø➩è✥ø➑ì❛é✢➪íó❨ø➑è✧ç✙ì❛ÿ✂è➤ê✧ÿ❼✡✦✝➠ê✥å➄î➻æ✙û➀ø➑é❼✂☛➶✜✞ ❝✗✘✬✠✶➌ ✞ ❝✗➾✡✠❖➌➻ê✤æ◆ì➈ô❛ï➊éòó➵ì➈ä❿ù ä✥ï✒❶ì➎✂➈é✙ø➑è✧ø➀ì➈é ➪⑨ó❨ø➩è✥ç✭ê✧ÿ❼✡✦✝➠ê✥å➄î➻æ✙û➀ø➑é❼✂☛➶ ✞ ❝✵✑ ✠✶➌ ✞ ❝ ❜ ✠❖➌✌ì➈é❼✝♠û➀ø➑é☎ï ä✧ï★❶ì✑✂❛é✙ø➑è✧ø➀ì➈é ì➄ë➻ø➀ê✧ì➈û✓å✠è✥ï✖ù❲ç☎å➈é☎ù✂ó❨ä✥ø➩è✧è✧ï➊é②❿ç✎å➄ä❿å✑✹✝ è✧ï✖ä✥ê✹✞ ❝✫❝✫✠✶➌✙å➈é☎ù✺ê✤ø✠✂➈é✎å✠è✧ÿ☎ä✧ï➳ú➈ï✖ä✧ø✙➞✥➊å➄è✧ø➀ì➈é ✞ ❝✦✙✆✠⑥ð ✬ ✛❊✢ ➳❺➸➑➳❯➺❩✭✝✆ ã✛ç✙ø➀ê➉ê✤ï★↔è✧ø➀ì➈é✫ù✂ï❞ê❺➊ä✧ø✠✡◆ï✖ê❨ø➀é✺î❭ì❛ä✧ï✌ù✂ï➊è✥å➈ø➑û❀è✧ç✙ï➞å➈ä❘❿ç✙ø➑è✧ï★↔è✥ÿ✙ä✧ï➤ì➈ë ✗✝ï❞ñ➔ï➊è❇✝ ✙✦➌❨è✥ç✙ï①☞✇ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û➳ñ➔ï➊ÿ☎ä✥å➈û♣ñ➔ï➊è④ó✇ì❛ä✧ô➘ÿ☎ê✧ï✖ù✻ø➑é✲è✧ç✙ï ï❯↔✂æ◆ï➊ä✥ø➑î➻ï✖é❛è❿ê➊ð ✗❀ï✖ñ➔ï➊è❇✝ ✙➉➊ì➈î➻æ✙ä✥ø➀ê✧ï✖ê ✔➞û➀å✪✘❛ï➊ä❿ê✄➌✐é✙ì➄è✩❶ì❛ÿ✙é✐è✧ø➀é❼✂❙è✧ç✙ï ø➀é✙æ✙ÿ✂è★➌☎å➄û➀û✟ì➄ë❀ó❨ç✙ø✁❿ç➙➊ì➈é✐è✥å➈ø➑é✶è✧ä❿å➄ø➀é☎å❄✡☎û➑ï➳æ☎å➄ä❿å➄î➻ï➊è✧ï➊ä❿ê✬➪⑨ó✇ï✖ø✙✂❛ç✐è✥ê✴➶↔ð ã✛ç✙ï➏ø➀é✙æ✙ÿ✙è❀ø➀ê❀å❙❜ ✑✪↔ ❜✵✑❫æ☎ø➟↔✂ï➊û➈ø➑î➓å❄✂❛ï➈ð❀ã✛ç☎ø➀ê❀ø➀ê✝ê✧ø✠✂➈é✙ø✙➞✥➊å➈é✐è✧û✠✘➉û✓å➄ä❘✂➈ï➊ä è✧ç✎å➄é✺è✧ç☎ï➞û➀å➈ä❺✂❛ï✖ê✤è✬❿ç☎å➄ä❿å✑❶è✧ï✖ä❨ø➑é✺è✥ç✙ï✒ù✙å➄è✥å❄✡✎å➈ê✧ï➣➪⑨å✠è♣î❭ì✐ê④è✒✑❆✘♦↔✤✑✻✘ æ✙ø✙↔✂ï➊û✓ê➓❶ï✖é❛è✥ï➊ä✥ï✖ù❺ø➑é å❈✑✻✺♦↔✤✑❆✺ ➞☎ï➊û✓ù✗➶❶ð✶ã✛ç✙ï➓ä✥ï✖å➈ê✧ì➈é ø✓ê➤è✧ç☎å➄è➞ø➩è➞ø✓ê ù✂ï❞ê✤ø➀ä✥å✑✡✙û➀ï✌è✧ç☎å➄è♣æ◆ì➄è✥ï➊é✐è✧ø✓å➄û❦ù✂ø✓ê④è✥ø➑é✗❶è✧ø➀ú➈ï✒ëíï✖å✠è✥ÿ✙ä✥ï✖ê➉ê✧ÿ✗❿ç➲å❛ê♣ê✤è✧ä✥ì➈ô❛ï ï➊é✎ù☛✝♠æ◆ì➈ø➀é✐è✥ê➵ì➈ä✎❶ì❛ä✧é✙ï✖ä✎✖å➄é✶å➈æ✙æ◆ï✖å➄ä❊✣●➨✆➺➭➥❼➳ ❄✴➳✄➨✗➺r➳❯➡❣ì➈ë✝è✧ç☎ï♣ä✥ï✒➊ï➊æ✦✝ è✧ø➀ú➈ï✞➞☎ï✖û➀ù✿ì➈ë❀è✧ç☎ï✌ç✙ø✙✂❛ç✙ï✖ê✤è❇✝⑥û➀ï➊ú➈ï✖û◆ëíï✖å✠è✥ÿ✙ä✥ï➤ù✙ï❶è✧ï★↔è✥ì➈ä❿ê➊ð❨➏➠é❖✗✝ï❞ñ➔ï❶è❺✝✝✙ è✧ç☎ï♣ê✧ï❶è✇ì➈ë✏❶ï➊é✐è✥ï➊ä❿ê➏ì➄ë☛è✥ç✙ï➉ä✥ï✒➊ï➊æ✂è✥ø➑ú❛ï✩➞☎ï✖û➀ù✙ê✇ì➄ë✟è✥ç✙ï➉û✓å➈ê✤è✎➊ì➈é➇ú➈ì❛û➑ÿ✦✝ è✧ø➀ì➈é✎å➄û✂û✓å✪✘➈ï➊ä✛➪❖☞❀❜❼➌➈ê✧ï➊ï✩✡◆ï➊û➀ì✠ó✬➶❀ëíì➈ä✥î➶å✹✑❆✘❄↔✤✑❆✘➳å➄ä✥ï✖å♣ø➑é❭è✧ç☎ï✬❶ï✖é❛è✥ï➊ä ì➄ë❀è✧ç✙ï✹❜ ✑♦↔ ❜ ✑➤ø➀é✙æ✙ÿ✂è❞ð❫ã✛ç✙ï➳ú✠å➄û➀ÿ✙ï✖ê➵ì➄ë❇è✧ç✙ï➳ø➀é✙æ✙ÿ✂è➔æ✙ø✙↔✂ï➊û✓ê✛å➄ä✥ï➉é✙ì❛ä❇✝ î➓å➄û➀ø✙➽✖ï✖ù❺ê✤ì✢è✧ç☎å➄è➤è✧ç✙ï➝✡☎å➎❿ô❩✂❛ä✧ì❛ÿ✙é☎ù✫û➀ï➊ú❛ï➊û✩➪íó❨ç☎ø➩è✥ï★➶t❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✙ê è✧ì✫å✿ú✠å➄û➀ÿ✙ï➻ì➄ë➃✝ ✘☎ð✙➾❭å➄é☎ù❖è✧ç✙ï➻ëíì❛ä✧ï✒✂➈ä✥ì➈ÿ✙é✎ù✢➪➭✡✙û✓å✑❿ô❼➶✞❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✙ê è✧ì①➾❛ð✙➾✍✔✬✙✙ð➻ã✛ç☎ø➀ê✌î➓å➈ô➈ï✖ê➳è✥ç✙ï➓î➻ï✖å➈é ø➀é✙æ✙ÿ✙è✒ä✧ì❛ÿ❼✂➈ç☎û✙✘✦✘✗➌❇å➈é☎ù❺è✧ç✙ï ú✠å➄ä✥ø➀å➈é✗❶ï➳ä✥ì➈ÿ❼✂❛ç✙û✙✘➔➾➳ó❨ç☎ø✠❿ç✺å✑✒❶ï➊û➀ï➊ä❿å✠è✥ï✖ê➵û➀ï✖å➄ä✥é✙ø➀é❼✂❖✞ ❝ ✓ ✠⑥ð ➏➠é➤è✧ç✙ï❫ëíì➈û➀û➑ì✠ó❨ø➀é❼✂✗➌✪❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û✠û✓å✪✘➈ï✖ä✥ê✝å➈ä✧ï❣û✓å❄✡◆ï➊û➀ï✖ù➉☞➜↔❢➌✠ê✧ÿ❼✡✦✝ ê✥å➄î➻æ✙û➀ø➑é❼✂✢û➀å✪✘❛ï➊ä❿ê➉å➈ä✧ï➞û✓å❄✡◆ï➊û➀ï✖ù þ❩↔❢➌☛å➄é☎ù➲ëíÿ✙û➑û✠✘❩✝❖➊ì➈é✙é☎ï✒↔è✥ï✖ù✫û✓å✪✘➈ï✖ä✥ê å➄ä✥ï➤û➀å✑✡✎ï✖û➑ï❞ù➙✜❳↔❢➌✙ó❨ç✙ï✖ä✧ï✞↔✿ø➀ê➵è✥ç✙ï✌û➀å✪✘❛ï➊ä✛ø➀é☎ù✂ï❯↔☛ð ✗❀å✪✘➈ï✖ä➑☞④➾➽ø✓ê✒å✆❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û❣û✓å✪✘➈ï✖ä➞ó❨ø➩è✥ç✮✓✺ëíï✖å➄è✧ÿ✙ä✥ï➓î➓å➄æ☎ê✖ð ✭❫å➎❿ç➻ÿ✙é✙ø➑è➵ø➀é➓ï✖å➎❿ç➻ëíï✖å✠è✥ÿ✙ä✥ï➔î➓å➄æ➽ø✓ê➜➊ì➈é✙é✙ï★↔è✥ï✖ù➻è✥ì✒å ✙✪↔✤✙➤é☎ï➊ø✠✂➈ç✦✝ ✡◆ì➈ä✥ç✙ì➇ì➇ù✒ø➑é✒è✧ç✙ï✛ø➀é✙æ✙ÿ✂è❞ð❯ã✛ç☎ï✛ê✧ø✠➽➊ï❫ì➈ë☎è✥ç✙ï❫ëíï❞å✠è✥ÿ✙ä✧ï➵î➓å➄æ☎ê❯ø✓ê ✑❆✺♦↔✤✑✻✺ ó❨ç✙ø✁❿ç❖æ✙ä✥ï➊ú❛ï➊é✐è✥êt➊ì➈é✙é✙ï★↔è✥ø➑ì❛é➲ëíä✥ì➈î è✧ç☎ï❭ø➀é✙æ✙ÿ✙è✌ëíä✧ì❛î ë⑨å➄û➀û➑ø➀é❼✂✿ì❄➘ è✧ç☎ï➑✡◆ì➈ÿ✙é✎ù✙å➄ä❘✘➈ð➓☞④➾➑➊ì➈é✐è✥å➈ø➑é☎ê➉➾ ✙✻✓➻è✧ä❿å➄ø➀é☎å❄✡☎û➑ï❙æ☎å➄ä❿å➄î➻ï❶è✥ï➊ä❿ê✄➌◆å➄é☎ù ➾ ✑ ✑✦➌ ❜✻✘✫❝➑❶ì❛é✙é✙ï★↔è✧ø➀ì➈é✎ê➊ð ✗❀å✪✘➈ï✖ä➔þ ✑❭ø➀ê➉å➓ê✤ÿ❼✡❼✝⑥ê✥å➄î➻æ✙û➀ø➑é✗✂❭û✓å✪✘➈ï✖ä❨ó❨ø➩è✥ç ✓❙ëíï✖å✠è✥ÿ✙ä✥ï✌î➻å➈æ☎ê❨ì➈ë ê✧ø✙➽✖ï✞➾✖❝❄↔❢➾❪❝✎ð★✭❫å➎❿ç✒ÿ✙é☎ø➩è❣ø➀é❭ï❞å✑❿ç✒ëíï✖å✠è✥ÿ✙ä✥ï➵î➓å➈æ❙ø✓ê❷❶ì❛é✙é✙ï✒❶è✧ï❞ù➞è✥ì➳å ✑✪↔✤✑➤é☎ï➊ø✠✂➈ç☛✡✎ì❛ä✧ç☎ì✐ì✂ù➻ø➀é➓è✧ç☎ï④❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✂ø➀é❼✂➤ëíï❞å✠è✧ÿ☎ä✧ï♣î➻å➈æ➓ø➑é✫☞④➾➈ð ã✛ç✙ï➳ëíì➈ÿ☎ä❨ø➑é✙æ☎ÿ✂è✥ê✛è✥ì➓å✒ÿ✙é☎ø➩è➔ø➀é➲þ ✑✒å➈ä✧ï✌å❛ù✙ù✂ï✖ù✏➌➇è✧ç✙ï✖é✢î❙ÿ✙û➩è✥ø➑æ☎û➑ø➀ï✖ù ✡☛✘ å✫è✧ä❿å➄ø➀é☎å✑✡✙û➑ï➐❶ì➇ï❈✯➣➊ø➑ï✖é❛è★➌➏å➄é☎ù❑å➈ù✙ù✙ï✖ù➷è✥ì å➲è✧ä❿å➄ø➀é☎å❄✡✙û➀ï➣✡✙ø✓å➈ê✖ð ã✛ç✙ï➲ä✥ï✖ê✧ÿ✙û➑è➽ø➀ê✶æ☎å➈ê✥ê✧ï✖ù❑è✧ç✙ä✥ì➈ÿ✗✂➈ç❍å ê✤ø✠✂➈î➻ì❛ø➀ù✙å➈û✛ëíÿ✙é✗❶è✧ø➀ì➈é✝ð ã✛ç✙ï ✑✪↔✤✑✶ä✧ï★❶ï✖æ✂è✧ø➀ú➈ï➑➞☎ï➊û✓ù✙ê✌å➈ä✧ï❙é✙ì➈é✦✝⑥ì✠ú➈ï✖ä✧û✓å➄æ☎æ✙ø➑é✗✂✗➌✎è✧ç✙ï✖ä✧ï➊ëíì➈ä✥ï✒ëíï❞å✠è✧ÿ☎ä✧ï î➓å➄æ☎ê✒ø➑é❤þ✤✑✺ç☎årú➈ï➻ç☎å➈û➩ë✛è✥ç✙ï✶é➇ÿ✙î➑✡✎ï✖ä✒ì➄ë✛ä✥ì✠ó➔ê➞å➄é☎ù①➊ì➈û➀ÿ✙î➻é å➈ê ëíï✖å➄è✧ÿ✙ä✥ï➤î➓å➄æ☎ê❨ø➀é➔☞④➾➈ð ✗❇å✪✘➈ï➊ä❨þ ✑❭ç☎å❛êt➾ ✑✒è✧ä❿å➄ø➀é☎å✑✡✙û➑ï➤æ☎å➈ä✥å➈î➻ï❶è✧ï✖ä✥ê å➄é✎ù✳✙❼➌ ✺✗✺✻✘➚❶ì➈é☎é✙ï✒❶è✧ø➀ì➈é☎ê✖ð ✗❀å✪✘➈ï✖ä✞☞❀❜➻ø✓ê♣å➣➊ì➈é➇ú➈ì❛û➑ÿ✙è✧ø➀ì➈é☎å➈û☛û➀å✪✘❛ï➊ä➔ó❨ø➑è✧ç✢➾✒✓❭ëíï✖å➄è✧ÿ✙ä✥ï➞î➓å➄æ☎ê✖ð ✭❫å➎❿ç✺ÿ✙é✙ø➑è➉ø➑é✫ï❞å✑❿ç✿ëíï✖å➄è✧ÿ✙ä✥ï➞î➓å➄æ✺ø✓ê✬➊ì➈é✙é✙ï★↔è✥ï✖ù✢è✧ì✶ê✧ï➊ú❛ï➊ä❿å➄û ✙♦↔ ✙ é✙ï✖ø✙✂❛ç❩✡◆ì➈ä✥ç✙ì➇ì✂ù✙ê✺å✠è✫ø✓ù✂ï➊é✐è✧ø✁➊å➈û➤û➑ì✦➊å➄è✧ø➀ì➈é☎ê✺ø➀é å❑ê✧ÿ❼✡☎ê✧ï❶è➲ì➈ë➻þ ✑❂❁ ê ëíï✖å➄è✧ÿ✙ä✥ï✿î➻å➈æ☎ê✖ð❑ã❯å❄✡✙û➀ï➙➏❭ê✤ç☎ì✠ó➔ê✒è✥ç✙ï✫ê✤ï➊è❭ì➈ë➤þ ✑➲ëíï✖å➄è✧ÿ✙ä✥ï✿î➻å➈æ☎ê

(leCun et al., 1998)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 13 / 57

slide-23
SLIDE 23

AlexNet (Krizhevsky et al., 2012)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 14 / 57

slide-24
SLIDE 24

GoogLeNet

input Conv 7x7+2(S) MaxPool 3x3+2(S) LocalRespNorm Conv 1x1+1(V) Conv 3x3+1(S) LocalRespNorm MaxPool 3x3+2(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) MaxPool 3x3+2(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) AveragePool 5x5+3(V) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) AveragePool 5x5+3(V) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) MaxPool 3x3+2(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) Conv 1x1+1(S) MaxPool 3x3+1(S) DepthConcat Conv 3x3+1(S) Conv 5x5+1(S) Conv 1x1+1(S) AveragePool 7x7+1(V) FC Conv 1x1+1(S) FC FC SoftmaxActivation softmax0 Conv 1x1+1(S) FC FC SoftmaxActivation softmax1 SoftmaxActivation softmax2

(Szegedy et al., 2015)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 15 / 57

slide-25
SLIDE 25

Resnet

7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000 image

(He et al., 2015)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 16 / 57

slide-26
SLIDE 26

Deep learning is built on a natural generalization of a neural network: a graph

  • f tensor operators, taking advantage of
  • the chain rule (aka “back-propagation”),
  • stochastic gradient decent,
  • convolutions,
  • parallel operations on GPUs.

This does not differ much from networks from the 90s

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 17 / 57

slide-27
SLIDE 27

This generalization allows to design complex networks of operators dealing with images, sound, text, sequences, etc. and to train them end-to-end. (Yeung et al., 2015)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 18 / 57

slide-28
SLIDE 28

CIFAR10 32 × 32 color images, 50k train samples, 10k test samples. (Krizhevsky, 2009, chap. 3)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 19 / 57

slide-29
SLIDE 29

Performance on CIFAR10

75 80 85 90 95 100 2010 2012 2014 2016 2018

Krizhevsky et al. (2012) Graham (2015) Human performance Real et al. (2018)

Accuracy (%) Year

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 20 / 57

slide-30
SLIDE 30

method top-1 err. top-5 err.

VGG [41] (ILSVRC’14)

  • 8.43†

GoogLeNet [44] (ILSVRC’14)

  • 7.89

VGG [41] (v5) 24.4 7.1 PReLU-net [13] 21.59 5.71 BN-inception [16] 21.99 5.81 ResNet-34 B 21.84 5.71 ResNet-34 C 21.53 5.60 ResNet-50 20.74 5.25 ResNet-101 19.87 4.60 ResNet-152 19.38 4.49 Table 4. Error rates (%) of single-model results on the ImageNet validation set (except † reported on the test set).

method

top-5 err. (test) VGG [41] (ILSVRC’14) 7.32 GoogLeNet [44] (ILSVRC’14) 6.66 VGG [41] (v5) 6.8 PReLU-net [13] 4.94 BN-inception [16] 4.82 ResNet (ILSVRC’15) 3.57 Table 5. Error rates (%) of ensembles. The top-5 error is on the test set of ImageNet and reported by the test server.

(He et al., 2015)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 21 / 57

slide-31
SLIDE 31

Current application domains

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 22 / 57

slide-32
SLIDE 32

Object detection and segmentation (Pinheiro et al., 2016)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 23 / 57

slide-33
SLIDE 33

Human pose estimation (Wei et al., 2016)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 24 / 57

slide-34
SLIDE 34

Image generation (Radford et al., 2015)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 25 / 57

slide-35
SLIDE 35

Reinforcement learning Self-trained, plays 49 games at human level. (Mnih et al., 2015)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 26 / 57

slide-36
SLIDE 36

Strategy games March 2016, 4-1 against a 9-dan professional without handicap. (Silver et al., 2016)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 27 / 57

slide-37
SLIDE 37

Translation

“The reason Boeing are doing this is to cram more seats in to make their plane more competitive with our products,” said Kevin Keniston, head of passenger comfort at Europe’s Airbus.

“La raison pour laquelle Boeing fait cela est de cr´ eer plus de si` eges pour rendre son avion plus comp´ etitif avec nos produits”, a d´ eclar´ e Kevin Keniston, chef du confort des passagers chez Airbus. When asked about this, an official of the American administration replied: “The United States is not conducting electronic surveillance aimed at offices

  • f the World Bank and IMF in Washington.”

Interrog´ e ` a ce sujet, un fonctionnaire de l’administration am´ ericaine a r´ epondu: “Les ´ Etats-Unis n’effectuent pas de surveillance ´ electronique ` a l’intention des bureaux de la Banque mondiale et du FMI ` a Washington”

(Wu et al., 2016)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 28 / 57

slide-38
SLIDE 38

Auto-captioning (Vinyals et al., 2015)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 29 / 57

slide-39
SLIDE 39

Question answering

I: Jane went to the hallway. I: Mary walked to the bathroom. I: Sandra went to the garden. I: Daniel went back to the garden. I: Sandra took the milk there. Q: Where is the milk? A: garden I: It started boring, but then it got interesting. Q: What’s the sentiment? A: positive

(Kumar et al., 2015)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 30 / 57

slide-40
SLIDE 40

Why does it work now?

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 31 / 57

slide-41
SLIDE 41

The success of deep learning is multi-factorial:

  • Five decades of research in machine learning,
  • CPUs/GPUs/storage developed for other purposes,
  • lots of data from “the internet”,
  • tools and culture of collaborative and reproducible science,
  • resources and efforts from large corporations.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 32 / 57

slide-42
SLIDE 42

Five decades of research in ML provided

  • a taxonomy of ML concepts (classification, generative models, clustering,

kernels, linear embeddings, etc.),

  • a sound statistical formalization (Bayesian estimation, PAC),
  • a clear picture of fundamental issues (bias/variance dilemma, VC

dimension, generalization bounds, etc.),

  • a good understanding of optimization issues,
  • efficient large-scale algorithms.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 33 / 57

slide-43
SLIDE 43

From a practical perspective, deep learning

  • lessens the need for a deep mathematical grasp,
  • makes the design of large learning architectures a system/software

development task,

  • allows to leverage modern hardware (clusters of GPUs),
  • does not plateau when using more data,
  • makes large trained networks a commodity.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 34 / 57

slide-44
SLIDE 44

10-3 100 103 106 109 1012 1960 1970 1980 1990 2000 2010 2020 Flops per USD

(Wikipedia “FLOPS”)

TFlops (1012) Price GFlops per $ Intel i7-6700K 0.2 $344 0.6 AMD Radeon R-7 240 0.5 $55 9.1 NVIDIA GTX 750 Ti 1.3 $105 12.3 AMD RX 480 5.2 $239 21.6 NVIDIA GTX 1080 8.9 $699 12.7

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 35 / 57

slide-45
SLIDE 45

103 106 109 1012 1980 1990 2000 2010 2020 Bytes per USD

(John C. McCallum) The typical cost of a 4Tb hard disk is $120 (Dec 2016).

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 36 / 57

slide-46
SLIDE 46

A l e x N e t B N

  • A

l e x N e t B N

  • N

I N G

  • g

L e N e t R e s N e t

  • 1

8 V G G

  • 1

6 V G G

  • 1

9 R e s N e t

  • 3

4 R e s N e t

  • 5

R e s N e t

  • 1

1 I n c e p t i

  • n
  • v

3

50 55 60 65 70 75 80 Top-1 accuracy [%] 5 10 15 20 25 30 35 40 Operations [G-Ops] 50 55 60 65 70 75 80 Top-1 accuracy [%]

AlexNet BN-AlexNet BN-NIN ResNet-18 VGG-16 VGG-19 GoogLeNet ResNet-34 ResNet-50 ResNet-101 Inception-v3 5M 35M 65M 95M 125M 155M

1 2 4 8 16 32 64 Batch size [ / ] 100 200 300 400 500 600 Foward time per image [ms] BN-NIN GoogLeNet Inception-v3 AlexNet BN-AlexNet VGG-16 VGG-19 ResNet-18 ResNet-34 ResNet-50 ResNet-101 1 2 4 8 16 32 64 Batch size [ / ] 5 10 20 50 100 200 500 Foward time per image [ms] BN-NIN GoogLeNet Inception-v3 AlexNet BN-AlexNet VGG-16 VGG-19 ResNet-18 ResNet-34 ResNet-50 ResNet-101

(Canziani et al., 2016)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 37 / 57

slide-47
SLIDE 47

Implementing a deep network, PyTorch

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 38 / 57

slide-48
SLIDE 48

Deep-learning development is usually done in a framework:

Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD

  • U. of Montreal

Caffe C++ BSD 2 clauses

  • U. of CA, Berkeley

A fast, low-level, compiled backend to access computation devices, combined with a slow, high-level, interpreted language.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 39 / 57

slide-49
SLIDE 49

We will use PyTorch for our examples. http://pytorch.org

“PyTorch is a python package that provides two high-level features:

  • Tensor computation (like numpy) with strong GPU acceleration
  • Deep Neural Networks built on a tape-based autograd system

You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed.”

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 40 / 57

slide-50
SLIDE 50

MNIST data-set 28 × 28 grayscale images, 60k train samples, 10k test samples. (leCun et al., 1998)

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 41 / 57

slide-51
SLIDE 51

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range(nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 42 / 57

slide-52
SLIDE 52

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range(nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 42 / 57

slide-53
SLIDE 53

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range(nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 42 / 57

slide-54
SLIDE 54

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range(nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 42 / 57

slide-55
SLIDE 55

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range(nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 42 / 57

slide-56
SLIDE 56

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std)

  • ptimizer = optim.SGD(model. parameters (), lr = 1e -1)

criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range(nb_train_samples , bs):

  • utput = model( train_input .narrow (0, b, bs))

loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward ()

  • ptimizer.step ()

≃7s on a GTX1080, ≃1% test error

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 42 / 57

slide-57
SLIDE 57

Learning from data

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 43 / 57

slide-58
SLIDE 58

The general objective of machine learning is to capture regularity in data to make predictions. In our regression example, we modeled age and blood pressure as being linearly related, to predict the latter from the former.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 44 / 57

slide-59
SLIDE 59

The general objective of machine learning is to capture regularity in data to make predictions. In our regression example, we modeled age and blood pressure as being linearly related, to predict the latter from the former. There are multiple types of inference that we can roughly split into three categories:

  • Classification (e.g. object recognition, cancer detection, speech

processing),

  • regression (e.g. customer satisfaction, stock prediction, epidemiology), and
  • density estimation (e.g. outlier detection, data visualization,

sampling/synthesis).

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 44 / 57

slide-60
SLIDE 60

Learning consists of finding in a set F of functionals a “good” f ∗ (or its parameters’ values) usually defined through a loss l : F × Z → R such that l(f , z) increases with how wrong f is on z.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 45 / 57

slide-61
SLIDE 61

Learning consists of finding in a set F of functionals a “good” f ∗ (or its parameters’ values) usually defined through a loss l : F × Z → R such that l(f , z) increases with how wrong f is on z. E.g.

  • for classification: l(f , (x, y)) = 1{f (x)=y},
  • for regression: l(f , (x, y)) = (f (x) − y)2,
  • for density estimation l(q, z) = − log q(z).

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 45 / 57

slide-62
SLIDE 62

Learning consists of finding in a set F of functionals a “good” f ∗ (or its parameters’ values) usually defined through a loss l : F × Z → R such that l(f , z) increases with how wrong f is on z. E.g.

  • for classification: l(f , (x, y)) = 1{f (x)=y},
  • for regression: l(f , (x, y)) = (f (x) − y)2,
  • for density estimation l(q, z) = − log q(z).

We are looking for an f with a small empirical loss: L(f ) = 1 N

N

  • n=1

l(f , Zn). However, it may reflect poorly the “true” loss on test data.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 45 / 57

slide-63
SLIDE 63

Capacity

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 46 / 57

slide-64
SLIDE 64

Consider a polynomial model ∀x, α0, . . . , αD ∈ R, f (x; α) =

D

  • d=0

αdxd.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

slide-65
SLIDE 65

Consider a polynomial model ∀x, α0, . . . , αD ∈ R, f (x; α) =

D

  • d=0

αdxd. and training points (xn, yn) ∈ R2, n = 1, . . . , N, minimize the quadratic loss L(α) =

  • n

(f (xn; α) − yn)2

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

slide-66
SLIDE 66

Consider a polynomial model ∀x, α0, . . . , αD ∈ R, f (x; α) =

D

  • d=0

αdxd. and training points (xn, yn) ∈ R2, n = 1, . . . , N, minimize the quadratic loss L(α) =

  • n

(f (xn; α) − yn)2 =

  • n

D

  • d=0

αdxd

n − yn

2

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

slide-67
SLIDE 67

Consider a polynomial model ∀x, α0, . . . , αD ∈ R, f (x; α) =

D

  • d=0

αdxd. and training points (xn, yn) ∈ R2, n = 1, . . . , N, minimize the quadratic loss L(α) =

  • n

(f (xn; α) − yn)2 =

  • n

D

  • d=0

αdxd

n − yn

2 =

  x0

1

. . . xD

1

. . . . . . x0

N

. . . xD

N

      α0 . . . αD    −    y1 . . . yN   

  • 2

.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

slide-68
SLIDE 68

Consider a polynomial model ∀x, α0, . . . , αD ∈ R, f (x; α) =

D

  • d=0

αdxd. and training points (xn, yn) ∈ R2, n = 1, . . . , N, minimize the quadratic loss L(α) =

  • n

(f (xn; α) − yn)2 =

  • n

D

  • d=0

αdxd

n − yn

2 =

  x0

1

. . . xD

1

. . . . . . x0

N

. . . xD

N

      α0 . . . αD    −    y1 . . . yN   

  • 2

. This is a standard quadratic problem, for which we have efficient algorithms.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

slide-69
SLIDE 69

def fit_polynomial (D, x, y): N = x.size (0) X = Tensor(N, D + 1) Y = Tensor(N, 1) # Exercise: avoid the n loop for n in range(N): for d in range(D + 1): X[n, d] = x[n]**d Y[n, 0] = y[n] # LAPACK ’s GEneralized Least -Square alpha , _ = torch.gels(Y, X) return alpha

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 48 / 57

slide-70
SLIDE 70
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Data

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-71
SLIDE 71
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=0 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-72
SLIDE 72
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=1 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-73
SLIDE 73
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=2 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-74
SLIDE 74
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=3 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-75
SLIDE 75
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=4 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-76
SLIDE 76
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=5 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-77
SLIDE 77
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=6 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-78
SLIDE 78
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=7 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-79
SLIDE 79
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=8 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-80
SLIDE 80
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1 Degree D=9 Data f*

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

slide-81
SLIDE 81

10-3 10-2 10-1 1 2 3 4 5 6 7 8 9 Error (MSE) Degree Train Test

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 50 / 57

slide-82
SLIDE 82

We define the capacity of a set of predictors as its ability to model an arbitrary functional. Although it is difficult to define precisely, it is quite clear in practice how to increase or decrease it for a given class of models.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 51 / 57

slide-83
SLIDE 83

We define the capacity of a set of predictors as its ability to model an arbitrary functional. Although it is difficult to define precisely, it is quite clear in practice how to increase or decrease it for a given class of models.

  • If the capacity is too low, the predictor does not fit the data. The training

error is high, and reflects the test error. ⇒ Under-fitting.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 51 / 57

slide-84
SLIDE 84

We define the capacity of a set of predictors as its ability to model an arbitrary functional. Although it is difficult to define precisely, it is quite clear in practice how to increase or decrease it for a given class of models.

  • If the capacity is too low, the predictor does not fit the data. The training

error is high, and reflects the test error. ⇒ Under-fitting.

  • If the capacity is too high, the predictor fits well the data including noise.

The training error is low, and does not reflect the test error. ⇒ Over-fitting.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 51 / 57

slide-85
SLIDE 85

Proper evaluation protocols

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 52 / 57

slide-86
SLIDE 86

Learning algorithms, in particular deep-learning ones, require the tuning of many meta-parameters.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 53 / 57

slide-87
SLIDE 87

Learning algorithms, in particular deep-learning ones, require the tuning of many meta-parameters. These parameters have a strong impact on the performance, resulting in a “meta” over-fitting through experiments.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 53 / 57

slide-88
SLIDE 88

Learning algorithms, in particular deep-learning ones, require the tuning of many meta-parameters. These parameters have a strong impact on the performance, resulting in a “meta” over-fitting through experiments. We must be extra careful with performance estimation.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 53 / 57

slide-89
SLIDE 89

Learning algorithms, in particular deep-learning ones, require the tuning of many meta-parameters. These parameters have a strong impact on the performance, resulting in a “meta” over-fitting through experiments. We must be extra careful with performance estimation. Running 100 times our experiment on MNIST, with randomized weights, we get: Worst Median Best 1.3% 1.0% 0.82%

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 53 / 57

slide-90
SLIDE 90

The ideal development cycle is Write code Train

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

slide-91
SLIDE 91

The ideal development cycle is Write code Train Test

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

slide-92
SLIDE 92

The ideal development cycle is Write code Train Test Results

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

slide-93
SLIDE 93

The ideal development cycle is Write code Train Test Results

  • r in practice something like

Write code Train Test Results

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

slide-94
SLIDE 94

The ideal development cycle is Write code Train Test Results

  • r in practice something like

Write code Train Test Results There may be over-fitting, but it does not bias the final performance evaluation.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

slide-95
SLIDE 95

Unfortunately, it often looks like Write code Train Test Results

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

slide-96
SLIDE 96

Unfortunately, it often looks like Write code Train Test Results

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

slide-97
SLIDE 97

Unfortunately, it often looks like Write code Train Test Results

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

slide-98
SLIDE 98

Unfortunately, it often looks like Write code Train Test Results

  • This should be avoided at all costs. The standard strategy is to have a

separate validation set for the tuning.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

slide-99
SLIDE 99

Unfortunately, it often looks like Write code Train Test Results

  • This should be avoided at all costs. The standard strategy is to have a

separate validation set for the tuning. Write code Train Validation Test Results

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

slide-100
SLIDE 100

Unfortunately, it often looks like Write code Train Test Results

  • This should be avoided at all costs. The standard strategy is to have a

separate validation set for the tuning. Write code Train Validation Test Results

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

slide-101
SLIDE 101

Some data-sets (MNIST!) have been used by thousands of researchers, over millions of experiments, in hundreds of papers.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 56 / 57

slide-102
SLIDE 102

Some data-sets (MNIST!) have been used by thousands of researchers, over millions of experiments, in hundreds of papers. The global overall process looks more like Write code Train Test Results

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 56 / 57

slide-103
SLIDE 103

“Cheating” in machine learning, from bad to “are you kidding?”:

  • “Early evaluation stopping”,
  • meta-parameter (over-)tuning,
  • data-set selection,
  • algorithm data-set specific clauses,
  • seed selection.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 57 / 57

slide-104
SLIDE 104

“Cheating” in machine learning, from bad to “are you kidding?”:

  • “Early evaluation stopping”,
  • meta-parameter (over-)tuning,
  • data-set selection,
  • algorithm data-set specific clauses,
  • seed selection.

The community pushes toward accessible implementations, reference data-sets, leader boards, and constant upgrades of benchmarks.

Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 57 / 57

slide-105
SLIDE 105

The end

slide-106
SLIDE 106

References

  • A. Canziani, A. Paszke, and E. Culurciello. An analysis of deep neural network models for

practical applications. CoRR, abs/1605.07678, 2016.

  • K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of

pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193–202, April 1980.

  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR,

abs/1512.03385, 2015.

  • A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis,

Department of Computer Science, University of Toronto, 2009.

  • A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional

neural networks. In Neural Information Processing Systems (NIPS), 2012.

  • A. Kumar, O. Irsoy, J. Su, J. Bradbury, R. English, B. Pierce, P. Ondruska, I. Gulrajani,

and R. Socher. Ask me anything: Dynamic memory networks for natural language

  • processing. CoRR, abs/1506.07285, 2015.
  • Y. leCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to

document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

  • W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity.

The bulletin of mathematical biophysics, 5(4):115–133, 1943.

slide-107
SLIDE 107
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,
  • M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik,
  • I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis.

Human-level control through deep reinforcement learning. Nature, 518(7540):529–533,

  • Feb. 2015.
  • P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Doll´
  • ar. Learning to refine object segments.

In European Conference on Computer Vision (ECCV), pages 75–91, 2016.

  • A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep

convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.

  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Neurocomputing: Foundations of

Research, chapter Learning Representations by Back-propagating Errors, pages 696–699. MIT Press, 1988.

  • D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche,
  • J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe,
  • J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu,
  • T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and

tree search. Nature, 529:484–503, 2016.

  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,

and A. Rabinovich. Going deeper with convolutions. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

  • O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption
  • generator. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  • S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. Convolutional pose machines. CoRR,

abs/1602.00134, 2016.

slide-108
SLIDE 108
  • Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao,
  • Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws,
  • Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young,
  • J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean.

Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144, 2016.

  • S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei. End-to-end learning of action detection

from frame glimpses in videos. CoRR, abs/1511.06984, 2015.