Introduction Feedforward Neural Networks
Deep learning
J´ er´ emy Fix
CentraleSup´ elec jeremy.fix@centralesupelec.fr
2016
1 / 94
Deep learning J er emy Fix CentraleSup elec - - PowerPoint PPT Presentation
Introduction Feedforward Neural Networks Deep learning J er emy Fix CentraleSup elec jeremy.fix@centralesupelec.fr 2016 1 / 94 Introduction Feedforward Neural Networks Introduction and historical perspective [Schmidhuber, 2015]
Introduction Feedforward Neural Networks
1 / 94
Introduction Feedforward Neural Networks
2 / 94
Introduction Feedforward Neural Networks
3 / 94
Introduction Feedforward Neural Networks
4 / 94
Introduction Feedforward Neural Networks
5 / 94
Introduction Feedforward Neural Networks
6 / 94
Introduction Feedforward Neural Networks
Skip layer connection
7 / 94
Introduction Feedforward Neural Networks
Sensory x0 x1 x2 x3 Associative Result a0 = φ0(x) a1 = φ1(x) a2 = φ2(x) r0 r1 r2 w00 w10 w01 w02 w22 Σ Σ Σ g
8 / 94
Introduction Feedforward Neural Networks
9 / 94
Introduction Feedforward Neural Networks
10 / 94
Introduction Feedforward Neural Networks
Case yi = +1 Case yi = −1 w φ(xi) φ(xi) w
11 / 94
Introduction Feedforward Neural Networks
Case yi = +1 Case yi = −1 φ(xi) φ(xi) w w w + φ(xi) w − φ(xi)
12 / 94
Introduction Feedforward Neural Networks
13 / 94
Introduction Feedforward Neural Networks
14 / 94
Introduction Feedforward Neural Networks
15 / 94
Introduction Feedforward Neural Networks
16 / 94
Introduction Feedforward Neural Networks
17 / 94
Introduction Feedforward Neural Networks
3 2 1 1 2 3 3 2 1 1 2 3
18 / 94
Introduction Feedforward Neural Networks
19 / 94
Introduction Feedforward Neural Networks
20 / 94
Introduction Feedforward Neural Networks
21 / 94
Introduction Feedforward Neural Networks
22 / 94
Introduction Feedforward Neural Networks
23 / 94
Introduction Feedforward Neural Networks
24 / 94
Introduction Feedforward Neural Networks
25 / 94
Introduction Feedforward Neural Networks
26 / 94
Introduction Feedforward Neural Networks
1 1+exp(−x), σ(x) ∈ [0, 1]
dx σ(x) = σ(x)(1 − σ(x))
27 / 94
Introduction Feedforward Neural Networks
28 / 94
Introduction Feedforward Neural Networks
29 / 94
Introduction Feedforward Neural Networks
j x)
k x)
30 / 94
Introduction Feedforward Neural Networks
31 / 94
Introduction Feedforward Neural Networks
j
32 / 94
Introduction Feedforward Neural Networks
33 / 94
Introduction Feedforward Neural Networks
Layer 0 Layer 1 · · · Layer L-1 Layer L · · · · · · · · ·
y(L−1) a(L−1)
w(L−1)
00
w(L−1)
0i
w(1)
00
w(1)
01
w(1)
02
x0 x1 x2 x3 y(L)
i
= f(a(L)
i
) a(L)
i
=
j w(L) ij y(L−1) j
y(L−1)
i
= g(a(L−1)
i
) a(L−1)
i
=
j w(L−1) ij
y(L−2)
j
y(1)
i
= g(a(1)
i )
a(1)
i
=
j w(1) ij xj
y(L−1) 1 a(L−1) 1
y(L) a(L) y(L)
1
a(L)
1
Named MLP for historical reasons. Should be called FNN. 34 / 94
Introduction Feedforward Neural Networks
Input Hidden layers Output
Layer 0 Layer 1 · · · Layer L-1 Layer L · · · · · · · · ·
1 1 1
y(L−1) a(L−1)
w(L−1)
00
w(L−1)
0i
w(1)
00
w(1)
01
w(1)
02
x0 x1 x2 x3 y(L)
i
= f(a(L)
i
) a(L)
i
=
j w(L) ij y(L−1) j
y(L−1)
i
= g(a(L−1)
i
) a(L−1)
i
=
j w(L−1) ij
y(L−2)
j
y(1)
i
= g(a(1)
i )
a(1)
i
=
j w(1) ij xj
y(L−1) 1 a(L−1) 1
y(L) a(L) y(L)
1
a(L)
1
35 / 94
Introduction Feedforward Neural Networks
4 2 2 4 1.0 0.5 0.0 0.5 1.0 4 2 2 4 0.0 0.2 0.4 0.6 0.8 1.0 4 2 2 4 1 2 3 4 5
36 / 94
Introduction Feedforward Neural Networks
eaj
37 / 94
Introduction Feedforward Neural Networks
Backprop is usually attributed to [Rumelhart,1986] but [Werbos,1981] already introduced the idea. 38 / 94
Introduction Feedforward Neural Networks
39 / 94
Introduction Feedforward Neural Networks
40 / 94
Introduction Feedforward Neural Networks
41 / 94
Introduction Feedforward Neural Networks
42 / 94
Introduction Feedforward Neural Networks
43 / 94
Introduction Feedforward Neural Networks
44 / 94
Introduction Feedforward Neural Networks
45 / 94
Introduction Feedforward Neural Networks
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
46 / 94
Introduction Feedforward Neural Networks
47 / 94
Introduction Feedforward Neural Networks
48 / 94
Introduction Feedforward Neural Networks
4 2 2 4 1 2 3 4 5 4 2 2 4 1 1 2 3 4 5 4 2 2 4 1 1 2 3 4 5
49 / 94
Introduction Feedforward Neural Networks
50 / 94
Introduction Feedforward Neural Networks
51 / 94
Introduction Feedforward Neural Networks
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 5 10 15 20 25 30 0.6 0.4 0.2 0.0 0.2 0.4 0.6
52 / 94
Introduction Feedforward Neural Networks
53 / 94
Introduction Feedforward Neural Networks
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 5 10 15 20 25 30 0.6 0.4 0.2 0.0 0.2 0.4 0.6
54 / 94
Introduction Feedforward Neural Networks
55 / 94
Introduction Feedforward Neural Networks
56 / 94
Introduction Feedforward Neural Networks
57 / 94
Introduction Feedforward Neural Networks
58 / 94
Introduction Feedforward Neural Networks
Choromanska, 2015 : The Loss Surface of Multilayer Nets Dauphin, 2014 : Identifying and attacking the saddle point problem in high-dimensional non-convex optimization Pascanu, 2014 : On the saddle point problem for non-convex optimization 59 / 94
Introduction Feedforward Neural Networks
60 / 94
Introduction Feedforward Neural Networks
61 / 94
Introduction Feedforward Neural Networks
Introduction Feedforward Neural Networks
63 / 94
Introduction Feedforward Neural Networks
64 / 94
Introduction Feedforward Neural Networks
65 / 94
Introduction Feedforward Neural Networks
66 / 94
Introduction Feedforward Neural Networks
10 5 5 10 x 30 20 10 10 20 30 40 y
67 / 94
Introduction Feedforward Neural Networks
5 5 10 b 5 5 10 w 200 400 600 800 1000 iteration 5 4 3 2 1 1 2 log(J) 68 / 94
Introduction Feedforward Neural Networks
69 / 94
Introduction Feedforward Neural Networks
5 5 10 b 5 5 10 w 200 400 600 800 1000 iteration 5 4 3 2 1 1 2 log(J)
70 / 94
Introduction Feedforward Neural Networks
200 400 600 800 1000 iteration 5 4 3 2 1 1 2 log(J) 200 400 600 800 1000 iteration 5 4 3 2 1 1 2 log(J)
71 / 94
Introduction Feedforward Neural Networks
72 / 94
Introduction Feedforward Neural Networks
5 5 10 b 5 5 10 w 200 400 600 800 1000 iteration 5 4 3 2 1 1 2 log(J)
73 / 94
Introduction Feedforward Neural Networks
200 400 600 800 1000 iteration 5 4 3 2 1 1 2 log(J) 200 400 600 800 1000 iteration 5 4 3 2 1 1 2 log(J) 200 400 600 800 1000 iteration 5 4 3 2 1 1 2 log(J)
74 / 94
Introduction Feedforward Neural Networks
75 / 94
Introduction Feedforward Neural Networks
76 / 94
Introduction Feedforward Neural Networks
77 / 94
Introduction Feedforward Neural Networks
78 / 94
Introduction Feedforward Neural Networks
79 / 94
Introduction Feedforward Neural Networks
80 / 94
Introduction Feedforward Neural Networks
81 / 94
Introduction Feedforward Neural Networks
82 / 94
Introduction Feedforward Neural Networks
83 / 94
Introduction Feedforward Neural Networks
84 / 94
Introduction Feedforward Neural Networks
5 5 10 b 5 5 10 w 5 10 15 b 10 15 20 25 w
85 / 94
Introduction Feedforward Neural Networks
86 / 94
Introduction Feedforward Neural Networks
87 / 94
Introduction Feedforward Neural Networks
88 / 94
Introduction Feedforward Neural Networks
89 / 94
Introduction Feedforward Neural Networks
90 / 94
Introduction Feedforward Neural Networks
91 / 94
Introduction Feedforward Neural Networks
92 / 94
Introduction Feedforward Neural Networks
B + ǫ
93 / 94
Introduction Feedforward Neural Networks
94 / 94
1 / 35
2 / 35
3 / 35
3 channels
1 kernel
K kernels K feature maps
ReLu
s s
Max
4 2 2 4 1 2 3 4 5Bias
K feature maps K feature maps
4 / 35
5 / 35
6 / 35
7 / 35
8 / 35
9 / 35
10 / 35
11 / 35
12 / 35
13 / 35
28x28 1x1 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 28x28 3x3 Representation Size Input RF Size 28x28 1x1 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 28x28 3x3 28x28 5x5 Representation Size Input RF Size 28x28 1x1 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 28x28 3x3 28x28 5x5 14x14 6x6 Representation Size Input RF Size 28x28 1x1 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 28x28 3x3 28x28 5x5 14x14 6x6 14x14 10x10 Representation Size Input RF Size
14 / 35
28x28 1x1 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 28x28 3x3 28x28 5x5 14x14 6x6 14x14 10x10 14x14 14x14 Representation Size Input RF Size 28x28 1x1 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 28x28 3x3 28x28 5x5 14x14 6x6 14x14 10x10 14x14 14x14 15x15 7x7 Representation Size Input RF Size 28x28 1x1 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 same stride 1 Conv 3x3 stride 2 Max 2x2 same stride 1 Conv 3x3 28x28 3x3 28x28 5x5 14x14 6x6 14x14 10x10 14x14 14x14 15x15 7x7 23x23 7x7 Representation Size Input RF Size
15 / 35
16 / 35
17 / 35
18 / 35
width height #channels n
1 1 n
Conv 1x1 m filters
width height #channels m
Relu
19 / 35
20 / 35
21 / 35
22 / 35
23 / 35
24 / 35
25 / 35
26 / 35
27 / 35
28 / 35
29 / 35
30 / 35
31 / 35
32 / 35
33 / 35
34 / 35
35 / 35
Recurrent Neural Networks
Input Output
1 / 15
Recurrent Neural Networks
Delay line Hidden layers Output layer xt xt−1 xt−2 xt−3 xt−4 xt−5 xt−6
2 / 15
Recurrent Neural Networks
Input Output
3 / 15
Recurrent Neural Networks
4 / 15
Recurrent Neural Networks
5 / 15
Recurrent Neural Networks
6 / 15
Recurrent Neural Networks
7 / 15
Recurrent Neural Networks
8 / 15
Recurrent Neural Networks
9 / 15
Recurrent Neural Networks
10 / 15
Recurrent Neural Networks
11 / 15
Recurrent Neural Networks
12 / 15
Recurrent Neural Networks
13 / 15
Recurrent Neural Networks
14 / 15
Recurrent Neural Networks
15 / 15