Summary
Key topics. ◮ Familiarity with form of basic network gradient. ◮ Deep network initialization. ◮ Minibatches. ◮ Momentum. Next time: convexity.
17 / 42
Summary Key topics. Familiarity with form of basic network - - PowerPoint PPT Presentation
Summary Key topics. Familiarity with form of basic network gradient. Deep network initialization. Minibatches. Momentum. Next time: convexity. 17 / 42 Part 2: convexity Why convexity? Deep networks are not convex in their
17 / 42
18 / 42
19 / 42
i=1
i x ≤ bi
i=1 αixi : k ∈ N, xi ∈ S, αi ≥ 0, k i=1 αi = 1}.
19 / 42
20 / 42
21 / 42
i=1 exp(xi)
21 / 42
22 / 42
22 / 42
22 / 42
22 / 42
22 / 42
22 / 42
i
23 / 42
n
Txiyi) = 1
n
24 / 42
T(x − x0)
25 / 42
T(x − x0)
25 / 42
26 / 42
∂ ∂x f(x)
∂2 ∂x2 f(x)
26 / 42
∂ ∂x f(x)
∂2 ∂x2 f(x)
26 / 42
∂ ∂x f(x)
∂2 ∂x2 f(x)
26 / 42
∂ ∂x f(x)
∂2 ∂x2 f(x)
26 / 42
∂ ∂x f(x)
∂2 ∂x2 f(x)
∂ ∂x f(x)
∂2 ∂x2 f(x)
∂ ∂x f(x)
∂2 ∂x2 f(x)
26 / 42
∂ ∂x f(x)
∂2 ∂x2 f(x)
26 / 42
27 / 42
27 / 42
28 / 42
28 / 42
29 / 42
2(1 − z)2 is 1-strongly-convex.
29 / 42
2(1 − z)2 is 1-strongly-convex.
29 / 42
x∈Rd
30 / 42
x∈Rd
30 / 42
x∈Rd
30 / 42
x∈Rd
30 / 42
x∈Rd
30 / 42
x∈Rd
30 / 42
x∈Rd
30 / 42
x∈Rd
30 / 42
x∈Rd
31 / 42
x∈Rd
31 / 42
x∈Rd
32 / 42
x∈Rd
x∈Rd
32 / 42
x∈Rd
x∈Rd
32 / 42
x∈Rd
x∈Rd
32 / 42
33 / 42
34 / 42
35 / 42
35 / 42
i≤t
t
w f(w)
36 / 42
i≤t
t
w f(w)
36 / 42
i≤t
t
w f(w)
t
36 / 42
t
37 / 42
t
37 / 42
38 / 42
T(w′ − x).
T(w′ − x)
39 / 42
x f(x).
40 / 42
x f(x).
40 / 42
T(X − y)
TE (X − y) = f(y).
41 / 42
42 / 42