cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Optimization for Training Deep Models
Xiaogang Wang
xgwang@ee.cuhk.edu.hk
February 12, 2019
Xiaogang Wang Optimization for Training Deep Models
Optimization for Training Deep Models Xiaogang Wang - - PowerPoint PPT Presentation
Optimization Basics Optimization of training deep neural networks Multi-GPU Training Optimization for Training Deep Models Xiaogang Wang xgwang@ee.cuhk.edu.hk February 12, 2019 cuhk Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
θ J(X(train), θ)
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
∂ ∂xj f(x)i
∂2 ∂xi ∂xj f tells us how the first derivative will change
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
i,j |λi
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Gradient descent fails to exploit the curvature information contained in Hessian. Here we use gradient descent on a quadratic function whose Hessian matrix has condition number 5. The red lines indicate the path followed by gradient descent. This very elongated quadratic function resembles a long canyon. Gradient descent wastes time repeatedly descending canyon walls, because they are the steepest feature. Because the step size is somewhat too large, it has a tendency to overshoot the bottom of the function and thus needs to descend the opposite canyon wall
direction indicates that this directional derivative is rapidly increasing, so an optimization algorithm based on the Hessian could predict that the steepest direction is not actually a promising search direction in this context. Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
∂w2
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Tf ′ T−1 . . . f ′ 2f ′ 1
t = ∂ft(αt)
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
i ← 1
m
i
i )2 ← 1
m
i
i )2
i
i
i
i )2 + ǫ
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
i
j
i )2 = m
i
i
i ) · −1
i )2 + ǫ)−3/2
i
i
i )2 + ǫ
i )2 ·
n=1 −2(x(n) i
i )
i
i
i )2 + ǫ
i )2 · 2(x(n) i
i )
i
m
j
i
m
j Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
t
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
t
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
t
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
t
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
1
2
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
All comparisons are against a 12-core Intel E5-2679v2 CPU @ 2.4GHz running Caffe with Intel MKL 11.1.3. Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
All comparisons are against a 12-core Intel E5-2679v2 CPU @ 2.4GHz running Caffe with Intel MKL 11.1.3. Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models
cuhk Optimization Basics Optimization of training deep neural networks Multi-GPU Training
Xiaogang Wang Optimization for Training Deep Models