Michael O’Connor
MIXED PRECISION TRAINING Michael OConnor MIXED PRECISION What is - - PowerPoint PPT Presentation
MIXED PRECISION TRAINING Michael OConnor MIXED PRECISION What is - - PowerPoint PPT Presentation
MIXED PRECISION TRAINING Michael OConnor MIXED PRECISION What is the benefit? Using mixed precision and Volta your networks can be: 1. 3-4x faster 2. Reduce memory consumption and bandwidth pressure 3. just as powerful with no architecture
2
MIXED PRECISION
Using mixed precision and Volta your networks can be: 1. 3-4x faster 2. Reduce memory consumption and bandwidth pressure 3. just as powerful with no architecture change.
What is the benefit?
3
A MIXED PRECISION SOLUTION
"Master" weights in FP32 Loss (Gradient) Scaling Accumulate to FP32 (Tensor Cores) Imprecise weight updates Gradients underflow Maintain precision
4
MIXED SOLUTION: FP32 MASTER WEIGHTS
FP32 Master Weights Forward Pass FP32 Master Gradients FP16 Gradients FP16 Weights Apply Copy FP16 Loss
5
GRADIENTS RANGE OFFSET
6
MIXED PRECISION TRAINING
FP32 Master Weights Forward Pass FP32 Loss Loss Scaling Scaled FP32 Loss Scaled FP32 Gradients FP32 Gradients Scaled FP16 Gradients FP16 Weights Remove scale, (+clip, etc.) Apply Copy
7
NVCAFFE V0.16 TRAINING ALEXNET
1200 1700 2200 2700 June 2016 Sept 2016 Oct 2016 Dec 2016 Feb 2017 March 2017 May 2017
Images per second Single P100 GPU, Batch Size=128
Starting point nvCaffe 0.15 @ 1265 2568 Parallelize I/O decode & deserialize CPU Affinity Improved algo selection Fused weight update Parallel AllReduce Balance memory alloc
- btw. I/O & conv w.s.
8
RESNET-50 FP32 PERFORMANCE
250 500 750 1000 1250 1500 1750 2000 1 GPU 2 GPU 4 GPU 8 GPU
Caffe Caffe2 TensorFlow MXNet Torch CNTK Chainer Images per second
4/30/2017 : DGX-1 with Batch Size=64 per GPU. Chainer numbers are preliminary.
9
RESNET-50 MIXED PRECISION AND FP32
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 MXNet FP32 GTC 2017 MXNet FP32 GTC 2018 MXNet Mixed GTC 2018
1 GPU 2 GPU 4 GPU 8 GPU Images per second
10
INFORMATION SOURCES
CE8130 - Connect with the Experts: Deep Learning Training for Volta Tensor Cores Tu 2PM S8923 - Training Neural Networks with Mixed Precision: Theory and Practice Wed 2PM S81012 - Training Neural Networks with Mixed Precision: Real Examples Th 9 AM CE8162 - Connect with the Experts: Deep Learning Training for Volta Tensor Cores Th 2PM Mixed- Precision Training of Deep Neural Networks (NVIDIA Developer Blog) Training with Mixed Precision (NVIDIA User Guide)