MIXED PRECISION TRAINING Michael OConnor MIXED PRECISION What is - - PowerPoint PPT Presentation

mixed precision training
SMART_READER_LITE
LIVE PREVIEW

MIXED PRECISION TRAINING Michael OConnor MIXED PRECISION What is - - PowerPoint PPT Presentation

MIXED PRECISION TRAINING Michael OConnor MIXED PRECISION What is the benefit? Using mixed precision and Volta your networks can be: 1. 3-4x faster 2. Reduce memory consumption and bandwidth pressure 3. just as powerful with no architecture


slide-1
SLIDE 1

Michael O’Connor

MIXED PRECISION TRAINING

slide-2
SLIDE 2

2

MIXED PRECISION

Using mixed precision and Volta your networks can be: 1. 3-4x faster 2. Reduce memory consumption and bandwidth pressure 3. just as powerful with no architecture change.

What is the benefit?

slide-3
SLIDE 3

3

A MIXED PRECISION SOLUTION

"Master" weights in FP32 Loss (Gradient) Scaling Accumulate to FP32 (Tensor Cores) Imprecise weight updates Gradients underflow Maintain precision

slide-4
SLIDE 4

4

MIXED SOLUTION: FP32 MASTER WEIGHTS

FP32 Master Weights Forward Pass FP32 Master Gradients FP16 Gradients FP16 Weights Apply Copy FP16 Loss

slide-5
SLIDE 5

5

GRADIENTS RANGE OFFSET

slide-6
SLIDE 6

6

MIXED PRECISION TRAINING

FP32 Master Weights Forward Pass FP32 Loss Loss Scaling Scaled FP32 Loss Scaled FP32 Gradients FP32 Gradients Scaled FP16 Gradients FP16 Weights Remove scale, (+clip, etc.) Apply Copy

slide-7
SLIDE 7

7

NVCAFFE V0.16 TRAINING ALEXNET

1200 1700 2200 2700 June 2016 Sept 2016 Oct 2016 Dec 2016 Feb 2017 March 2017 May 2017

Images per second Single P100 GPU, Batch Size=128

Starting point nvCaffe 0.15 @ 1265 2568 Parallelize I/O decode & deserialize CPU Affinity Improved algo selection Fused weight update Parallel AllReduce Balance memory alloc

  • btw. I/O & conv w.s.
slide-8
SLIDE 8

8

RESNET-50 FP32 PERFORMANCE

250 500 750 1000 1250 1500 1750 2000 1 GPU 2 GPU 4 GPU 8 GPU

Caffe Caffe2 TensorFlow MXNet Torch CNTK Chainer Images per second

4/30/2017 : DGX-1 with Batch Size=64 per GPU. Chainer numbers are preliminary.

slide-9
SLIDE 9

9

RESNET-50 MIXED PRECISION AND FP32

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 MXNet FP32 GTC 2017 MXNet FP32 GTC 2018 MXNet Mixed GTC 2018

1 GPU 2 GPU 4 GPU 8 GPU Images per second

slide-10
SLIDE 10

10

INFORMATION SOURCES

CE8130 - Connect with the Experts: Deep Learning Training for Volta Tensor Cores Tu 2PM S8923 - Training Neural Networks with Mixed Precision: Theory and Practice Wed 2PM S81012 - Training Neural Networks with Mixed Precision: Real Examples Th 9 AM CE8162 - Connect with the Experts: Deep Learning Training for Volta Tensor Cores Th 2PM Mixed- Precision Training of Deep Neural Networks (NVIDIA Developer Blog) Training with Mixed Precision (NVIDIA User Guide)

Where to learn about mixed precision training

slide-11
SLIDE 11