Mixed Precision Training
计算平台事业部PAI团队
Mixed Precision Training PAI Overview What is mixed-precision - - PowerPoint PPT Presentation
Mixed Precision Training PAI Overview What is mixed-precision & Why mixed-precision How mixed-precision Mixed-precision tools on PAI-tensorflow Experimental results 1 What is mixed-precision
计算平台事业部PAI团队
1
2
3
4
5
FP16 FP32
6
Variables: 2-16 to 2-4 gradients: 2-30 to 2-5
7
FP16 FP32
8
FP16 FP32
9
10
12
13
1. Compute-bound
2. Memory-bound
① Reductions
② Element-wise operation
Take advantage of Tensorcore:
14
15
Computation: forward and backward Optimizer related →Can be in MP →should be in FP32
16
MP training (var in FP32):
Computation in MP Optimizer related: in FP32
17
MP training (var in FP32):
18
MP training (var in FP32):
19
MP training (var in FP32):
20
MP training (var in FP32):
21
MP training (var in FP32):
22
23
scaling strategy
24
FP32 graph_def MP graph_def Automatically conversion Standard optimizer Mixed-precision optimizer
25
26
27
28
Karras, Tero, et al. "Progressive Growing of GANs for Improved Quality." Stability, and Variation.
29
Karras, Tero, et al. "Progressive Growing of GANs for Improved Quality." Stability, and Variation.
30
(cifar10 dataset)
fp32 mp-no-scaling mp-auto-scaling Exp. fp32 mp-auto-scaling mp-no-scaling sliced_wasserstein 9.3764 9.1662 7.9601
31
Pyramid Embedded Generative Adversarial Network for Automated Font Generation
32
Pyramid Embedded Generative Adversarial Network for Automated Font Generation
33
fp32 mp-no-scaling mp-auto-scaling
34
Wide & Deep Learning for Recommender Systems
income of over 50,000 dollars
35
Exp fp32 mp-no-scaling Accuracy 84.31% 84.27% Wide & Deep Learning for Recommender Systems
36
37
Smaller Inputs and Bigger derivatives
activations, so as to reduce the
using FP16
derivatives
1/S rather than 1.0
38
activations activation gradients
39
40
activations gradients
41
42
precision