SLIDE 1 Recent Trends in Computer Vision and Deep Learning Systems
Yangqing Jia
Lead Researcher and Manager of AI Platform, Facebook
SLIDE 2
SLIDE 3
SLIDE 4
Computer Vision
SLIDE 5
SLIDE 6
AlexNet
So it begins.
SLIDE 7
VGGNet
Punch it.
SLIDE 8
GoogLeNet
We must go deeper.
SLIDE 9
ResNet
And we took the word seriously
SLIDE 10
ResNet
And we took the word seriously
SLIDE 11
ResNeXT
We totally see it coming
SLIDE 12 Pushing the Performance
ScSVM AlexNet VGGNet GoogLeNet ResNet ResNeXT
3.03 3.57 6.7 7.3 16.4 28.2
SLIDE 13 Why is it challenging?
Gradients, as one example
1 3 5 7 9 11 13 15
depth exploding vanishing ideal
SLIDE 14
SLIDE 15
SLIDE 16
SLIDE 17
Deep Learning Systems
SLIDE 19 Scalability
Run fast, run far
“How do I train on
multiple GPUs and machines?”
- Probably the most question we got from Caffe users
SLIDE 20
Scalability
Run fast, run far
1.2 million = (# of images in ImageNet1K) (# of new images @FB every 5 mins in 2013) (# of AI jobs per month @FB)
SLIDE 21
Scalability
Run fast, run far L1 L2 L3 L3b L2b L1b U3 U2 U1
SLIDE 22
Scalability
Run fast, run far L1 L2 L3 L3b L2b L1b U3 U2 U1 R3 R2 R1
SLIDE 23
Scalability
Run fast, run far L1 L2 L3 L3b L2b L1b U3 U2 U1 R3 R2 R1 L1 L2 L3 L3b L2b L1b U3 U2 U1 R3 R2 R1
SLIDE 24
Scalability
Run fast, run far L1 L2 L3 L3b L2b L1b U3 U2 U1 R3 R2 R1 L1 L2 L3 L3b L2b L1b U3 U2 U1 R3 R2 R1
SLIDE 25 The Return of MPI
"I'm your father", said Allreduce. Allreduce
Tree based - O(MlogN) Ring based - O(M) etc.
SLIDE 26
And so we scale
SLIDE 28
Quantized Computation
Forget about float, the world is bigger 8 23 5 10 16 8
float fp16 fixed16 fixed8
SLIDE 29
Why do we care?
Battery life is life.
float add fp16 add fixed16 add fixed8 add
0.9 0.4 0.05 0.03
float mul fp16 mul fixed8 mul
4.0 1.0 0.2
SLIDE 30 How does it perform?
Source: Nvidia https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/
SLIDE 31
Why does it matter for cars?
250 watts 10 -> 20 TFlops 10 watts 0.7 -> 1.5 TFlops
SLIDE 33 Portable System
One software to rule them all, and...
AI Math and Algorithms Deployment Platforms
SLIDE 34
SLIDE 35 Portable System
Cloud, Mobile, IoT, Cars, Drones, Coffee makers
Model auto predictor =
caffe2::Predictor(model_file) public class Predictor implements
Caffe2ModelInterface;
SLIDE 36
SLIDE 37 The Land of Deep Learning System
Applications Caffe, Torch, TF, etc... Core Math
Eigen
CuDNN NNPack
THNN
MKL
Comms
NCCL
MPI
ZeroMQ
Redis
...
Low Level
CUDA OpenGL OpenCL Vulkan
...
Compilers DataBases
LevelDB
RocksDB
Hadoop Amazon S3 your old disk
Not as complex as a car, but still.
SLIDE 38
SLIDE 39
SLIDE 40 Thank you!
Recent Trends in Computer Vision and Deep Learning Systems
Yangqing Jia