Industrial Level Deep Learning Training Infrastructure
—the Practice and Experience from SenseTime
Shengen Yan SenseTime Group Limited.
Industrial Level Deep Learning Training Infrastructure the Practice - - PowerPoint PPT Presentation
Industrial Level Deep Learning Training Infrastructure the Practice and Experience from SenseTime Shengen Yan SenseTime Group Limited. The Success of Deep Learning Google Search AlexNet won ImageNet 2006-01 2007-01 2008-01 2009-01
—the Practice and Experience from SenseTime
Shengen Yan SenseTime Group Limited.
2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 2015-01 2016-01
Google Search AlexNet won ImageNet
The Key to High Performance
5 8 22 169 1207
LeNe Net Alex exNet ( (2012) 2) Goog
et (201 014) ResN sNet (2 (2016) Ours rs
# Layers
Years months weeks days
Accelerate the training time from several years to several days!
A deep learning framework that is efficient, scalable, and flexible.
A large-scale cluster platform designed for deep learning.
Delivers many application models
Deep Learning community developed frameworks to make the life easier.
GoogleNet (2014)
parallel
high level compiler backend optimization algorithms on intermediate representation.
Optimizations: liveness analysis, computation graph
Generated Graph with mirror(re-compute) node
Chen T, Xu B, Zhang C, et al. Training deep nets with sublinear memory cost[J]. arXiv preprint arXiv:1604.06174, 2016.
Memory usage efficiency, higher is better
20 40 60 80 100 120 140 VGG ResNet50 ResNet152 Inception V4 ResNet269 Inception ResNet Ours MxNet TensorFlow Chainer Caffe Torch
Batch-32 Batch-64 Batch-128 Caffe 497.5 1045 1965 Chainer 200 290 543 TensorFlow 178.6 315.7 587.2 Parrots 122.7 225.6 471 500 1000 1500 2000 2500
milliseconds / iteration
Caff ffe Chai ainer Tens nsorFlo low Parr rrots
Support Multi-GPUs and Multi-Nodes Three procedures: Copy, Allreduce, Copy Optimizations:
communication and computation overhead
GPU0 GPU1 GPU3 GPU2 CPU Memory Other Nodes Allreduce Copy Copy
0.2 0.4 0.6 0.8 1 1.2 2000 4000 6000 8000 10000 12000 1 2 3 4 8 16 24 32 # GPUs # GPUs
millisec/iter scale efficiency
single node multiple nodes
A deep learning framework that is efficient, scalable, and flexible.
A large-scale cluster platform designed for deep learning.
Delivers many application models
It just like highway in the city
— It is a key infrastructure of AI
The key infrastructures for AI research.
DATA
COMPPUT- ATION
MODEL
DeepLink
Designed for Deep Learning
Software Hardware Co-design High- performance Hardware Customized Middlewares Maximize respective strengths while ensuring optimal cooperation.
files)
Heterogeneous deep learning super computer High speed storage system Operation/Maintenance/Monitoring System Lightweight virtualization Task scheduling system Distributed training software Deep Learning Training Visualization System Customized communication library for deep learning Computation library Distributed cache system Software Platform
>3000 GPUs
A deep learning framework that is efficient, scalable, and flexible.
A large-scale cluster platform designed for deep learning.
Delivers many application models