Large-scale GPU Deep Learning Platform Design and Case Analysis - PowerPoint PPT Presentation

Large-scale GPU Deep Learning Platform Design and Case Analysis Zhang Qing Alfie Lew YOUR SUCCESS, WE SUCCEED

AI Age Has arrived Electric Age AI Age •In 1870s •In 2012 •The second •The fourth technological technological revolution revolution Information Age Steam Age •In 1940s~1950s •In 1760s •The third technological •The first technological revolution revolution

AI Application Trend • More and More Users The Internet – Security and Surveillance – Finance, health care – Car manufacturers – Smart City Financial Medical care Robots, entertainment – • More and more application scenarios Image / video analysis – Speech recognition – NLP/OCR – … – Automobile Household Entertainment

Deep Learning Process Flow Inference Data Preprocessing Training Abnormal Model Data sets “Thank you”

Deep Learning Computing Characteristics Data Preprocess High IO Intensity Training Extreme Computing and Communication Intensity Inference High throughput and low latency

Deep Learning Computing System Trend • Computing Mode From single node to clusters – From local to cloud – • Data Storage From Dedicated (Training and Inference) to Unified Storage – • System Management Development platform – Production platform – Cloud platform – • Application mode From single user to multi-user – From single framework to multiple frameworks –

Deep Leaning Challenges Implementing distributed Obtaining large amount of Large-scale deep learning parallel neural network labeled data and computing platform algorithm for speed, scale, preprocessing efficiency expandability

Architecture of Large-scale Deep Learning System Image/video Apps NLP Apps Speech Apps App Level Caffe-mpi TensorFlow Caffe CNTK mxnet Framework Level Management Scheduling Mirror Management monitoring applied analysis Management Level Inspur Teye Inspur AIStation GPU Inference Platform CPU pre-processing Platform GPU Training Platform Parallel 10GbE/IB Platform Level storage Network

Deep Learning Challenges - Platform Level Design IO efficiency for data pre-processing Computing resources required for modeling, tuning and optimization Inference speed and desired throughput for large amount of sample processing

Architecture of Large-scale Deep Learning Platform Computing Architecture • Data preprocessing platform – CPU cluster • Training platform – CPU+ P100/P40 GPU • HPC Cluster • Inference platform – CPU+P4 GPU • Hadoop • Data Storage • Offline with Lustre – Online with HDFS – Network • Offline with IB – Online with 10GbE –

Deep Learning Challenges - Management Layer Managing different computing platforms and configurations/devices Managing different frameworks for different computing tasks Managing the whole system and monitoring different computing tasks

Deep Learning Management System AIStation is a Deep Learning Cluster and training task management software, which can rapidly deploy training environment for Deep Learning, and comprehensively manage Deep Learning training tasks, providing an efficient and convenient platform for users. Key Functions Deployment of Deep-Learning environment Management of Deep-Learning training tasks GPU & CPU Monitoring GPU resources management and scheduling Cluster Statistics & Report

AIStation - Workflow Resource scheduling Containers run Assign GPUs Training mgmt Container installation Applications start Compose training jobs User interaction 1. Resources ： GPU 1. Shell access 1. Run Contariners 2. Templates ： TF1 2. VNC access 2. Execute Job 1. Job starter 3. Images ： TF/v1.0 3. Training 2. TF1.yaml commands 4. Parametes ： ps,ws… visualization 5. Data ： volume

AIStation - Integrating Deep Learning Frameworks Integrate Deep Learning Frameworks – Supports Multiple Deep Learning Frameworks: Caffe, TensorFlow, CNTK, etc. – Support various models: GoogleNet, VGG, ResNet, etc. – One-Key deployment of the Deep Learning environment – Training jobs submit & schedule – Training process management & visualization GPU Resource Training Jobs 20% 30% Utilization Throughput

Teye : Application Optimization Analysis Tool Analyzing the bottleneck and characteristics of Applicatation • GPU driver data：clock，ECC，power – GPU runtime data：memory util，memory copy，cache，SP/DP Gflops – CPU runtime info： AVX，SSE，SP/DP Gflops, CPI –

Deep Learning Challenges - Framework • How to select from many Deep Learning Framework? Caffe, TensorFlow, MxNET, CNTK, Torch, Theono, DeepLearning4j, PaddlePaddle … • What framework to use for a given scenario and model? • Using a single framework or multiple frameworks?

A Frameworks Comparison Compute Platform: Inspur SR-AI Rack(16 GPUs) + AIStation+Teye (management) • Framework: Caffe, Tensorflow, MxNet • Model: Alexnet, Googlenet • Performance • Alexnet: 4675.799 Images/s (16 GPUs/GPU = 14X) à Caffe is best – Googlenet：2462Images/s (16 GPUs/GPU = 13X) à MxNet is best –

Factors to Consider when Selecting Framework • Based on model size and complexity • Based on different application scenarios Image – Speech – NLP – • Based on data size to select distributed framework Caffe-MPI – Tensorflow – MxNet –

Deep Learning Challenges - Applications Layer • How to improve the recognition accuracy? Model design – Data pre-processing – • How to improve Training performance? CUDA Programming for half Precision (pascal) – CUDA Programming for mixed Precision – • How to improve Inference performance? CUDA Programming for Int8 –

Deep Learning Applications on GPU Speech Training Image Search Time （ S ） 300 256.1 250 200 CPU 150 115.2 GPU 100 50 0 Samples ： 1M ， dimensions:180 Image Training Network Security Time(s) 350 300 250 200 150 100 50 0 CPU:C+MKL 1GPU version 4GPU version version

Deep Learning Platform End-to-End Alexnet/GoogLenet/Resnet CNN/RNN/LSTM Model & Algorithm Deep learning Caffe-MPI TensorFlow MXNET PaddlePaddle training platform DL Framework “Big Win！” DL Management AIStation management system T-Eye Tuning Tool Speech recognition “ This is Daniel wu ” Face recognition Training Data Model Terminal “ Pursuit staff ” Video monitoring GPU AI Cloud “ Retinopathy ” 16 Card Medical imaging 2U8 Card GPU Box “Have booked G6 ” 10G/IB Network 2U4 Card 4U4 Card NF5280M4 P8000 Wrokstation Personal assistant Inference Training GPU Clustre AI recognition processing speech/image/video data processing Cluster / natural language Flash Storage AS5600/13000 Storage

Inspur Deep Learning GPU Servers 8GPU Server 64GPU Server 2GPU Server 4GPU Server Inspur is a leading AI computing providers: NF5280M4 NF5568M4 AGX-2 SR-AI Rack Supply >60% AI HW to CSP in Inference Training Training Training China

Thank You Visit us in Booth #911 COMPUTING INSPIRES FUTURE

Large-scale GPU Deep Learning Platform Design and Case Analysis - PowerPoint PPT Presentation

Large-scale GPU Deep Learning Platform Design and Case Analysis Zhang Qing Alfie Lew YOUR SUCCESS, WE SUCCEED AI Age Has arrived Electric Age AI Age In 1870s In 2012 The second The fourth technological technological

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning on GPU Mattias Flt Dept. of Automatic Control Lund Institute of Technology

Distributed Synthetic Data Platform for Deep Learning Applications BITCOIN OR ETHER AMAZON DEEP

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

SUSE Storage Solutions SUSECON 2019 April 2, 2019 Mike Dilio & Sanjeet Singh Agenda

Creating a Handwriting Recognition Corpus for Bushman Languages Kyle Williams and Hussein Suleman

Minoan linguistic resources: The Linear A digital Corpus The Hong Kong Institute of Education ,

Presentation to the IGG 29 th August 2018 Agenda Long Term No Access I-SEM Smart Metering and

Agenda Unified Planning Assumptions & Study Plan Isabella Nicosia Associate Stakeholder

Oslejsek R. , Toth D., Eichler Z., Burska K. LAB OF SOFTWARE ARCHITECTURES AND INFORMATION

By By 2020 2020 over 15 ZB of data will be stored. 1.5 ZB are stored today. 2 THE PROBLEM

Hitachi Virtual Storage Platform Family Advanced Storage Capabilities for All Organizations Andre

Large-scale GPU Deep Learning Platform Design and Case Analysis - PowerPoint PPT Presentation

Large-scale GPU Deep Learning Platform Design and Case Analysis Zhang Qing Alfie Lew YOUR SUCCESS, WE SUCCEED AI Age Has arrived Electric Age AI Age In 1870s In 2012 The second The fourth technological technological

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning on GPU Mattias Flt Dept. of Automatic Control Lund Institute of Technology

Distributed Synthetic Data Platform for Deep Learning Applications BITCOIN OR ETHER AMAZON DEEP

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

SUSE Storage Solutions SUSECON 2019 April 2, 2019 Mike Dilio &amp; Sanjeet Singh Agenda

Creating a Handwriting Recognition Corpus for Bushman Languages Kyle Williams and Hussein Suleman

Minoan linguistic resources: The Linear A digital Corpus The Hong Kong Institute of Education ,

Presentation to the IGG 29 th August 2018 Agenda Long Term No Access I-SEM Smart Metering and

Agenda Unified Planning Assumptions &amp; Study Plan Isabella Nicosia Associate Stakeholder

Oslejsek R. , Toth D., Eichler Z., Burska K. LAB OF SOFTWARE ARCHITECTURES AND INFORMATION

By By 2020 2020 over 15 ZB of data will be stored. 1.5 ZB are stored today. 2 THE PROBLEM

Hitachi Virtual Storage Platform Family Advanced Storage Capabilities for All Organizations Andre

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

SUSE Storage Solutions SUSECON 2019 April 2, 2019 Mike Dilio & Sanjeet Singh Agenda

Agenda Unified Planning Assumptions & Study Plan Isabella Nicosia Associate Stakeholder