Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and - PowerPoint PPT Presentation

Deep Learning on GPUs March 2016

What is Deep Learning? GPUs and DL AGENDA DL in practice Scaling up DL 2

What is Deep Learning? 3

DEEP LEARNING EVERYWHERE INTERNET & CLOUD MEDICINE & BIOLOGY SECURITY & DEFENSE MEDIA & ENTERTAINMENT AUTONOMOUS MACHINES Image Classification Cancer Cell Detection Face Detection Pedestrian Detection Video Captioning Speech Recognition Diabetic Grading Video Surveillance Lane Tracking Video Search Language Translation Drug Discovery Satellite Imagery Recognize Traffic Sign Real Time Translation Language Processing Sentiment Analysis Recommendation 4

Traditional machine perception Hand crafted feature extractors Classifier/ Raw data Feature extraction Result detector SVM, shallow neural net, … Speaker ID, HMM, speech transcription, … shallow neural net, … Topic classification, machine translation, Clustering, HMM, sentiment analysis… LDA, LSA 5 …

Deep learning approach Train: Errors Dog MODEL Dog Cat Raccoon Cat Honey badger Deploy: MODEL Dog 6

Artificial neural network A collection of simple, trainable mathematical units that collectively learn complex functions Hidden layers Input layer Output layer Given sufficient training data an artificial neural network can approximate very complex functions mapping raw data to output decisions 7

Artificial neurons Biological neuron Artificial neuron y w 1 w 2 w 3 x 1 x 2 x 3 From Stanford cs231n lecture notes y=F(w 1 x 1 +w 2 x 2 +w 3 x 3 ) F(x)=max(0,x) 8

Deep neural network (dnn) Raw data Low-level features Mid-level features High-level features Application components: Task objective e.g. Identify face Training data 10-100M images Network architecture ~10 layers 1B parameters Learning algorithm Input Result ~30 Exaflops ~30 GPU days 9

Deep learning benefits § Robust § No need to design the features ahead of time – features are automatically learned to be optimal for the task at hand § Robustness to natural variations in the data is automatically learned § Generalizable § The same neural net approach can be used for many different applications and data types § Scalable § Performance improves with more data, method is massively parallelizable 10

Baidu Deep Speech 2 End-to-end Deep Learning for English and Mandarin Speech Recognition English and Mandarin speech recognition Transition from English to Mandarin made simpler by end-to-end DL No feature engineering or Mandarin-specifics required More accurate than humans Error rate 3.7% vs. 4% for human tests http://svail.github.io/mandarin/ http://arxiv.org/abs/1512.02595 11

AlphaGo First Computer Program to Beat a Human Go Professional Training DNNs: 3 weeks, 340 million training steps on 50 GPUs Play: Asynchronous multi-threaded search Simulations on CPUs, policy and value DNNs in parallel on GPUs Single machine: 40 search threads, 48 CPUs, and 8 GPUs Distributed version: 40 search threads, 1202 CPUs and 176 GPUs Outcome: Beat both European and World Go champions in best of 5 matches http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html http://deepmind.com/alpha-go.html 12

Deep Learning for Autonomous vehicles 13

Deep Learning Synthesis Texture synthesis and transfer using CNNs. Timo Aila et al., NVIDIA Research 14

THE AI RACE IS ON IMAGENET Accuracy Rate 100% Traditional CV Deep Learning 90% 80% 70% Baidu Deep Speech 2 IBM Watson Achieves Breakthrough Facebook Beats Humans in Natural Language Processing Launches Big Sur 60% 50% 40% 30% 20% 10% Google Microsoft & U. Science & Tech, China Toyota Invests $1B Launches TensorFlow Beat Humans on IQ in AI Labs 0% 2009 2010 2011 2012 2013 2014 2015 2016 15

The Big Bang in Machine Learning DNN BIG DA TA GPU “ Google’s AI engine also reflects how the world of computer hardware is changing. (It) depends on machines equipped with GPUs… And it depends on these chips more than the larger tech universe realizes. ” 16

GPUs and DL USE MORE PROCESSORS TO GO FASTER 17

Deep learning development cycle 18

Three Kinds of Networks DNN – all fully connected layers CNN – some convolutional layers RNN – recurrent neural network, LSTM 19

DNN Key operation is dense M x V Backpropagation uses dense matrix-matrix multiply starting from softmax scores 20

DNN Batching for training and latency insensitive. M x M Batched operation is M x M – gives re-use of weights. Without batching, would use each element of Weight matrix once. Want 10-50 arithmetic operations per memory fetch for modern compute architectures. 21

CNN Requires convolution and M x V Filters conserved through plane Multiply limited – even without batching. 22

Other Operations To finish building a DNN These are not limiting factors with appropriate GPU use Complex networks have hundreds of millions of weights. 23

Lots of Parallelism Available in a DNN 24

13x Faster Training Caffe Dual CPU Server TESLA M40 Reduce Training Time from 13 Days to just 1 Day GPU Server with World’s Fastest Accelerator 4x TESLA M40 for Deep Learning Training 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of Days CUDA Cores 3072 Peak SP 7 TFLOPS GDDR5 Memory 12 GB Bandwidth 288 GB/s Power 250W 28 Gflop/W Note: Caffe benchmark with AlexNet, CPU server uses 2x E5-2680v3 12 Core 2.5GHz CPU, 128GB System Memory, Ubuntu 14.04 25

Comparing CPU and GPU – server class Xeon E5-2698 and Tesla M40 NVIDIA Whitepaper “GPU based deep learning inference: A performance and power analysis.” 26

DL in practice 27

The Engine of Modern AI EDUCATION BIG SUR TENSORFLOW WATSON CNTK TORCH CAFFE THEANO MATCONVNET START-UPS MOCHA.JL PURINE CHAINER DL4J KERAS OPENDEEP MINERVA MXNET* SCHULTS VITRUVIAN LABORATORIES NVIDIA GPU PLATFORM 28 * U. Washington, CMU, Stanford, TuSimple, NYU, Microsoft, U. Alberta, MIT, NYU Shanghai

CUDA for Deep Learning Development DEEP LEARNING SDK DIGITS cuDNN cuSPARSE cuBLAS NCCL TITAN X DEVBOX GPU CLOUD 29

Tiled FFT up to 2x faster than FFT GPU-accelerated Deep Learning § subroutines 2.5x High performance neural network § 2.0x training 1.5x Accelerates Major Deep Learning § 1.0x frameworks: Caffe, Theano, 0.5x Torch, TensorFlow 0.0x Deep Learning Primitives Up to 3.5x faster AlexNet training § in Caffe than baseline GPU Millions of Images Trained Per Day Accelerating Artificial Intelligence 100 80 60 40 20 0 cuDNN 1 cuDNN 2 cuDNN 3 cuDNN 4 developer .nvidia.com/cudnn 30

Caffe Performance 6 M40+cuDNN4 5 CUDA BOOSTS M40+cuDNN3 4 DEEP LEARNING Performance 3 5X IN 2 YEARS 2 K40+cuDNN1 K40 1 0 11/2013 9/2014 7/2015 12/2015 AlexNet training throughput based on 20 iterations, CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04 31

NVIDIA DIGITS Interactive Deep Learning GPU Training System Process Data Configure DNN Monitor Progress Visualize Layers Test Image Test Image developer .nvidia.com/digits 32

ONE ARCHITECTURE — END-TO-END AI PC GAMING Jetson T esla Titan X DRIVE PX for Embedded for Cloud for PC for Auto 33

Scaling DL 34

Scaling Neural Networks Data Parallelism W W Sync. Image 1 Image 2 Machine 1 Machine 2 Notes: Need to sync model across machines. Largest models do not fit on one GPU. Requires P-fold larger batch size. Works across many nodes – parameter server approach – linear speedup. Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Ng and Bryan Catanzaro 35

Multiple GPUs Near linear scaling – data parallel. Ren Wu et al, Baidu, “Deep Image: Scaling up Image Recognition.” arXiv 2015 36

Scaling Neural Networks Model Parallelism W Image 1 Machine 1 Machine 2 Notes: Allows for larger models than fit on one GPU. Requires much more frequent communication between GPUs. Most commonly used within a node – GPU P2P . Effective for the fully connected layers. Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Ng and Bryan Catanzaro 37

Scaling Neural Networks Hyper Parameter Parallelism Try many alternative neural networks in parallel – on different CPU / GPU / Machines. Probably the most obvious and effective way! 38

Deep Learning Everywhere NVIDIA DRIVE PX NVIDIA T esla NVIDIA Jetson NVIDIA Titan X Contact: jbarker@nvidia.com 39

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and - PowerPoint PPT Presentation

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice Scaling up DL 2 What is Deep Learning? 3 DEEP LEARNING EVERYWHERE INTERNET & CLOUD MEDICINE & BIOLOGY SECURITY & DEFENSE MEDIA &

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Machine Learning on GPUs Seminar talk | Daniel Schlegel | 28.01.2015 University of

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Resource Elasticity in Distributed Deep Learning Andrew Or , Haoyu Zhang * , Michael J. Freedman

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Clusters of GPUs Michael LeBeane mlebeane@utexas.edu Advisor : Lizy K. John Problem Statement

Deep Learning For Retail And Marketing: Practical Use Cases on GPUs in the Cloud GTC, 3/28

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Deep Fisher Networks and Class Saliency Maps for Object Classification and Localisation Karn

Global Optimality in Neural Network Training Benjamin D. Haeffele and Ren Vidal Johns Hopkins

Deep Equilibrium Models Shaojie Bai Carnegie Mellon University joint work with J. Zico Kolter

Deep Learning to Evaluate Secure RSA Implementations Mathieu Carbone, Vincent Conin, Marie-Angela

Uni.lu HPC School 2019 PS12a: Machine / Deep learning I Keras/Tensorflow CPU/GPU Uni.lu High

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and - PowerPoint PPT Presentation

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice Scaling up DL 2 What is Deep Learning? 3 DEEP LEARNING EVERYWHERE INTERNET & CLOUD MEDICINE & BIOLOGY SECURITY & DEFENSE MEDIA &

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Machine Learning on GPUs Seminar talk | Daniel Schlegel | 28.01.2015 University of

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Resource Elasticity in Distributed Deep Learning Andrew Or , Haoyu Zhang * , Michael J. Freedman

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Clusters of GPUs Michael LeBeane mlebeane@utexas.edu Advisor : Lizy K. John Problem Statement

Deep Learning For Retail And Marketing: Practical Use Cases on GPUs in the Cloud GTC, 3/28

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Deep Fisher Networks and Class Saliency Maps for Object Classification and Localisation Karn

Global Optimality in Neural Network Training Benjamin D. Haeffele and Ren Vidal Johns Hopkins

Deep Equilibrium Models Shaojie Bai Carnegie Mellon University joint work with J. Zico Kolter

Deep Learning to Evaluate Secure RSA Implementations Mathieu Carbone, Vincent Conin, Marie-Angela

Uni.lu HPC School 2019 PS12a: Machine / Deep learning I Keras/Tensorflow CPU/GPU Uni.lu High

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets

Interpretable &amp; Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech