High-Performance Deep Learning: Issues, Trends, and Challenges CSE - - PowerPoint PPT Presentation

high performance deep learning issues trends and
SMART_READER_LITE
LIVE PREVIEW

High-Performance Deep Learning: Issues, Trends, and Challenges CSE - - PowerPoint PPT Presentation

High-Performance Deep Learning: Issues, Trends, and Challenges CSE 5194.01 Autumn 20 Dhabaleswar K. (DK) Panda Hari Subramoni Arpan Jain The Ohio State University The Ohio State University The Ohio State University E-mail:


slide-1
SLIDE 1

High-Performance Deep Learning: Issues, Trends, and Challenges

CSE 5194.01 Autumn ‘20

Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Hari Subramoni The Ohio State University E-mail: subramon@cse.ohio-state.edu http://www.cse.ohio-state.edu/~subramon Arpan Jain The Ohio State University E-mail: jain.575@osu.edu http://www.cse.ohio-state.edu/~jain.575

slide-2
SLIDE 2

CSE 5194.01

2

Network Based Computing Laboratory

  • Introduction

– The Past, Present, and Future of Deep Learning – What are Deep Neural Networks? – Diverse Applications of Deep Learning – Deep Learning Frameworks

  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for Deep Learning

Outline

slide-3
SLIDE 3

CSE 5194.01

3

Network Based Computing Laboratory

What is Deep Learning?

Courtesy: https://hackernoon.com/difference-between-artificial-intelligence-machine-learning- and-deep-learning-1pcv3zeg, https://blog.dataiku.com/ai-vs.-machine-learning-vs.-deep-learning

  • Deep Learning (DL)

– A subset of Machine Learning that uses Deep Neural Networks (DNNs) – Perhaps, the most revolutionary subset!

  • Based on learning data representation
  • Examples Convolutional Neural Networks,

Recurrent Neural Networks, Hybrid Networks

  • Data Scientist or Developer Perspective

1. Identify DL as solution to a problem 2. Determine Data Set 3. Select Deep Learning Algorithm to Use 4. Use a large data set to train an algorithm

slide-4
SLIDE 4

CSE 5194.01

4

Network Based Computing Laboratory

Brief History of Deep Learning (DL)

Courtesy: http://www.zdnet.com/article/caffe2-deep-learning-wide-ambitions-flexibility-scalability-and-advocacy/

slide-5
SLIDE 5

CSE 5194.01

5

Network Based Computing Laboratory

Milestones in the Development of Neural Networks

Courtesy: https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html

slide-6
SLIDE 6

CSE 5194.01

6

Network Based Computing Laboratory

The Deep Learning Revolution

Adopted from: http://www.deeplearningbook.org/contents/intro.html

  • Deep Learning (DL) is a sub-set of

Machine Learning (ML)

– Perhaps, the most revolutionary subset! – Feature extraction vs. hand-crafted features – Availability of datasets!

  • Deep Learning

– A renewed interest and a lot of hype! – Key success: Deep Neural Networks (DNNs) – Everything was there since the late 80s except the “computability of DNNs”

AI

Machine Learning Deep Learning

Examples: MLPs, DNNs,

Examples: Logistic Regression

slide-7
SLIDE 7

CSE 5194.01

7

Network Based Computing Laboratory

  • Modern and efficient hardware enabled

– Computability of DNNs – impossible in the past! – GPUs – at the core of DNN training – CPUs – catching up fast

  • Availability of Datasets

– MNIST, CIFAR10, ImageNet, and more…

  • Excellent Accuracy for many application areas

– Vision, Machine Translation, and several others...

Three key pieces in the DL Resurgence

Courtesy: A. Canziani et al., “An Analysis of Deep Neural Network Models for Practical Applications”, CoRR, 2016. 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 2 GTX 580 DGX-2

Minutes to Train

AlexNet

~500 00X i X in 5 years

slide-8
SLIDE 8

CSE 5194.01

8

Network Based Computing Laboratory

The Rise of GPU-based Deep Learning

Courtesy: http://images.nvidia.com/content/technologies/deep-learning/pdf/NVIDIA-DeepLearning-Infographic-v11.pdf

slide-9
SLIDE 9

CSE 5194.01

9

Network Based Computing Laboratory

Intel is committed to AI and Deep Learning as well!

Courtesy: https://newsroom.intel.com/editorials/krzanich-ai-day/

slide-10
SLIDE 10

CSE 5194.01

10

Network Based Computing Laboratory

Deep Learning and High-Performance Architectures

*https://blogs.nvidia.com/blog/2014/09/07/imagenet/

  • NVIDIA GPUs are the main driving force for faster training of DL models

– The ImageNet Challenge - (ILSVRC) -- 90% of the teams used GPUs (2014)* – Deep Neural Networks (DNNs) like ResNet(s) and Inception

  • However, High Performance Architectures for DL and HPC are evolving

– 110/500 Top HPC systems use NVIDIA GPUs (Jun ’20) – DGX-1 (Pascal) and DGX-2 (Volta)

  • Dedicated DL supercomputers

– Cascade-Lake Xeon CPUs have 28 cores/socket (TACC Frontera– #8 on Top500) – AMD EPYC (Rome) CPUs have 64 cores/socket (Upcoming DOE Clusters) – AMD GPUs will be powering Frontier – DOE’s Exascale System at ORNL – Domain Specific Accelerators for DNNs are also emerging

Accelerator/CP Performance Share www.top500.org

slide-11
SLIDE 11

CSE 5194.01

11

Network Based Computing Laboratory

The Bright Future of Deep Learning

Courtesy: https://www.top500.org/news/market-for-artificial-intelligence-projected-to-hit-36-billion-by-2025/

slide-12
SLIDE 12

CSE 5194.01

12

Network Based Computing Laboratory

Current and Future Use Cases of Deep Learning

Courtesy: https://www.top500.org/news/market-for-artificial-intelligence-projected-to-hit-36-billion-by-2025/

slide-13
SLIDE 13

CSE 5194.01

13

Network Based Computing Laboratory

  • Introduction

– The Past, Present, and Future of Deep Learning – What are Deep Neural Networks? – Diverse Applications of Deep Learning – Deep Learning Frameworks

  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for Deep Learning

Outline

slide-14
SLIDE 14

CSE 5194.01

14

Network Based Computing Laboratory

  • Example of a 3-layer Deep Neural Network (DNN) – (input layer is not counted)

So what is a Deep Neural Network?

Courtesy: http://cs231n.github.io/neural-networks-1/

slide-15
SLIDE 15

CSE 5194.01

15

Network Based Computing Laboratory

Graphical/Mathematical Intuitions for DNNs

Drawing of a Biological Neuron The Mathematical Model

Courtesy: http://cs231n.github.io/neural-networks-1/

slide-16
SLIDE 16

CSE 5194.01

16

Network Based Computing Laboratory

Key Phases: DNN Training and Inference

Courtesy: https://devblogs.nvidia.com/

  • Training is compute intensive

– Many passes over data – Can take days to weeks – Model adjustment is done

  • Inference

– Single pass over the data – Should take seconds – No model adjustment

  • Challenge: How to make

“Training” faster?

– Need Parallel and Distributed Training…

slide-17
SLIDE 17

CSE 5194.01

17

Network Based Computing Laboratory

  • To actually train a network, please visit: http://playground.tensorflow.org

TensorFlow playground (Quick Demo)

slide-18
SLIDE 18

CSE 5194.01

18

Network Based Computing Laboratory

  • To try your own image, please visit: https://microsoft.github.io/onnxjs-demo/#/resnet50

Inference on trained ResNet50 (Quick Demo)

slide-19
SLIDE 19

CSE 5194.01

19

Network Based Computing Laboratory

  • Introduction

– The Past, Present, and Future of Deep Learning – What are Deep Neural Networks? – Diverse Applications of Deep Learning – Deep Learning Frameworks

  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for Deep Learning

Outline

slide-20
SLIDE 20

CSE 5194.01

20

Network Based Computing Laboratory

  • Vision

– Image Classification – Style Transfer – Caption Generation

  • Speech

– Speech Recognition – Real-time Translation

  • Text

– Sequence Recognition and Generation

  • Disease discovery

– Cancer Detection

  • Autonomous Driving

– Combination of multiple areas like Image/Object Detection, Speech Recognition, etc.

Diverse Application Areas for Deep Learning

slide-21
SLIDE 21

CSE 5194.01

21

Network Based Computing Laboratory

Style Transfer

Courtesy: https://github.com/alexjc/neural-doodle

slide-22
SLIDE 22

CSE 5194.01

22

Network Based Computing Laboratory

Style Transfer

Courtesy: https://github.com/alexjc/neural-doodle

slide-23
SLIDE 23

CSE 5194.01

23

Network Based Computing Laboratory

Caption Generation

Courtesy: https://machinelearningmastery.com/inspirational-applications-deep-learning/

slide-24
SLIDE 24

CSE 5194.01

24

Network Based Computing Laboratory

  • VIOLA: Why, Salisbury must find his flesh and thought That which I am not aps, not a

man and in fire, To show the reining of the raven and the wars To grace my hand reproach within, and not a fair are hand, That Caesar and my goodly father's world; When I was heaven of presence and our fleets, We spare with hours, but cut thy council I am great, Murdered and by thy master's ready there My power to give thee but so much as hell: Some service in the noble bondman here, Would show him to her wine.

  • KING LEAR: O, if you were a feeble sight, the courtesy of your law, Your sight and

several breath, will wear the gods With his heads, and my hands are wonder'd at the deeds, So drop upon your lordship's head, and your opinion Shall be against your honour.

Shakespeare’s Style Passage Generation

Remember, all the RNN knows are characters, so in particular it samples both speaker’s names and the contents. Sometimes we also get relatively extended monologue passages, such as:

Courtesy: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

slide-25
SLIDE 25

CSE 5194.01

25

Network Based Computing Laboratory

Machine Translation

Courtesy: https://research.googleblog.com/2015/07/how-google-translate-squeezes-deep.html

Some of the “dirty” letters we use for training. Dirt, highlights, and rotation, but not too much because we don’t want to confuse our neural net.

slide-26
SLIDE 26

CSE 5194.01

26

Network Based Computing Laboratory

Google Translate

Courtesy: https://www.theverge.com/2015/1/14/7544919/google-translate-update-real-time-signs-conversations

slide-27
SLIDE 27

CSE 5194.01

27

Network Based Computing Laboratory

Self Driving Cars

Courtesy: http://www.teslarati.com/teslas-full-self-driving-capability-arrive-3-months-definitely-6-months-says-musk/

slide-28
SLIDE 28

CSE 5194.01

28

Network Based Computing Laboratory

Cancer Detection

Courtesy: https://blog.insightdatascience.com/automating-breast-cancer-detection-with-deep-learning-d8b49da17950

slide-29
SLIDE 29

CSE 5194.01

29

Network Based Computing Laboratory

  • Applications

– Prostate Cancer Detection – Metastasis Detection in Breast Cancer – Genetic Mutation Prediction – Tumor Detection for Molecular Analysis

AI-Driven Digital Pathology

Courtesy: https://www.frontiersin.org/articles/10.3389/fmed.2019.00185/full

slide-30
SLIDE 30

CSE 5194.01

30

Network Based Computing Laboratory

  • Computer Vision Applications (image classification, object detection ….)

– For many, the default answer is Convolutional Neural Networks (CNNs)

  • Convolutional Neural Network

– Dense Layers (used a classifier) – Convolution Layer (used as Feature Extraction layer)

  • Convolution operation
  • Activation function
  • Pooling

Most Well Known Application Area: Computer Vision

Courtesy: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/

slide-31
SLIDE 31

CSE 5194.01

31

Network Based Computing Laboratory

What is a Convolution Operation? Why do we need it?

Found Vertical Edge

Different Filter will give a different feature

Courtesy: https://www.analyticsvidhya.com/blog/2018/12/guide-convolutional-neural-network-cnn/

slide-32
SLIDE 32

CSE 5194.01

32

Network Based Computing Laboratory

Example of a Convolution Filter

  • 1
  • 2
  • 1

1 2

  • 1
  • 1

1

  • 2

2

  • 1

1

Sobel Filter

slide-33
SLIDE 33

CSE 5194.01

33

Network Based Computing Laboratory

  • Introduction

– The Past, Present, and Future of Deep Learning – What are Deep Neural Networks? – Diverse Applications of Deep Learning – Deep Learning Frameworks

  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for Deep Learning

Outline

slide-34
SLIDE 34

CSE 5194.01

34

Network Based Computing Laboratory

  • Deep Learning frameworks have emerged

– hide most of the complicated mathematics – focus on the design of neural networks

  • We have saturated the peak potential of current-

generation architectures – A single GPU or a many-core CPU is not enough!

  • Two strategies to deal with current limitations

– Parallel (multiple units in a single node) and/or Distributed (multiple nodes) training of DNNs – Dedicated hardware architectures for DNNs are being developed

  • DL Frameworks will need to be enhanced for both

strategies

DL Frameworks, Hardware Architectures, and Distributed Training

Statement and its dataflow fragment. The data and computing vertexes with different colors reside on different processes. Courtesy: https://web.stanford.edu/~rezab/nips2014workshop/submits/minerva.pdf

slide-35
SLIDE 35

CSE 5194.01

35

Network Based Computing Laboratory

  • Many Deep Learning frameworks!!

– Google TensorFlow – Facebook Torch/PyTorch – Berkeley Caffe – Microsoft CNTK – Chainer/ChainerMN – Intel Neon/Nervana Graph

  • Open Neural Net eXchange (ONNX) Format

Deep Learning Frameworks

slide-36
SLIDE 36

CSE 5194.01

36

Network Based Computing Laboratory

  • The most widely used framework open-sourced by Google
  • Replaced Google’s DistBelief framework
  • Runs on almost all architectures (CPU, GPU, TPU, Mobile, etc.)
  • Gone back and forth for APIs

– TF 1.0 – Lazy Execution and Sessions/Estimators – TF 2.0 – Eager Execution and tf.keras

  • https://github.com/tensorflow/tensorflow

Google TensorFlow (Most Popular)

Courtesy: https://www.tensorflow.org/ Martin Abadi et al., “TensorFlow: A system for large-scale machine learning” https://ai.google/research/pubs/pub45381

slide-37
SLIDE 37

CSE 5194.01

37

Network Based Computing Laboratory

  • Torch was written in Lua

– Adoption wasn’t wide-spread

  • PyTorch is a Python adaptation of Torch

– Gaining lot of attention

  • Several contributors

– Biggest support by Facebook

  • PyTorch and Caffe2 have been merged now to PyTorch
  • Key selling point is ease of expression and “define-by-run” approach

Facebook Torch/PyTorch - Catching up fast!

slide-38
SLIDE 38

CSE 5194.01

38

Network Based Computing Laboratory

  • MXNet

– An Apache incubator project – Strongly supported by Amazon now

  • D2L.ai – A deep learning book with

– Interactive jupyter notebooks, math formula, and a forum

  • MXNet -- can work as a Keras backend
  • Key selling point: Rich and flexible ecosystem with Gluon

– GluonCV – Computer Vision – GluonNLP – Natural Language Processing

MXNet

slide-39
SLIDE 39

CSE 5194.01

39

Network Based Computing Laboratory

  • ONNX- Not a Deep Learning framework but an open format to

exchange “trained” networks across different frameworks

  • Currently supported

– Frameworks: Caffe2, Chainer, CNTK, MXNet, PyTorch – Convertors: CoreML, TensorFlow – Runtimes: NVIDIA

  • https://onnx.ai
  • https://github.com/onnx

Open Neural Network eXchange (ONNX) Format

slide-40
SLIDE 40

CSE 5194.01

40

Network Based Computing Laboratory

  • Caffe – https://caffe.berkeleyvision.org
  • Keras - https://keras.io
  • Theano - http://deeplearning.net/software/theano/
  • Blocks - https://blocks.readthedocs.io/en/latest/
  • Intel BigDL - https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-
  • n-apache-spark
  • The list keeps growing and the names keep getting longer and weirder ;-)

– Livermore Big Artificial Neural Network Toolkit (LBANN) - https://github.com/LLNL/lbann – Deep Scalable Sparse Tensor Network Engine (DSSTNE) - https://github.com/amzn/amazon-dsstne

Many Other DL Frameworks…

slide-41
SLIDE 41

CSE 5194.01

41

Network Based Computing Laboratory

  • AI Index report offers very

detailed trends about AI and ML

– Interesting stats. about DL frameworks

  • TheGradient* has a latest article
  • n PyTorch winning over

TensorFlow in CVPR, ICML, ICLR and other conferences

Statistics about DL Frameworks

Courtesy: https://aiindex.org

* https://thegradient.pub/state-of-ml-frameworks-2019-

pytorch-dominates-research-tensorflow-dominates-industry/

slide-42
SLIDE 42

CSE 5194.01

42

Network Based Computing Laboratory

  • Introduction
  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for Deep Learning

Outline

slide-43
SLIDE 43

CSE 5194.01

43

Network Based Computing Laboratory

  • Early (2014) frameworks used a single fast GPU

– As DNNs became larger, faster and better GPUs became available

  • Today

– Parallel training on multiple GPUs is being supported by most frameworks – Distributed (multiple nodes) training is still upcoming

  • A lot of fragmentation in the efforts (MPI, Big-Data, NCCL, Gloo, etc.)

– On the other hand, DL has made its way to Mobile and Web too!

  • Smartphones - OK Google, Siri, Cortana, Alexa, etc.
  • DrivePX – the computer that drives NVIDIA’s self-driving car
  • Very recently, Google announced Deeplearn.js (a DL framework in a web-browser)
  • TensorFlow playground - http://playground.tensorflow.org/

So where do we run our DL framework?

slide-44
SLIDE 44

CSE 5194.01

44

Network Based Computing Laboratory

Conventional Execution on GPUs and CPUs

  • We have all heard

– Our framework is faster than your framework!

  • This needs to be understood in a holistic way
  • Performance

– Depends on the entire execution environment (the full stack) – Multiple helper libraries and systems have an impact

  • Isolated view of performance is not helpful for ML/DL workloads
  • CPU-specific and GPU-specific optimizations need to be understood
  • A. A. Awan, H. Subramoni, and Dhabaleswar K. Panda. “An In-depth Performance Characterization of CPU- and GPU-based DNN Training
  • n Modern Architectures”, In Proceedings of the Machine Learning on HPC Environments (MLHPC'17). ACM, New York, NY, USA, Article 8.
slide-45
SLIDE 45

CSE 5194.01

45

Network Based Computing Laboratory

  • BLAS Libraries – the heart of math
  • perations

– Atlas/OpenBLAS – NVIDIA cuBlas – Intel Math Kernel Library (MKL)

  • Most compute intensive layers are

generally optimized for a specific hardware

– E.g. Convolution Layer, Pooling Layer, etc.

  • DNN Libraries – the heart of Convolutions!

– NVIDIA cuDNN (already reached its 7th iteration – cudnn-v7) – Intel MKL-DNN – a promising development for CPU-based ML/DL training

DL Frameworks and Underlying Libraries

  • A. A. Awan, H. Subramoni, and Dhabaleswar K. Panda. “An In-depth Performance Characterization of CPU- and GPU-based DNN Training
  • n Modern Architectures”, In Proceedings of the Machine Learning on HPC Environments (MLHPC'17). ACM, New York, NY, USA, Article 8.
slide-46
SLIDE 46

CSE 5194.01

46

Network Based Computing Laboratory

  • Introduction
  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for Deep Learning

Outline

slide-47
SLIDE 47

CSE 5194.01

47

Network Based Computing Laboratory

  • Why do we need Parallel Training?
  • Larger and Deeper models are being proposed

– AlexNet to ResNet to Neural Machine Translation (NMT) – DNNs require a lot of memory – Larger models cannot fit a GPU’s memory

  • Single GPU training became a bottleneck
  • As mentioned earlier, community has already moved to multi-GPU training
  • Multi-GPU in one node is good but there is a limit to Scale-up (8 GPUs)
  • Multi-node (Distributed or Parallel) Training is necessary!!

The Need for Parallel and Distributed Training

slide-48
SLIDE 48

CSE 5194.01

48

Network Based Computing Laboratory

  • Strong scaling CIFAR10 Training with

OSU-Caffe (1 –> 4 GPUs) – Batch Size 2K

  • Large batch size is needed for

scalability.

  • Adding more GPUs may degrade the

scaling efficiency

Benefits of Distributed Training: An Example with Caffe

OSU-Caffe is available from the HiDL project page (http://hidl.cse.ohio-state.edu)

20 40 60 80 100 120 140 160 CIFAR-10 Time (seconds)

CIFAR-10 Training with OSU-Caffe

1-GPU 2-GPUs 4-GPUs

Run Command - (change $np from 1—4) mpirun_rsh -np $np ./build/tools/caffe train -solver examples/cifar10/cifar10_quick_solver.prototxt

  • scal strong

Output: I0123 21:49:24.289763 75582 caffe.cpp:351] Avg. Time Taken: 142.101 Output: I0123 21:54:03.449211 97694 caffe.cpp:351] Avg. Time Taken: 74.6679 Output: I0123 22:02:46.858219 20659 caffe.cpp:351] Avg. Time Taken: 39.8109

slide-49
SLIDE 49

CSE 5194.01

49

Network Based Computing Laboratory

  • Introduction
  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for Deep Learning

Outline

slide-50
SLIDE 50

CSE 5194.01

50

Network Based Computing Laboratory

Drivers of Modern HPC Cluster Architectures

  • Multi-core/many-core technologies
  • Remote Direct Memory Access (RDMA)-enabled networking (InfiniBand and RoCE)
  • Solid State Drives (SSDs), Non-Volatile Random-Access Memory (NVRAM), NVMe-SSD
  • Accelerators (NVIDIA GPGPUs)
  • Available on HPC Clouds, e.g., Amazon EC2, NSF Chameleon, Microsoft Azure, etc.

Accelerators high compute density, high performance/watt >1 TFlop DP on a chip High Performance Interconnects - InfiniBand <1usec latency, 200Gbps Bandwidth> Multi-/Many-core Processors SSD, NVMe-SSD, NVRAM

K - Computer Sunway TaihuLight Summit Sierra

slide-51
SLIDE 51

CSE 5194.01

51

Network Based Computing Laboratory

  • Hardware

– Interconnects – InfiniBand, RoCE, Omni-Path, etc. – Processors – GPUs, Multi-/Many-core CPUs, Tensor Processing Unit (TPU), FPGAs, etc. – Storage – NVMe, SSDs, Burst Buffers, etc.

  • Communication Middleware

– Message Passing Interface (MPI)

  • CUDA-Aware MPI, Many-core Optimized MPI runtimes (KNL-specific
  • ptimizations)

– NVIDIA NCCL

HPC Technologies

slide-52
SLIDE 52

CSE 5194.01

52

Network Based Computing Laboratory

  • Introduction
  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for Deep Learning

Outline

slide-53
SLIDE 53

CSE 5194.01

53

Network Based Computing Laboratory

Exploiting HPC for Deep Learning

Deep Learning

(Caffe, TensorFlow, PyTorch, Horovod, BigDL, etc.)

Advanced Hardware (OpenPOWER CPU, GPU, TPU, InfiniBand)

HPC

(MPI, RDMA, Lustre, etc.)

Convergence of HPC and Deep Learning! But how?

slide-54
SLIDE 54

CSE 5194.01

54

Network Based Computing Laboratory

  • Next class will be Introduction to Deep Learning and its associated

terminologies

  • After that, we have the following:

– Introduction to HPC Technologies – Deep Learning Frameworks – Overview of State-of-the-art DL Models – Challenges in Exploiting HPC for DL – The need for Co-Design – Solutions and Case Studies – Latest and Emerging Trends

Next Classes and Plan

slide-55
SLIDE 55

CSE 5194.01

55

Network Based Computing Laboratory

Thank You!

The High-Performance Deep Learning Project http://hidl.cse.ohio-state.edu/ Network-Based Computing Laboratory http://nowlab.cse.ohio-state.edu/ The MVAPICH2 Project http://mvapich.cse.ohio-state.edu/ panda@cse.ohio-state.edu subramon@cse.ohio-state.edu jain.575@osu.edu