Computer architecture for deep learning applications David Brooks - PowerPoint PPT Presentation

Computer architecture for deep learning applications David Brooks School of Engineering and Applied Sciences Harvard University

The rise of deep learning

Google Translate è Neural in Nov’16 https://blog.google/products/translate/translate-where-you-need-it-in-any-app/ 5

Google Translate è Neural in Nov’16 https://blog.google/products/translate/translate-where-you-need-it-in-any-app/ 6

Why computer architecture for ML? Roelof Pieters, Jan 2015 7

Why computer architecture for ML? “The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence… [It] is expected to be finished in about a year at a cost of $100,000… Later perceptrons will be able to recognize people and call out their names and instantly translate speech in one language to speech in another.” New Navy Device Learns By Doing , New York Times, July 1958 8

Why computer architecture for ML? “By May, the (Google) Brain team understood that the only way they were ever going to make the system fast enough to implement as a product was if they could run it on T.P.U.s, the special-purpose chips that (Jeff) Dean had called for. As (Zhifeng) Chen put it: “We did not even know if the code would work. But we did know that without T.P.U.s, it definitely wasn’t going to work.” The Great A.I. Awakening , New York Times, Dec 2016 9

Today’s virtuous cycle Better More Algorithms Compute Bigger (and better) Data

Architectural Support for Deep Learning at Harvard A Full-Stack Approach to Machine Learning Algorithms Co-Designing Deep Neural Network Accelerators for Accuracy and Energy Using Bayesian Optimization Tools Fathom: Reference Workloads for Modern Deep Learning Methods Architectures Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators Circuits SM2: A Deep Neural Network Accelerator SoC in 28nm bulk and 16nm FinFET

Shortcomings of current hardware research 1. Narrow focus Researchers have latched on to just a few methods 2. Mismatch between research and reality We need real models, real data, and real environments 3. Abundant folklore Lack of hard numbers leads to conflicting assumptions

The community has a narrow focus Characteristics of deep learning models 8 16 research projects from top-tier conferences 9 10 11 12 14 21 24 26 35 38 39 40 44 47 49

The community has a narrow focus Neuronal style: What building blocks are used? F C R N 8 9 10 Fully-connected (FC) neural networks 11 12 Convolutional neural networks (CNN) 14 21 Recurrent neural networks (RNN) 24 26 Novel architectures (everything else) 35 38 39 40 44 47 49 Neuronal Style

The community has a narrow focus Learning task: What are the underlying use-case assumptions? Inference: use a pre-trained network Supervised: train with labeled data Unsupervised: train without labels Reinforcement: train with loose feedback Neuronal Learning Style Task

The community has a narrow focus Application: Which problem domains are considered? Computer vision Speech recognition Language modeling Function approximation Knowledge reasoning General AI Neuronal Learning Application Style Task Domain

The community has a narrow focus Model depth: How large are the models? 1+ layers 6+ layers 11+ layers 16+ layers 21+ layers 26+ layers Neuronal Learning Application Model Style Task Domain Depth

The community has a narrow focus This is a problem. Neuronal Learning Application Model Style Task Domain Depth

Realism in models, data, and environments Existing Research… …and Reality Stable, established models; Models are constantly in flux; avoids state of the art new ones appear often

Realism in models, data, and environments Existing Research… …and Reality Stable, established models; Models are constantly in flux; avoids state of the art new ones appear often Small, manageable data sets, Large, unwieldy data sets, used in isolation often combined with preprocessing or staging

Realism in models, data, and environments Existing Research… …and Reality Stable, established models; Models are constantly in flux; avoids state of the art new ones appear often Small, manageable data sets, Large, unwieldy data sets, used in isolation often combined with preprocessing or staging Simple, stand-alone Kernels are embedded in implementations complex, high-level frameworks

Conflicting assumptions cause confusion “Convolutions account for over 90% of the processing in CNNs for both inference/testing and training” - Chen et al. (2016) “In convolutional neural network (CNN), fully connected layers [make up] more than 96% of the connections … [and] up to 38% computation time.” - Han et al. (2016)

Conflicting assumptions cause confusion “Convolutions account for over 90% of the processing in CNNs for both inference/testing and training” - Chen et al. (2016) “In convolutional neural network (CNN), fully connected layers [make up] more than 96% of the connections … [and] up to 38% computation time.” - Han et al. (2016) The worst part? They’re both right. There is no single answer, no single design.

Conflicting assumptions cause confusion And we finally start to see some industrial data… 95% of Google’s TPU Workloads - Jouppi et al. (ISCA 2017)

Broaden architectural research Foster realism Abolish deep learning folklore Reduce barriers to entry

What is Fathom? 8 diverse, state-of-the-art learning models Seq2Seq Compatible with widely-used datasets MemNet Clear, tested implementations in TensorFlow Speech High-level frameworks are here to stay Autoenc Training and inference modes provided Residual VGG High-level behavioral characterization Provide hard numbers and intuition AlexNet DeepQ

The Fathom workloads Watershed model for deep neural networks Seq2Seq Neuron style: Convolutional/Fully-connected MemNet Learning task: Supervised learning Domain: Image classification Speech Model: 5-CNN,2-FC network, ReLU nonlinearity Autoenc Residual VGG AlexNet DeepQ Krizhevsky, et al. “ImageNet Classification with Deep Convolutional Neural Networks.” NIPS, 2012

The Fathom workloads Atari-playing neural network from DeepMind Seq2Seq Neuron style: Convolutional/Fully-connected MemNet Learning task: Reinforcement learning Domain: General AI Speech Model: 3-CNN,2-FC network for estimating value, trained via Q-learning with experience replay Autoenc Residual VGG AlexNet DeepQ Mnih, et al. “Human-Level Control Through Deep Reinforcement Learning.” Nature, 2015

The Fathom workloads Facebook’s memory-oriented learning model Seq2Seq Neuron style: Memory networks MemNet Learning task: Supervised learning Domain: Q&A, Automated reasoning Speech Model: 3-layer memory network, built using indirect lookups on sentence embeddings Autoenc Residual VGG AlexNet DeepQ Sukhbaatar, et al. “End-To-End Memory Networks.” NIPS, 2015

Understanding the Fathom workloads Fathom is a tool. Tools require understanding to use. High-level, quantitative intuition on: Distribution of primitive operations Performance profiles Workload similarity Hardware and mode effects Parallelism and scaling

Deep learning models in a high-level framework TensorFlow models are coarse-grained dataflow graphs Basic building block is an “operation” Ops are a useful abstraction Map to underlying library Enables causal reasoning Stable performance across the lifetime of a run

Models are dominated by a few operation types Each model spends 90% of its time in ≤ 6 ops All models jointly spend 90% of their time in 22 ops

Operation type profiling Deep learning methods rely on different primitives

Operation type profiling Deep learning methods rely on different primitives Some trends are obvious and expected CNNs Convolutions

Operation type profiling Deep learning methods rely on different primitives Some trends are obvious and expected Most ops fall into a few broad performance classes

Performance similarity in Fathom Compute similarity via cosine similarity between op profiles

Performance similarity in Fathom Compute similarity via cosine similarity between op profiles CNNs

Performance similarity in Fathom Compute similarity via cosine similarity between op profiles RNNs CNNs

Architecture and mode effects High-level models make discriminative analysis easy

Architecture and mode effects High-level models make discriminative analysis easy ~3x mean speedup

Architecture and mode effects High-level models make discriminative analysis easy

Computer architecture for deep learning applications David Brooks - PowerPoint PPT Presentation

Computer architecture for deep learning applications David Brooks School of Engineering and Applied Sciences Harvard University The rise of deep learning The rise of deep learning The rise of deep learning Google Translate Neural in

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into

APPLICATIONS OF DEEP LEARNING TO COMPUTER VISION AND COMPUTER GRAPHICS Mike Houston Practical DEEP

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

LCD and LArIAT Datasets And CaloDNN and LArTPCDNN Amir Farbin (ATLAS/UTA) LCD Calo Dataset

Cause-Effect Pairs http://www.kaggle.com/c/cause-effect-pairs/ Goals: Introduction to the

AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg University

Hyperparameter Optimization with SHERPA Lars Hertel, Julian Collado, Peter Sadowski, Pierre Baldi

3-3 Multiple Events 21 October 2010 While Im gone Groups of three Two players, one

Decision-aid methodologies in transportation Lecture 5: Issues with performance validation Tim

Parameter Tuning. Automatic Algorithm Configuration Petr Po s k P. Po s k c

Bayesian machine learning: a tutorial R emi Bardenet CNRS & CRIStAL, Univ. Lille, France

Computer architecture for deep learning applications David Brooks - PowerPoint PPT Presentation

Computer architecture for deep learning applications David Brooks School of Engineering and Applied Sciences Harvard University The rise of deep learning The rise of deep learning The rise of deep learning Google Translate Neural in

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into

APPLICATIONS OF DEEP LEARNING TO COMPUTER VISION AND COMPUTER GRAPHICS Mike Houston Practical DEEP

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

LCD and LArIAT Datasets And CaloDNN and LArTPCDNN Amir Farbin (ATLAS/UTA) LCD Calo Dataset

Cause-Effect Pairs http://www.kaggle.com/c/cause-effect-pairs/ Goals: Introduction to the

AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg University

Hyperparameter Optimization with SHERPA Lars Hertel, Julian Collado, Peter Sadowski, Pierre Baldi

3-3 Multiple Events 21 October 2010 While Im gone Groups of three Two players, one

Decision-aid methodologies in transportation Lecture 5: Issues with performance validation Tim

Parameter Tuning. Automatic Algorithm Configuration Petr Po s k P. Po s k c

Bayesian machine learning: a tutorial R emi Bardenet CNRS &amp; CRIStAL, Univ. Lille, France

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Bayesian machine learning: a tutorial R emi Bardenet CNRS & CRIStAL, Univ. Lille, France