Deep Learning with Limited Numerical Precision Suyog Gupta SUYOG @ - PDF document

Deep Learning with Limited Numerical Precision Suyog Gupta SUYOG @ US . IBM . COM Ankur Agrawal ANKURAGR @ US . IBM . COM Kailash Gopalakrishnan KAILASH @ US . IBM . COM IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 Pritish Narayanan PNARAYA @ US . IBM . COM IBM Almaden Research Center, San Jose, CA 95120 Abstract or a combination of CPUs and GPUs scaled-up to multiple nodes (Coates et al., 2013; Wu et al., 2015). Training of large-scale deep neural networks is often constrained by the available computational At the same time, the natural error resiliency of neu- resources. We study the effect of limited preci- ral network architectures and learning algorithms is well- sion data representation and computation on neu- documented, setting them apart from more traditional ral network training. Within the context of low- workloads that typically require precise computations and precision fixed-point computations, we observe number representations with high dynamic range. It is well the rounding scheme to play a crucial role in de- appreciated that in the presence of statistical approxima- termining the network’s behavior during train- tion and estimation errors, high-precision computation in ing. Our results show that deep networks can be the context of learning is rather unnecessary (Bottou & trained using only 16 -bit wide fixed-point num- Bousquet, 2007). Moreover, the addition of noise during ber representation when using stochastic round- training has been shown to improve the neural network’s ing, and incur little to no degradation in the performance (Murray & Edwards, 1994; Bishop, 1995; An, classification accuracy. We also demonstrate an 1996; Audhkhasi et al., 2013). With the exception of em- energy-efficient hardware accelerator that imple- ploying the asynchronous version of the stochastic gradi- ments low-precision fixed-point arithmetic with ent descent algorithm (Recht et al., 2011) to reduce net- stochastic rounding. work traffic, the state-of-the-art large-scale deep learning systems fail to adequately capitalize on the error-resiliency of their workloads. These systems are built by assembling 1. Introduction general-purpose computing hardware designed to cater to the needs of more traditional workloads, incurring high and To a large extent, the success of deep learning techniques is often unnecessary overhead in the required computational contingent upon the underlying hardware platform’s ability resources. to perform fast, supervised training of complex networks This work is built upon the idea that algorithm-level noise using large quantities of labeled data. Such a capability tolerance can be leveraged to simplify underlying hard- enables rapid evaluation of different network architectures ware requirements, leading to a co-optimized system that and a thorough search over the space of model hyperpa- achieves significant improvements in computational perfor- rameters. It should therefore come as no surprise that re- mance and energy efficiency. Allowing the low-level hard- cent years have seen a resurgence of interest in deploy- ware components to perform approximate, possibly non- ing large-scale computing infrastructure designed specif- deterministic computations and exposing these hardware- ically for training deep neural networks. Some notable generated errors up to the algorithm level of the comput- efforts in this direction include distributed computing in- ing stack forms a key ingredient in developing such sys- frastructure using thousands of CPU cores (Dean et al., tems. Additionally, the low-level hardware changes need 2012; Chilimbi et al., 2014), or high-end graphics proces- to be introduced in a manner that preserves the program- sors (GPUs) (Ciresan et al., 2010; Krizhevsky et al., 2012), ming model so that the benefits can be readily absorbed at Proceedings of the 32 nd International Conference on Machine the application-level without incurring significant software Learning , Lille, France, 2015. JMLR: W&CP volume 37. Copy- redevelopment costs. right 2015 by the author(s).

Deep Learning with Limited Numerical Precision Suyog Gupta SUYOG @ - PDF document

Deep Learning with Limited Numerical Precision Suyog Gupta SUYOG @ US . IBM . COM Ankur Agrawal ANKURAGR @ US . IBM . COM Kailash Gopalakrishnan KAILASH @ US . IBM . COM IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 Pritish

Mixed Precision Training PAI Overview What is mixed-precision

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

MIXED PRECISION TRAINING OF DEEP NEURAL NETWORKS Carl Case, NVIDIA OUTLINE 1. What is mixed

Numerical Computation for Deep Learning Lecture slides for Chapter 4 of Deep Learning

VLVK EHF. VLVK EHF. Precision machining Precision machining Professional precision for

2018 Milken Institute Hamptons Dialogues Precision, Precision, Precision: The Future of Health

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

The Aemes Biorefinery Advanced Renewable Fuels and Chemicals

EEE20B EEE20B-Temperature Dependent Electrical Performance of GaN High Electron Mobility

The Project-And-Lift Algorithm for the Computation of Toric Gr obner Bases An Implementation

An Agile Approach to Building a GPU-enabled and Performance- portable Global Cloud-resolving

SOLUTIONS FOR HEAVY OIL FRACTAL SYSTEMS About Fractal Systems Fractal Systems Inc. is a

Understanding and Control of Combustion Understanding and Control of Combustion Dynamics in Gas

Deformation-induced cementite decomposition in pearlitic steel wires studied by Atom probe

Joy Whinney Director Food Standards Agency Wales Food Standards Agency Improve Diet and Health