arXiv:1602.04283v1 [cs.DC] 13 Feb 2016 ABSTRACT formance in - PDF document

Deep Learning on FPGAs: Past, Present, and Future Griffin Lacey Graham Taylor Shawki Areibi University of Guelph University of Guelph University of Guelph 50 Stone Rd E 50 Stone Rd E 50 Stone Rd E Guelph, Ontario Guelph, Ontario Guelph, Ontario laceyg@uoguelph.ca gwtaylor@uoguelph.ca sareibi@uoguelph.ca arXiv:1602.04283v1 [cs.DC] 13 Feb 2016 ABSTRACT formance in important domains such as computer vision, speech recognition, and natural language processing. The The rapid growth of data size and accessibility in recent study of these data-driven techniques is called deep learn- years has instigated a shift of philosophy in algorithm de- ing, and is seeing significant attention from two important sign for artificial intelligence. Instead of engineering algo- groups of the technology community: researchers, who are rithms by hand, the ability to learn composable systems au- interested in exploring and training these models to achieve tomatically from massive amounts of data has led to ground- top performance across tasks, and application scientists, who breaking performance in important domains such as com- are interested in deploying these models for novel, real world puter vision, speech recognition, and natural language pro- applications. However, both of these groups are limited by cessing. The most popular class of techniques used in these the need for better hardware acceleration to accommodate domains is called deep learning , and is seeing significant scaling beyond current data and algorithm sizes. attention from industry. However, these models require in- The current state of hardware acceleration for deep learn- credible amounts of data and compute power to train, and ing is largely dominated by using clusters of graphics pro- are limited by the need for better hardware acceleration cessing units (GPU) as general purpose processors (GPGPU) to accommodate scaling beyond current data and model [18]. GPUs have orders of magnitude more computational sizes. While the current solution has been to use clusters cores compared to traditional general purpose processors of graphics processing units (GPU) as general purpose pro- (GPP), and allow a greater ability to perform parallel com- cessors (GPGPU), the use of field programmable gate arrays putations. In particular, the NVIDIA CUDA platform for (FPGA) provide an interesting alternative. Current trends GPGPU programming is most dominant, with major deep in design tools for FPGAs have made them more compatible learning tools utilizing this platform to access GPU accel- with the high-level software practices typically practiced in eration [16, 26, 13, 19]. More recently, the open parallel the deep learning community, making FPGAs more accessi- programming standard OpenCL has gained traction as an ble to those who build and deploy models. Since FPGA ar- alternative tool for heterogeneous hardware programming, chitectures are flexible, this could also allow researchers the with interest from these popular tools gaining momentum. ability to explore model-level optimizations beyond what is OpenCL, while trailing CUDA in terms of support in the possible on fixed architectures such as GPUs. As well, FP- deep learning community, has two unique features which dis- GAs tend to provide high performance per watt of power tinguish itself from CUDA. First is the open source, royalty- consumption, which is of particular importance for appli- free standard for development, as opposed to the single ven- cation scientists interested in large scale server-based de- dor support of CUDA. The second is the support for a wide ployment or resource-limited embedded applications. This variety of alternative hardware including GPUs, GPPs, field review takes a look at deep learning and FPGAs from a programmable gate-arrays (FPGA), and digital signal pro- hardware acceleration perspective, identifying trends and cessors (DSP). innovations that make these technologies a natural fit, and motivates a discussion on how FPGAs may best serve the 1.1 The Case for FPGAs needs of the deep learning community moving forward. The imminent support for alternative hardware is espe- 1. INTRODUCTION cially important for FPGAs, a strong competitor to GPUs The effects of machine learning on our everyday life are for algorithm acceleration. Unlike GPUs, these devices have far-reaching. Whether you are clicking through personal- a flexible hardware configuration, and often provide better ized recommendations on websites, using speech to commu- performance per watt than GPUs for subroutines important nicate with your smart-phone, or using face-detection to get to deep learning, such as sliding-windows computation [24]. the perfect picture on your digital camera, some form of However, programming of these devices requires hardware artificial intelligence is involved. This new wave of artifi- specific knowledge that many researchers and application cial intelligence is accompanied by a shift in philosophy for scientists may not possess, and as such, FPGAs have been algorithm design. Where past attempts at learning from often considered a specialist architecture. Recently, FPGA data involved much “feature engineering” by hand using ex- tools have adopted software-level programming models, in- pert domain-specific knowledge, the ability to learn compos- cluding OpenCL, which has made them a more attractive able feature extraction systems automatically from massive option for users trained in mainstream software development amounts of example data has led to ground-breaking per- practices.

arXiv:1602.04283v1 [cs.DC] 13 Feb 2016 ABSTRACT formance in - PDF document

Deep Learning on FPGAs: Past, Present, and Future Griffin Lacey Graham Taylor Shawki Areibi University of Guelph University of Guelph University of Guelph 50 Stone Rd E 50 Stone Rd E 50 Stone Rd E Guelph, Ontario Guelph, Ontario Guelph,

Michael Duff Imperial College London based on [arXiv:1301.4176 arXiv:1309.0546 arXiv:1312.6523

March 2018 Progress Report March Feb Anderson March Feb Anderson March Feb Anderson March

Introductiontothelarge chargeexpansion Domenico Orlando Introduction Whos who S. Reffert

Exotic Brane Junctions Exotic Brane Junctions from F-theory from F-theory JHEP 05 (2016) 060

Introductiontothelarge chargeexpansion Domenico Orlando Introduction Whos who S. Reffert

35 30 33 20 10 10 8 7 0 Feb 10 Aug 10 Feb 11 Aug 11 Feb 12 Aug 12 Feb 13 Aug 13

Alargecharge torulestrongcoupling Domenico Orlando Introduction Whos who S. Reffert (AEC

Z c (3900) from lattice QCD based on Y. Ikeda et al., (HAL QCD), arXiv.1602.03465(hep-lat).

The Entropy of a Hole in Space-Time Based on: arXiv:1305.0856, arXiv:1310.4204, arXiv:1406.nnnn

19 th ,20 th Feb 2010 Feb 2010 1 19 th ,20 th Feb 2010 Feb 2010 2 Contents Importance of

1 21-Feb-17 2 21-Feb-17 3 21-Feb-17

Banburismus Banburismus Monday Feb 23 and Wednesday Feb 25 Monday Feb 23 and Wednesday Feb

Alexander Volya 2016, Feb. GGI Lecture notes www.volya.net Alexander Volya 2016, Feb. GGI

Evidence Towards a Swampland Conjecture Eran Palti University of Heidelberg 1602.06517 (JHEP

Alpha-bits, Teleportation and Black Holes ArXiv:1706.09434, ArXiv:1807.06041 Geoffrey Penington,

NOTICE TO PRO SE LITIGANTS Courtroom 1602 PRESENTATION OF MOTIONS (Parties representing

Scratching that itch Process improvement and problem solving in the University of Bath Research

KYOCERA Corporation Outline of Q&A on financial presentation for the six months ended

Magento Spring Clean Who is responsible? Its a team effort 1. Make a check-list 2. Assign

CONTEXT-AWARE NETWORK MAPPING AND ASSET CLASSIFICATION Bartley Richardson, PhD (Senior Data

Click to edit Master title style Click to edit Master title style Denver Regional Aerial

Heritage Speakers Alexander Stenzer Claudia Woller 30 th May 2011 Institut fr

Avoiding the Legacy Trap Round 1: the New Web Development Stack 1 ---- DRAFT ---- or a 30

Design of a web-based courseware authoring and presentation system Engr. Prof Hyacinth C. Inyiama

arXiv:1602.04283v1 [cs.DC] 13 Feb 2016 ABSTRACT formance in - PDF document

Deep Learning on FPGAs: Past, Present, and Future Griffin Lacey Graham Taylor Shawki Areibi University of Guelph University of Guelph University of Guelph 50 Stone Rd E 50 Stone Rd E 50 Stone Rd E Guelph, Ontario Guelph, Ontario Guelph,

Michael Duff Imperial College London based on [arXiv:1301.4176 arXiv:1309.0546 arXiv:1312.6523

March 2018 Progress Report March Feb Anderson March Feb Anderson March Feb Anderson March

Introductiontothelarge chargeexpansion Domenico Orlando Introduction Whos who S. Reffert

Exotic Brane Junctions Exotic Brane Junctions from F-theory from F-theory JHEP 05 (2016) 060

Introductiontothelarge chargeexpansion Domenico Orlando Introduction Whos who S. Reffert

35 30 33 20 10 10 8 7 0 Feb 10 Aug 10 Feb 11 Aug 11 Feb 12 Aug 12 Feb 13 Aug 13

Alargecharge torulestrongcoupling Domenico Orlando Introduction Whos who S. Reffert (AEC

Z c (3900) from lattice QCD based on Y. Ikeda et al., (HAL QCD), arXiv.1602.03465(hep-lat).

The Entropy of a Hole in Space-Time Based on: arXiv:1305.0856, arXiv:1310.4204, arXiv:1406.nnnn

19 th ,20 th Feb 2010 Feb 2010 1 19 th ,20 th Feb 2010 Feb 2010 2 Contents Importance of

1 21-Feb-17 2 21-Feb-17 3 21-Feb-17

Banburismus Banburismus Monday Feb 23 and Wednesday Feb 25 Monday Feb 23 and Wednesday Feb

Alexander Volya 2016, Feb. GGI Lecture notes www.volya.net Alexander Volya 2016, Feb. GGI

Evidence Towards a Swampland Conjecture Eran Palti University of Heidelberg 1602.06517 (JHEP

Alpha-bits, Teleportation and Black Holes ArXiv:1706.09434, ArXiv:1807.06041 Geoffrey Penington,

NOTICE TO PRO SE LITIGANTS Courtroom 1602 PRESENTATION OF MOTIONS (Parties representing

Scratching that itch Process improvement and problem solving in the University of Bath Research

KYOCERA Corporation Outline of Q&amp;A on financial presentation for the six months ended

Magento Spring Clean Who is responsible? Its a team effort 1. Make a check-list 2. Assign

CONTEXT-AWARE NETWORK MAPPING AND ASSET CLASSIFICATION Bartley Richardson, PhD (Senior Data

Click to edit Master title style Click to edit Master title style Denver Regional Aerial

Heritage Speakers Alexander Stenzer Claudia Woller 30 th May 2011 Institut fr

Avoiding the Legacy Trap Round 1: the New Web Development Stack 1 ---- DRAFT ---- or a 30

Design of a web-based courseware authoring and presentation system Engr. Prof Hyacinth C. Inyiama

KYOCERA Corporation Outline of Q&A on financial presentation for the six months ended