Near-Data Processing for Differentiable Machine Learning Models - PowerPoint PPT Presentation

Near-Data Processing for Differentiable Machine Learning Models Hyeokjun Choe 1 , Seil Lee 1 , Hyunha Nam 1 , Seongsik Park 1 , Seijoon Kim 1 , Eui-Young Chung 2 and Sungroh Yoon 1 , 3 ∗ 1 Electrical and Computer Engineering, Seoul National University 2 Electrical and Electronic Engineering, Yonsei University 3 Neurology and Neurological Sciences, Stanford University ∗ Correspondence: sryoon@snu.ac.kr Homepage: http://dsl.snu.ac.kr May 19th, 2017 Hyeokjun Choe et al. MSST 2017 May 19th, 2017 1 / 33

Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, 2017 2 / 33

Machine Learning’s Success Big data Powerful parallel processors ⇒ Sophisticated models Hyeokjun Choe et al. MSST 2017 May 19th, 2017 4 / 33

Issues on Conventional Memory Hierachy Data movement in memory hierarchy Computational efficiency ⇓ Power consumption ⇑ Hyeokjun Choe et al. MSST 2017 May 19th, 2017 5 / 33

Near-data Processing (NDP) Memory or storage with intelligence (i.e., computing power) Process the data stored in memory or storage Reduce the data movements, CPU offloading Hyeokjun Choe et al. MSST 2017 May 19th, 2017 6 / 33

ISP-ML ISP-ML: a full-fledged ISP-supporting SSD platform Easy to implement machine learning algorithm in C/C++ For validation, three SGD algorithms were implemented and experimented with ISP-ML User SSD Application SSD controller OS Embedded SRAM Processor(CPU) ISP SW CPU Main Memory Channel Host I/F Controller SSD controller Cache NAND NAND SRAM ARM Processor SSD ISP HW ISP SW Controller Flash Flash DRAM Channel Controller NAND NAND Channel ISP HW Flash Flash Cache Host I/F ISP HW Controller Controller Channel Controller ISP HW NAND NAND ISP HW NAND NAND Flash Flash ISP HW Flash Flash DRAM Hyeokjun Choe et al. MSST 2017 May 19th, 2017 7 / 33

Machine Learning as an Optimization Problem Machine learning categories Supervised learning, unsupervised learning, reinforcement learning The main purpose of supervised machine learning Find the optimal θ that minimizes F ( D ; θ ) F ( D, θ ) = L ( D, θ ) + r ( θ ) (1) D : input data θ : model parameters L : loss function r : regularization term F : objective function Input Output layer layer Hyeokjun Choe et al. MSST 2017 May 19th, 2017 9 / 33

Gradient Descent θ t +1 = θ t − η ∇ F ( D, θ t ) (2) � = θ t − η ∇ F ( D i , θ t ) (3) i η : learning rate t : iteration index h t t p s : / / s e b a s t i a n r a s c h k a . c o m / f a q / d o c s / c l o s e d - f o r m - v s - g d . h t m l i : data sample index 1st-order iterative optimization algorithm Use all samples per iteration Stochastic gradient descent (SGD) Use only one sample per iteration. Minibatch stochastic gradient descent Between gradient descent and SGD Use multiple samples per iteration Hyeokjun Choe et al. MSST 2017 May 19th, 2017 10 / 33

Parallel and Distributed SGD Synchornous SGD Parameter server aggregates ∇ θ slave synchronously. Downpour SGD Workers communicate with parameter server asynchronously. Elastic Average SGD (EASGD) Each worker has own parameters Workers transfer ( θ slave − θ master ), not ∇ θ slave Hyeokjun Choe et al. MSST 2017 May 19th, 2017 11 / 33

Fundamentals of Solid-State Drives (SSDs) SSD Controller Embedded processor for FTL HDD emulation Wear Leveling, Garbage collection, etc. Cache controller Channel controller DRAM Cache and Buffer 512MB - 2GB NAND flash arrays Simultaneously accessible Host interface logic SATA, PCIe Hyeokjun Choe et al. MSST 2017 May 19th, 2017 12 / 33

Previous Work on Near-Data Processing:PIM Perform computation inside the main memory 3D stacked memory (e.g. HMC) is used for PIM recently Implement processing unit in Logic Layer Applications: sorting, string matching, CNN, matrix multiplication etc. Hyeokjun Choe et al. MSST 2017 May 19th, 2017 13 / 33

Previous Work on Near-Data Processing:ISP Perform computation inside the storage ISP with embedded processor Pros: easy to implement, flexible Cons: no parallelism ISP with dedicated hardware logic Pros: channel parallelism, hardware acceleration Cons: hard to implement and change Applications: DB query (scan, join), linear regression, k-means, string match etc. Hyeokjun Choe et al. MSST 2017 May 19th, 2017 14 / 33

ISP-ML: ISP Platform for Machine Learning on SSDs ISP-supporting SSD simulator Implemented in SystemC on the Synopsys Platform Architect Software/Hardware co-simulation Easily executes various machine learning algorithms in C/C++ Transaction level simulator For reasonable simulation speed ISP components SRAM ISP SW, ISP HW Embedded Processor User Channel SSD Application SSD controller Controller OS SRAM Embedded Processor(CPU) ISP SW clk/rst CPU Host I/F Main Memory Channel Host I/F Controller Cache SSD controller Cache NAND NAND SRAM ARM Processor SSD ISP SW ISP HW Controller Flash Flash Controller DRAM Channel Controller NAND NAND Channel ISP HW Flash Flash Cache Host I/F Controller ISP HW DRAM Controller Channel Controller ISP HW NAND NAND ISP HW NAND NAND Flash Flash ISP HW NAND Flash Flash Flash DRAM Hyeokjun Choe et al. MSST 2017 May 19th, 2017 16 / 33

ISP-ML: ISP Platform for Machine Learning on SSDs We implemented two types of ISP hardware components. Channel controller: perform primitive operations on the stored data. Cache controller: collect the results from each of the channel controller. Master-slave architecture They communicate with each other. User SSD Application SSD controller OS Embedded SRAM Processor(CPU) ISP SW CPU Main Memory Channel Host I/F Controller SSD controller Cache NAND NAND SRAM ARM Processor ISP HW SSD ISP SW Controller Flash Flash DRAM Channel Controller NAND NAND Channel ISP HW Flash Flash Cache Host I/F ISP HW Controller Controller Channel Controller ISP HW ISP HW NAND NAND NAND NAND Flash Flash ISP HW Flash Flash DRAM Hyeokjun Choe et al. MSST 2017 May 19th, 2017 17 / 33

Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

Methodology for IHP-ISP Performance Comparison Ideal Ways to Fairly Compare ISP and IHP 1 Implementing ISP-ML in a real semiconductor chip High chip manufacturing costs 2 Simulating IHP in the ISP-ML framework. High simulation time to simulate IHP 3 Implementing both ISP and IHP using FPGAs. Require another significant development efforts. ⇒ Hard to fairly compare the performances of ISP and IHP ⇒ We propose a practical comparison methodology Hyeokjun Choe et al. MSST 2017 May 19th, 2017 19 / 33

Near-Data Processing for Differentiable Machine Learning Models - PowerPoint PPT Presentation

Near-Data Processing for Differentiable Machine Learning Models Hyeokjun Choe 1 , Seil Lee 1 , Hyunha Nam 1 , Seongsik Park 1 , Seijoon Kim 1 , Eui-Young Chung 2 and Sungroh Yoon 1 , 3 1 Electrical and Computer Engineering, Seoul National

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

The Origin of Near Earth The Origin of Near Earth The Origin of Near Earth The Origin of Near

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Differentiable Rendering for Mesh and Implicit Surface Weikai Chen Tencent America GAMES

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

The Differentiable Curry Martin Abadi, Dan Belov, Gordon Plotkin, Richard Wei, Dimitrios Vytiniotis

Relay : a high level differentiable IR Jared Roesch TVMConf December 12th, 2018 1 This

Open Connect Everywhere: A Glimpse at the Internet Ecosystem through the Lens of the Netflix CDN

to make the ISP 10:00-11:30am, 4 June 2020 actionable Item (presenter) Estimated start Welcome

Collaboration with ISPs for Large-Scale Deployment of eduroam in Japan Hideaki Goto NII /

Introduction What is the Internet? Define network edge: hosts, access net, physical media

19/06/2019 European Centre for Disease Prevention and Control (ECDC) ECDCs mission and vision:

7/8/2019 1 2 3 4 5 6 1 7/8/2019 7 8 9 10 11 12 2 7/8/2019 13 14 15 16 17 18

7/8/2019 1 2 3 4 5 6 1 7/8/2019 7 8 9 10 11 12 2 7/8/2019 13 14 15 16 17 18

6/24/2019 1 2 3 4 5 6 1 6/24/2019 7 8 9 10 11 12 2 6/24/2019 13 14 15 16 17 3