A Powerful, Flexible, and Intui5ve Deep Learning Framework Shohei - PowerPoint PPT Presentation

@ NVIDIA GTC, April 6 th , 2016 � A Powerful, Flexible, and Intui5ve Deep Learning Framework � Shohei Hido Chief Research Officer Preferred Networks, Inc.

Overview � http://chainer.org/ l Chainer is a Python-based deep learning framework l Chainer v1.0 was released as an open source on June 2015 l It DOESN’T rely on Theano, unlike other Python frameworks l Chainer uses a unique scheme named Define-by-Run l Why do users sOll need another framework? l How different and effecOve Chainer is? 2�

Preferred Networks (PFN) A startup that applies deep learning to industrial IoT � l Founded: March 2014 l Headquarter: Tokyo, Japan l U.S. Subsidiary: San Mateo, California l Company size: 35 engineers & researchers l Investors: Toyota, FANUC, NTT Manufacturing � Healthcare � Automotive � Industrial IoT Deep learning 3�

Partnering with world-leading companies using Chainer l R&D collaboraOon on industrial problems with real-world data ̶ Specific requirements, modified algorithms, many trials and errors, etc ̶ Different from making general-purpose recogniOon system NTT FANUC Toyota Panasonic Cisco NVIDIA 4�

Two types of background behind DL frameworks � 1. Scalability-oriented 2. Flexibility-oriented l Use-cases in mind l Use-cases in mind ̶ Image/speech recogniOon system ̶ Algorithm research ̶ Fast DL as a service in cloud ̶ R&D projects for new products l Problem type l Problem type ̶ A few general applicaOons ̶ Various specific applicaOons ̶ 10+ million training samples ̶ 10+ k training samples ̶ 10+ nodes cluster w/ fast network ̶ 1 node with mulOple GPUs l Possible boZleneck l Possible boZleneck ̶ Tuning of well-known algorithms ̶ Trial-and-error in prototyping ̶ Distributed computaOon for ̶ Debugging, profiling & refactoring model/data-parallel training ̶ (wait Ome during compilaOon)

Designed for efficient research & development � l Flexible: new kinds of complex models for various applicaOons l IntuiOve: rapid prototyping and efficient trial-and-error l Powerful: comparable performance for 1 node & mulO-GPUs Scalability-oriented Flexibility-oriented 6�

Agenda � l Deep learning framework basics l IntroducOon to Chainer l CuPy: NumPy-compaOble GPU library l Performance and applicaOons � 7�

Neural network and computation � Forward computation � Image� Object:   x 1 � Tulip h 1 � ・・ k 1 � y 1 � ・・・・・・・・ Sensor� ・・ Anomaly score:   0.35 ・・・・ k M � y M � h H � Category:   x N � Sports Text� Input Hidden units Output Backward computation (backpropagation) � 8�

Chainer focuses on network representation/training � l Design choices for deep learning frameworks ̶ How to build neural networks? ̶ How to train neural networks? ̶ Which text format/language for modeling? ̶ Which language for compuOng? ̶ Run with GPU? ̶ Run on mulOple GPUs? ̶ Run on mulOple compute nodes? � 9�

Building and training neural networks: Computational graph construction is the key � Construct a computaOonal graph 1. ̶ Based on network definiOon given by users ̶ Chains of funcOons and operaOons on input variables Compute loss and gradients 2. ̶ Forward computaOon to calculate loss for a minibatch ̶ BackpropagaOon gives gradients to all of parameters OpOmize model 3. ̶ Update each parameter with the gradient ̶ Repeat unOl convergence Step 1. is the most important and there are many approaches � 10�

Building blocks � l These funcOonaliOes are very similar between frameworks l But the structure, abstracOon level, and interface are different l It comes to the design of domain-specific language for NN � Array data structure Network (vector/matrix/tensor) � (computational graph) � Optimizer Operations & functions � (SGD/AdaGrad/Adam) � 11�

Types of domain-specific language for neural networks � l Symbolic program l ImperaOve program l Text DSL ̶ OperaOons ̶ Direct computaOons ̶ Ex. Caffe (prototxt) on symbols on raw data arrays ̶ Ex. CNTK (NDL) � ̶ Ex. Theano ̶ Ex. Torch.nn ̶ Ex. TensorFlow ̶ Ex. Chainer � %% DefiniOon in text f: { Ex. MXNet “A”: “Variable”, # Symbolic definiOon # ImperaOve declaraOon “B”: “Variable”, A = Variable(‘A’) a = np.ones(10) “C”: [“B”, “*”, “A”], B = Variable(‘B’) b = np.ones(10) * 2 “ret”: [“C”, “+”, 1] C = B * A c = b * a } D = C + Constant(1) d = c + 1 # Compile # Compile f = compile(“f.txt”) f = compile(D) d = f(A=np.ones(10), d = f(A=np.ones(10), B=np.ones(10) * 2) B=np.ones(10) * 2) 12�

Comparison of DSL type � DSL type Pros. Cons. • Human-readable definiOon • Users must study the format • Non-programmer can easily • Format might have to be Text DSL edit the network extended for new algorithms • StaOc analysis at compile • Users must study special syntax • OpOmizaOon before training • May need more efforts to Symbolic • Easy to parallelize implement new algorithms Internal DSL • Less efforts to learn syntax • Hard to opOmize in advance • Easy debugging and profiling • Less efficient in memory ImperaOve • Suitable for new algorithms allocaOon and parallelizaOon with complex logic Chainer is at the extreme end of imperaOve program for high flexibility � 13�

Agenda � l Deep learning framework basics l IntroducOon to Chainer l CuPy: NumPy-compaOble GPU library l Performance and applicaOons � 14�

Chainer as an open-source project � l hZps://github.com/pfnet/chainer l 50 contributors l 1,277 stars & 255 fork Original developer l 3,708 commits Seiya Tokui l AcOve development & release for last 10 months ̶ v1.0.0 (June 2015) to v1.7.2 (March 2016) 15�

� Chainer software stack � l Chainer is built on top of NumPy and CUDA l CuPy is also introduced as an equivalent of NumPy on GPU � Chainer � CuPy NumPy � cuDNN � BLAS � CUDA � CPU � NVIDIA GPU � 16�

� Graph build scheme (1/2) - Define-and-Run: Most of frameworks use this scheme (Chainer does not) � l Define: build a computaOonal graph based on definiOon l Run: update the model (parameters) using training dataset � Define � Parameters � Network ComputaOonal Gradient definiOon � graph � funcOon � Auto differenOaOon Run � Parameters � Update ComputaOonal Gradient Training graph � funcOon � data � Loss & gradient 17�

Graph build scheme (2/2) - Define-by-Run: Computational graph construction on the fly � l No graph is constructed before training l Instead, the graph is built at each forward computaOon l ComputaOonal graph can be modified dynamically for each iteraOon/sample or depending on some condiOons Define-by-Run � Model Parameters � definiOon � Update ComputaOonal Gradient Training graph � funcOon � data � Dynamic change CondiOons 18�

Define-by-Run example: MLP for MNIST � l Only transformaOons between units are set before training l1 = Linear(784, n_units) l2 = Linear(n_units, 10)) l ConnecOon is given as forward computaOon � def forward(x): h1 = ReLU(l1(x)) return l2(h1) bias � bias � W � W � ０５ｘ� h1 � y � ９ ReLU Linear l1 Linear l2 19�

Define-by-Run: An interpreted language for neural network � l Idea ̶ Forward computaOon actually goes through computaOonal graph ̶ By remembering the history, the actual graph can be obtained l Advantage ̶ Flexibility for new algorithms with complex components u Ex. recurrent, recursive, aZenOon, memory, adversarial, etc ̶ IntuiOve coding with highly imperaOve network definiOon u Ex. stochasOc network of which graph changes for each iteraOon l Current drawbacks ̶ Graph is generated every Ome also for fixed networks ̶ No opOmizaOon even for staOc part of graphs u JIT-like analysis and subgraph cache might be useful 20�

Basic components (1/2): Variable and Function � l Variable ̶ Variable wraps arrays ( .data ) Variable ̶ It remembers parent funcOon ０ｘ� h1 � y � ５ ( .creator ) ９ ̶ It will be assigned gradient ( .grad ) ̶ It keeps track of not only data but also computaOons l FuncOon ̶ TransformaOon between Variable ｘ� y � ̶ Stateless Function ̶ e.g. sigmoid, tanh, ReLU, maxpooling, dropout � 21�

Basic components (2/2): Link and Chain � l Link = funcOon with state b � W � ̶ Parameters are also Variable and gradients will be assigned ｘ� y � ̶ e.g. Linear (fully-connected), LSTM Link ConvoluOon2d, word-embedding (Linear) y=f(W*x+b) l Chain = network ̶ Chain has a set of child Link bias � bias � W � W � ̶ Forward computaOon is defined � � h1 � y � in . __call__() �� ReLU � Linear l1 Linear l2 ̶ e.g. MLP2, AlexNet, GoogLeNet, Chain (MLP2) RNNLM, seq2seq, 22�

Backpropagation through computational graph � l Consider an objecOve (Link.Linear): L = f(x * w + b) l This computes the value of L in forward computaOon, and simultaneously builds the following computaOonal graph is Variable W � b � is FuncOon ｘ� * + f L � l The gradient of L can be computed with respect to any variables by backpropagaOon l Then the opOmizer updates the value of parameters 23�

A Powerful, Flexible, and Intui5ve Deep Learning Framework Shohei - PowerPoint PPT Presentation

@ NVIDIA GTC, April 6 th , 2016 A Powerful, Flexible, and Intui5ve Deep Learning Framework Shohei Hido Chief Research Officer Preferred Networks, Inc. Overview http://chainer.org/ l Chainer is a Python-based deep learning framework

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

Present and Powerful Present and Powerful Psalm 46:1 God is our refuge and strength, an

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Building powerful brands Athens M konos Qatar D bai Building powerful brands Building powerful

QtJambi Java bindings for the powerful Qt framework Qt Powerful framework for C++ GUI, XML,

Deep Learning With Differential Privacy Presenter: Xiaojun Xu Deep Learning Framework

Flexible Instruction Day Parent Presentation Flexible Instruction Day March 16 - 20 - Flexible

Flexible Infrastructure Qualification What Is Flexible Infrastructure/Benefits Flexible

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Segmen&ng a Market Segment: New Ideas for Capturing

INVESTOR PRESENTATION 2019 F ULL -Y EAR R ESULTS SAFE HARBOR STATEMENT Net Element, Inc. is

ULVA FERRY HOUSING PROJECT Affordable, low energy, long-term rental homes on the Isle of Mull

Moray Finch General Manager, Mull and Iona Community Trust Rural Housing Scotland 2017

EXISTING ELEVATIONS General Notes EXISTING WINDOW SCHEDULE LOCATION /

Treating Data Deficit Disorder - where next for the NBN? By John Sawyer and Rachel Stroud E:

Use of Smart Meters to Supply Local Energy in Fintry CARES Infrastructure and Innovation Fund

Woodside Middle School The image cannot be displayed. Your computer may not have enough memory to

A Powerful, Flexible, and Intui5ve Deep Learning Framework Shohei - PowerPoint PPT Presentation

@ NVIDIA GTC, April 6 th , 2016 A Powerful, Flexible, and Intui5ve Deep Learning Framework Shohei Hido Chief Research Officer Preferred Networks, Inc. Overview http://chainer.org/ l Chainer is a Python-based deep learning framework

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

Present and Powerful Present and Powerful Psalm 46:1 God is our refuge and strength, an

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Building powerful brands Athens M konos Qatar D bai Building powerful brands Building powerful

QtJambi Java bindings for the powerful Qt framework Qt Powerful framework for C++ GUI, XML,

Deep Learning With Differential Privacy Presenter: Xiaojun Xu Deep Learning Framework

Flexible Instruction Day Parent Presentation Flexible Instruction Day March 16 - 20 - Flexible

Flexible Infrastructure Qualification What Is Flexible Infrastructure/Benefits Flexible

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Segmen&amp;ng a Market Segment: New Ideas for Capturing

INVESTOR PRESENTATION 2019 F ULL -Y EAR R ESULTS SAFE HARBOR STATEMENT Net Element, Inc. is

ULVA FERRY HOUSING PROJECT Affordable, low energy, long-term rental homes on the Isle of Mull

Moray Finch General Manager, Mull and Iona Community Trust Rural Housing Scotland 2017

EXISTING ELEVATIONS General Notes EXISTING WINDOW SCHEDULE LOCATION /

Treating Data Deficit Disorder - where next for the NBN? By John Sawyer and Rachel Stroud E:

Use of Smart Meters to Supply Local Energy in Fintry CARES Infrastructure and Innovation Fund

Woodside Middle School The image cannot be displayed. Your computer may not have enough memory to

Segmen&ng a Market Segment: New Ideas for Capturing