a powerful flexible and intui5ve deep learning framework
play

A Powerful, Flexible, and Intui5ve Deep Learning Framework Shohei - PowerPoint PPT Presentation

@ NVIDIA GTC, April 6 th , 2016 A Powerful, Flexible, and Intui5ve Deep Learning Framework Shohei Hido Chief Research Officer Preferred Networks, Inc. Overview http://chainer.org/ l Chainer is a Python-based deep learning framework


  1. @ NVIDIA GTC, April 6 th , 2016 � A Powerful, Flexible, and Intui5ve Deep Learning Framework � Shohei Hido Chief Research Officer Preferred Networks, Inc.

  2. Overview � http://chainer.org/ l Chainer is a Python-based deep learning framework l Chainer v1.0 was released as an open source on June 2015 l It DOESN’T rely on Theano, unlike other Python frameworks l Chainer uses a unique scheme named Define-by-Run l Why do users sOll need another framework? l How different and effecOve Chainer is? 2�

  3. Preferred Networks (PFN) A startup that applies deep learning to industrial IoT � l Founded: March 2014 l Headquarter: Tokyo, Japan l U.S. Subsidiary: San Mateo, California l Company size: 35 engineers & researchers l Investors: Toyota, FANUC, NTT Manufacturing � Healthcare � Automotive � Industrial IoT Deep learning 3�

  4. Partnering with world-leading companies using Chainer l R&D collaboraOon on industrial problems with real-world data ̶ Specific requirements, modified algorithms, many trials and errors, etc ̶ Different from making general-purpose recogniOon system NTT FANUC Toyota Panasonic Cisco NVIDIA 4�

  5. Two types of background behind DL frameworks � 1. Scalability-oriented 2. Flexibility-oriented l Use-cases in mind l Use-cases in mind ̶ Image/speech recogniOon system ̶ Algorithm research ̶ Fast DL as a service in cloud ̶ R&D projects for new products l Problem type l Problem type ̶ A few general applicaOons ̶ Various specific applicaOons ̶ 10+ million training samples ̶ 10+ k training samples ̶ 10+ nodes cluster w/ fast network ̶ 1 node with mulOple GPUs l Possible boZleneck l Possible boZleneck ̶ Tuning of well-known algorithms ̶ Trial-and-error in prototyping ̶ Distributed computaOon for ̶ Debugging, profiling & refactoring model/data-parallel training ̶ (wait Ome during compilaOon)

  6. Designed for efficient research & development � l Flexible: new kinds of complex models for various applicaOons l IntuiOve: rapid prototyping and efficient trial-and-error l Powerful: comparable performance for 1 node & mulO-GPUs Scalability-oriented Flexibility-oriented 6�

  7. Agenda � l Deep learning framework basics l IntroducOon to Chainer l CuPy: NumPy-compaOble GPU library l Performance and applicaOons � 7�

  8. Neural network and computation � Forward computation � Image� Object: 
 x 1 � Tulip h 1 � ・・ k 1 � y 1 � ・・ ・・・・ ・・ Sensor� ・・ Anomaly score: 
 0.35 ・・ ・・ k M � y M � h H � Category: 
 x N � Sports Text� Input Hidden units Output Backward computation (backpropagation) � 8�

  9. Chainer focuses on network representation/training � l Design choices for deep learning frameworks ̶ How to build neural networks? ̶ How to train neural networks? ̶ Which text format/language for modeling? ̶ Which language for compuOng? ̶ Run with GPU? ̶ Run on mulOple GPUs? ̶ Run on mulOple compute nodes? � 9�

  10. Building and training neural networks: Computational graph construction is the key � Construct a computaOonal graph 1. ̶ Based on network definiOon given by users ̶ Chains of funcOons and operaOons on input variables Compute loss and gradients 2. ̶ Forward computaOon to calculate loss for a minibatch ̶ BackpropagaOon gives gradients to all of parameters OpOmize model 3. ̶ Update each parameter with the gradient ̶ Repeat unOl convergence Step 1. is the most important and there are many approaches � 10�

  11. Building blocks � l These funcOonaliOes are very similar between frameworks l But the structure, abstracOon level, and interface are different l It comes to the design of domain-specific language for NN � Array data structure Network (vector/matrix/tensor) � (computational graph) � Optimizer Operations & functions � (SGD/AdaGrad/Adam) � 11�

  12. Types of domain-specific language for neural networks � l Symbolic program l ImperaOve program l Text DSL ̶ OperaOons ̶ Direct computaOons ̶ Ex. Caffe (prototxt) on symbols on raw data arrays ̶ Ex. CNTK (NDL) � ̶ Ex. Theano ̶ Ex. Torch.nn ̶ Ex. TensorFlow ̶ Ex. Chainer � %% DefiniOon in text f: { Ex. MXNet “A”: “Variable”, # Symbolic definiOon # ImperaOve declaraOon “B”: “Variable”, A = Variable(‘A’) a = np.ones(10) “C”: [“B”, “*”, “A”], B = Variable(‘B’) b = np.ones(10) * 2 “ret”: [“C”, “+”, 1] C = B * A c = b * a } D = C + Constant(1) d = c + 1 # Compile # Compile f = compile(“f.txt”) f = compile(D) d = f(A=np.ones(10), d = f(A=np.ones(10), B=np.ones(10) * 2) B=np.ones(10) * 2) 12�

  13. Comparison of DSL type � DSL type Pros. Cons. • Human-readable definiOon • Users must study the format • Non-programmer can easily • Format might have to be Text DSL edit the network extended for new algorithms • StaOc analysis at compile • Users must study special syntax • OpOmizaOon before training • May need more efforts to Symbolic • Easy to parallelize implement new algorithms Internal DSL • Less efforts to learn syntax • Hard to opOmize in advance • Easy debugging and profiling • Less efficient in memory ImperaOve • Suitable for new algorithms allocaOon and parallelizaOon with complex logic Chainer is at the extreme end of imperaOve program for high flexibility � 13�

  14. Agenda � l Deep learning framework basics l IntroducOon to Chainer l CuPy: NumPy-compaOble GPU library l Performance and applicaOons � 14�

  15. Chainer as an open-source project � l hZps://github.com/pfnet/chainer l 50 contributors l 1,277 stars & 255 fork Original developer l 3,708 commits Seiya Tokui l AcOve development & release for last 10 months ̶ v1.0.0 (June 2015) to v1.7.2 (March 2016) 15�

  16. � Chainer software stack � l Chainer is built on top of NumPy and CUDA l CuPy is also introduced as an equivalent of NumPy on GPU � Chainer � CuPy NumPy � cuDNN � BLAS � CUDA � CPU � NVIDIA GPU � 16�

  17. � Graph build scheme (1/2) - Define-and-Run: Most of frameworks use this scheme (Chainer does not) � l Define: build a computaOonal graph based on definiOon l Run: update the model (parameters) using training dataset � Define � Parameters � Network ComputaOonal Gradient definiOon � graph � funcOon � Auto differenOaOon Run � Parameters � Update ComputaOonal Gradient Training graph � funcOon � data � Loss & gradient 17�

  18. Graph build scheme (2/2) - Define-by-Run: Computational graph construction on the fly � l No graph is constructed before training l Instead, the graph is built at each forward computaOon l ComputaOonal graph can be modified dynamically for each iteraOon/sample or depending on some condiOons Define-by-Run � Model Parameters � definiOon � Update ComputaOonal Gradient Training graph � funcOon � data � Dynamic change CondiOons 18�

  19. Define-by-Run example: MLP for MNIST � l Only transformaOons between units are set before training l1 = Linear(784, n_units) l2 = Linear(n_units, 10)) l ConnecOon is given as forward computaOon � def forward(x): h1 = ReLU(l1(x)) return l2(h1) bias � bias � W � W � 0 5 x� h1 � y � 9 ReLU Linear l1 Linear l2 19�

  20. Define-by-Run: An interpreted language for neural network � l Idea ̶ Forward computaOon actually goes through computaOonal graph ̶ By remembering the history, the actual graph can be obtained l Advantage ̶ Flexibility for new algorithms with complex components u Ex. recurrent, recursive, aZenOon, memory, adversarial, etc ̶ IntuiOve coding with highly imperaOve network definiOon u Ex. stochasOc network of which graph changes for each iteraOon l Current drawbacks ̶ Graph is generated every Ome also for fixed networks ̶ No opOmizaOon even for staOc part of graphs u JIT-like analysis and subgraph cache might be useful 20�

  21. Basic components (1/2): Variable and Function � l Variable ̶ Variable wraps arrays ( .data ) Variable ̶ It remembers parent funcOon 0 x� h1 � y � 5 ( .creator ) 9 ̶ It will be assigned gradient ( .grad ) ̶ It keeps track of not only data but also computaOons l FuncOon ̶ TransformaOon between Variable x� y � ̶ Stateless Function ̶ e.g. sigmoid, tanh, ReLU, maxpooling, dropout � 21�

  22. Basic components (2/2): Link and Chain � l Link = funcOon with state b � W � ̶ Parameters are also Variable and gradients will be assigned x� y � ̶ e.g. Linear (fully-connected), LSTM Link ConvoluOon2d, word-embedding (Linear) y=f(W*x+b) l Chain = network ̶ Chain has a set of child Link bias � bias � W � W � ̶ Forward computaOon is defined � � h1 � y � in . __call__() �� ReLU � Linear l1 Linear l2 ̶ e.g. MLP2, AlexNet, GoogLeNet, Chain (MLP2) RNNLM, seq2seq, 22�

  23. Backpropagation through computational graph � l Consider an objecOve (Link.Linear): L = f(x * w + b) l This computes the value of L in forward computaOon, and simultaneously builds the following computaOonal graph is Variable W � b � is FuncOon x� * + f L � l The gradient of L can be computed with respect to any variables by backpropagaOon l Then the opOmizer updates the value of parameters 23�

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend