Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, - PowerPoint PPT Presentation

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT

Deep Learning • Learning with multi-layer (3~30) neural networks, on a huge training set. • State-of-the-art on many AI tasks • Computer Vision : Image Classification, Object Detection, Semantic Segmentation, etc. • Speech Recognition & Natural Language Processing : Acoustic Modeling, Language Modeling, Word / Sentence embedding

softmax2 SoftmaxActivation FC AveragePool 7x7+1(V) DepthConcat GoogLeNet (Inception) Winner of ILSVRC 2014, 27 layers, ~7 million parameters Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat Conv Conv Conv Conv softmax1 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool SoftmaxActivation 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool FC 3x3+2(S) DepthConcat FC Conv Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) 1x1+1(S) Conv Conv MaxPool AveragePool 1x1+1(S) 1x1+1(S) 3x3+1(S) 5x5+3(V) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat softmax0 Conv Conv Conv Conv SoftmaxActivation 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool FC 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat FC Conv Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) 1x1+1(S) Conv Conv MaxPool AveragePool 1x1+1(S) 1x1+1(S) 3x3+1(S) 5x5+3(V) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool 3x3+2(S) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool 3x3+2(S) LocalRespNorm Conv 3x3+1(S) Conv 1x1+1(V) LocalRespNorm MaxPool 3x3+2(S) Conv 7x7+2(S) input

ISVRC on Imagenet 30 22.5 15 7.5 0 2010 2011 2012 2013 2014 Human ArXiv 2015 Top-5 error Deep learning dominated since 2012; surpassing “human performance” since 2015.

Deep Learning in Speech Recognition image source: Li Deng and Dong Yu. Deep Learning – Methods and Applications .

Deep Neural Networks • A network that consists of computation units (layers, or nodes) connected via a specific architecture. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural Networks.” NIPS 2012.

Deep Learning Made Easy A deep learning toolkit provides common layers, easy ways to define network architecture, and transparent interface to high performance computation backends (BLAS, GPUs, etc.) • C++ : Caffe (widely used on academia), dmlc/cxxnet, cuda- convnet, etc. • Python : Theano (auto-differentiation) and wrappers, NervanaSystems/neon, etc. • Lua : Torch7 (facebook); Matlab : MatConvNet (VGG) • Julia : pluskid/Mocha.jl, dfdx/Boltzmann.jl

Why Mocha.jl? • Written in Julia and for Julia : easily making use of data pre/post processing and visualization tools from Julia. • Minimum dependency : Julia backend ready to run, easy for fast prototyping. • Multiple backends : easily switching to CUDA + cuDNN based backend for highly efficient deep nets training. • Correctness : all computation layers are unit-tested. • Modular architecture : layers, activation functions, network topology, etc. Easily extendable.

Mocha.jl • Deep learning framework for (and written in) Julia; inspired by Caffe; focusing on easy prototyping, customization, and efficiency (switchable computation backends) > Pkg.add(“Mocha”) or for latest dev version: > Pkg.checkout(“Mocha”) > Pkg.test(“Mocha”)

IJulia Example IJulia Image classification example with pre-trained Imagenet model.

Mini-tutor: CNN on MNIST • MNIST: handwritten digits • Data preparation: • Image data in Mocha is represented as 4D tensor: width -by- height -by- channels -by- batch • MNIST: 28-by-28-by-1-by-64 • Mocha supports ND-tensor for general data • HDF5 file: general format for tensor data, also supported by numpy, Matlab, etc.

  Mini-tutor: CNN on MNIST • Data layer   data_layer = AsyncHDF5DataLayer(name="train-data", source="data/ train.txt", batch_size=64, shuffle=true) • data/train.txt lists the HDF5 files for training set • 64 images is provided for each mini-batch • the data is shuffled to improve convergence • async data layer use Julia’s @async to pre-read data while waiting for computation on CPU / GPU

Convolution layer C3: f. maps 16@10x10 C1: feature maps S4: f. maps 16@5x5 INPUT 6@28x28 32x32 S2: f. maps C5: layer OUTPUT F6: layer 6@14x14 120 10 84 Gaussian connections Full connection Subsampling Subsampling Full connection Convolutions Convolutions LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324. conv_layer = ConvolutionLayer(name="conv1", n_filter=20, kernel=(5,5), bottoms=[:data], tops=[:conv])

Pooling Layer pool_layer = PoolingLayer(name="pool1", kernel=(2,2), stride=(2,2), bottoms=[:conv], tops=[:pool]) • Pooling layer operate on the output of convolution layer • By default, MAX pooling is performed; can switch to MEAN pooling by specifying pooling=Pooling.Mean()

Blobs & Net Architecture • Network architecture is determined by connecting tops (output) blobs to bottoms (input) blobs with matching blob names . • Layers are automatically sorted and connected as a directed acyclic graph (DAG).

Rest of the layers conv2_layer = ConvolutionLayer(name="conv2", n_filter=50, kernel=(5,5), bottoms=[:pool], tops=[:conv2]) pool2_layer = PoolingLayer(name="pool2", kernel=(2,2), stride=(2,2), bottoms=[:conv2], tops=[:pool2]) fc1_layer = InnerProductLayer(name="ip1", output_dim=500, neuron=Neurons.ReLU(), bottoms=[:pool2], tops=[:ip1]) fc2_layer = InnerProductLayer(name="ip2", output_dim=10, bottoms=[:ip1], tops=[:ip2]) loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:ip2, :label])

SGD Solver params = SolverParameters(max_iter=10000, regu_coef=0.0005, mom_policy=MomPolicy.Fixed(0.9), lr_policy=LRPolicy.Inv(0.01, 0.0001, 0.75), load_from=exp_dir) solver = SGD(params)

Coffee Breaks… … for the solver setup_coffee_lounge(solver, save_into="$exp_dir/ statistics.jld", every_n_iter=1000) # report training progress every 100 iterations add_coffee_break(solver, TrainingSummary(), every_n_iter=100) # save snapshots every 5000 iterations add_coffee_break(solver, Snapshot(exp_dir), every_n_iter=5000)

Solver Statistics Solver statistics will be automatically saved if coffee lounge is set up. Snapshots save the training progress periodically, can continue training from the last snapshot after interruption.

Demo: GPU vs CPU backend = use_gpu ? GPUBackend() : CPUBackend()

Parameter Sharing • When a layer has trainable parameters (e.g. convolution, inner-product layers), those parameters will be registered under the layer name, and shared by layers with the same name • Use cases • Validation network during training • Pre-training, fine-tuning • Advanced architectures, time-delayed nodes

Parameter Sharing

3rd most star-ed Julia package Contributions are very welcome!

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, - PowerPoint PPT Presentation

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision : Image Classification,

MOCHA: Federated Multi-Task Learning NIPS 17 Virginia Smith Stanford / CMU Chao-Kai

Testable JavaScript James Kovacs Technical Evangelist, JetBrains @jameskovacs | jameskovacs.com

coffee sandwich LATTE drip etc.. $ 10.00 coffee click me coffee Espresso Cappuccino

Matrix Multiplication Example (Cost Analysis, 45 in 2.4) The Mundo Candy Company makes three

A Modest Proposal A Modest Proposal For preventing the Testing of User Interfaces From being a

Sistemas Grficos Interativos Rogers & Adams: Captulo 1 Redbook: Captulo 1 Apostila:

Gravitational Lensing of Gravitational Waves Ken Ng, Kaze Wong, Tom Broadhurst Otto Hannuksela,

Ray Tracing Ray Tracing Classical geometric optics technique Extremely versatile

Review Line rasterization Scan Conversion Basic Incremental Algorithm Digital

Pro Bono Design & Management Accelerator 1 February 13, 2019 Session 5 Pro Bono Project

Agile JavaScript Frameworks, Tools, Services Ben Ripkens IT Consultant codecentric AG

JavaScript: Skeletons in the Closet Allen Wirfs-Brock @awbjs www.wirfs-brock.com/allen

Testable JavaScript Saturday, October 6, 12 1 Testable JavaScript James Kovacs Technical

CS 5 4 3 : Com puter Graphics Lecture 8 ( Part I ) : Shading Emmanuel Agu Recall: Setting Light

M O C H A M O CHA 1 Java M achine Vision M O CHA - M inimal O ptical Coffee Height A nalysis

A Story About JavaScript Natalie Silvanovich May 21, 2020 About Me Natalie Silvanovich AKA

Step up your game Step up your game & bring your projects to the bring your projects to the

Panta Rhei Water and Society: Science and Education Thorsten Wagener 1

Analysis of optimistic multi-party contract signing Rohit Chadha 1,2 , Steve Kremer 3 , Andre

Developing Managed Code Rootkits for the Java Runtime Environment DEFCON 24, August 6th 2016

Learning Theory Bridges Loss Functions July 13 rd , 2020 Han Bao (The University of Tokyo / RIKEN

Introduction to Jest testing framework Painless JavaScript Testing platform Vasyl Boroviak

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval Saeid Balaneshin 1

JavaScript: The Good Parts vs. JavaScript: The Definitive Guide CS 252: Advanced Programming

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, - PowerPoint PPT Presentation

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision : Image Classification,

MOCHA: Federated Multi-Task Learning NIPS 17 Virginia Smith Stanford / CMU Chao-Kai

Testable JavaScript James Kovacs Technical Evangelist, JetBrains @jameskovacs | jameskovacs.com

coffee sandwich LATTE drip etc.. $ 10.00 coffee click me coffee Espresso Cappuccino

Matrix Multiplication Example (Cost Analysis, 45 in 2.4) The Mundo Candy Company makes three

A Modest Proposal A Modest Proposal For preventing the Testing of User Interfaces From being a

Sistemas Grficos Interativos Rogers &amp; Adams: Captulo 1 Redbook: Captulo 1 Apostila:

Gravitational Lensing of Gravitational Waves Ken Ng, Kaze Wong, Tom Broadhurst Otto Hannuksela,

Ray Tracing Ray Tracing Classical geometric optics technique Extremely versatile

Review Line rasterization Scan Conversion Basic Incremental Algorithm Digital

Pro Bono Design &amp; Management Accelerator 1 February 13, 2019 Session 5 Pro Bono Project

Agile JavaScript Frameworks, Tools, Services Ben Ripkens IT Consultant codecentric AG

JavaScript: Skeletons in the Closet Allen Wirfs-Brock @awbjs www.wirfs-brock.com/allen

Testable JavaScript Saturday, October 6, 12 1 Testable JavaScript James Kovacs Technical

CS 5 4 3 : Com puter Graphics Lecture 8 ( Part I ) : Shading Emmanuel Agu Recall: Setting Light

M O C H A M O CHA 1 Java M achine Vision M O CHA - M inimal O ptical Coffee Height A nalysis

A Story About JavaScript Natalie Silvanovich May 21, 2020 About Me Natalie Silvanovich AKA

Step up your game Step up your game &amp; bring your projects to the bring your projects to the

Panta Rhei Water and Society: Science and Education Thorsten Wagener 1

Analysis of optimistic multi-party contract signing Rohit Chadha 1,2 , Steve Kremer 3 , Andre

Developing Managed Code Rootkits for the Java Runtime Environment DEFCON 24, August 6th 2016

Learning Theory Bridges Loss Functions July 13 rd , 2020 Han Bao (The University of Tokyo / RIKEN

Introduction to Jest testing framework Painless JavaScript Testing platform Vasyl Boroviak

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval Saeid Balaneshin 1

JavaScript: The Good Parts vs. JavaScript: The Definitive Guide CS 252: Advanced Programming

Sistemas Grficos Interativos Rogers & Adams: Captulo 1 Redbook: Captulo 1 Apostila:

Pro Bono Design & Management Accelerator 1 February 13, 2019 Session 5 Pro Bono Project

Step up your game Step up your game & bring your projects to the bring your projects to the