Mocha.jl
Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT
Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, - - PowerPoint PPT Presentation
Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision : Image Classification,
Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT
Detection, Semantic Segmentation, etc.
Processing: Acoustic Modeling, Language Modeling, Word / Sentence embedding
Winner of ILSVRC 2014, 27 layers, ~7 million parameters
Deep learning dominated since 2012; surpassing “human performance” since 2015.
7.5 15 22.5 30 Top-5 error 2010 2011 2012 2013 2014 Human ArXiv 2015
image source: Li Deng and Dong Yu. Deep Learning – Methods and Applications.
(layers, or nodes) connected via a specific architecture.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural Networks.” NIPS 2012.
A deep learning toolkit provides common layers, easy ways to define network architecture, and transparent interface to high performance computation backends (BLAS, GPUs, etc.)
convnet, etc.
NervanaSystems/neon, etc.
pre/post processing and visualization tools from Julia.
for fast prototyping.
based backend for highly efficient deep nets training.
topology, etc. Easily extendable.
inspired by Caffe; focusing on easy prototyping, customization, and efficiency (switchable computation backends) > Pkg.add(“Mocha”) > Pkg.test(“Mocha”) > Pkg.checkout(“Mocha”)
IJulia Image classification example with pre-trained Imagenet model.
width-by-height-by-channels-by-batch
supported by numpy, Matlab, etc.
data_layer = AsyncHDF5DataLayer(name="train-data", source="data/ train.txt", batch_size=64, shuffle=true)
data while waiting for computation on CPU / GPU
conv_layer = ConvolutionLayer(name="conv1", n_filter=20, kernel=(5,5), bottoms=[:data], tops=[:conv])
INPUT 32x32
Convolutions Subsampling Convolutions
C1: feature maps 6@28x28
Subsampling
S2: f. maps 6@14x14 S4: f. maps 16@5x5 C5: layer 120 C3: f. maps 16@10x10 F6: layer 84
Full connection Full connection Gaussian connections
OUTPUT 10
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
pool_layer = PoolingLayer(name="pool1", kernel=(2,2), stride=(2,2), bottoms=[:conv], tops=[:pool])
MEAN pooling by specifying pooling=Pooling.Mean()
determined by connecting tops (output) blobs to bottoms (input) blobs with matching blob names.
sorted and connected as a directed acyclic graph (DAG).
conv2_layer = ConvolutionLayer(name="conv2", n_filter=50, kernel=(5,5), bottoms=[:pool], tops=[:conv2]) pool2_layer = PoolingLayer(name="pool2", kernel=(2,2), stride=(2,2), bottoms=[:conv2], tops=[:pool2]) fc1_layer = InnerProductLayer(name="ip1", output_dim=500, neuron=Neurons.ReLU(), bottoms=[:pool2], tops=[:ip1]) fc2_layer = InnerProductLayer(name="ip2", output_dim=10, bottoms=[:ip1], tops=[:ip2]) loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:ip2, :label])
params = SolverParameters(max_iter=10000, regu_coef=0.0005, mom_policy=MomPolicy.Fixed(0.9), lr_policy=LRPolicy.Inv(0.01, 0.0001, 0.75), load_from=exp_dir) solver = SGD(params)
… for the solver
setup_coffee_lounge(solver, save_into="$exp_dir/ statistics.jld", every_n_iter=1000) # report training progress every 100 iterations add_coffee_break(solver, TrainingSummary(), every_n_iter=100) # save snapshots every 5000 iterations add_coffee_break(solver, Snapshot(exp_dir), every_n_iter=5000)
Solver statistics will be automatically saved if coffee lounge is set up. Snapshots save the training progress periodically, can continue training from the last snapshot after interruption.
backend = use_gpu ? GPUBackend() : CPUBackend()
convolution, inner-product layers), those parameters will be registered under the layer name, and shared by layers with the same name
3rd most star-ed Julia package Contributions are very welcome!