Deep Learning g on mobile le phones
- A Practit
itio ionersguid ide
Anirudh Koul, Siddha Ganju, Meher Kasam
Deep Learning g on mobile le phones - A Practit itio ionersguid - - PowerPoint PPT Presentation
Deep Learning g on mobile le phones - A Practit itio ionersguid ide Anirudh Koul, Siddha Ganju, Meher Kasam Deep Learning g on mobile le phones - A Practit itio ionersguid ide Anirudh Koul, Siddha Ganju, Meher Kasam Anirud rudh h
Deep Learning g on mobile le phones
itio ionersguid ide
Anirudh Koul, Siddha Ganju, Meher Kasam
Deep Learning g on mobile le phones
itio ionersguid ide
Anirudh Koul, Siddha Ganju, Meher Kasam
Anirud rudh h Koul @Anirud rudhK hKoul ul
Head of AI & Research, Aira [Lastname]@aira.io
Siddha ha Ganju @SiddhaGa haGanj nju
Architect, Self-Driving Vehicles, NVIDIA [FirstnameLastname]@gmail.com
Meher Anand nd Kasam @MeherK rKasam
Software Engineer, Square [FirstnameMiddlenameK]@gmail.com
0.1 second : Reacting instantly 1.0 seconds : User’s flow of thought 10 seconds : Keeping the user’s attention
[Miller 1968; Card et al. 1991; Jakob Nielsen 1993]:
Mobile Inference Engine + Pretrained Model = DL App (Efficient) (Efficient)
Benchmark & Compare them COCO-Text v2.0 for Text reading in the wild
COCO-Val 2017 for Image T agging in the wild
ag similarity match instead of word match
Text API Accuracy Amazon Rekognition 45.4% Google Cloud Vision 33.4% Microsoft Cognitive Services 55.4% Evaluation criteria:
Evaluation criteria:
Text API Accuracy Amazon Rekognition 65% Google Cloud Vision 47.6% Microsoft Cognitive Services 50.0%
Evaluation criteria:
Text API Accuracy Avg #Tags Amazon Rekognition 65% 14 Google Cloud Vision 47.6% 14 Microsoft Cognitive Services 50.0% 8
Hard to do Precision-Recall since COCO ground truth tags are not exhaustive Lower # of tags for a given accuracy indicates higher F-measure Text API Accuracy Avg #Tags Amazon Rekognition 65% 14 Google Cloud Vision 47.6% 14 Microsoft Cognitive Services 50.0% 8
savings in size
recognition
bilinear interpolation
http://deeplearningkit.org/2015/12/28/deeplearningkit-deep-learning-for-ios-tested-on-iphone-6s-tvos-and-os-x-developed-in-metal-and-swift/
Energy to train Convolutional Neural Network Energy to use Convolutional Neural Network
ImageNet – 1000 Object Categorizer VGG16 Inception-v3 Resnet-50 MobileNet SqueezeNet
Core ML TensorFlow Lite Caffe2
Metal BNNS +MPS CoreML CoreML2
2014 2016 2017 2018
Metal
compute shader application programming interface (API)
Metal BNNS +MPS CoreML CoreML2
2014 2016 2017 2018
Fast low-level primitives:
Inconvenient for large networks:
Metal BNNS +MPS CoreML CoreML2
2014 2016 2017 2018
Convert Caffe/Tensorflow model to CoreML model in 3 lines:
import coremltools coreml_model = coremltools.converters.caffe.convert('my_caffe_model.caffemodel’) coreml_model.save('my_model.mlmodel’)
Add model to iOS project and call for prediction. Direct support for Keras, Caffe, scikit-learn, XGBoost, LibSVM Automatically minimizes memory footprint and power consumption
Metal BNNS +MPS CoreML CoreML2
2014 2016 2017 2018
Metal BNNS +MPS CoreML CoreML2
2014 2016 2017 2018
CoreML Be Benchmark - Pic Pick a DNN fo for yo your mobile architecture
Model Top-1 Accurac y Size of Model (MB) iPhone 5S Execution Time (ms) iPhone 6 Execution Time (ms) iPhone 6S/SE Execution Time (ms) iPhone 7 Execution Time (ms) iPhone 8/X Execution Time (ms) VGG 16 71 553 7408 4556 235 181 146 Inception v3 78 95 727 637 114 90 78 Resnet 50 75 103 538 557 77 74 71 MobileNet 71 17 129 109 44 35 33 SqueezeNet 57 5 75 78 36 30 29
2014 2015 2016
Huge improv
GPU hardwa dware in 2015
2013 2017
TensorFlow TensorFlow Mobile TensorFlow Lite
2015 2016 2018
The full, bulky deal
TensorFlow TensorFlow Mobile TensorFlow Lite
2015 2016 2018
TensorFlow TensorFlow Mobile TensorFlow Lite
2015 2016 2018 Easy pipeline to bring Tensorflow models to mobile Excellent documentation Optimizations to bring model to mobile
1 line conversion from Keras to TensorFlow lite
TensorFlow TensorFlow Mobile TensorFlow Lite
2015 2016 2018
akes advantage of on-device hardware acceleration
T ensorFlow Lite ite Be Benchmarks - htt ttp://alpha.lab.numericcal.com/
From Facebook Under 1 MB of binary size Built for Speed : For ARM CPU : Uses NEON Kernels, NNPack For iPhone GPU : Uses Metal Performance Shaders and Metal For Android GPU : Uses Qualcomm Snapdragon NPE (4-5x speedup) ONNX format support to import models from CNTK/PyTorch
detection, Smart reply
esting
By leveraging GPU delegate, ~4x speed up on Pixel 3 ~6x speed up on iPhone7
Keras .tflite file
tflite_convert
“My app has become too big to download. What do I do?”
“Do I need to ship a new app update with every model improvement?”
wait time
“Why does my app not recognize objects at top/bottom of screen?”
Learn Playing an Accordion 3 months
Learn Playing an Accordion 3 months Knows Piano
Fin ine Tune Sk Skil ills
1 week
Step 1 : Find a pre-trained model Step 2 : Fine tune a pre-trained model Step 3 : Run using existing frameworks
Model Zoo
https://modelzoo.co
Papers with Code
https://paperswithcode.com/sota
[Krizhevsky, Sutskever,Hinton’12] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng, “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks”, 11 n-dimension Feature representation
Size of New Datase set Simi milarity to Original Dataset What to do? Large High Fine tune. Small High Don’t Fine Tune, it will overfit. Train linear classifier on CNN Features Small Low Train a classifier from activations in lower layers. Higher layers are dataset specific to older dataset. Large Low Train CNN from scratch
http://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html
Size of New Datase set Simi milarity to Original Dataset What to do? Large High Fine tune. Small High Don’t Fine Tune, it will overfit. Train linear classifier on CNN Features Small Low Train a classifier from activations in lower layers. Higher layers are dataset specific to older dataset. Large Low Train CNN from scratch
http://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html
Size of New Datase set Simi milarity to Original Dataset What to do? Large High Fine tune. Small High Don’t Fine Tune, it will overfit. Train linear classifier on CNN Features Small Low Train a classifier from activations in lower layers. Higher layers are dataset specific to older dataset. Large Low Train CNN from scratch
http://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html
Size of New Datase set Simi milarity to Original Dataset What to do? Large High Fine tune. Small High Don’t Fine Tune, it will overfit. Train linear classifier on CNN Features Small Low Train a classifier from activations in lower layers. Higher layers are dataset specific to older dataset. Large Low Train CNN from scratch
http://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html
Cust stom Vis ision Ser Service (customvision.ai) i) – Drag and drop tr training
Tip : Upload 30 photos per class for make prototype model Upload 200 photos per class for more robust production model More distinct the shape/type of object, lesser images required.
Cust stom Vis ision Ser Service (customvision.ai) i) – Drag and drop tr training
Tip : Use Fatkun Browser Extension to download images from Search Engine,
proper rights
CoreML exporter from customvision.ai – Drag and drop tr training
5 minute shortcut to training, finetuning and getting model ready in CoreML format Drag and drop interface
Live Guide user in finding a barcode with audio cues With Server Decode barcode to identify product Tech MPSCNN running on mobile GPU + barcode library Metrics 40 FPS (~25 ms) on iPhone 7
Aim : Help blind users identify products using barcode Issue ue : Blind users don’t know where the barcode is
Aim : Identify currency
Live Identify denomination of paper currency instantly With Server
T ask specific CNN running on mobile GPU Metrics 40 FPS (~25 ms) on iPhone 7
Request volunteers to take photos of objects in non-obvious settings Sends photos to cloud, trains model nightly Newsletter shows the best photos from volunteers Let them compete for fame
What you want
https://www.flickr.com/photos/kenjonbro/9075514760/and http://www.newcars.com/land-rover/range-rover-sport/2016
$2000
$200,000
What you can afford
11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 AlexNet, 8 layers (ILSVRC 2012)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015
11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000
AlexNet, 8 layers (ILSVRC 2012)
3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000
VGG, 19 layers (ILSVRC 2014)
input Conv 7x7+ 2(S) MaxPool 3x3+ 2(S) LocalRespNorm Conv 1x1+ 1(V) Conv 3x3+ 1(S) LocalRespNorm MaxPool 3x3+ 2(S) Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat MaxPool 3x3+ 2(S) Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) AveragePool 5x5+ 3(V) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) AveragePool 5x5+ 3(V) Dept hConcat MaxPool 3x3+ 2(S) Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat AveragePool 7x7+ 1(V) FC Conv 1x1+ 1(S) FC FC Soft maxAct ivat ion soft max0 Conv 1x1+ 1(S) FC FC Soft maxAct ivation soft max1 Soft maxAct ivat ion soft max2GoogleNet, 22 layers (ILSVRC 2014)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015
AlexNet, 8 layers (ILSVRC 2012) ResNet, 152 layers (ILSVRC 2015)
3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000 11x11 conv, 96, /4, pool/2 5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x2 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 256, /2 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 512, /2 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 ave pool, fc 1000 7x7 conv, 64, /2, pool/2VGG, 19 layers (ILSVRC 2014)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015
Ultra deep
ResNet, 152 layers
1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x2 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv 128 7x7 conv, 64, /2, pool/2Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015
28.2 25.8 16.4 11.7 7.3 6.7 3.6 2.9 ILSVRC'10 ILSVRC'11 ILSVRC'12 AlexNet ILSVRC'13 ILSVRC'14 VGG ILSVRC'14 GoogleNet ILSVRC'15 ResNet ILSVRC'16 Ensemble
ImageNet Classification top-5 error (%)
shallow 8 layers 19 layers 22 layers
152 layers
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015
Ensemble of Resnet, Inception Resnet, Inception and Wide Residual Network
Size is proportional to num parameters
Alfredo Canziani, Adam Paszke, Eugenio Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications” 2016
552 MB
240 MB
What t we want
Your Budget - Sm Smartphone Floating Point Operations Per Se Second (2015)
http://pages.experts-exchange.com/processing-power-compared/
https://thenextweb.com/apple/2017/09/12/apples-new-iphone-x-already-destroying-android-devices-g/
Before training
After training
CoreML Be Benchmark - Pic Pick a DNN fo for yo your mobile architecture
Model Top-1 Accura cy Size of Model (MB) Million Multi Adds iPhone 5S Execution Time (ms) iPhone 6 Execution Time (ms) iPhone 6S/SE Execution Time (ms) iPhone 7 Execution Time (ms) iPhone 8/X Execution Time (ms) VGG 16 71 553 15300 7408 4556 235 181 146 Inception v3 78 95 5000 727 637 114 90 78 Resnet 50 75 103 3900 538 557 77 74 71 MobileNet 71 17 569 129 109 44 35 33 SqueezeN et 57 5 800 75 78 36 30 29
2014 2015 2016
Huge improv
GPU hardwa dware in 2015
2013 2017
Splits the convolution into a 3x3 depthwise conv and a 1x1 pointwise conv Tune with two parameters – Width Multiplier and resolution multiplier
Andrew G. Howard et al, "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”, 2017
https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html
MobileNetV2 is the current favorite
Jonathan Huang et al, "Speed/accuracy trade-offs for modern convolutional object detectors”, 2017
Jonathan Huang et al, "Speed/accuracy trade-offs for modern convolutional object detectors”, 2017
ICNet - Image cascade network
Three layers of 3x3 convolutions >> One layer of 7x7 convolution Replace large 5x5, 7x7 convolutions with stacks of 3x3 convolutions Replace NxN convolutions with stack of 1xN and Nx1 Fewer parameters ☺ Less compute ☺ More non-linearity ☺
Better Faster Stronge ger
Andrej Karpathy, CS-231n Notes, Lecture 11
Idea : Augment data limited to how your network will be used Example : If making a selfie app, no benefit in rotating training images beyond +-45 degrees. Your phone will anyway rotate. Followed by WordLens / Google Translate Example : Add blur if analyzing mobile phone frames
Aim : Remove all connections with absolute weights below a threshold
Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015
AlexNet 240 MB VGG-16 552 MB 96% of all parameters 90% of all parameters
Pr Pruning gets ts quickest model compression wit ithout accuracy lo loss ss
AlexNet 240 MB VGG-16 552 MB First layer which directly interacts with image is sensitive and cannot be pruned too much without hurting accuracy
(x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile( optimizer='adam’, loss= ‘sparse_categorical_crossentropy’, metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test)
(x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(),
prune.Prune(tf.keras.layers.Dense(512, activation=tf.nn.relu)),
tf.keras.layers.Dropout(0.2),
prune.Prune(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
]) model.compile( optimizer='adam’, loss= ‘sparse_categorical_crossentropy’, metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test)
Idea : Cluster weights with similar values together, and store in a dictionary. Codebook Huffman coding HashedNets Cons: Need a special inference engine, doesn’t work for most applications
Filt ilter Pr Pruning - ThiN iNet
Idea : Discard whole filter if not important to predictions Advantage:
Just like feature selection, select filter to discard. Possible greedy methods:
Reduce precision from 32 bits to <=16 bits or lesser Use stochastic rounding for best results In Practice:
Reducing CoreML models to half size
# Load a model, lower its precision, and then save the smaller model. model_spec = coremltools.utils.load_spec(‘model.mlmodel’) model_fp16_spec = coremltools.utils.convert_neural_network_spec_weights_to_fp16(model_spec) coremltools.utils.save_spec(model_fp16_spec, ‘modelFP16.mlmodel')
Reducing CoreML models to even smaller size Choose bits and quantization mode Bits from [1,2,4,8] Quantization mode from [“linear","linear_lut","kmeans_lut",”custom_lut”]
from coremltools.models.neural_network.quantization_utils import * quantized_model= quantize_weights(model, 8, 'linear') quantized_model.save('quantizedModel.mlmodel’) compare_model(model, quantized_model, './sample_data/')
Idea :Reduce the weights to -1,+1 Speedup : Convolution operation can be approximated by only summation and subtraction
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
Idea :Reduce the weights to -1,+1 Speedup : Convolution operation can be approximated by only summation and subtraction
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
Idea :Reduce the weights to -1,+1 Speedup : Convolution operation can be approximated by only summation and subtraction
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
Idea :Reduce both weights + inputs to -1,+1 Speedup : Convolution operation can be approximated by XNOR and Bitcount operations
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
Idea :Reduce both weights + inputs to -1,+1 Speedup : Convolution operation can be approximated by XNOR and Bitcount operations
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
Idea :Reduce both weights + inputs to -1,+1 Speedup : Convolution operation can be approximated by XNOR and Bitcount operations
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
Off the shelf CNNs not robust for video Solutions:
Competitions to to foll llow
Winners = High accuracy + Low energy consumption * LPIRC - Low-Power Image Recognition Challenge * EDLDC - Embedded deep learning design contest * System Design Contest at Design Automation Conference (DAC)
MnasNet: Platform-Aware Neural Architecture Search for Mobile
models using reinforcement learning
platform
Sample models from search space
Trainer Mobile phones Multi-objective reward
latency reward
Controller
accuracy
For same accuracy:
parameters
ntist t PhD