Huawei Noah’s Ark Lab Yunhe Wang
AI on the Edge
— Discussion on the Gap Between Industry and Academia
AI on the Edge Discussion on the Gap Between Industry and Academia - - PowerPoint PPT Presentation
AI on the Edge Discussion on the Gap Between Industry and Academia Yunhe Wang Huawei Noahs Ark Lab ABOUT ME Enthusiasm PKUer Researcher Programmer Yunhe Wang www.wangyunhe.site yunhe.wang@huawei.com Deep Model Compression [Han
— Discussion on the Gap Between Industry and Academia
Enthusiasm
Programmer
PKUer Researcher
Yunhe Wang
www.wangyunhe.site yunhe.wang@huawei.com
[Han et. al. NIPS 2015] [Han et. al. ICLR 2016 best paper award]
encoding.
Restrictions for using AI on the edge.
Deep Model Compression
CNNpack: Packing Convolutional Neural Networks in the Frequency Domain (NIPS 2016)
Compressed AlexNet VGGNet-16 ResNet-50
rc 39x 46x 12x rs 25x 9.4x 4.4x Top1-err 41.6% 29.7% 25.2% Top5-err 19.2% 10.4% 7.8%
Input data DCT bases DCT feature maps Weighted combination Feature maps of this layer DCT bases K-means clustering
0.499 0.498 0.501 0.502 0.500 0.5 Huffman & CSR storage
Original filters l1-shrinkage Quantization Compression
232 572 95 5.9 12.4 7.9
200 400 600 800 AlexNet VGGNet-16 ResNet-50 Memory (MB)
7e8 2e10 3.8e9 3e7 2.1e9 8.5e8
0.00E+00 5.00E+09 1.00E+10 1.50E+10 2.00E+10 2.50E+10
AlexNet VGGNet-16 ResNet-50 Multiplications
Input Images Teacher Network Student Network
Discriminator (Assistant)
Feature Space Teacher Feature Student Feature
LGAN = 1
n
Pn
i=1 H(oi S, yi) + γ 1 n
Pn
i=1
⇥ log(D(zi
T )) + log(1 − D(zi S))
⇤ , We suggest to develop a teaching assistant network to identify the difference between features generated by student and teacher network:
Adversarial Learning of Portable Student Networks (AAAI 2018)
Visualization results of different networks trained on the MNIST dataset, where features of a specific category in every sub-figure are represented in the same color: (a) features of the original teacher network; (b) features
network learned using the proposed method with a teaching assistant. (a) accuracy = 99.2% (b) accuracy = 97.2% (c) accuracy = 99.1%
Adversarial Learning of Portable Student Networks (AAAI 2018)
An illustration of the evolution of LeNet on the MNIST dataset. Each dot represents an individual in the population, and the thirty best individuals are shown in each evolutional iteration. The fitness of individuals is gradually improved with an increasing number of iterations, implying that the network is more compact but remaining the same accuracy. Original Filters: Remained Filters: Retrained Filters:
Toward Evolutionary Compression (SIGKDD 2018)
Two generators in CycleGAN will be simultaneously compressed:
Statistics of compressed generators
P30 Pro Latency: 6.8s -> 2.1s
Co-Evolutionary Compression for GANs (ICCV 2019)
Generator A Generator B Generator A Generator B Gen A Gen B Iteration = 1 Iteration = 2 Iteration = T
… … … …
Population A Population A Population A Population B Population B Population B
… … Input Baseline ThiNet Ours
Student Network Teacher Network Random Signals Generated Images Generative Network Distillation
A generator is introduced to approximate training data
DAFL: Data-Free Learning of Student Networks (ICCV 2019)
How to provide perfect model optimization service on the cloud?
Privacy-Related AI Applications
Entertain ment APP FaceID Voice assistant Finger print
Original and Generated Face Images 98.20% on MNIST 92.22% on CIFAR-10 74.47% on CIFAR-100
AdderNet: Do We Really Need Multiplications in Deep Learning?(CVPR 2020)
Using Add in Deep Learning can significantly reduce the energy consumption and area cost of chips.
https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf http://eecs.oregonstate.edu/research/vlsi/teaching/ECE471_WIN15/mark_horowitz_ISSCC_2014.pdf http://eyeriss.mit.edu/2019_neurips_tutorial.pdf
Feature Visualization on MNIST Adder Network Convolutional Network Feature calculation in adder neural network: Feature calculation in convolutional neural network: Validations on ImageNet
Huawei HDC 2020: Real-time Video Style Transfer
Inference Time: about 630ms Inference Time: 60ms Huawei Atlas 200 AI Accelerator Module The key approaches used for completing this task:
1. Model Distillation: remove the optical flow module in the original network 2. Filter Pruning: reduce the computational complexity of the video generator 3. Operator Optimization: automatically select the suitable operators in Atlas 200 https://developer.huaweicloud.com/exhibition/Atlas_neural_style.html
The 4 reasons to move deep learning workloads from the cloud down on to the device
ü fast ü large memory ü free energy resource Server/Cloud Mobile device
resource Deep Neural Network
Github Link Zhihu (知乎)
Thank You!
Contact me: yunhe.wang@huawei.com, wangyunhe@pku.edu.cn http://www.wangyunhe.site