A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks
Yuzhe Ma1, Ran Chen1, Wei Li1, Fanhua Shang2, Wenjian Yu3, Minsik Cho4, Bei Yu1
1CUHK, 2Xidian Univ., 3Tsinghua Univ. 4IBM T. J. Watson
1 / 23
A Unified Approximation Framework for Compressing and Accelerating - - PowerPoint PPT Presentation
A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks Yuzhe Ma 1 , Ran Chen 1 , Wei Li 1 , Fanhua Shang 2 , Wenjian Yu 3 , Minsik Cho 4 , Bei Yu 1 1 CUHK, 2 Xidian Univ., 3 Tsinghua Univ. 4 IBM T. J. Watson 1 / 23
1CUHK, 2Xidian Univ., 3Tsinghua Univ. 4IBM T. J. Watson
1 / 23
2 / 23
1
1Alfredo Canziani, Adam Paszke, and Eugenio Culurciello (2016). “An analysis of deep neural network models for practical
applications”. In: arXiv preprint arXiv:1605.07678.
3 / 23
4 / 23
2
2Song Han and William J Dally (2018). “Bandwidth-efficient deep learning”. In: Proc. DAC, pp. 1–6. 5 / 23
3
3Song Han and William J Dally (2018). “Bandwidth-efficient deep learning”. In: Proc. DAC, pp. 1–6. 6 / 23
7 / 23
4Wei Wen et al. (2016). “Learning structured sparsity in deep neural networks”. In: Proc. NIPS, pp. 2074–2082. 5Yihui He, Xiangyu Zhang, and Jian Sun (2017). “Channel Pruning for Accelerating Very Deep Neural Networks”. In:
8 / 23
6Xiangyu Zhang et al. (2015). “Efficient and accurate approximations of nonlinear convolutional networks”. In:
7Xiyu Yu et al. (2017). “On compressing deep models by low rank and sparse decomposition”. In: Proc. CVPR,
9 / 23
ReLU
W N
W N
8Xiangyu Zhang et al. (2015). “Efficient and accurate approximations of nonlinear convolutional networks”. In:
10 / 23
ReLU
11 / 23
A,B N
12 / 23
A,B N
F + λ1 A2,1 + λ2 B∗
13 / 23
A,B,M N
F + λ1 A2,1 + λ2 B∗ ,
N
F + λ1 A2,1 + λ2 B∗ + Λ, A + B − M + t
F
14 / 23
A
F
B
F
M N
F + Λk, Ak+1 + Bk+1 − M + t
F ,
15 / 23
A λ1 A2,1 + t
F
t ·2,1(Mk − Bk − Λk
t
9Guangcan Liu et al. (2013). “Robust recovery of subspace structures by low-rank representation”. In: IEEE TPAMI 35.1,
16 / 23
B λ2 B∗ + t
F
t ·∗(Mk − Ak+1 − Λk
t (Σ)V, where D λ2 t (Σ) = diag({(σi − λ2
10Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen (2010). “A singular value thresholding algorithm for matrix
completion”. In: SIAM Journal on Optimization (SIOPT) 20.4, pp. 1956–1982.
17 / 23
M N
F + Λk, Ak+1 + Bk+1 − M + t
F
18 / 23
11Hao Li et al. (2017). “Pruning filters for efficient convnets”. In: Proc. ICLR. 12Cheng Tai et al. (2016). “Convolutional neural networks with low-rank regularization”. In: Proc. ICLR. 13Shiva Prasad Kasiviswanathan, Nina Narodytska, and Hongxia Jin (2018). “Network Approximation using Tensor
Sketching”. In: Proc. IJCAI, pp. 2319–2325.
19 / 23
Comparison of reconstructing linear response and non-linear response: (a) layer conv2-1; (b) layer conv3-1.
20 / 23
50 100 50 100 150 200 250
500 1000 1500 2000 25 50 75 100 125
500 1000 1500 2000 50 100 150 200 250
21 / 23
14Cheng Tai et al. (2016). “Convolutional neural networks with low-rank regularization”. In: Proc. ICLR. 15Yong-Deok Kim et al. (2016). “Compression of deep convolutional neural networks for fast and low power mobile
applications”. In: Proc. ICLR.
16Ruichi Yu et al. (2018). “NISP: Pruning networks using neuron importance score propagation”. In: Proc. CVPR. 22 / 23
23 / 23