Model Compression
Seminar: Advanced Machine Learning, SS 2016 Markus Beuckelmann markus.beuckelmann@stud.uni-heidelberg.de
July 19, 2016
Markus Beuckelmann Model Compression July 19, 2016 1 / 33
Model Compression Seminar: Advanced Machine Learning, SS 2016 - - PowerPoint PPT Presentation
Model Compression Seminar: Advanced Machine Learning, SS 2016 Markus Beuckelmann markus.beuckelmann@stud.uni-heidelberg.de July 19, 2016 Markus Beuckelmann Model Compression July 19, 2016 1 / 33 Introduction Outline Outline 1 Overview
Markus Beuckelmann Model Compression July 19, 2016 1 / 33
Introduction Outline
1 Overview & Motivation
2 Recap: Neural Networks for Prediction 3 Neural Network Compression & Model Compression
4 Summary
Markus Beuckelmann Model Compression July 19, 2016 2 / 33
Introduction Overview & Motivation
Markus Beuckelmann Model Compression July 19, 2016 3 / 33
Introduction Overview & Motivation
(Han et al., 2015) (Tensorflow) Markus Beuckelmann Model Compression July 19, 2016 4 / 33
Introduction Overview & Motivation
How good is your model in terms of...?
Markus Beuckelmann Model Compression July 19, 2016 5 / 33
Introduction Overview & Motivation
AlexNet (Krizhevsky et al., 2012)
(Krizhevsky et al., 2012)
Markus Beuckelmann Model Compression July 19, 2016 6 / 33
Introduction Overview & Motivation
Smartphone Hardware (2016)
(Micriµm, Embedded Software)
Markus Beuckelmann Model Compression July 19, 2016 7 / 33
Introduction Overview & Motivation
(Han et al., 2015) Markus Beuckelmann Model Compression July 19, 2016 8 / 33
Introduction Overview & Motivation
(Han et al., 2015)
Markus Beuckelmann Model Compression July 19, 2016 8 / 33
Neural Networks
Markus Beuckelmann Model Compression July 19, 2016 9 / 33
Neural Networks Neural Networks
Feed–Forward Networks
⊤z(i), z(0) := x
(Rajesh Rai, AI lecture) (http: // deepdish. io ) Markus Beuckelmann Model Compression July 19, 2016 10 / 33
Neural Networks Neural Networks
(Zeiler, 2013)
Loss functions
N
i=1
N
i=1 K
k=1
k=1
k )
Markus Beuckelmann Model Compression July 19, 2016 11 / 33
Model Compression Methods
Markus Beuckelmann Model Compression July 19, 2016 12 / 33
Model Compression Methods Pruning
(Ben Lorica, O’Reilly Media) Markus Beuckelmann Model Compression July 19, 2016 13 / 33
Model Compression Methods Pruning
(Ben Lorica, O’Reilly Media)
Important Questions
Markus Beuckelmann Model Compression July 19, 2016 13 / 33
Model Compression Methods Pruning
(Seeman et al., 1987) Markus Beuckelmann Model Compression July 19, 2016 14 / 33
Model Compression Methods Pruning
Markus Beuckelmann Model Compression July 19, 2016 15 / 33
Model Compression Methods Pruning
Markus Beuckelmann Model Compression July 19, 2016 15 / 33
Model Compression Methods Pruning
Markus Beuckelmann Model Compression July 19, 2016 15 / 33
Model Compression Methods Pruning
i
(i,j)
∂2ℒ ∂wi∂wj
Markus Beuckelmann Model Compression July 19, 2016 16 / 33
Model Compression Methods Pruning
i
(i,j)
1 2
i
(H)iiδw2
i + 1 2
i̸=j
δwi(H)ijδwj
Approximations
Markus Beuckelmann Model Compression July 19, 2016 17 / 33
Model Compression Methods Pruning
i
(i,j)
1 2
i
(H)iiδw2
i + 1 2
i̸=j
δwi(H)ijδwj
Approximations
i
i → 𝑇k = 1
k
Markus Beuckelmann Model Compression July 19, 2016 17 / 33
Model Compression Methods Pruning
1 Choose a reasonable network architecture 2 Train the network until a reasonable local minimum is obtained 3 Compute the diagonal of the Hessian, i.e. (H)kk 4 Compute the saliencies given by 𝑇k = 1
k for each parameter 5 Sort the parameters by 𝑇k 6 Delete parameters with low–saliency 7 (Optional: Iterate to step 2)
Markus Beuckelmann Model Compression July 19, 2016 18 / 33
Model Compression Methods Pruning
(Le Cun et al., 1990) Markus Beuckelmann Model Compression July 19, 2016 19 / 33
Model Compression Methods Pruning
(Le Cun et al., 1990) Markus Beuckelmann Model Compression July 19, 2016 20 / 33
Model Compression Methods Pruning
δwk
k
δW
Markus Beuckelmann Model Compression July 19, 2016 21 / 33
Model Compression Methods Pruning
k
k
The obvious drawback...
Markus Beuckelmann Model Compression July 19, 2016 22 / 33
Model Compression Methods Knowledge Distillation
Motivation: An Analogy to Insects
Similarly: There are different requirements for training and testing
Markus Beuckelmann Model Compression July 19, 2016 23 / 33
Model Compression Methods Knowledge Distillation
(Yangyang, 2014)
The Algorithm
1 Feed teacher with data 2 Obtain logits from teacher (transfer training set) 3 Train student on these logits (ℓ2-regression)
Markus Beuckelmann Model Compression July 19, 2016 24 / 33
Model Compression Methods Knowledge Distillation
T )
k=1
a(l)
k
T )
(Yangyang, 2014)
The Algorithm
1 Feed teacher with data (𝑈1) 2 Obtain soft targets from teacher (𝑈1) (transfer training set) 3 Train student on soft targets (𝑈1) with cross–entropy loss 4 Use student with 𝑈 < 𝑈1
Markus Beuckelmann Model Compression July 19, 2016 25 / 33
Model Compression Methods Knowledge Distillation
Why does this work?
Markus Beuckelmann Model Compression July 19, 2016 26 / 33
Model Compression Methods Knowledge Distillation
Why does this work?
Markus Beuckelmann Model Compression July 19, 2016 26 / 33
Model Compression Methods Deep Compression
(Han et al., 2015) Markus Beuckelmann Model Compression July 19, 2016 27 / 33
Model Compression Methods Deep Compression
(Han et al., 2015) (Han et al., 2015) (Han et al., 2015) Markus Beuckelmann Model Compression July 19, 2016 28 / 33
Model Compression Methods Deep Compression
(Han et al., 2015) (Han et al., 2015) Markus Beuckelmann Model Compression July 19, 2016 29 / 33
Model Compression Methods Deep Compression
(Han et al., 2015)
(Han et al., 2015) Markus Beuckelmann Model Compression July 19, 2016 30 / 33
Markus Beuckelmann Model Compression July 19, 2016 31 / 33
Summary
Model Compression: Why?
Model Compression: Different Approaches
1 Pruning: Selectively removing weights (by saliency for OBD & OBS) 2 Knowledge Distillation: Try to distill the model’s function 𝜚(x) directly 3 Deep Compression: Pruning – Quantization – Encoding
Markus Beuckelmann Model Compression July 19, 2016 32 / 33
Summary Resources
Markus Beuckelmann Model Compression July 19, 2016 33 / 33
Summary Resources
Markus Beuckelmann Model Compression July 19, 2016 33 / 33
Markus Beuckelmann Model Compression July 19, 2016 1 / 3
zi T
j
zj T
vi T
j
vj T
T
j zj T
T
j vj T
Markus Beuckelmann Model Compression July 19, 2016 2 / 3
j
(i,j)
ij
(i,j)
i
j
i
l
li
l
Markus Beuckelmann Model Compression July 19, 2016 3 / 3