! LIT: ! L earned I ntermediate representation T raining for Model - - PowerPoint PPT Presentation

lit
SMART_READER_LITE
LIVE PREVIEW

! LIT: ! L earned I ntermediate representation T raining for Model - - PowerPoint PPT Presentation

! LIT: ! L earned I ntermediate representation T raining for Model Compression Animesh Koratana*, Daniel Kang* , Peter Bailis, Matei Zaharia DAWN Project, Stanford InfoLab http://dawn.cs.stanford.edu/ LIT can compress models up to 4x on CIFAR10:


slide-1
SLIDE 1

! LIT: !

Learned Intermediate representation Training for Model Compression

Animesh Koratana*, Daniel Kang*, Peter Bailis, Matei Zaharia DAWN Project, Stanford InfoLab

http://dawn.cs.stanford.edu/

slide-2
SLIDE 2

LIT can compress models up to 4x on CIFAR10: ResNet -> ResNet

This s talk: achieving higher compression

  • n modern deep

networks

slide-3
SLIDE 3

Deep networks can be compressed to reduce inference costs

e.g., deep compression, knowledge distillation, FitNets, …

Deep compression Knowledge distillation

These methods are largely architecture agnostic

slide-4
SLIDE 4

LIT: Learned Intermediate-representation Training for modern, very deep networks

Modern networks have highly repetitive sections – can we compress them?

18 residual blocks 9 residual blocks 18 residual blocks 18 residual blocks 9 residual blocks 9 residual blocks

IR comparison KD comparison

Teacher model: ResNet-110 Student model: ResNet-56

FC layer FC layer

KD loss

Losses

slide-5
SLIDE 5

LIT: Learned Intermediate-representation Training for modern, very deep networks

18 residual blocks 9 residual blocks 18 residual blocks 18 residual blocks 9 residual blocks 9 residual blocks

IR comparison KD comparison

Teacher model: ResNet-110 Student model: ResNet-56

FC layer FC layer

IR loss IR loss IR loss KD loss

Losses

LIT penalizes de deviation

  • ns in interme

rmedi diate repr presentation

  • ns of

architectures with the same width

slide-6
SLIDE 6

LIT uses the ou

  • utpu

put of the teacher model’s pr previou

  • us section
  • n as

input to the student model’s cu curren ent s sect ection

LIT: Learned Intermediate-representation Training for modern, very deep networks

18 residual blocks 9 residual blocks 18 residual blocks 18 residual blocks 9 residual blocks 9 residual blocks

IR comparison KD comparison

Teacher model: ResNet-110 Student model: ResNet-56

FC layer FC layer

Training only

IR loss IR loss IR loss KD loss

Losses

slide-7
SLIDE 7

LIT can compress models up to 4x on CIFAR10: ResNet -> ResNet

slide-8
SLIDE 8

LIT can compress StarGAN up to 1.8x

Student model outperforms teacher in Inception/FID score

slide-9
SLIDE 9

LIT can compress GANs up to 1.8x

Teacher (18) Student (10) Scratch (10)

Original Black hair Blond hair Brown hair Gender Age

Student model also outperforms teacher in qualitative evaluation

slide-10
SLIDE 10

Conclusions

Neural networks are becoming more expensive to deploy LIT is a novel technique that combines both: 1. Intermediate representations and 2. matching outputs that improves training to give 3-5x compression for many tasks ddkang@stanford.edu koratana@stanford.edu Find our poster at Pacific Ballroom, #17!