lit
play

! LIT: ! L earned I ntermediate representation T raining for Model - PowerPoint PPT Presentation

! LIT: ! L earned I ntermediate representation T raining for Model Compression Animesh Koratana*, Daniel Kang* , Peter Bailis, Matei Zaharia DAWN Project, Stanford InfoLab http://dawn.cs.stanford.edu/ LIT can compress models up to 4x on CIFAR10:


  1. ! LIT: ! L earned I ntermediate representation T raining for Model Compression Animesh Koratana*, Daniel Kang* , Peter Bailis, Matei Zaharia DAWN Project, Stanford InfoLab http://dawn.cs.stanford.edu/

  2. LIT can compress models up to 4x on CIFAR10: ResNet -> ResNet This s talk : achieving higher compression on modern deep networks

  3. Deep networks can be compressed to reduce inference costs e.g., deep compression, knowledge distillation, FitNets, … Deep compression Knowledge distillation These methods are largely architecture agnostic

  4. LIT: Learned Intermediate-representation Training for modern, very deep networks Losses IR comparison KD comparison Teacher model: ResNet-110 18 residual 18 residual 18 residual FC blocks blocks blocks layer KD loss Student model: ResNet-56 9 residual 9 residual 9 residual FC blocks blocks blocks layer Modern networks have highly repetitive sections – can we compress them?

  5. LIT: Learned Intermediate-representation Training for modern, very deep networks Losses IR comparison KD comparison Teacher model: ResNet-110 18 residual 18 residual 18 residual FC blocks blocks blocks layer IR loss IR loss IR loss KD loss Student model: ResNet-56 9 residual 9 residual 9 residual FC blocks blocks blocks layer LIT penalizes de deviation ons in interme rmedi diate repr presentation ons of architectures with the same width

  6. LIT: Learned Intermediate-representation Training for modern, very deep networks Training only Losses IR comparison KD comparison Teacher model: ResNet-110 18 residual 18 residual 18 residual FC blocks blocks blocks layer IR loss IR loss IR loss KD loss Student model: ResNet-56 9 residual 9 residual 9 residual FC blocks blocks blocks layer LIT uses the ou outpu put of the teacher model’s pr previou ous section on as input to the student model’s cu curren ent s sect ection

  7. LIT can compress models up to 4x on CIFAR10: ResNet -> ResNet

  8. LIT can compress StarGAN up to 1.8x Student model outperforms teacher in Inception/FID score

  9. LIT can compress GANs up to 1.8x Original Black hair Blond hair Brown hair Gender Age Teacher (18) Student (10) Scratch (10) Student model also outperforms teacher in qualitative evaluation

  10. Conclusions Neural networks are becoming more expensive to deploy LIT is a novel technique that combines both: 1. Intermediate representations and 2. matching outputs that improves training to give 3-5x compression for many tasks ddkang@stanford.edu Find our poster at koratana@stanford.edu Pacific Ballroom, #17!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend