when ensembling smaller models is more effjcient than
play

When Ensembling Smaller Models is More Effjcient than Single Large - PowerPoint PPT Presentation

When Ensembling Smaller Models is More Effjcient than Single Large Models WebVision 2020 Dan Kondratyuk, Mingxing Tan, Matuhew Brown, Boqing Gong {dankondratyuk,tanmingxing,mtbr,bgong}@google.com Model Ensembles Train multiple models and


  1. When Ensembling Smaller Models is More Effjcient than Single Large Models WebVision 2020 Dan Kondratyuk, Mingxing Tan, Matuhew Brown, Boqing Gong {dankondratyuk,tanmingxing,mtbr,bgong}@google.com

  2. Model Ensembles Train multiple models and average their predictions during inference ● E.g., train a neural network architecture with difgerent random initializations ○ Easy method to reduce prediction error ● Introduces heavy effjciency penalties ● Prediction Most commonly reserved for the largest models ○ Can small ensembles be effjcient ? ● Aggregation ... Model Model Model 1 2 N Input Example

  3. Image Classifjcation - Wide ResNet - CIFAR 10 Ensembles can be both ● more accurate and more effjcient Each line represents one ○ model architecture Each point indicates the ○ number of models ensembled As model sizes get larger, ○ the pergormance gap widens Larger ensembles produce ○ diminishing returns and become less effjcient

  4. Image Classifjcation - EffjcientNet - ImageNet This trend appears for ● highly optimized models on larger datasets as well EffjcientNet scales the ○ width, depth, and resolution of each model size

  5. NAS Ensemble - ImageNet Can we use NAS to ● generate diverse ensemble architectures? Can architecture diversity ○ boost the accuracy to FLOPs/latency ratio? Pareto curve shown for model ○ ensembles searched with NAS Surprisingly, a single searched ○ model pergorms nearly the same as a diverse ensemble Latency (ms)

  6. Conclusion Ensembles of smaller models can be more accurate and more effjcient ● than single large models, especially as model size grows One can use ensembles as a more fmexible trade-ofg between a model’s inference ○ speed and accuracy Ensembles can be easily distributed across multiple workers, furuher increasing ○ effjciency A single searched model using NAS can fjnd a well-optimized architecture ● for ensembling However, ensembling diverse architectures from a search on multiple models pergorms ○ nearly the same as ensembling one model architecture from the search

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend