parameter efficient training of deep convolutional neural
play

Parameter efficient training of deep convolutional neural networks - PowerPoint PPT Presentation

Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization Hesham Mostafa (Intel AI) Xin Wang (Intel AI, Cerebras Systems) Easy : post-training (sparse) compression Hard : direct training of sparse


  1. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization Hesham Mostafa (Intel AI) Xin Wang (Intel AI, Cerebras Systems)

  2. Easy : post-training (sparse) compression Hard : direct training of sparse networks Compression

  3. “Winning lottery tickets” (Frankle & Carbin 2018): post hoc identification of trainable sparse nets Compression

  4. Dynamic sparse reparameterization (ours): training-time structural exploration

  5. Direct training sparse nets to generalize as well as post-training compression : is this possible? - YES Directly trained sparse nets : are they “winning lottery tickets”? - NO

  6. Dynamic sparse reparameterization prune grow 1 for each sparse parameter tensor W i do ( W i , k i ) ← prune_by_threshold ( W i , H ) ◃ k i is the number of pruned weights 2 l i ← number_of_nonzero_entries ( W i ) ◃ Number of surviving weights after pruning 3 4 end for 5 ( K, L ) ← ( � i k i , � i l i ) ◃ Total number of pruned and surviving weights 6 H ← adjust_pruning_threshold ( H, K, δ ) ◃ Adjust pruning threshold 7 for each sparse parameter tensor W i do W i ← grow_back ( W i , l i ◃ Grow l i L K ) L K zero-initialized weights at random in W i 8 9 end for

  7. Closed gap between post-training compression and direct training of sparse nets WRN-28-2 on CIFAR10 Resnet-50 on Imagenet Global sparsity Sparsity (# Param) 0.8 (7.3M) 0.9 (5.1M) 0.0 (25.6M) 0.9 0.8 0.7 0.6 0.5 95 72.4 90.9 70.7 89.9 Thin dense [-2.5] [-1.5] [-4.2] [-2.5] 71.6 90.4 67.8 88.4 Test accuracy% 94 Static sparse [-3.3] [-2.0] [-7.1] [-4.0] 71.7 90.6 70.2 90.0 DeepR 93 (Bellec et al., 2017) [-3.2] [-1.8] [-4.7] [-2.4] 74.9 92.4 72.6 91.2 70.4 90.1 SET [0.0] [0.0] (Mocanu et al., 2018) [-2.3] [-1.2] [-4.5] [-2.3] Full dense 92 Dynamic sparse 73.3 92.4 71.6 90.5 Compressed sparse (Ours) [ -1.6 ] [ 0.0 ] [ -3.3 ] [ -1.9 ] Thin dense Static sparse DeepR SET Dynamic sparse 73.2 91.5 70.3 90.0 Compressed sparse (Zhu & Gupta, 2017) [-1.7] [-0.9] [-4.6] [-2.4] 161 306 451 596 741 Number of parameters (K)

  8. Directly trained sparse nets are not “winning tickets”: exploration of structural degrees of freedom is crucial

  9. Visit our poster : 
 Wednesday, Pacific Ballroom #248

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend