lightweight neural networks from pca lda based distilled
play

Lightweight Neural Networks from PCA & LDA Based Distilled Dense - PowerPoint PPT Presentation

Lightweight Neural Networks from PCA & LDA Based Distilled Dense Neural Networks ICIP 2020 MEA. Seddik 1 , 2 , , H. Essafi 1 , A. Benzine 1 , 3 , M. Tamaazousti 1 1 CEA List, France 2 CentraleSuplec, L2S, France 3 Sorbonne University,


  1. Lightweight Neural Networks from PCA & LDA Based Distilled Dense Neural Networks ICIP 2020 MEA. Seddik 1 , 2 , ∗ , H. Essafi 1 , A. Benzine 1 , 3 , M. Tamaazousti 1 1 CEA List, France 2 CentraleSupélec, L2S, France 3 Sorbonne University, CNRS, France ∗ http://melaseddik.github.io/ August 21, 2020 1 / 5

  2. / 2/5 Abstract Context: ◮ Compression of dense neural networks with the teacher-student approach. Motivation: ◮ Build lightweight neural networks that can fit into edge and IoT devices with limited resources (memory and computation). Proposed methods: ◮ We proposed two methods which rely on dimension reduction techniques (PCA and LDA). ◮ The dimension reduction is applied at each layer of the teacher net and then mapped to the layers of the student net using a multi-task loss function. 2 / 5

  3. / 3/5 Setting Given a Teacher Network (TN) trained on a dataset D with loss L TN � h (0) = x ∈ R p 0 (TN) : � W ( ℓ ) h ( ℓ − 1) + b ( ℓ ) � ∀ ℓ ∈ [ L ] h ( ℓ ) = f ℓ ∈ R p ℓ Construct a Student Network (SN) to train on D � h (0) = x ∈ R p 0 ˜ � b ( ℓ ) � (SN) : ∀ ℓ ∈ [ L ] ˜ h ( ℓ ) = f ℓ W ( ℓ )˜ ˜ h ( ℓ − 1) + ˜ ∈ R k ℓ Such that k ℓ ≪ p ℓ & Performance (SN) � Performance (TN) 3 / 5

  4. / 4/5 Proposed Methods (Net-PCAD & Net-LDAD) Given (TN) , a data matrix X and (TN) loss function L TN For each layer ℓ : 1. Extract the representations H ℓ of X from (TN) 2. Compute a projection matrix U ℓ ∈ R p ℓ × k ℓ through PCA or LDA on H ℓ Train (SN) as a multi-task 1 problem with L − 1 � ℓ h ( ℓ ) � � h ( ℓ ) , U ⊺ ˜ L SN = e − σ L TN + σ e − σ ℓ L mse + + σ ℓ � �� � ℓ =1 Learning Task � �� � (SN) Hidden Layers Task where σ and { σ ℓ } L − 1 ℓ =1 are learnable parameters. 1 Using the Homoscedastic loss function: A. Kendall et al. “Multitask learning using uncertainty to weigh losses for scene geometry and semantics” in Proceedings of IEEE CVPR, 2018. 4 / 5

  5. / 5/5 Experimental Setting & Results Layer (TN) (SN) Dense 1 p 0 × 1024 p 0 × k Dense 2 1024 × 512 k × k Dense 3 512 × 256 k × k Dense 4 256 × 10 k × 10 Table: Networks architectures. (SN) Datasets (TN) k = 50 100 200 MNIST 2 . 23s 0 . 38s 0 . 45s 0 . 65s 98% 97% 97 . 5% 97 . 8% FASHION 2 . 23s 0 . 38s 0 . 45s 0 . 65s 88% 87 . 5% 88 . 5% 88 . 5% CIFAR10 4 . 63s 0 . 75s 0 . 92s 1 . 35s 45% 50% 50 . 1% 50 . 3% Table: Networks performances. ⇒ k ℓ ≪ p ℓ & Performance (SN) � Performance (TN) 5 / 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend