random matrix theory proves that deep learning
play

Random Matrix Theory Proves that Deep Learning Representations of - PowerPoint PPT Presentation

Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures ICML 2020 MEA. Seddik 12 , C.Louart 13 , M. Tamaazousti 1 , R. Couillet 23 1 CEA List, France 2 CentraleSuplec, L2S, France 3 GIPSA Lab


  1. Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures ICML 2020 MEA. Seddik 12 ∗ , C.Louart 13 , M. Tamaazousti 1 , R. Couillet 23 1 CEA List, France 2 CentraleSupélec, L2S, France 3 GIPSA Lab Grenoble-Alpes University, France ∗ http://melaseddik.github.io/ June 8, 2020 1 / 17

  2. / 2/17 Abstract Context: ◮ Study of large Gram matrices of concentrated data. Motivation: ◮ Gram matrices are at the core of various ML algorithms. ◮ RMT predicts their performances under Gaussian assumptions on the data. ◮ BUT Real data are unlikely close to Gaussian vectors. Results: ◮ GAN data ( ≈ Real data) fall within the class of Concentrated vectors. ◮ Universality result: Only first and second order statistics of Concentrated data matter to describe the behavior of Gram matrices. 2 / 17

  3. Concentrated Vectors/ 3/17 Notion of Concentrated Vectors Definition (Concentrated Vectors) Given a normed space ( E , � · � E ) and q ∈ R , a random vector Z ∈ E is q -exponentially concentrated if for any 1- Lipschitz 1 function F : E → R , there exists C , c > 0 such that ∀ t > 0 , P {|F ( Z ) − E F ( Z ) | ≥ t } ≤ Ce − ( t / c ) q denoted − − − − − → Z ∈ E q ( c ) If c independent of dim( E ), we denote Z ∈ E q (1) Concentrated vectors enjoy: (P1 ) If X ∼ N ( 0 , I p ) then X ∈ E 2 (1) “Gaussian vectors are concentrated vectors” (P2 ) If X ∈ E q (1) and G is a λ G - Lipschitz map, then G ( X ) ∈ E q ( λ G ) “Concentrated vectors are stable through Lipschitz maps” 1 Reminder: F : E → F is λ F -Lipschitz if ∀ ( x , y ) ∈ E 2 : �F ( x ) − F ( y ) � F ≤ λ F � x − y � E . 3 / 17

  4. GAN Data: An Example of Concentrated Vectors/ 4/17 Why Concentrated Vectors? Figure: Images artificially generated using the BigGAN model [Brock et al , ICLR’19]. Real Data ≈ GAN Data = F L ◦ F L − 1 ◦ · · · ◦ F 1 ( Gaussian ) � �� � G where the F i ’s correspond to Fully Connected layers, Convolutional layers, Sub-sampling, Pooling and activation functions, residual connections or Batch Normalisation. ⇒ The F i ’s are essentially Lipschitz operations. 4 / 17

  5. GAN Data: An Example of Concentrated Vectors/ 5/17 Why Concentrated Vectors? ◮ Fully Connected Layers and Convolutional Layers are affine operations: F i ( x ) = W i x + b i , � W i u � p and �F i � lip = sup u � = 0 , for any p -norm. � u � p ◮ Pooling Layers and Activation Functions: Are 1-Lipschitz operations with respect to any p -norm (e.g., ReLU and Max-pooling). ◮ Residual Connections: F i ( x ) = x + F ( ℓ ) ◦ · · · ◦ F (1) ( x ) i i where the F ( j ) ’s are Lipschitz operations, thus F i is a Lipschitz operation with i Lipschitz constant bounded by 1 + � ℓ j =1 �F ( j ) � lip . i ◮ . . . By: (P1 ) If X ∼ N ( 0 , I p ) then X ∈ E 2 (1) (P2 ) If X ∈ E q (1) and G is a λ G - Lipschitz map, then G ( X ) ∈ E q ( λ G ) ⇒ GAN data are concentrated vectors by design. Remark: Still we need to control λ G . 5 / 17

  6. GAN Data: An Example of Concentrated Vectors/ 6/17 Control of λ G with Spectral Normalization Let σ ∗ > 0 and G be a neural network composed of N affine layers, each one of input dimension d i − 1 and output dimension d i for i ∈ [ N ], with 1-Lipschitz activation functions. Consider the following dynamics with learning rate η : W ← W − η E , with E i , j ∼ N (0 , 1) W ← W − max(0 , σ 1 ( W ) − σ ∗ ) u 1 ( W ) v 1 ( W ) ⊺ . The Lipschitz constant of G is bounded at convergence with high probability as: N � � � � σ 2 ∗ + η 2 d i d i − 1 λ G ≤ ε + . i =1 6 Largest singular value σ 1 Without SN 5 With SN Theoretical bound σ ∗ = 4 4 σ ∗ = 3 3 σ ∗ = 2 2 1 0 200 400 600 800 1 , 000 Iterations Figure: Parameters N = 1, d 0 = d 1 = 100 and η = 1 / d 0 . 6 / 17

  7. GAN Data: An Example of Concentrated Vectors/ 7/17 Model & Assumptions (A1) Data matrix (distributed in k classes C 1 , C 2 , . . . , C k ):      ∈ R p × n X =  x 1 , . . . , x n 1 , x n 1 +1 , . . . , x n 2 , . . . , x n − n k +1 , . . . , x n � �� � � �� � � �� � ∈E q 1 (1) ∈E q 2 (1) ∈E qk (1) Model statistics: µ ℓ = E x i ∈C ℓ [ x i ] , C ℓ = E x i ∈C ℓ [ x i x ⊺ i ] (A2) Growth rate assumptions: As p → ∞ , 1. p / n → c ∈ (0 , ∞ ). 2. The number of classers k is bounded. 3. For any ℓ ∈ [ k ], � µ ℓ � = O ( √ p ). Gram matrix and its resolvent: G = 1 Q ( z ) = ( G + z I n ) − 1 p X ⊺ X , � m L ( z ) = 1 UU ⊺ = − 1 n tr ( Q ( − z )) , Q ( − z ) dz 2 π i γ 7 / 17

  8. Behavior of the Gram Matrix for Concentrated Vectors/ 8/17 Main Result Theorem Under Assumptions (A1) and (A2) , we have Q ( z ) ∈ E q ( p − 1 2 ) . Furthermore, �� � log p Q ( z ) = 1 z Λ ( z ) + 1 � Q ( z ) � � E [ Q ( z )] − ˜ � = O where ˜ p z J Ω( z ) J ⊺ p � � k 1 n ℓ ℓ ˜ R ( z ) µ ℓ } k and Ω( z ) = diag { µ ⊺ with Λ ( z ) = diag 1+ δ ℓ ( z ) ℓ =1 ℓ =1 � � − 1 k � 1 C ℓ ˜ R ( z ) = 1 + δ ℓ ( z ) + z I p k ℓ =1 with δ ( z ) = [ δ 1 ( z ) , . . . , δ k ( z )] is the unique fixed point of the system of equations � � � − 1 � k 1 � C j δ ℓ ( z ) = tr C ℓ 1 + δ j ( z ) + z I p for each ℓ ∈ [ k ] . k j =1 8 / 17

  9. Behavior of the Gram Matrix for Concentrated Vectors/ 9/17 Main Result Theorem Under Assumptions (A1) and (A2) , we have Q ( z ) ∈ E q ( p − 1 2 ) . Furthermore, �� � � Q ( z ) � log p Q ( z ) = 1 z Λ ( z ) + 1 � E [ Q ( z )] − ˜ � = O where ˜ p z J Ω( z ) J ⊺ p � � k 1 n ℓ and Ω( z ) = diag { µ ℓ ⊺ ˜ R ( z ) µ ℓ } k with Λ ( z ) = diag 1+ δ ℓ ( z ) ℓ =1 ℓ =1 � � − 1 k � 1 C ℓ ˜ R ( z ) = 1 + δ ℓ ( z ) + z I p k ℓ =1 with δ ( z ) = [ δ 1 ( z ) , . . . , δ k ( z )] is the unique fixed point of the system of equations � � � − 1 � k � 1 C j for each ℓ ∈ [ k ] . δ ℓ ( z ) = tr C ℓ 1 + δ j ( z ) + z I p k j =1 Key Observation: Only first and second order statistics matter! 9 / 17

  10. Application to CNN Representations of GAN Images/ 10/17 Application to CNN Representations of GAN Images Generator Discriminator Real / Fake Lipschitz operation Representation Network Concentrated Vectors Lipschitz operation ◮ CNN representations correspond to the penultimate layer. ◮ Popular architectures considered in practice are: Resnet, VGG, Densenet . 10 / 17

  11. Application to CNN Representations of GAN Images/ 11/17 Application to CNN Representations of GAN Images GAN Images Real Images Figure: k = 3 classes, n = 3000 images. 11 / 17

  12. Application to CNN Representations of GAN Images/ 12/17 Application to CNN Representations of GAN Images GAN Images Real Images 12 / 17

  13. Application to CNN Representations of GAN Images/ 13/17 Application to CNN Representations of GAN Images GAN Images Real Images 13 / 17

  14. Application to CNN Representations of GAN Images/ 14/17 Application to CNN Representations of GAN Images GAN Images Real Images 14 / 17

  15. Application to CNN Representations of GAN Images/ 15/17 Performance of a linear SVM classifier GAN Images 15 / 17

  16. Application to CNN Representations of GAN Images/ 16/17 Performance of a linear SVM classifier Real Images 16 / 17

  17. Application to CNN Representations of GAN Images/ 17/17 Take away messages ◮ Concentrated Vectors seem appropriate for realistic data modelling. ◮ Universality of linear classifiers regardless of the data distribution. ◮ RMT can anticipate the performances of standard classifiers for DL representations of GAN images. ◮ Universality supports the Gaussianity assumption on the data representations as considered in the literature, e.g., the FID metric � � 1 d 2 (( µ , C ) , ( µ w , C w )) = � µ − µ w � 2 + tr C + C w − 2( CC w ) . 2 17 / 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend