adversarial learning bounds for linear classes and neural
play

Adversarial Learning Bounds for Linear Classes and Neural Nets - PowerPoint PPT Presentation

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning through Rademacher Complexity Pranjal Awasthi, Natalie Frank, Mehryar Mohri Google Research & Courant Institute August 14, 2020 1 / 17


  1. Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning through Rademacher Complexity Pranjal Awasthi, Natalie Frank, Mehryar Mohri Google Research & Courant Institute August 14, 2020 1 / 17

  2. Adversarial Attacks Figure: Imperceptible adversarial perturbations in ℓ 2 . [5] 2 / 17

  3. Adversarial Robustness Figure: A sparse perturbation. [1] Overarching Goal: Train classifiers robust to adversarial perturbations. ◮ Examples in many areas of applications. ◮ Different possible forms of perturbations: changing every pixel in an image vs. placing a sticker on a stop sign. ◮ Can we derive learning guarantees for adversarial robustness? 3 / 17

  4. Outline of Talk Goal of our paper: Understand what characterizes robust generalization and how it relates to non-robust generalization 1. Classification & Adversarial Classification setup 2. Rademacher complexity & Adversarial Rademacher Complexity 3. Better bounds for adversarial Rademacher complexity of linear classes 4. Better bounds for Rademacher complexity of linear classes 5. Adversarial Rademacher complexity of neural nets 4 / 17

  5. Standard Classification Setting Binary Classification: Data distributed over R d × {− 1 , +1 } according to D Standard Setting: ◮ Given a predictor h : R d → R , a point x is classified as sign( h ( x )). ◮ There is an error if yh ( x ) < 0, or 1 yh ( x ) < 0 = 1. ◮ The classification error is then R ( h ) = ( x , y ) ∼D [ 1 yh ( x ) < 0 ] E 5 / 17

  6. Defining Adversarial Perturbations Adversarial Setting: ◮ The data is perturbed by ǫ in ℓ p to “fool” the classifier into thinking there is an error, now an error occurs if 1 = sup 1 yh ( x ′ ) < 0 = 1 inf � x − x ′� r ≤ ǫ yh ( x ′ ) < 0 � x − x ′ � r ≤ ǫ ◮ The adversarial classification error is then � R ( h ) = ( x , y ) ∼D [ 1 inf � x − x ′� r ≤ ǫ yh ( x ′ ) < 0 ] E 6 / 17

  7. Rademacher Complexity The empirical Rademacher complexity is m � � � 1 R S ( F ) = E sup σ i f ( z i ) m σ f ∈F i =1 ρ -Margin Loss: Φ ρ ( u ) = min(1 , max(0 , 1 − u ρ )) Theorem (Margin Bounds [4]) � 1 log 2 R S ,ρ ( h )+2 R ( h ) ≤ � δ ρ R S ( F )+3 2 m holds with probability at least 1 − δ for all h ∈ F . 0 1 7 / 17

  8. Adversarial Rademacher Complexity Theorem (Robust margin bounds) Define the class � F by � � � � x − x ′ � r ≤ ǫ yf ( x ′ ): f ∈ F F = ( x , y ) �→ inf . The following holds with probability at least 1 − δ for all h ∈ F : � log 2 R S ,ρ ( h ) + 2 R ( h ) ≤ � � ρ R S ( � δ F ) + 3 2 m . Definition We define the adversarial Rademacher Complexity as R S ( F ) := R S ( � � F ) 8 / 17

  9. Prior Work on Adversarial Rademacher Complexity of Linear Classes F p = { x �→ � w , x � : � w � p ≤ W } Yin et. al. [6]: For perturbations in the infinity norm, for some constant c 1 1 p ∗ p ∗ max( R S ( F p ) , c ǫ W d R S ( F p ) ≤ R S ( F p ) + ǫ W d √ m ) ≤ � √ m Khim and Loh [3]: For perturbation in the r -norm, there exists a constant M r for which R S ( F 2 ) ≤ W ( x i , y i ) ∈S � x i � 2 + ǫ M r ∗ √ m max 2 √ m 9 / 17

  10. Adversarial Rademacher Complexity of Linear Classes F p = { x �→ � w , x � : � w � p ≤ W } Theorem Let ǫ > 0 , r ≥ 1 . Consider a sample S = { ( x 1 , y 1 ) , . . . , ( x m , y m ) } with x i ∈ R d and y i ∈ {± 1 } and perturbations in the r-norm. Then � � R S ( F p ) , ǫ W max( d 1 − 1 r − 1 p , 1) ≤ � max √ R S ( F p ) 2 2 m ≤ R S ( F p ) + ǫ W 2 √ m max( d 1 − 1 r − 1 p , 1) 10 / 17

  11. Rademacher Complexity of Linear Classes F p = { x �→ � w , x � : � w � p ≤ W } X = [ x 1 . . . x m ] Group norms: � A � p , q = � ( � A 1 � p · · · � A m � p ) � q where A i is the i th row of A . Prior Work [2]: � � 2 log(2 d ) W � X � max if p = 1 R S ( F p ) ≤ m √ p ∗ − 1 � X � p ∗ , 2 W if 1 < p ≤ 2 m Our new bounds:  � W 2 log(2 d ) � X T � 2 , p ∗  if p = 1  m  � � 1  Γ( p ∗ +1 √ p ∗ ) R S ( F p ) ≤ 2 W 2 � X T � 2 , p ∗ if 1 < p ≤ 2 √ π  m    W m � X T � 2 , p ∗ if p ≥ 2 11 / 17

  12. Comparing the Bounds for 1 < p ≤ 2  √ p ∗ − 1 � X � p ∗ , 2  W old bound  m � � 1 Γ( p ∗ +1 R S ( F p ) ≤ √ p ∗ )  2 W 2 � X T � 2 , p ∗  new bound √ π m Comparing the Norms: If p ≤ 2, then 2 − 1 1 p ∗ � X T � 2 , p ∗ ≥ � X � p ∗ , 2 ≥ � X T � 2 , p ∗ min( m , d ) 5.5 Comparing the Constants: 5 4.5 � 4 p ∗ − 1 c 1 ( p ) = 3.5 3 � Γ( p ∗ +1 2.5 √ � 1 ) 2 2 p ∗ c 2 ( p ) = 2 √ π 1.5 1 0 5 10 15 20 25 12 / 17

  13. Adversarial Rademacher Complexity of the ReLU G p = { ( x , y ) �→ ( y � w , x � ) + : � w � p ≤ W , y ∈ {− 1 , 1 }} F p = { x �→ � w , x � : � w � p ≤ W } Theorem The adversarial Rademacher complexity of G p can be bounded as follows: W δǫ ǫ, s ∗ | max( d 1 − 1 p − 1 r , 1) ≤ � | T δ √ R S ( G p ) 2 2 m ≤ R T ǫ ( F p ) + ǫ W 2 √ m max(1 , d 1 − 1 r − 1 p ) , where T ǫ = { i : y i = − 1 or , y i = 1 and � x i � r > ǫ } T δ ǫ, s = { i : � s , x i � − (1 + δ y i ) y i ǫ � s � r ∗ > 0 } and s ∗ is the adversarial perturbation. 13 / 17

  14. Adversarial Rademacher Complexity of Neural Nets � � � n G n p = ( x , y ) �→ y u j ρ ( w j · x ): � u � 1 ≤ Λ , � w j � p ≤ W . j =1 Theorem Let ρ be a function with Lipschitz constant L ρ with ρ (0) = 0 . Then, the following upper bound holds for the adversarial Rademacher complexity of G n p : � W Λ max(1 , d 1 − 1 � p − 1 r )( � X � r , ∞ + ǫ ) � R S ( G n p ) ≤ L ρ √ m × � � � 1 + d ( n + 1) log(36) . 14 / 17

  15. Towards Dimension Independent Bounds ◮ Studying the structure of adversarial perturbations leads to equations qualitatively similar to γ -fat shattering. ◮ Under appropriate assumptions, this can lead to dimension independent bounds. 15 / 17

  16. Conclusion We covered ◮ New bounds for Rademacher complexity of linear classes. ◮ New bounds for adversarial Rademacher complexity of linear classes. ◮ New bounds for adversarial Rademacher complexity of Neural nets. Open problems ◮ Generalize to arbitrary norms: in general is the dual norm a good regularizer? ◮ Improve the adversarial neural nets generalization bound or find a matching lower bound. 16 / 17

  17. Bibliography [1] Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. CoRR , 2017. [2] Sham M. Kakade, Karthik Sridharan, and Ambuj Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Proceedings of NIPS , pages 793–800, 2008. [3] Justin Khim and Po-Ling Loh. Adversarial risk bounds via function transformation. arXiv preprint arXiv:1810.09519 , 2018. [4] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning . The MIT Press, second edition, 2018. [5] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Proceedings of ICLR , 2014. [6] Dong Yin, Kannan Ramchandran, and Peter L. Bartlett. Rademacher complexity for adversarially robust generalization. In Proceedings of ICML , pages 7085–7094, 2019. 17 / 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend