confidence calibrated adversarial training
play

Confidence-Calibrated Adversarial Training Generalizing to Unseen - PowerPoint PPT Presentation

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias Hein, Bernt Schiele 2-Minute Overview Problem: Robustness to various adversarial examples. Adversarial training on L adversarial examples:


  1. Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias Hein, Bernt Schiele

  2. 2-Minute Overview Problem: Robustness to various adversarial examples. Adversarial training on L ∞ adversarial examples: training ǫ = 0 . 03 1 Confidence SVHN: 0 . 8 Correct 0 . 6 Adversarial robust 0 . 4 ≤ ǫ (seen) 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  3. 2-Minute Overview Problem: Robustness to various adversarial examples. Adversarial training on L ∞ adversarial examples: training ǫ = 0 . 03 1 Confidence SVHN: 0 . 8 Correct 0 . 6 Adversarial robust not robust 0 . 4 ≤ ǫ (seen) > ǫ (unseen) 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  4. 2-Minute Overview Problem: Robustness to various adversarial examples. Adversarial training on L ∞ adversarial examples: 1 Confidence SVHN: 0 . 8 not robust Correct 0 . 6 L 2 attack Adversarial 0 . 4 (unseen) 0 . 2 0 0 0 . 5 1 1 . 5 2 L 2 Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  5. 2-Minute Overview Summary of adversarial training: training ǫ = 0 . 03 1 1 Confidence Confidence 0 . 8 0 . 8 not robust 0 . 6 0 . 6 L 2 attack robust not robust 0 . 4 0 . 4 (unseen) ≤ ǫ (seen) > ǫ (unseen) 0 . 2 0 . 2 0 0 0 0 . 5 1 1 . 5 2 0 0.01 0.03 0.05 L ∞ Perturbation L 2 Perturbation in Adversarial Direction in Adversarial Direction ◮ High-confidence on adversarial examples ( ≤ ǫ ). ◮ No generalization to larger/other L p perturbations. ◮ Behavior not meaningful for arbitrarily large ǫ . Confidence-Calibrated Adversarial Training – David Stutz

  6. 2-Minute Overview Confidence-calibrated adversarial training ( L ∞ only ): training ǫ = 0 . 03 1 Confidence SVHN: 0 . 8 ≤ ǫ seen Correct 0 . 6 Adversarial 0 . 4 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  7. 2-Minute Overview Confidence-calibrated adversarial training ( L ∞ only ): training ǫ =0 . 03 1 Confidence SVHN: 0 . 8 ≤ ǫ seen > ǫ unseen Correct 0 . 6 Adversarial confidence threshold 0 . 4 robust by rejecting 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  8. 2-Minute Overview Confidence-calibrated adversarial training ( L ∞ only ): 1 Confidence SVHN: 0 . 8 unseen L 2 attack Correct 0 . 6 confidence threshold Adversarial 0 . 4 robust by rejecting 0 . 2 0 0 0 . 5 1 1 . 5 2 L 2 Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  9. 2-Minute Overview Adversarial training: training ǫ = 0 . 03 1 ◮ High-confidence on adversarial examples. Confidence 0 . 8 ◮ No robustness to unseen perturbations. 0 . 6 robust not robust 0 . 4 ≤ ǫ (seen) > ǫ (unseen) 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation Confidence-calibrated adversarial training: training ǫ =0 . 03 1 ◮ Low-confidence on adversarial examples. Confidence 0 . 8 ≤ ǫ seen > ǫ unseen ◮ Robustness to unseen perturbations 0 . 6 confidence threshold 0 . 4 by confidence thresholding. robust by rejecting 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation Confidence-Calibrated Adversarial Training – David Stutz

  10. Interested? More details: Paper & code: davidstutz.de/ccat Contact: david.stutz@mpi-inf.mpg.de Confidence-Calibrated Adversarial Training – David Stutz

  11. Interested? More details: Paper & code: davidstutz.de/ccat Contact: david.stutz@mpi-inf.mpg.de Outline: 1. Problems of adversarial training 2. Confidence-calibrated adversarial training 3. Confidence-thresholded robust test error 4. Results on SVHN and CIFAR10 Confidence-Calibrated Adversarial Training – David Stutz

  12. Problems of Adversarial Training Min-max formulation: classifier � � min � δ � ∞ ≤ ǫ L ( f ( x + δ ; w ) , y ) max . w E p ( x,y ) minimizing cross-entropy yields high-confidence Confidence-Calibrated Adversarial Training – David Stutz

  13. Problems of Adversarial Training Min-max formulation: classifier � � min � δ � ∞ ≤ ǫ L ( f ( x + δ ; w ) , y ) max . w E p ( x,y ) minimizing cross-entropy yields high-confidence training ǫ = 0 . 03 1 1 Confidence Confidence 0 . 8 0 . 8 not robust 0 . 6 0 . 6 L 2 attack robust not robust 0 . 4 0 . 4 (unseen) ≤ ǫ (seen) > ǫ (unseen) 0 . 2 0 . 2 0 0 0 0 . 5 1 1 . 5 2 0 0.01 0.03 0.05 L 2 Perturbation L ∞ Perturbation in Adversarial Direction in Adversarial Direction ◮ Robustness does not generalize to unseen attacks. Confidence-Calibrated Adversarial Training – David Stutz

  14. Confidence-Calibrated Adversarial Training 1 Transition to uniform distribution on adversarial examples within the ǫ -ball: 1 training ǫ = 0 . 03 training ǫ = 0 . 03 Confidence 0 . 8 0 . 6 0 . 4 0 . 2 0 − 0 . 04 − 0 . 03 − 0 . 02 − 0 . 01 0 0 . 01 0 . 02 0 . 03 0 . 04 L ∞ Perturbation in (Adversarial) Direction ◮ Low-confidence extrapolated beyond ǫ -ball. Confidence-Calibrated Adversarial Training – David Stutz

  15. Confidence-Calibrated Adversarial Training 1 Transition to low confidence on adversarial examples within the ǫ -ball. 2 Reject low-confidence (adversarial) examples via confidence-thresholding: training ǫ =0 . 03 1 Confidence 0 . 6 CCAT 0 . 8 0 . 6 0 . 4 ← reject confidence threshold 0 . 4 0 . 2 reject 0 . 2 0 0 0 0 . 01 0 . 02 0 . 03 0 . 04 0 0 . 2 0 . 4 0 . 6 0 . 8 1 L ∞ Perturbation Confidence on Adversarial Examples Confidence-Calibrated Adversarial Training – David Stutz

  16. 1 Transition to Low Confidence 1. Compute high-confidence adversarial examples: ˜ δ = max � δ � ∞ ≤ ǫ max k � = y f k ( x + δ ; w ) confidence of class k 2. Impose target distribution via cross-entropy loss: y = λ one_hot ( y ) + (1 − λ ) 1 / K ˜ 1 Distribution ˜ y transition 0 . 8 Target 0 . 6 λ = (1 − min(1 , � δ � ∞ / ǫ )) ρ 0 . 4 completely uniform 0 . 2 0 0 0 . 01 0 . 02 0 . 03 L ∞ Perturbation ( � δ � ∞ ) Confidence-Calibrated Adversarial Training – David Stutz

  17. 1 Transition to Low Confidence 1. Compute high-confidence adversarial examples: ˜ δ = max � δ � ∞ ≤ ǫ max k � = y f k ( x + δ ; w ) confidence of class k 2. Impose target distribution via cross-entropy loss: y = λ one_hot ( y ) + (1 − λ ) 1 / K ˜ 1 Distribution ˜ y transition 0 . 8 Target 0 . 6 λ = (1 − min(1 , � δ � ∞ / ǫ )) ρ 0 . 4 completely uniform 0 . 2 0 0 0 . 01 0 . 02 0 . 03 L ∞ Perturbation ( � δ � ∞ ) Confidence-Calibrated Adversarial Training – David Stutz

  18. 2 Robustness by Confidence Thresholding training ǫ = 0 . 03 1 Confidence SVHN: 0 . 8 ≤ ǫ seen Correct 0 . 6 Adversarial 0 . 4 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  19. 2 Robustness by Confidence Thresholding training ǫ =0 . 03 1 Confidence SVHN: 0 . 8 ≤ ǫ seen > ǫ unseen Correct 0 . 6 Adversarial confidence threshold 0 . 4 robust by rejecting 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  20. 2 Robustness by Confidence Thresholding 1 Confidence SVHN: 0 . 8 unseen L 2 attack Correct 0 . 6 Adversarial confidence threshold 0 . 4 robust by rejecting 0 . 2 0 0 0 . 5 1 1 . 5 2 L 2 Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  21. 2 Meaningful Extrapolation of Confidence Adversarial training: Confidence 1 = x ′ x = 0 . 8 0 . 6 0 . 4 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Confidence-calibrated adversarial training: Confidence 1 = x ′ x = 0 . 8 0 . 6 0 . 4 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Interpolation Factor κ Confidence-Calibrated Adversarial Training – David Stutz

  22. Summary: Generalizable Robustness Confidence-calibrated adversarial training: 1 Transition: low confidence on adversarial examples. 2 Reject low-confidence (adversarial) examples. training ǫ =0 . 03 1 1 Confidence Confidence 0 . 8 unseen L 2 attack 0 . 8 ≤ ǫ seen > ǫ unseen 0 . 6 0 . 6 confidence threshold confidence threshold 0 . 4 0 . 4 robust by rejecting robust by rejecting 0 . 2 0 . 2 0 0 0 0 . 5 1 1 . 5 2 0 0.01 0.03 0.05 L 2 Perturbation L ∞ Perturbation in Adversarial Direction in Adversarial Direction ◮ Robustness to previously unseen perturbations. Confidence-Calibrated Adversarial Training – David Stutz

  23. “Standard” Robust Test Error RErr = error on test examples that are “attacked”. Adversarial Training (AT): Ours (CCAT): 57.3% RErr 97.8% RErr Confidence-Calibrated Adversarial Training – David Stutz

  24. “Standard” Robust Test Error RErr = error on test examples that are “attacked”. Adversarial Training (AT): Ours (CCAT): 57.3% RErr 97.8% RErr 0 . 6 0 . 6 AT CCAT 0 . 4 0 . 4 Total: 539/1000 Total: 949/1000 0 . 2 0 . 2 0 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Confidence on Confidence on Adversarial Examples Adversarial Examples Confidence-Calibrated Adversarial Training – David Stutz

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend