towards dependable steganalysis
play

Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a - PowerPoint PPT Presentation

Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a Cisco systems, Inc., Cognitive Research Team in Prague, CZ b Department of Computer Science, University of Oxford, UK c Department of Computers, CVUT in Prague, CZ 10th


  1. Towards dependable steganalysis Tomáš Pevný a , c , Andrew D. Ker b a Cisco systems, Inc., Cognitive Research Team in Prague, CZ b Department of Computer Science, University of Oxford, UK c Department of Computers, CVUT in Prague, CZ 10th February 2015 SPIE/IS&T Electronic Imaging

  2. Motivation 1 0 . 8 Detection accuracy 0 . 6 0 . 4 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate

  3. Motivation 1 0 . 8 Detection accuracy 0 . 6 0 . 4 0 . 2 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate

  4. Millions of images ◮ In 2014, Yahoo! released 100 million CC Flickr images. ◮ Selected images with quality factor 80 and known camera, split into two sets: Training & 449 395 cover 449 395 stego from 4781 users validation Testing 4 062 128 cover 407 417 stego from 43026 users ◮ Stego images: nsF5 at 0.5 bits per nonzero coefficient. ◮ JRM features computed from every image.

  5. Motivation What is a good benchmark? ◮ Equal prior error rate? ◮ Emphasizing false positives? Our error measure (FP-50) False positive rate at 50% detection accuracy.

  6. Motivation What is a good benchmark? ◮ Equal prior error rate? ◮ Emphasizing false positives? Our error measure (FP-50) False positive rate at 50% detection accuracy.

  7. Mathematical formulation Exact optimization criterion � �� � arg min I f ( x ) > median { f ( y ) | y ∼ stego } f ∈ F E x ∼ cover ◮ I ( · ) is the indicator function ◮ F set of classifiers Simplifications ◮ Restrict F to linear classifiers. � �� ◮ argmin f ∈ F E x ∼ cover � I f ( x ) > E y ∼ stego [ f ( y )]

  8. Mathematical formulation Exact optimization criterion � �� � arg min I f ( x ) > median { f ( y ) | y ∼ stego } f ∈ F E x ∼ cover ◮ I ( · ) is the indicator function ◮ F set of classifiers Simplifications ◮ Restrict F to linear classifiers. � �� ◮ argmin f ∈ F E x ∼ cover � I f ( x ) > E y ∼ stego [ f ( y )]

  9. Approximation by square loss I square 8 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion � 2 + λ � w � 2 w T ( x − ¯ ∑ � argmin y ) w x cover

  10. Approximation by hinge loss I 3 hinge 2 loss 1 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion 0 , w T ( x − ¯ + λ � w � 2 ∑ � � argmin max y − 1 ) w x cover

  11. Approximation by exponential loss 8 I exp 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion e ( w T ( x − ¯ y ) ) + λ � w � 2 ∑ argmin w x cover

  12. Toy example Banana Set Banana Set 8 8 6 6 4 4 2 2 Feature 2 Feature 2 0 0 -2 -2 -4 -4 -6 -6 -8 -8 -10 -10 -10 -5 0 5 10 -10 -5 0 5 10 Feature 1 Feature 1 Fisher linear discriminant Optimizing exponential loss

  13. Linear classifiers on JRM features ◮ 22510 features ◮ 2 x 40 000 training images ◮ 2 x 250 000 validation images weighted SVM ∗ FP-50 FLD Square loss Exponential loss 1 . 11 · 10 − 4 2 . 18 · 10 − 5 1 . 45 · 10 − 5 training set 0 2 . 52 · 10 − 4 1 . 99 · 10 − 4 5 . 61 · 10 − 4 9 . 87 · 10 − 4 validation set ∗ argmin w η E x ∼ cover max { 0 , w T x } +( 1 − η ) E y ∼ stego max { 0 , − w T y } + λ � w � 2

  14. Optimizing an ensemble Ensembles based on random subspaces à la Kodovský: ◮ L base learners, ◮ Each trained on random d sub features, and all data. Two thresholds: ◮ base learner threshold: optimize equal prior accuracy ◮ Neyman-Pearson criterion (identical FP rate) ◮ voting threshold: majority vote ◮ arbitrary threshold

  15. Optimizing an ensemble Ensembles based on random subspaces à la Kodovský: ◮ L base learners, ◮ Each trained on random d sub features, and all data. Two thresholds: ◮ base learner threshold: optimize equal prior accuracy ◮ Neyman-Pearson criterion (identical FP rate) ◮ voting threshold: majority vote ◮ arbitrary threshold

  16. Optimizing an ensemble Ensembles based on random subspaces à la Kodovský: ◮ L base learners, ◮ Each trained on random d sub features, and all data. Two thresholds: ◮ base learner threshold: optimize equal prior accuracy ◮ Neyman-Pearson criterion (identical FP rate) ◮ voting threshold: majority vote ◮ arbitrary threshold

  17. ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 1000

  18. ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 500

  19. ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 250

  20. ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 100

  21. ROC of ensembles 1 ◮ 4.5M image testing set: 0 . 8 Detection accuracy ◮ False negative rate 51.2% ◮ False positive rate 5 . 56 · 10 − 5 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 100

  22. Errors on testing set Base learner Thresholds False negative rate False positive rate 1 . 33 · 10 − 3 9 . 07 · 10 − 3 FLD Traditional 4 . 58 · 10 − 1 3 . 26 · 10 − 4 FLD Proposed 5 . 12 · 10 − 1 5 . 56 · 10 − 5 Exponential loss Proposed

  23. Summary ◮ Classifiers derived from the FP-50 measure. ◮ Can derive same classifiers in two different ways. ◮ Various convex surrogates for step function: ◮ Non-smooth loss is difficult to optimize. ◮ Exponential loss encourages over-fitting. ◮ Square loss (FLD) has a hidden weakness. ◮ Ensemble subdimension is an indirect regularizer. ◮ Ensemble thresholds need to be optimized differently.

  24. Summary Banana Set 20 15 10 5 Feature 2 0 -5 -10 -15 -20 -20 -15 -10 -5 0 5 10 15 20 Feature 1

  25. Summary I square 8 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane

  26. Summary ◮ We detected lousy, very high-bit rate, steganography with 1 in 18000 false positive rate.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend