Perceptual Ad-Blocking: Meet Adversarial Machine Learning
Florian Tramèr Palo Alto Networks February 22nd 2019 Joint work with Pascal Dupré, Gili Rusak, Giancarlo Pellegrino and Dan Boneh
Perceptual Ad-Blocking: Meet Adversarial Machine Learning Florian - - PowerPoint PPT Presentation
Perceptual Ad-Blocking: Meet Adversarial Machine Learning Florian Tramr Palo Alto Networks February 22nd 2019 Joint work with Pascal Dupr, Gili Rusak, Giancarlo Pellegrino and Dan Boneh The Future of Ad-Blocking? easylist.txt
Florian Tramèr Palo Alto Networks February 22nd 2019 Joint work with Pascal Dupré, Gili Rusak, Giancarlo Pellegrino and Dan Boneh
2
easylist.txt …markup… …URLs…
This is an ad
> Legal requirement (U.S. FTC, EU E-Commerce) > Industry self-regulation on ad-disclosure
> Visually detects ad-disclosures > Traditional Computer Vision techniques > Simplified version implementable in Adblock Plus
> Locates ads in Facebook screenshots using neural networks > Not yet deployed
3
> Visually detects ad-disclosures > Traditional Computer Vision techniques
> Locates ads in Facebook screenshots using neural networks > Not yet deployed
4
5
Jerry uploads malicious content … … so that Tom’s post gets blocked
6
7
8
https://www.example.com Ad Disclosure
Data Collection and Training Page Segmentation Action Classifier Classifier
Ad
Classification
Ø Element-based (e.g., find all <img> tags) [Storey et al. 2017] Ø Frame-based (segment rendered webpage into “frames”) Ø Page-based (unsegmented screenshots à-la-Sentinel) Template matching, OCR, DNNs, Object detector networks
§ Sentinel is not yet deployed so we rolled our own § Let’s aim bigger than just Facebook!
(data collection on FB is a pain / privacy issue anyways...)
§ We trained an object detector neural network (YOLO-v3)
> Use filter lists to create a labelled dataset for training > Crop & replace ads for data augmentation (increase data diversity)
9
10
11
12
> Training ⟹ “tweak model parameters such that !( ) = %&'(&” > Attacking ⟹ “tweak input pixels such that !( ) = )*++,'”
13
Szegedy et al., 2014 Goodfellow et al., 2015
⁄ 2 255
14
15
> Attack must be implemented in HTML
> E.g., publishers can’t modify or know contents of ad frames
16 <div> <p> <img> <div> <img>
Classifier
> Abilities: Inspect ad-blocker classifier(s) offline Change page DOM, CSS, JavaScript... Cannot modify content of ad frames
> Abilities: Inspect ad-blocker classifier(s) offline Arbitrary changes to content of ad frames
17
18
Use HTML tiling to minimize perturbation size (20 KB)
Ø 100% success rate on 20 webpages not used to create the overlay Ø The attack is universal: the overlay is computed once and works for all (or most) websites
19
Original
> Creating a single perturbation that works for every ad on every website is hard > Target a specific domain
Ø 100% success rate for ads served on BBC.com Ø No CSS: Ad image is directly perturbed on the server Ø The perturbation is universal: It works for all ads (on this domain)
With adversarial ad
Alternative attack Ø Publisher perturbs background below ad frame Ø 100% success in evading ads
20
> Detect ad-blocking in client-side JavaScript or on server > Applicability of these attacks depends on ad-blocker type
> Abilities: Inspect ad-blocker classifier(s) offline Change page DOM, CSS Use client-side JavaScript to detect DOM changes
21
§ Publisher adds honeypot in page-region with fixed layout
> E.g., page header
With honeypot header
22
… so that Tom’s post gets blocked Jerry uploads malicious content …
What happened?
Ø Object detector model generates box predictions from full page inputs Ø Content from one user can affect predictions anywhere on page Ø Model’s segmentation is not aligned with web-security boundaries
§ Ad-block evasion & detection is a well-known arms race. But there’s more!
23
Ø ! has white-box access to ad-blocker Ø ! can exploit False Negatives and False Positives in classification pipeline Ø ! prepares attacks offline ó Ø ! can take part in crowd-sourced data collection
24
https://www.example.com Ad Disclosure
Data Collection and Training Page Segmentation Action Classifier Classifier
Ad
Classification
The ad-blocker must defend against attacks in real-time in the user’s browser Data Poisoning Ø DOM Obfuscation Ø Resource Exhaustion Adversarial Examples Privilege Abuse
§ Attacks are easy if ! has access to the ML model
> Hide model from adversary?
§ Obfuscate the ad-blocker?
> It isn’t hard to create adversarial examples for black-box classifiers
§ Randomize the ad-blocker?
> Deploy different models
> Randomly change page before classifying
25
(1) Page Segme
(3) Action
> Or train on adversarial examples proactively
> New arms-race: ! finds new attacks and ad-blocker re-trains > Mounting a new attack is much easier than updating the model > On-going research: so far ! always wins!
26
1600 citations, 800 in 2018! Broke 7 defenses, a few days after they were accepted for publication
> Simpler computer vision problem than full-page ad-detection > Light-weight and mature techniques (OCR, perceptual hashing, SIFT)
27
> Resisting adversarial examples is one of the most challenging open problems in ML security
> Evasion & detection with adversarial examples > Privilege abuse attacks from arbitrary content providers > Similar threats for other ML-based ad-blockers (e.g., AdGraph?)
28
http://arxiv.org/abs/1811.03194 https://github.com/ftramer/ad-versarial Ø Train a page-based ad-blocker Ø Download pre-trained models Ø Attack demos