Retrospective Measurement and Analysis of Anti-Adblock Filter Lists - - PowerPoint PPT Presentation

retrospective measurement and
SMART_READER_LITE
LIVE PREVIEW

Retrospective Measurement and Analysis of Anti-Adblock Filter Lists - - PowerPoint PPT Presentation

The Ad Wars: Retrospective Measurement and Analysis of Anti-Adblock Filter Lists Umar r Iqbal* al* , Zubair Shafiq*, and Zhiyun Qian The University y of Iowa wa* University of California- Riverside Agenda The he Ad Ad Wars Online


slide-1
SLIDE 1

The Ad Wars: Retrospective Measurement and Analysis of Anti-Adblock Filter Lists

Umar r Iqbal* al*, Zubair Shafiq*, and Zhiyun Qian†

The University y of Iowa wa*

University of California-Riverside†

slide-2
SLIDE 2

Agenda

The he Ad Ad Wars

Online ads Adblocking Anti-Adblocking Anti-Anti-Adblocking

Contrib ntributions utions

Anti-Adblock filter list analysis Retrospective coverage analysis Detecting Anti-Adblock Scripts

Concl nclusion usion

2 Umar Iqbal

slide-3
SLIDE 3

Online Advertising

Advertising enables free content

Publishers show free content Earn revenue with ads

Problems with ads

Privacy Intrusive Malware Performance

Solution

Adblocking

3 Umar Iqbal

slide-4
SLIDE 4

Ad/Tracker Blocking Solutions

Ad/Tracker Blocking Browsers Trackers Blocking Extensions Adblocking Extensions

Privacy Badger & Ghostery Adblock Plus & Adblock Brave browser & Cliqz browser

4

Mainstream Ad/Tracker Blocking Browsers

Apple Safari & Google Chrome

Umar Iqbal

slide-5
SLIDE 5

How do Adblockers Work?

5

Client 3rd Party Server www.example.com 3rd Party Content Ads Ad Server HTTP Request HTTP Response 1st Party Content 3rd Party Content Block HTTP Requests Block HTML Elements

Crowdsou

  • urce

ced Filt lter Lis ists EasyList Disconnect.me

Umar Iqbal

slide-6
SLIDE 6

Publishers vs Adblockers

Acceptable Ads Program

Whitelisting fee Transparency concerns Enabled by default in major adblockers

Use of Anti-Adblockers

Insert bait elements Detect adblockers Prompt to disable adblockers/whitelist website

6 Umar Iqbal

slide-7
SLIDE 7

Anti-Anti-Adblocking

Block/allow bait HTTP requests Hide/allow bait HTML elements Use anti-adblocking filter lists

Anti-Adblock Killer EasyList

7 Umar Iqbal

slide-8
SLIDE 8

Agenda

The he Ad Ad Wars

Online ads Adblocking Anti-Adblocking Anti-Anti-Adblocking

Contrib ntributions utions

Anti-Adblock filter list analysis Retrospective coverage analysis Detecting Anti-Adblock Scripts

Concl nclusion usion

8 Umar Iqbal

slide-9
SLIDE 9

Filter List Rules

HTTP Request Filter Rules Domain anchor || || Domain tag do doma main= n= HTML Element Filter Rules Domain restriction Without domain restriction Exception Rules HTTP exception rules HTML exception rules

! Rule with domain anchor || || example.com ! Rule with domain tag /example.js $script, domain main = example.com ! Rule with domain restriction example.com###examplebanner ! Rule without domain restriction ###examplebanner ! Exception rule for HTTP request @@/example.js $script domain = example1.com ! Exception rule for HTML element example.com#@##examplebanner

9 Umar Iqbal

slide-10
SLIDE 10

Popular Filter Lists

Anti-Adblock Killer ( 2014 )

353 to 1,811 filter rules 6.2 filter rules for every revision

EasyList ( 2011 )

Anti-Adblock sections 67 to 1,317 filter rules 0.6 filter rules per day

Warning Removal List ( 2013 )

4 to 167 filter rules 0.2 filter rules per day

EasyList + Warning Removal List Combined ined EasyL yList st

10 Umar Iqbal

slide-11
SLIDE 11

Anti-Adblock Killer vs Combined EasyList

Number of domains

Anti-Adblock Killer 1,415 Combined EasyList 1,394 Common domains 282

Similar distribution of Alexa ranking Similar distribution for categories Exception vs Non-Exception domains

Combined EasyList 4:1 Anti-Adblock Killer 1:1

11

Different Strategies of Crafting Anti-Adblocking Rules

Domain ain Categor

  • riza

ization ion

Umar Iqbal

slide-12
SLIDE 12

Anti-Adblock Killer vs Combined EasyList

282 common domains

Prompt in adding new rules

12

64% appear r first st in Combin ined Easyli ylist 34% appear r first in Anti-Ad Adblock lock Killer

2% appear at the same time

Combined EasyList is More Prompt in Adding New Rules

Umar Iqbal

slide-13
SLIDE 13

Agenda

The he Ad Ad Wars

Online ads Adblocking Anti-Adblocking Anti-Anti-Adblocking

Contrib ntributions utions

Anti-Adblock filter list analysis Retrospective coverage analysis Detecting Anti-Adblock Scripts

Concl nclusion usion

13 Umar Iqbal

slide-14
SLIDE 14

The Internet Archive’s Wayback Machine

Archives web pages

279 billion webpages Archives webpage resources as well Used in prior literature [USENIX Security ‘16] API to retrieve content

Alexa top 5K websites

5 years (2011 – 2016)

Wayback Machine is incomplete!

robots.txt permissions Partial snapshots Outdated URLs Not archived URLs

14

Missin ing Snaphots

Umar Iqbal

slide-15
SLIDE 15

Analysis Workflow

T

  • p 5K Alexa

domains List of Wayback URLs with timestamps Data Repository Filter list matching 15

Remove not archived domains Request to the Wayback Machine JSON API Remove

  • utdated URLs

Request Wayback Machine URLs with Selenium Store requests/responses and HTML content Match crawled content with anti-adblock filter lists Remove partial snapshots

Umar Iqbal

slide-16
SLIDE 16

Anti-Adblock Filter Lists Coverage

HTTP matching HTML matching Use respective filter lists Anti-Adblock Killer filter list Combined EasyList filter list

16 Numbe mber of websi sites s that trigge gger HTTP rules Numbe mber of websi sites s that trigge gger HTML L rules

331 Websites 16 Websites 5 Websites 4 Websites

Anti-Adblock Killer Filter List Has Better Coverage

Umar Iqbal

slide-17
SLIDE 17

Anti-Adblock Filter Lists Coverage

Detec ectio ion n on the Live e Web b

Alexa top 100K Anti-Adblock Killer 4,942 websites Combined EasyList 195 websites

17

Anti-Adblock Killer Filter List Has Better Coverage on the Live Web

Umar Iqbal

slide-18
SLIDE 18

Anti-Adblock Filter Lists Lag

Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days

Combined EasyList Anti-Adblock Killer

18

82% Anti-Adblockers 32% of Anti-Adblockers

Combined EasyList is More Prompt in Adding New Rules While Anti-Adblock Killer Has More Coverage

Umar Iqbal

slide-19
SLIDE 19

Agenda

The he Ad Ad Wars

Online ads Adblocking Anti-Adblocking Anti-Anti-Adblocking

Contrib ntributions utions

Anti-Adblock filter list analysis Retrospective coverage analysis Detecting Anti-Adblock Scripts

Concl nclusion usion

19 Umar Iqbal

slide-20
SLIDE 20

Static Code Analysis

Anti-Adblocking code from 3rd party vendors Anti-Adblocking code have structural similarities Static analysis to capture code structure Fingerprint anti-adblocking JavaScript

Curtsinger [USENIX Security ’11] Ikram [PETS ’17]

20 Umar Iqbal

slide-21
SLIDE 21

Anti-Adblock Detection Workflow

JS file Unpacked JS file Anti-Adblocking JS Non Anti-Adblocking JS

Extract features from ASTs and filter features with low correlation Construct ASTs from Unpacked JavaScript Code Train AdaBoost using SVM as base classifier

21

Unpack packed JavaScript files with V8 engine Classify Anti-Adblocking and Non Anti-Adblocking JavaScripts

Umar Iqbal

slide-22
SLIDE 22

Java avaScript Script Code Example xample if (ad_element.clinetHeight == 0){ BlockAdBlock = "abp"; }

Feature Extraction

Preprocessing

Unpack eval() using V8 Engine Construct Abstract Syntax Tree (AST)

Features (context : text)

All (AssignmentExpression:BlockAdBlock) Literal (Literal:abp) Keyword (Identifier:clientHeight)

Map scripts to a vector space

22 Umar Iqbal ∅ ∶ 𝑦 → ∅𝑡 𝑦

𝑡 ∈ 𝑇

∅𝑡 𝑦 = ቊ1, 𝑗𝑔 𝑦 𝑑𝑝𝑜𝑢𝑏𝑗𝑜𝑡 𝑢ℎ𝑓 𝑔𝑓𝑏𝑢𝑣𝑠𝑓 𝑡 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

Packed cked Code eval( “ var BlockAdBlock = “abp”; ” ); Unpacked ked Code var BlockAdBlock = “abp”; Identifier

ExpressionStatement

clientHeight ad_element BlockAdBlock abp IfStatement

slide-23
SLIDE 23

Feature Selection & Training

Labeled Data

372 anti-adblocking 4021 non anti-adblocking

Feature selection

Filter using χ2 correlation Reduce features

Classifier training

AdaBoost + SVM 10 fold cross validation

23 Umar Iqbal

slide-24
SLIDE 24

Results & Evaluation

Feat ature e Set Clas assifier ifier Numb mber er of

  • f

Feat atures es TP TP rate te (%) FP FP rat ate (%) all AdaBoost + SVM 10K 99.6 3.9 liter teral AdaBoost + SVM 10K 99.6 3.9 keywor word AdaBoost + SVM 1K 99.7 3.2

Results in term of

True Positiv ive e (TP) rate Correctly classified anti-adblocking scripts False se Positiv ive e (FP) rate Incorrectly classified anti-adblocking scripts

T est in the wild on Alexa top 100K websites

2,701 detected anti-adblockers TP rate of 92.5%

Complement manual analysis

Periodic crawl to expedite manual process Substantial reduction of manual effort

24 Umar Iqbal

slide-25
SLIDE 25

Key T akeaways

Comprehensive measurement study of anti-adblocking filter lists

Retrospective analysis on Alexa top 5K websites from 2011 to 2016 Effectiveness and evolution

Lightweight machine learning approach

Static analysis to detect anti-adblocking scripts Complement filter lists rules creation

The Wayback Machine enables retrospective analysis

Can be used to study similar filter lists Malware, Tracking, Censorship

25 Umar Iqbal

slide-26
SLIDE 26

Questions?

Umar Iqbal www.umariqbal.com @umaarr6

slide-27
SLIDE 27

References

  • A. Lerner, A. K. Simpson, T. Kohno, and F. Roesner. Internet Jones and the Raiders of the Lost Trackers: An Archaeological

Study of Web Tracking from 1996 to 2016. In USENIX Security Symposium, 2016.

  • C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert. ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection. In

USENIX Security Symposium, 2011.

  • M. Ikram, H. J. Asghar, M. A. Kaafar, A. Mahanti, and B. Krishnamurthy. T
  • wards Seamless Tracking-FreeWeb:Improved

Detection of Trackers via One-class Learning . In Privacy Enhancing T echnologies Symposium (PETS), 2017.

27 Umar Iqbal