AdGraph: A Graph-Based Approach to Ad and Tracker Blocking Umar - - PowerPoint PPT Presentation

adgraph a graph based approach to ad and tracker blocking
SMART_READER_LITE
LIVE PREVIEW

AdGraph: A Graph-Based Approach to Ad and Tracker Blocking Umar - - PowerPoint PPT Presentation

AdGraph: A Graph-Based Approach to Ad and Tracker Blocking Umar Iqbal, Peter Snyder, Shitong Zhu, Benjamin Livshits, Zhiyun Qian, and Zubair Shafiq IEEE Symposium on Security and Privacy, 2020 Online Advertising Advertising enables


slide-1
SLIDE 1

AdGraph: A Graph-Based Approach to Ad and Tracker Blocking

Umar Iqbal, Peter Snyder, Shitong Zhu, Benjamin Livshits, Zhiyun Qian, and Zubair Shafiq

IEEE Symposium on Security and Privacy, 2020

slide-2
SLIDE 2

Online Advertising

1

Advertising enables “free”content Publishers show content Earn revenue with ads

slide-3
SLIDE 3

Online Advertising

Interactive Advertising Bureau (IAB) ‘19

2

Advertising enables “free”content Publishers show content Earn revenue with ads

slide-4
SLIDE 4

Advertising enables “free”content Publishers show content Earn revenue with ads Problems with online advertising ecosystem

Online Advertising

3

slide-5
SLIDE 5

Advertising enables “free”content Publishers show content Earn revenue with ads Problems with online advertising ecosystem Privacy concerns – Behavioral targeting

Online Advertising

“I see ad ads for things I dream am ab about.”

“M “My y phone is s eave vesd sdropping on me me”

4

slide-6
SLIDE 6

Advertising enables “free”content Publishers show content Earn revenue with ads Problems with online advertising ecosystem Privacy concerns – Behavioral targeting Performance issues – Slow page load

Online Advertising

5

slide-7
SLIDE 7

Advertising enables “free”content Publishers show content Earn revenue with ads Problems with online advertising ecosystem Privacy concerns – Behavioral targeting Performance issues – Slow page load Malvertising

Online Advertising

6

slide-8
SLIDE 8

Advertising enables “free”content Publishers show content Earn revenue with ads Problems with online advertising ecosystem Privacy concerns – Behavioral targeting Performance issues – Slow page load Malvertising Intrusive

Online Advertising

7

slide-9
SLIDE 9

Advertising enables “free”content Publishers show content Earn revenue with ads Problems with online advertising ecosystem Privacy concerns – Behavioral targeting Performance issues – Slow page load Malvertising Intrusive Solution Ad & tracker blockers

Online Advertising

8

slide-10
SLIDE 10

Outline

State of Ad/Tracker Blocking Ads & Trackers Filter list blocking Machine learning based blocking AdGraph Graph-based representation Machine learning on graph representation Evaluation

9

slide-11
SLIDE 11

Outline

State of Ad/Tracker Blocking Ads & Trackers Filter list blocking Machine learning based blocking

10

slide-12
SLIDE 12

What are Ads and Trackers?

11

slide-13
SLIDE 13

What are Ads and Trackers?

Ads are audio-visual promotional content

12

slide-14
SLIDE 14

What are Ads and Trackers?

Ads are audio-visual promotional content Trackers collect sensitive information

Tracking Pixel

13

slide-15
SLIDE 15

What are Ads and Trackers?

Ads are audio-visual promotional content Trackers collect sensitive information They are: Created with JavaScript Requested with HTTP Displayed with HTML Ads and trackers involve HTML, Network, and JavaScript

Tracking Pixel

14

slide-16
SLIDE 16

What are Ads and Trackers?

Ads are audio-visual promotional content Trackers collect sensitive information They are: Created with JavaScript

JavaScript Tracking Pixel

15

slide-17
SLIDE 17

What are Ads and Trackers?

Ads are audio-visual promotional content Trackers collect sensitive information They are: Created with JavaScript Requested with HTTP

HTTP Tracking Pixel

16

slide-18
SLIDE 18

What are Ads and Trackers?

Ads are audio-visual promotional content Trackers collect sensitive information They are: Created with JavaScript Requested with HTTP Displayed with HTML

HTML Tracking Pixel

17

slide-19
SLIDE 19

What are Ads and Trackers?

Ads are audio-visual promotional content Trackers collect sensitive information They are: Created with JavaScript Requested with HTTP Displayed with HTML Ads and trackers involve HTML, Network, and JavaScript

JavaScript HTTP HTML Tracking Pixel

18

slide-20
SLIDE 20

Manually curated with crowdsourcing

Filter List Based Blocking

19

slide-21
SLIDE 21

Manually curated with crowdsourcing Leads to scalability issues

Filter List Based Blocking

3 months to add new rules [Iqbal et al. ‘17]

20

slide-22
SLIDE 22

Manually curated with crowdsourcing Leads to scalability issues

Filter List Based Blocking

3.8 year to remove rules [Snyder et al. ‘20]

21

slide-23
SLIDE 23

Manually curated with crowdsourcing Leads to scalability issues

Filter List Based Blocking

90% rules are useless [Snyder et al. ‘20]

22

slide-24
SLIDE 24

Manually curated with crowdsourcing Leads to scalability issues Operate at HTML/Network/JS layer in isolation

Filter List Based Blocking

23

slide-25
SLIDE 25

Manually curated with crowdsourcing Leads to scalability issues Operate at HTML/Network/JS layer in isolation Leads to accuracy issues

Filter List Based Blocking

24

Block network request

slide-26
SLIDE 26

Manually curated with crowdsourcing Leads to scalability issues Operate at HTML/Network/JS layer in isolation Leads to accuracy issues

Filter List Based Blocking

25

Block network request Hide HTML elements

slide-27
SLIDE 27

Manually curated with crowdsourcing Leads to scalability issues Operate at HTML/Network/JS layer in isolation Leads to accuracy issues

Filter List Based Blocking

26

Block network request Hide HTML elements Block script execution

slide-28
SLIDE 28

Manually curated with crowdsourcing Leads to scalability issues Operate at HTML/Network/JS layer in isolation Leads to accuracy issues

Filter List Based Blocking

27

Block network request Hide HTML elements Block script execution

slide-29
SLIDE 29

Filter List Based Blocking

28

Suffer from scalability issues Suffer from accuracy issues

slide-30
SLIDE 30

Machine Learning Based Blocking

Network layer [Bhagavatula et al. 14, Gugelmann et al. ’15] HTTP header properties as features presence of words like “ad” cookies set by response

29

slide-31
SLIDE 31

Machine Learning Based Blocking

Network layer [Bhagavatula et al. 14, Gugelmann et al. ’15] HTTP header properties as features presence of words like “ad” cookies set by response JavaScript layer [Wu et al. ‘16, Ikram et al. ‘17] JS API names as features document.cookie element.clientWidth

30

slide-32
SLIDE 32

Machine Learning Based Blocking

31

Solve scalability issues

slide-33
SLIDE 33

Machine Learning Based Blocking

Do not solve accuracy issues

32

Solve scalability issues

slide-34
SLIDE 34

Outline

AdGraph Graph-based representation Machine learning on graph representation Evaluation

33

slide-35
SLIDE 35

AdG AdGra raph ph

  • 34
slide-36
SLIDE 36

AdG AdGra raph ph

Graph-based cross-layer representation of ad/tracker behavior

  • 35
slide-37
SLIDE 37

AdG AdGra raph ph

Graph-based cross-layer representation of ad/tracker behavior ML to automatically learn ad/tracker behavior

  • 36
slide-38
SLIDE 38

AdG AdGra raph ph

Graph-based cross-layer representation of ad/tracker behavior ML to automatically learn ad/tracker behavior

  • 37

Chromium instrumentation

slide-39
SLIDE 39

AdG AdGra raph ph

Graph-based cross-layer representation of ad/tracker behavior ML to automatically learn ad/tracker behavior

  • 38

Chromium instrumentation Graph representation

slide-40
SLIDE 40

AdG AdGra raph ph

Graph-based cross-layer representation of ad/tracker behavior ML to automatically learn ad/tracker behavior

  • 39

Chromium instrumentation Graph representation Model training

slide-41
SLIDE 41

AdG AdGra raph ph

Graph-based cross-layer representation of ad/tracker behavior ML to automatically learn ad/tracker behavior

  • 40

Chromium instrumentation Graph representation Model training Classification decision

slide-42
SLIDE 42
  • AdGraph

Graph-based cross-layer representation of ad/tracker behavior

41

Chromium instrumentation Graph representation Model training Classification decision

slide-43
SLIDE 43

Cross-layer Context

Network Request HTML Element Script Element

42

slide-44
SLIDE 44

Cross-layer Context

Cross-layer interactions

Network Request HTML Element Script Element

43

slide-45
SLIDE 45

Cross-layer Context

Cross-layer interactions JS (element) → Network (request)

Network Request HTML Element Script Element

44

slide-46
SLIDE 46

Cross-layer Context

Cross-layer interactions JS (element) → Network (request) Network (request) → HTML (response)

Network Request HTML Element Script Element

45

slide-47
SLIDE 47

Cross-layer Context

Cross-layer interactions JS (element) → Network (request) Network (request) → HTML (response) Building cross-layer context

Network Request HTML Element Script Element

46

slide-48
SLIDE 48

Cross-layer Context

Cross-layer interactions JS (element) → Network (request) Network (request) → HTML (response) Building cross-layer context Easy to link Network with HTML

Network Request HTML Element Script Element

47

slide-49
SLIDE 49

Cross-layer Context

Cross-layer interactions JS (element) → Network (request) Network (request) → HTML (response) Building cross-layer context Easy to link Network with HTML JavaScript activity attribution is tricky

Network Request HTML Element Script Element

48

slide-50
SLIDE 50

JavaScript Attribution

49

No API to attribute JavaScript to HTML and Network requests

slide-51
SLIDE 51

JavaScript Attribution

No API to attribute JavaScript to HTML and Network requests Stack Walking [Privacy Badger, OpenWPM] Look at stack at points of interest Incomplete and evadable e.g. eval, inline scripts

50

slide-52
SLIDE 52

JavaScript Attribution

No API to attribute JavaScript to HTML and Network requests Stack Walking [Privacy Badger, OpenWPM] Look at stack at points of interest Incomplete and evadable e.g. eval, inline scripts Browser Instrumentation [JSGraph ‘18] Capture events as scripts execute Detailed cross-layer interaction

51

slide-53
SLIDE 53

Chromium Instrumentation

Instrument rendering (Blink) and JavaScript (V8) engines Build cross-layer context as a graph HTML modifications, Network requests, JS attributions

52

slide-54
SLIDE 54

Chromium Instrumentation

Instrument rendering (Blink) and JavaScript (V8) engines Build cross-layer context as a graph HTML modifications, Network requests, JS attributions

Script nodes Image request Script HTML Network nodes HTML nodes

1 1 2 5 8

Eval attribution to parent script Image attribution to script Edges created by HTML parser Edges created by scripts Script Script (eval) Image HTML Iframe request Iframe HTML

9 10 11

53

slide-55
SLIDE 55

AdGraph

ML to automatically learn ad/tracker behavior

  • 54

Chromium instrumentation Graph representation Model training Classification decision

slide-56
SLIDE 56

Features Extraction

55

Extract two types of features Structural & Content

slide-57
SLIDE 57

Features Extraction

Extract two types of features Structural & Content St Struc uctur ural fea eatur ures es capture graph properties

56

slide-58
SLIDE 58

Features Extraction

Extract two types of features Structural & Content St Struc uctur ural fea eatur ures es capture graph properties Average degree connectivity

Average degree connectivity

0.5 1 0.2 0.4 0.6 0.8 1 Fraction of requests Ad & Tracker Non-Ad & Non-Tracker 57

slide-59
SLIDE 59

Features Extraction

Extract two types of features Structural & Content St Struc uctur ural fea eatur ures es capture graph properties Average degree connectivity Co Content features capture node properties

https://events.bouncex.net/track.gif/bid_selected?partner=i ndex&deployment=masthead&deal_id=106202001&price=3.50000&au ction_number=1&ad_unit_id=26&source=ads&campaignid=917423&a gent=user&mode=0&websiteid=340&visitid=1588398576368654&dev iceid=2799665660403664656&pageviewid=1&sequenceid=17&client timestamp=1588398589360&clientapiversion=tag3&device=d https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstr ap.min.css

58

slide-60
SLIDE 60

Features Extraction

Extract two types of features Structural & Content St Struc uctur ural fea eatur ures es capture graph properties Average degree connectivity Co Content features capture node properties length of URL

https://events.bouncex.net/track.gif/bid_selected?partner=i ndex&deployment=masthead&deal_id=106202001&price=3.50000&au ction_number=1&ad_unit_id=26&source=ads&campaignid=917423&a gent=user&mode=0&websiteid=340&visitid=1588398576368654&dev iceid=2799665660403664656&pageviewid=1&sequenceid=17&client timestamp=1588398589360&clientapiversion=tag3&device=d https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstr ap.min.css

Length of URL

200 400 600 800 0.2 0.4 0.6 0.8 1 Fraction of requests Ad & Tracker Non-Ad & Non-Tracker 59

slide-61
SLIDE 61

Machine Learning

Ground truth Filter lists – despite shortcomings [Iqbal et al. ‘17, Snyder et al. ‘20] Manual evaluation of disagreements with classifier

60

slide-62
SLIDE 62

Machine Learning

Ground truth Filter lists – despite shortcomings [Iqbal et al. ‘17, Snyder et al. ‘20] Manual evaluation of disagreements with classifier Random forest classifier 10-fold cross validation

61

slide-63
SLIDE 63

Evaluation: Accuracy

62

Accuracy is more than 95.33% Recall 86.6% – Precision 89.1%

slide-64
SLIDE 64

Stock Chromium AdGraph

Evaluation: Accuracy

63

Accuracy is more than 95.33% Recall 86.6% – Precision 89.1%

slide-65
SLIDE 65

Accuracy is more than 95.33% Recall 86.6% – Precision 89.1% Disagreement analysis with filter lists

Evaluation: Accuracy

64

slide-66
SLIDE 66

Accuracy is more than 95.33% Recall 86.6% – Precision 89.1% Disagreement analysis with filter lists Filter lists under block due to unknown Ad/Trackers AdGraph detects 43.1% new ad/tackers

Evaluation: Accuracy

Filter Lists AdGraph

65

slide-67
SLIDE 67

Accuracy is more than 95.33% Recall 86.6% – Precision 89.1% Disagreement analysis with filter lists Filter lists under block due to unknown Ad/Trackers AdGraph detects 43.1% new ad/tackers Filter lists over block due to generic rules AdGraph identifies 28.7% over blocked functional content

Evaluation: Accuracy

AdGraph Filter Lists

66

slide-68
SLIDE 68

Evaluation: Accuracy

Filter Lists

AdGraph outperforms the current state-of-the-art

67

slide-69
SLIDE 69

Evaluation: Performance

Real time ad and tracker blocking with ML Instrumentation overhead Classification overhead Page load time comparison (Stock Chromium and AdBlock Plus) Makes up by request blocking & less rendering

68

slide-70
SLIDE 70

Evaluation: Performance

Real time ad and tracker blocking with ML Instrumentation overhead Classification overhead Page load time comparison (Stock Chromium and AdBlock Plus) Makes up by request blocking & less rendering Faster than Chromium on 42% websites Faster when blocks more

69

slide-71
SLIDE 71

Evaluation: Performance

Real time ad and tracker blocking with ML Instrumentation overhead Classification overhead Page load time comparison (Stock Chromium and AdBlock Plus) Makes up by request blocking & less rendering Faster than Chromium on 42% websites Faster when blocks more Faster than Adblock Plus on 78% websites Avoids rendering overhead

70

slide-72
SLIDE 72

Evaluation: Performance

Real time ad and tracker blocking with ML Instrumentation overhead Classification overhead Page load time comparison (Stock Chromium and AdBlock Plus) Makes up by request blocking & less rendering Faster than Chromium on 42% websites Faster when blocks more Faster than Adblock Plus on 78% websites Avoids rendering overhead Minor overhead on most websites

71

slide-73
SLIDE 73

Evaluation: Performance

AdGraph improves page load time

72

slide-74
SLIDE 74

Key Takeaways

Use cross-layer context to address accur accuracy acy issues Use machine learning address sc scalability issues

73

slide-75
SLIDE 75

Key Takeaways

Use cross-layer context to address accur accuracy acy issues Use machine learning address sc scalability issues Open source implementation

74

https://uiowa-irl.github.io/AdGraph/

slide-76
SLIDE 76

Key Takeaways

Use cross-layer context to address accur accuracy acy issues Use machine learning address sc scalability issues Open source implementation Maintained by Brave as PageGraph

75

https://uiowa-irl.github.io/AdGraph/

slide-77
SLIDE 77

Key Takeaways

Use cross-layer context to address accur accuracy acy issues Use machine learning address sc scalability issues Open source implementation Maintained by Brave as PageGraph Filter list generation

76

https://uiowa-irl.github.io/AdGraph/

slide-78
SLIDE 78

Umar Iqbal @umarr6 www.umariqbal.com

Qu Questi tions?

Paper link:

https://www.umariqbal.com/ papers/adgraph-sp2020.pdf

Source code:

https://uiowa- irl.github.io/AdGraph/

Contact details

slide-79
SLIDE 79

References

1. Advertising revenue – https://www.iab.com/wp-content/uploads/2019/05/Full-Year-2018-IAB-Internet-Advertising-Revenue-Report.pdf 2. Malvertising – https://www.zdnet.com/article/hackers-have-breached-60-ad-servers-to-load-their-own-malicious-ads/ 3. Malvertising – https://www.theguardian.com/technology/2016/mar/16/major-sites-new-york-times-bbc-ransomware-malvertising 4. Slow page load – https://www.nytimes.com/interactive/2015/10/01/business/cost-of-mobile-ads.html 5. OpenWPM – https://github.com/mozilla/OpenWPM 6. Privacy Badger – https://github.com/EFForg/privacybadger 7. Iqbal, Umar et al. "The ad wars: retrospective measurement and analysis of anti-adblock filter lists." Proceedings of the 2017 Internet Measurement Conference. 2017. 8. Snyder, Peter et al. "Who filters the filters: Understanding the growth, usefulness and efficiency of crowdsourced ad blocking”, SIGMETRICS. 2020. 9. Bhagavatula, Sruti, et al. "Leveraging machine learning to improve unwanted resource filtering." Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop. 2014. 10. Gugelmann, David, et al. "An automated approach for complementing ad blockers’ blacklists." Proceedings on Privacy Enhancing Technologies 2015. 11. Ikram, Muhammad, et al. "Towards seamless tracking-free web: Improved detection of trackers via one-class learning." Proceedings on Privacy Enhancing Technologies 2017. 12. Wu, Qianru, et al. "A machine learning approach for detecting third-party trackers on the web." European Symposium on Research in Computer Security. Springer, Cham, 2016. 13. Li, Bo, et al. "JSgraph: Enabling Reconstruction of Web Attacks via Efficient Tracking of Live In-Browser JavaScript Executions." NDSS. 2018. 14. Icon made by Pixel perfect from www.flaticon.com

78