seclab
play

seclab Detecting Website Defacements through Image-based Object - PowerPoint PPT Presentation

M EERKAT seclab Detecting Website Defacements through Image-based Object Recognition THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA Kevin Borgolte kevinbo@cs.ucsb.edu Christopher Kruegel chris@cs.ucsb.edu Giovanni Vigna


  1. M EERKAT 
 seclab Detecting Website Defacements through Image-based Object Recognition THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA Kevin Borgolte kevinbo@cs.ucsb.edu Christopher Kruegel chris@cs.ucsb.edu Giovanni Vigna vigna@cs.ucsb.edu August 13th, 2015 USENIX Security 2015

  2. seclab Defacements Source: The Register, August 3rd 2015, http://www.theregister.co.uk/2015/08/03/trump_website_hacked/ Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2

  3. seclab Defacements: Scale • Prolific defacers: 
 Team System Dz, 2,800 websites in 10 months (~10/day) • Over 4,700 manually-verified defacements each day (Zone-H) • Defacements to Phishing Pages 
 Reported Websites per Month Average: ~7 to 1 
 1,000,000 Reported Websites Maximum: ~33 to 1 100,000 • Over 53,000 websites from 
 10,000 top 1 million lists were 
 1,000 Defacements defaced in 2014 Phishing Pages 100 2000-01 2001-01 2002-01 2003-01 2004-01 2005-01 2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 Month Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 3

  4. seclab Approach • Prior work looks at websites’ code, host-based IDS etc. • Compares to prior version / known good state • M EERKAT : Visually, like a human analyst • Render website in browser • Take screenshot • Does the screenshot looks like a defacement? • No previous version of website needed • No manual feature engineering Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 4

  5. seclab Approach: Deep Neural Network ... Defaced Legitimate ... ... ... 160x160x3 ... 18x18x3 Local Feed-forward with Local 1600x900x3 L2 ... Receptive Dropout Contrast Pooling Fields Normalization Window Screenshot Feature Learning Classification Collection Extraction Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 5

  6. seclab Approach: A Window “Into” The Defacement ... Defaced Legitimate ... ... ... 160x160x3 160x160x3 ... 18x18x3 Local Feed-forward with Local 1600x900x3 1600x900x3 L2 ... Receptive Dropout Contrast Pooling Fields Normalization • Full-size screenshots impractical; window “into” defacement instead • Size of window? • Too large ⇒ overfit (high variance) • Too small ⇒ underfit (high bias) • Extract window from which part of the screenshot? Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 6

  7. seclab Approach: Representative Window Extraction (1) Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 7

  8. seclab Approach: Representative Window Extraction (2) • Sample windows from 2-dimensional Gaussian distribution • Bias heavily toward center of page • If outside of screenshot, resample • Why? • Center of page is descriptive! • Non-trivial to poison training set Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 8

  9. seclab Approach: Deep Neural Network • Feature Learning: 
 Stacked Auto-Encoders • Classification: 
 Feed-Forward Neural Network with Dropout • Implemented on-top of Caffe • Trained on GPU, training time in weeks Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 9

  10. seclab Approach: Features Learned • Color combinations • Green on black? Black on white/bright gray? • Letter combinations • Broken and mixed encodings • Leetspeak • “pwned” or “h4x0red” • Typographical and grammatical errors • “greats to” or “visit us in our website” • Defacement group logos Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 10

  11. seclab Approach: Detection Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 11

  12. seclab Evaluation • Dataset • 10,053,772 defaced websites = positives • Defaced websites manually verified by Zone-H • 2,554,905 legitimate websites = negatives • Legitimate websites not verified, might be defaced • Dataset skewed toward defacements • Report Bayesian detection rate (BDR): P(true positive|positive) • Unverified legitimate websites ⇒ TPR & BDR are lower-bounds! Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 12

  13. seclab Evaluation: Traditional • 10-fold cross-validation • Results: • TPR: avg. 97.878% [97.422%, 98.375%] • FPR: avg. 1.012% [0.547%, 1.419%] • BDR: avg. 99.716% [99.603%, 99.845%] • Traditional evaluation has problems: • Same defacement possibly in two bins • Defacements from 1998 vs. 2014 Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 13

  14. seclab Limitations • Fingerprinting and delayed defacements • Tiny defacements • Huge advertisements • Concept drift (natural and adversarial) • Major: learn new features from new data (no feature engineering) • Minor: adjust weights of deeper classification layer Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 14

  15. seclab Limitations: Minor Concept Drift & Fine-Tuning • Train on Dec 2012 to Dec 2013 Time-wise Split, with and without Fine-Tuning 1.000 • 1.78 million defacements True Positive Rate with fine-tuning 0.995 without fine-tuning 0.990 • 1.76 million legitimate pages 0.985 0.980 0.975 • Test on Jan to May 2014 0.970 0.040 False Positive Rate 0.035 • 1.54 million samples, 50/50 split 0.030 0.025 0.020 • Fine-tune Jan, Feb, Mar, Apr 0.015 0.010 • BDR in Jan: 98.583% 0.015 w/ FT - w/o FT True Positive Rate 0.010 False Positive Rate Di ff erence • w/o FT drops to 97.177% 0.005 0.000 -0.005 • w/ FT increases to 98.717% -0.010 -0.015 • Team System Dz started Jan 2014! January May February March April Month of 2014 Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 15

  16. seclab Conclusion • Introduced M EERKAT • Learns features automatically, match domain knowledge • Does not require prior version of website • Outperforms state of the art • Gracefully tackles minor and major concept drift Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 16

  17. Thank you for your attention! Questions? seclab kevinbo@cs.ucsb.edu http://kevin.borgolte.me twitter: @caovc THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend