seclab Detecting Website Defacements through Image-based Object - PowerPoint PPT Presentation

M EERKAT   seclab Detecting Website Defacements through Image-based Object Recognition THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA Kevin Borgolte kevinbo@cs.ucsb.edu Christopher Kruegel chris@cs.ucsb.edu Giovanni Vigna vigna@cs.ucsb.edu August 13th, 2015 USENIX Security 2015

seclab Defacements Source: The Register, August 3rd 2015, http://www.theregister.co.uk/2015/08/03/trump_website_hacked/ Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2

seclab Defacements: Scale • Prolific defacers:   Team System Dz, 2,800 websites in 10 months (~10/day) • Over 4,700 manually-verified defacements each day (Zone-H) • Defacements to Phishing Pages   Reported Websites per Month Average: ~7 to 1   1,000,000 Reported Websites Maximum: ~33 to 1 100,000 • Over 53,000 websites from   10,000 top 1 million lists were   1,000 Defacements defaced in 2014 Phishing Pages 100 2000-01 2001-01 2002-01 2003-01 2004-01 2005-01 2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 Month Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 3

seclab Approach • Prior work looks at websites’ code, host-based IDS etc. • Compares to prior version / known good state • M EERKAT : Visually, like a human analyst • Render website in browser • Take screenshot • Does the screenshot looks like a defacement? • No previous version of website needed • No manual feature engineering Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 4

seclab Approach: Deep Neural Network ... Defaced Legitimate ... ... ... 160x160x3 ... 18x18x3 Local Feed-forward with Local 1600x900x3 L2 ... Receptive Dropout Contrast Pooling Fields Normalization Window Screenshot Feature Learning Classification Collection Extraction Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 5

seclab Approach: A Window “Into” The Defacement ... Defaced Legitimate ... ... ... 160x160x3 160x160x3 ... 18x18x3 Local Feed-forward with Local 1600x900x3 1600x900x3 L2 ... Receptive Dropout Contrast Pooling Fields Normalization • Full-size screenshots impractical; window “into” defacement instead • Size of window? • Too large ⇒ overfit (high variance) • Too small ⇒ underfit (high bias) • Extract window from which part of the screenshot? Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 6

seclab Approach: Representative Window Extraction (1) Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 7

seclab Approach: Representative Window Extraction (2) • Sample windows from 2-dimensional Gaussian distribution • Bias heavily toward center of page • If outside of screenshot, resample • Why? • Center of page is descriptive! • Non-trivial to poison training set Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 8

seclab Approach: Deep Neural Network • Feature Learning:   Stacked Auto-Encoders • Classification:   Feed-Forward Neural Network with Dropout • Implemented on-top of Caffe • Trained on GPU, training time in weeks Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 9

seclab Approach: Features Learned • Color combinations • Green on black? Black on white/bright gray? • Letter combinations • Broken and mixed encodings • Leetspeak • “pwned” or “h4x0red” • Typographical and grammatical errors • “greats to” or “visit us in our website” • Defacement group logos Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 10

seclab Approach: Detection Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 11

seclab Evaluation • Dataset • 10,053,772 defaced websites = positives • Defaced websites manually verified by Zone-H • 2,554,905 legitimate websites = negatives • Legitimate websites not verified, might be defaced • Dataset skewed toward defacements • Report Bayesian detection rate (BDR): P(true positive|positive) • Unverified legitimate websites ⇒ TPR & BDR are lower-bounds! Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 12

seclab Evaluation: Traditional • 10-fold cross-validation • Results: • TPR: avg. 97.878% [97.422%, 98.375%] • FPR: avg. 1.012% [0.547%, 1.419%] • BDR: avg. 99.716% [99.603%, 99.845%] • Traditional evaluation has problems: • Same defacement possibly in two bins • Defacements from 1998 vs. 2014 Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 13

seclab Limitations • Fingerprinting and delayed defacements • Tiny defacements • Huge advertisements • Concept drift (natural and adversarial) • Major: learn new features from new data (no feature engineering) • Minor: adjust weights of deeper classification layer Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 14

seclab Limitations: Minor Concept Drift & Fine-Tuning • Train on Dec 2012 to Dec 2013 Time-wise Split, with and without Fine-Tuning 1.000 • 1.78 million defacements True Positive Rate with fine-tuning 0.995 without fine-tuning 0.990 • 1.76 million legitimate pages 0.985 0.980 0.975 • Test on Jan to May 2014 0.970 0.040 False Positive Rate 0.035 • 1.54 million samples, 50/50 split 0.030 0.025 0.020 • Fine-tune Jan, Feb, Mar, Apr 0.015 0.010 • BDR in Jan: 98.583% 0.015 w/ FT - w/o FT True Positive Rate 0.010 False Positive Rate Di ff erence • w/o FT drops to 97.177% 0.005 0.000 -0.005 • w/ FT increases to 98.717% -0.010 -0.015 • Team System Dz started Jan 2014! January May February March April Month of 2014 Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 15

seclab Conclusion • Introduced M EERKAT • Learns features automatically, match domain knowledge • Does not require prior version of website • Outperforms state of the art • Gracefully tackles minor and major concept drift Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 16

Thank you for your attention! Questions? seclab kevinbo@cs.ucsb.edu http://kevin.borgolte.me twitter: @caovc THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA

seclab Detecting Website Defacements through Image-based Object - PowerPoint PPT Presentation

M EERKAT seclab Detecting Website Defacements through Image-based Object Recognition THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA Kevin Borgolte kevinbo@cs.ucsb.edu Christopher Kruegel chris@cs.ucsb.edu Giovanni Vigna

fuzzing & exploiting wireless device drivers Vienna, 23 November 2007 Sylvester Keil

Stateful Fuzzing of Wireless Device Drivers in an Emulated Environment Tokyo 25 October 2007

seclab Impersonation Attacks through a Dedicated Bi-directional Authenticated Channel THE

seclab THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA

Reversing Java (Malware) with Radare Adam Pridgen April 2014 About me Rice SecLab, a PhD

Binary Analysis with angr Or: VEX was a good idea Who am I? Who are we? Who cares? Researchers

Automatic Network Protocol Analysis Gilbert Wondracek, Paolo Milani Comparetti, Christopher

DR. CHECKER A Soundy Analysis for Linux Kernel Drivers Aravind Machiry, Chad Spensky , Jake Corina,

Architecture in Practice: Chrome Reid Holmes Chrome Online content is insecure and can

Factoring as a Service Luke Valenta, Shaanan Cohney, Alex Liao, Joshua Fried, Satya Bodduluri,

SimuVEX Using VEX in Symbolic Analysis Yan Shoshitaishvili yans@cs.ucsb.edu 2014 Who am I? My

Static Enforcement of Web Application Integrity William Robertson and Giovanni Vigna { wkr,vigna

www.parenthese-london.co.uk Berkeley Square House Berkeley Square London W1J 6BD Facebook :

Can the SPDE approach replace traditional Geostatistics for industrial applications? N. Desassis,

EQUALITY 12ai (1) T(p,q) T(q,p) (p,q are constants) AUTOMATED REASONING (2) T(X,X)

Asymptotic study of subcritical graph classes Michael Drmota, Eric Fusy, Mihyun Kang, Veronika

COVID-19 Update April 6, 2020 Jill Hunsaker Ryan, executive Agenda director, CDPHE

Apache CloudStack & Apalia Involved with CloudStack since 2010 Dozens of CloudStack

Operators vs Arguments The Ins and Outs of Reification Antony Galton University of Exeter, UK

Class logistics Object recognition and Computer Vision 2020

Extracting Seeds from (Hardware) Wallets 9th of June, 2019 - Breaking Bitcoin - Charles GUILLEMET

The probability of planarity of a random graph near the critical point Marc Noy, Vlady

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

Random Utility without Regularity Michel Regenwetter Department of Psychology, University of

seclab Detecting Website Defacements through Image-based Object - PowerPoint PPT Presentation

M EERKAT seclab Detecting Website Defacements through Image-based Object Recognition THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA Kevin Borgolte kevinbo@cs.ucsb.edu Christopher Kruegel chris@cs.ucsb.edu Giovanni Vigna

fuzzing &amp; exploiting wireless device drivers Vienna, 23 November 2007 Sylvester Keil

Stateful Fuzzing of Wireless Device Drivers in an Emulated Environment Tokyo 25 October 2007

seclab Impersonation Attacks through a Dedicated Bi-directional Authenticated Channel THE

seclab THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA

Reversing Java (Malware) with Radare Adam Pridgen April 2014 About me Rice SecLab, a PhD

Binary Analysis with angr Or: VEX was a good idea Who am I? Who are we? Who cares? Researchers

Automatic Network Protocol Analysis Gilbert Wondracek, Paolo Milani Comparetti, Christopher

DR. CHECKER A Soundy Analysis for Linux Kernel Drivers Aravind Machiry, Chad Spensky , Jake Corina,

Architecture in Practice: Chrome Reid Holmes Chrome Online content is insecure and can

Factoring as a Service Luke Valenta, Shaanan Cohney, Alex Liao, Joshua Fried, Satya Bodduluri,

SimuVEX Using VEX in Symbolic Analysis Yan Shoshitaishvili yans@cs.ucsb.edu 2014 Who am I? My

Static Enforcement of Web Application Integrity William Robertson and Giovanni Vigna { wkr,vigna

www.parenthese-london.co.uk Berkeley Square House Berkeley Square London W1J 6BD Facebook :

Can the SPDE approach replace traditional Geostatistics for industrial applications? N. Desassis,

EQUALITY 12ai (1) T(p,q) T(q,p) (p,q are constants) AUTOMATED REASONING (2) T(X,X)

Asymptotic study of subcritical graph classes Michael Drmota, Eric Fusy, Mihyun Kang, Veronika

COVID-19 Update April 6, 2020 Jill Hunsaker Ryan, executive Agenda director, CDPHE

Apache CloudStack &amp; Apalia Involved with CloudStack since 2010 Dozens of CloudStack

Operators vs Arguments The Ins and Outs of Reification Antony Galton University of Exeter, UK

Class logistics Object recognition and Computer Vision 2020

Extracting Seeds from (Hardware) Wallets 9th of June, 2019 - Breaking Bitcoin - Charles GUILLEMET

The probability of planarity of a random graph near the critical point Marc Noy, Vlady

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

Random Utility without Regularity Michel Regenwetter Department of Psychology, University of

fuzzing & exploiting wireless device drivers Vienna, 23 November 2007 Sylvester Keil

Apache CloudStack & Apalia Involved with CloudStack since 2010 Dozens of CloudStack