seclab Detecting Website Defacements through Image-based Object - - PowerPoint PPT Presentation

seclab
SMART_READER_LITE
LIVE PREVIEW

seclab Detecting Website Defacements through Image-based Object - - PowerPoint PPT Presentation

M EERKAT seclab Detecting Website Defacements through Image-based Object Recognition THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA Kevin Borgolte kevinbo@cs.ucsb.edu Christopher Kruegel chris@cs.ucsb.edu Giovanni Vigna


slide-1
SLIDE 1

Kevin Borgolte Christopher Kruegel Giovanni Vigna

seclab

THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA

August 13th, 2015 USENIX Security 2015

kevinbo@cs.ucsb.edu chris@cs.ucsb.edu vigna@cs.ucsb.edu

MEERKAT


Detecting Website Defacements through Image-based Object Recognition

slide-2
SLIDE 2

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2

Defacements

Source: The Register, August 3rd 2015, http://www.theregister.co.uk/2015/08/03/trump_website_hacked/

slide-3
SLIDE 3

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition

2000-01 2001-01 2002-01 2003-01 2004-01 2005-01 2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 Month 100 1,000 10,000 100,000 1,000,000 Reported Websites

Reported Websites per Month

Defacements Phishing Pages

3

Defacements: Scale

  • Prolific defacers:


Team System Dz, 2,800 websites in 10 months (~10/day)

  • Over 4,700 manually-verified defacements each day (Zone-H)
  • Defacements to Phishing Pages


Average: ~7 to 1
 Maximum: ~33 to 1

  • Over 53,000 websites from 


top 1 million lists were
 defaced in 2014

slide-4
SLIDE 4

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 4

Approach

  • Prior work looks at websites’ code, host-based IDS etc.
  • Compares to prior version / known good state
  • MEERKAT: Visually, like a human analyst
  • Render website in browser
  • Take screenshot
  • Does the screenshot looks like a defacement?
  • No previous version of website needed
  • No manual feature engineering
slide-5
SLIDE 5

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 5

Approach: Deep Neural Network

Feed-forward with Dropout

1600x900x3 160x160x3 18x18x3 ... ...

Local Receptive Fields

...

L2 Pooling

... ...

Local Contrast Normalization

...

Defaced Legitimate

Classification Screenshot Collection

Window Extraction Feature Learning

slide-6
SLIDE 6

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 6

Approach: A Window “Into” The Defacement

  • Full-size screenshots impractical; window “into” defacement instead
  • Size of window?
  • Too large ⇒ overfit (high variance)
  • Too small ⇒ underfit (high bias)
  • Extract window from which part of the screenshot?

Feed-forward with Dropout

1600x900x3 160x160x3 18x18x3 ... ...

Local Receptive Fields

...

L2 Pooling

... ...

Local Contrast Normalization

...

Defaced Legitimate

1600x900x3 160x160x3

slide-7
SLIDE 7

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 7

Approach: Representative Window Extraction (1)

slide-8
SLIDE 8

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition

  • Sample windows from 2-dimensional Gaussian distribution
  • Bias heavily toward center of page
  • If outside of screenshot, resample
  • Why?
  • Center of page is descriptive!
  • Non-trivial to poison training set

8

Approach: Representative Window Extraction (2)

slide-9
SLIDE 9

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 9

Approach: Deep Neural Network

  • Feature Learning:


Stacked Auto-Encoders

  • Classification:


Feed-Forward Neural Network with Dropout

  • Implemented on-top of Caffe
  • Trained on GPU, training time in weeks
slide-10
SLIDE 10

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 10

Approach: Features Learned

  • Color combinations
  • Green on black? Black on white/bright gray?
  • Letter combinations
  • Broken and mixed encodings
  • Leetspeak
  • “pwned” or “h4x0red”
  • Typographical and grammatical errors
  • “greats to” or “visit us in our website”
  • Defacement group logos
slide-11
SLIDE 11

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 11

Approach: Detection

slide-12
SLIDE 12

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 12

Evaluation

  • Dataset
  • 10,053,772 defaced websites = positives
  • Defaced websites manually verified by Zone-H
  • 2,554,905 legitimate websites = negatives
  • Legitimate websites not verified, might be defaced
  • Dataset skewed toward defacements
  • Report Bayesian detection rate (BDR): P(true positive|positive)
  • Unverified legitimate websites ⇒ TPR & BDR are lower-bounds!
slide-13
SLIDE 13

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 13

Evaluation: Traditional

  • 10-fold cross-validation
  • Results:
  • TPR: avg. 97.878% [97.422%, 98.375%]
  • FPR: avg. 1.012% [0.547%, 1.419%]
  • BDR: avg. 99.716% [99.603%, 99.845%]
  • Traditional evaluation has problems:
  • Same defacement possibly in two bins
  • Defacements from 1998 vs. 2014
slide-14
SLIDE 14

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 14

Limitations

  • Fingerprinting and delayed defacements
  • Tiny defacements
  • Huge advertisements
  • Concept drift (natural and adversarial)
  • Major: learn new features from new data (no feature engineering)
  • Minor: adjust weights of deeper classification layer
slide-15
SLIDE 15

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 15

Limitations: Minor Concept Drift & Fine-Tuning

  • Train on Dec 2012 to Dec 2013
  • 1.78 million defacements
  • 1.76 million legitimate pages
  • Test on Jan to May 2014
  • 1.54 million samples, 50/50 split
  • Fine-tune Jan, Feb, Mar, Apr
  • BDR in Jan: 98.583%
  • w/o FT drops to 97.177%
  • w/ FT increases to 98.717%
  • Team System Dz started Jan 2014!

0.970 0.975 0.980 0.985 0.990 0.995 1.000 True Positive Rate

Time-wise Split, with and without Fine-Tuning

with fine-tuning without fine-tuning

0.010 0.015 0.020 0.025 0.030 0.035 0.040 False Positive Rate January February March April May Month of 2014

  • 0.015
  • 0.010
  • 0.005

0.000 0.005 0.010 0.015 Difference w/ FT - w/o FT

True Positive Rate False Positive Rate

slide-16
SLIDE 16

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 16

Conclusion

  • Introduced MEERKAT
  • Learns features automatically, match domain knowledge
  • Does not require prior version of website
  • Outperforms state of the art
  • Gracefully tackles minor and major concept drift
slide-17
SLIDE 17

kevinbo@cs.ucsb.edu http://kevin.borgolte.me twitter: @caovc

Thank you for your attention! Questions?

seclab

THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA