deSEO: Combating Search-Result Poisoning John P John Fang Yu, - - PowerPoint PPT Presentation

deseo combating search result poisoning
SMART_READER_LITE
LIVE PREVIEW

deSEO: Combating Search-Result Poisoning John P John Fang Yu, - - PowerPoint PPT Presentation

deSEO: Combating Search-Result Poisoning John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin Abadi University of Washington & MSR, Silicon Valley The malware pipeline find vulnerable web servers compromise web servers and


slide-1
SLIDE 1

deSEO: Combating Search-Result Poisoning

John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin Abadi University of Washington & MSR, Silicon Valley

slide-2
SLIDE 2

The malware pipeline

bad stufg spread malicious links via email, IM, search results compromise web servers and host malicious content find vulnerable web servers

slide-3
SLIDE 3

The malware pipeline

  • Malware links spread through:
  • spam emails, spam IMs, social

networks, search results, etc.

  • We look at search results

bad stufg spread malicious links via email, IM, search results compromise web servers and host malicious content find vulnerable web servers

slide-4
SLIDE 4
slide-5
SLIDE 5

Is this really a problem?

  • ~40% of popular searches contain at least one

malicious link in top results

  • Scareware fraud made $150 m. in profit last

year

slide-6
SLIDE 6

Is this really a problem?

  • ~40% of popular searches contain at least one

malicious link in top results

  • Scareware fraud made $150 m. in profit last

year

slide-7
SLIDE 7

Contributions

  • How does the search poisoning attack work?
  • What can we learn about such attacks?
  • How can we defend against them?
  • examined a live attack involving 5,000 compromised sites
  • identified common features in search poisoning attacks
  • developed deSEO, which detected new live SEO attacks
  • n 1,000+ domains
slide-8
SLIDE 8

Anatomy of SEO attack

search engine redirection server exploit server compromised Web server

slide-9
SLIDE 9

Anatomy of SEO attack

search query

search engine redirection server exploit server compromised Web server

slide-10
SLIDE 10

Anatomy of SEO attack

search query

search engine redirection server exploit server compromised Web server

slide-11
SLIDE 11

Anatomy of SEO attack

search query

search engine redirection server exploit server compromised Web server

slide-12
SLIDE 12

Anatomy of SEO attack

search query

search engine redirection server exploit server compromised Web server

slide-13
SLIDE 13

Anatomy of SEO attack

search query

search engine redirection server exploit server compromised Web server

slide-14
SLIDE 14

Analysis of an attack

  • Examine a specific attack
  • August - October 2010
  • 5,000 compromised domains
  • Tens of thousands of compromised keywords
  • Millions of SEO pages generated
slide-15
SLIDE 15

How are servers compromised?

  • Sites running osCommerce
  • Unpatched vulnerabilities
  • Allows attackers to host any file on the Web

server - including executables

www.example.com/admin/file_manager.php/login.php? action=processuploads!

slide-16
SLIDE 16

What files are uploaded?

slide-17
SLIDE 17

What files are uploaded?

  • php shell to manage file operations
slide-18
SLIDE 18

What files are uploaded?

  • php shell to manage file operations
  • HTML templates, images
slide-19
SLIDE 19

What files are uploaded?

  • php shell to manage file operations
  • HTML templates, images
  • php script to generate SEO web pages
slide-20
SLIDE 20

The main php script

www.example.com/images/page.php?page=kobayashi+arrested

slide-21
SLIDE 21

The main php script

www.example.com/images/page.php?page=kobayashi+arrested

kobayashi arrested

slide-22
SLIDE 22

The main php script

  • Obfuscated script
  • Simple encryption using nested evals

www.example.com/images/page.php?page=kobayashi+arrested

slide-23
SLIDE 23

The main script (de-obfuscated)

slide-24
SLIDE 24

The main script (de-obfuscated)

Check if search crawler Generate page for keyword

slide-25
SLIDE 25

The main script (de-obfuscated)

Check if search crawler Generate page for keyword Fetch: snippets from google images from bing

slide-26
SLIDE 26

The main script (de-obfuscated)

Check if search crawler Generate page for keyword Fetch: snippets from google images from bing Add links to other compromised sites

slide-27
SLIDE 27

The main script (de-obfuscated)

Check if search crawler Generate page for keyword Fetch: snippets from google images from bing Add links to other compromised sites Cache page

slide-28
SLIDE 28

Dense link structure

  • Other compromised domains found by

crawling included links

  • Each site linked to 200 other sites
  • ~5,000 compromised domains identified
  • Each site hosted 8,000 SEO pages
  • 40 million pages total
slide-29
SLIDE 29

Poisoned keywords

  • 20,000+ popular search terms poisoned
slide-30
SLIDE 30

Poisoned keywords

  • 20,000+ popular search terms poisoned
slide-31
SLIDE 31

Poisoned keywords

  • 20,000+ popular search terms poisoned
slide-32
SLIDE 32

Poisoned keywords

  • 20,000+ popular search terms poisoned
  • Google Trends + Bing related searches
  • haiti earthquake
  • senate elections
  • veterans day 2010
  • halloween 2010
  • thanksgiving 2010 ...
slide-33
SLIDE 33

Poisoned keywords

  • 20,000+ popular search terms poisoned
  • Google Trends + Bing related searches
  • haiti earthquake
  • senate elections
  • veterans day 2010
  • halloween 2010
  • thanksgiving 2010 ...
  • 95% of Google Trends keywords poisoned
slide-34
SLIDE 34

Redirection servers

  • Three domains used for redirection
  • Over 1,000 exploit URLs fetched

τ0 τ1 τ2 τ3 δ1 τ0+T δ3 δ2

!" #!!!" $!!!" %!!!" &!!!" '!!!" (!!!" )!!!" *!!!" !"#$%&'()'*+,-#'*+.+/.' 01/%'

slide-35
SLIDE 35

Redirection servers

  • Three domains used for redirection
  • Over 1,000 exploit URLs fetched

τ0 τ1 τ2 τ3 δ1 τ0+T δ3 δ2

Almost 100,000 victims over 10 weeks

!" #!!!" $!!!" %!!!" &!!!" '!!!" (!!!" )!!!" *!!!" !"#$%&'()'*+,-#'*+.+/.' 01/%'

slide-36
SLIDE 36

Evasive techniques

  • Why can’t redirection behavior be easily

detected?

  • Cloaking
  • Requiring user interaction
  • Redirection through javascript or flash
slide-37
SLIDE 37

What are prominent features in search poisoning?

  • Dense link structure
  • Automatic generation of relevant pages
  • Large number of pages with popular keywords
  • Behavior of compromised sites
  • before - diverse content and behavior
  • after - similar content and behavior
slide-38
SLIDE 38

What are prominent features in search poisoning?

  • Dense link structure
  • Automatic generation of relevant pages
  • Large number of pages with popular keywords
  • Behavior of compromised sites
  • before - diverse content and behavior
  • after - similar content and behavior
slide-39
SLIDE 39

deSEO steps

  • 1. History-based filtering

select domains where many new pages are set up, different from older pages

  • 2. Clustering suspicious domains

using K-means++

  • 3. Group similarity analysis

select groups where new pages are similar across domains

slide-40
SLIDE 40

Sample web URLs with trendy keywords

http://www.askania-fachmaerkte.de/images/news.php? page=justin+bieber+breaks+neck

slide-41
SLIDE 41

Sample web URLs with trendy keywords History based detection

slide-42
SLIDE 42

History based detection Domain clustering

  • lexical features of URLs

String features- keyword separators, arguments, filename, path Numerical features- number of arguments, length of arguments, length of keywords Bag of words- set of keywords

Sample web URLs with trendy keywords

slide-43
SLIDE 43

History based detection Domain clustering

  • lexical features of URLs

Group analysis

  • web page feature similarity

Sample web URLs with trendy keywords

slide-44
SLIDE 44

History based detection Domain clustering

  • lexical features of URLs

Group analysis

  • web page feature similarity

Sample web URLs with trendy keywords

slide-45
SLIDE 45

History based detection Domain clustering

  • lexical features of URLs

Group analysis

  • web page feature similarity

! !"!# !"!$ !"!% !"!& !"!' !"!( !"!) !"!* !"!+ !"# #! %! '! )! +! ##! #%! #'! #)! #+! $#! $%! $'! $)! %!! %(! &!! &$! '#! ()!

!"#$%&'()'*)+#,-.)) /)'*)012.)

! !"# !"$ !"% !"& !"' !"( !") !"* ! # $ ) + #! $! $+ %$ %* (! (' (( ### #+#

!"#$%&'()'*)+#,-.)) /)'*)012.)

Sample web URLs with trendy keywords

slide-46
SLIDE 46

History based detection Domain clustering

  • lexical features of URLs

Group analysis

  • web page feature similarity

Regular expressions

  • to match URLs not in our sample

.*\/xmlrpc\.php\/\?showc=\w+(\+\w+)+$

Sample web URLs with trendy keywords

slide-47
SLIDE 47

deSEO findings

  • 11 malicious groups from sampled web graph

in January 2011

  • 957 domains
  • 15,482 URLs
  • Revealed a new search poisoning attack
  • compromised Wordpress installations
  • cloaking to avoid detection
  • different link topology
slide-48
SLIDE 48

Applying to search results

  • 120 keyword searches in Google and Bing
  • 163 malicious URLs detected in results
  • 43 search terms affected

<* 5* 4* 8* :* 3<* 3* 5* 6* 4* 7* 8* 9* :* ;* !"#$%&''()'#*+,-,(".'+,/0.' 1%*&-2'&%."+3'4*5%'

slide-49
SLIDE 49

Conclusion

  • Malware and SEO are big problems
  • Analyzed an ongoing scareware campaign
  • Identified thousands of compromised domains
  • Identified prominent features in SEO attacks

and used them to build deSEO

  • Promising results on a partial dataset from bing
  • Identified multiple live SEO attacks
slide-50
SLIDE 50

Thank You

jjohn@cs.washington.edu