Ne Needle in a Haystack: : Tracking Do Down Elite Ph Phishing g - - PowerPoint PPT Presentation

ne needle in a haystack tracking do down elite ph
SMART_READER_LITE
LIVE PREVIEW

Ne Needle in a Haystack: : Tracking Do Down Elite Ph Phishing g - - PowerPoint PPT Presentation

Ne Needle in a Haystack: : Tracking Do Down Elite Ph Phishing g Dom Domains in the Wild Ke Tian, Steve T.K. Jan , Hang Hu, Danfeng Yao, Gang Wang Computer Science, Virginia Tech Phishing is a Big Th Threat Phishing: fraudulent attempt


slide-1
SLIDE 1

Ne Needle in a Haystack: : Tracking Do Down Elite Ph Phishing g Dom Domains in the Wild

Ke Tian, Steve T.K. Jan, Hang Hu, Danfeng Yao, Gang Wang Computer Science, Virginia Tech

slide-2
SLIDE 2

Phishing is a Big Th Threat

  • Phishing: fraudulent attempt to obtain credentials (password)
  • Big Threat: estimated $30M loss in 20171
  • Exploiting human factor is easier than system vulnerabilities.

2

  • 1. Internet Crime Report, FBI, 2017.

Yahoo Data Breach in 2014 Affected 500 Million Yahoo! User Account Ubiquiti Networks Lost $46.7M dollar to scammers in 2015

slide-3
SLIDE 3
  • Phishing is a long existing problem
  • Good news: some phishing websites are easy to detect

3 http://178.128.85.7/banks/National URL not relate to Paypal: Phishing http://account-updates-center-service.beedoces.com.br

Some Some P Phishing W Websites a are E Easy t to T

  • Tell
slide-4
SLIDE 4
  • Phishing is a long existing problem
  • Good news: some phishing websites are easy to detect

4 http://178.128.85.7/banks/National URL not include domain name: Phishing http://account-updates-center-service.beedoces.com.br

Some Some P Phishing W Websites a are E Easy t to T

  • Tell

http://178.128.85.7/banks/National

slide-5
SLIDE 5

Mor More Sop Sophisticated P Phishing E Examp mple

  • This is IDN (Internalized Domain Name) homograph attack
  • Homograph domain squatting: Exploit the fact that many

different characters look alike

5 http://www.apple.com http://www.apple.com Different Char

slide-6
SLIDE 6

Mor More Sop Sophisticated P Phishing E Examp mple

  • This is IDN (Internalized Domain Name) homograph attack
  • Homograph domain squatting: Exploit the fact that many

different characters look alike

6 http://www.apple.com http://www.apple.com Different Char http://get.adoḅe.com/es/flashplayer

slide-7
SLIDE 7

Ho How w can an we e system ematic tically ally cap aptur ture e thes these e so sophisticated phish shing g websi sites s in practice?

7

slide-8
SLIDE 8

Th This Study

  • We focus on squatting phishing domains
  • Web contents: phishing content, mimicking real websites
  • Domain name: “squatting” domain that impersonates popular brands
  • Research questions
  • How to systematically detect squatting phishing domains in practice?
  • What types of impersonation/evasion techniques do they use?
  • How effective are existing blacklists to detect them?
  • Large-scale empirical measurements
  • Search over 224 million DNS records
  • 702 popular brands

8

slide-9
SLIDE 9

Ou Outline

  • Introduction
  • Detection methodology
  • Detect squatting domain
  • Detect phishing pages under squatting domain
  • Measuring squatting-based phishing
  • Conclusion

9

slide-10
SLIDE 10

Detec ecti tion n Metho thodo dology gy

  • Our detection methodology based on a series of filtering process

10 Squatting Domains: 657,663 Phishing: 1,741 Confirmed: Web 857 Mobile: 908 DNS Records: 224,810,532 Popular brands: 702 Squatting domain detection Phishing classifier Manually check

slide-11
SLIDE 11

Detec ect t Squa quatti ting ng Domain

  • Goal: Detect squatting domain that impersonate brands
  • Given a brand, search squatting domains in DNS
  • Capture five types of squatting domains

1. Homograph: Look similar to target domain 2. Bits: Flip a bit of target domain 3. Typo: Mimic the incorrectly typed of target domain 4. Combo: Connect target domain with other strings 5. WrongTLD: Different TLD of target domain 11 facebook-stroty.com facebook.audi fcaebook.com facebnok.com faceb00k.com facebook.com facebook.com

slide-12
SLIDE 12

Detec ect t Squa quatti ting ng Domain

  • 224,810,532 DNS records 657,663 squatting domains
  • Crawl web and mobile version of pages that are still alive
  • Dynamic crawler: It can load java scripts and process redirections
  • 6,115 squatting domains (1.7%) are redirected to original brand
  • Some business purchase squatting domains to protect their own customers

12 pricelin.com Squattting Domain priceline.com Original Brand

Re-direct

slide-13
SLIDE 13

Ph Phis ishin ing Clas lassif ifier ier

  • Goal: Classifying phishing pages under squatting domains
  • Ground Truth Data:
  • 1,731 phishing pages from PhishTank (manually confirm)
  • 1,565 benign pages from squatting domain (manually confirm)
  • Our classifier is motivated by observations on evasion techniques:
  • 1. Layout obfuscation
  • 2. String obfuscation
  • 3. Code obfuscation

13

slide-14
SLIDE 14

La Layou

  • ut O

Obfuscation

  • n
  • Change style/color/layout of target brand website
  • Evade screenshot-similarity based detection method

14 Target Brand Phishing Website

Be detected by existing methods Not be detected by existing methods

slide-15
SLIDE 15

<script> String.fromCharCode(50) + “a” + …. <title> Log in to your PayPal </title>

Be detected by keyword- similarly based methods

<title> Log in to your PayPa1 </title>

St Stri ring/Cod /Code O Obfuscation

  • n
  • Hide important text and keywords in the HTML source code
  • Evade keyword-similarly based, or source code similarly based

detection

15 Phishing HTML Target Brand HTML <title> Log in to your PayPal </title>

String Obfuscation Code Obfuscation

slide-16
SLIDE 16

Ou Our Desi sign gn

  • Intuition 1: Phishing pages will be visually displayed to users
  • Extract keywords from their screenshots with OCR
  • Tesseract OCR: extract keywords from image

16 Keyword list: Paypol Email passward …… Keyword list: Paypal Email password ……

Google OCR NLTK spell check

slide-17
SLIDE 17

Ou Our Desi sign gn Cont.

  • Intuition 2: Phishing pages contain forms to collect user credentials
  • Extract keywords from HTML forms
  • Using text-based feature from the source code as compliment

17

slide-18
SLIDE 18

Gr Ground Truth th Evalu aluatio tion

  • Feed features to machine learning classifiers
  • Image (OCR) features, form features, text-based features
  • Naive Bayes, KNN and Random forest
  • Results of 10-fold cross-validation:

18 Classifier False Positive False Negative AUC NaïveBayes 0.5 0.05 0.64 KNN 0.04 0.1 0.92 Random Forest 0.03 0.06 0.97

Random Forest is highly accurate

slide-19
SLIDE 19

Ou Outline

  • Introduction
  • Detection methodology
  • Detect squatting domain
  • Detect phishing pages under squatting domain
  • Measuring squatting-based phishing
  • Conclusion

19

slide-20
SLIDE 20

DNS Records: 224,810,532, Popular brands: 702 20 Squatting domains: 657,663 Detected Phishing pages: 1,741

Confirmed phishing pages

  • n web: 857

Detec ecti tion n in n Practi tice

Confirmed phishing pages

  • n mobile: 908

Web only: 267 Phishing on Mobile and Web: 590 Mobile only: 318 Confirmed phishing pages

  • n both: 1175

Squatting phishing websites indeed exist More phishing websites on mobile

slide-21
SLIDE 21

Can Current Blacklists Detect Th Them?

  • Run 70+ phishing blacklists, including PhishTank, eCrimeX, VirusTotal

21

200 400 600 800 1000 1200 PhishTank VirusTotal eCrimeX Evaded Blacklists

# of Pages

Over 90 % live

  • ver a month

Reported them

Existing blacklists/tools are not capable to capture squatting phishing yet

slide-22
SLIDE 22

Sq Squatting D Doma

  • mains T

Types

  • Combo squatting domains contain the largest number of phishing pages
  • Bits and homograph squatting domains: Hard to register

22

100 200 300 400 500 600 Homograph Bits Typo Combo WrongTLD

# of pages Web Mobile

slide-23
SLIDE 23

Ex Exampl ple Study: udy: Ube ber

  • Attackers steal Uber truck driver’s account.

23 go-uberfreight.com Target Domain Squatting Domain freight.uber.com

slide-24
SLIDE 24

Ex Exampl ple Study: udy: Of Offi fice 365

  • Attackers compromises users’ office 365 account

24 Target Domain Squatting Domain

  • ffice365.com
  • utlook-office365.net
slide-25
SLIDE 25

Con Conclusion

  • n
  • An extensive measurement of squatting phishing domain
  • From 224,810,532 DNS records and 700+ brands
  • Detect and identify 1,175 squatting phishing pages
  • Open-sourced our tool at: https://github.com/SquatPhish
  • Future work
  • Adversarial attacks for OCR-based phishing detection
  • Deploy the system for long term measurement

25

slide-26
SLIDE 26

Thank You

26

slide-27
SLIDE 27

APPENDIX

27

slide-28
SLIDE 28

Ev Evasions in Squatting Phishing

  • Layout obfuscation: average 28.5 hamming distance
  • String obfuscation: 68% adopted
  • Code obfuscation: 35% adopted

28

Obfuscation is common to squatting phishing.

slide-29
SLIDE 29

IP IP Locatio tion

  • Check geolocation of 1,021 IP addresses, hosted in 53 different

countries.

  • U.S. has most of the sites, then Germany

29

slide-30
SLIDE 30

Fa False Positive Prediction

30 http://paypal.me