When threat hunting fails Identifying malvertising domains using - - PowerPoint PPT Presentation

when threat hunting fails
SMART_READER_LITE
LIVE PREVIEW

When threat hunting fails Identifying malvertising domains using - - PowerPoint PPT Presentation

When threat hunting fails Identifying malvertising domains using lexical clustering Tucson, January 9th, 2018 Authors kitty Matt Foley David Rodriguez Dhia Mahjoub 2 Background Ad Network Profiling and Filtering Agenda Lexical Clustering


slide-1
SLIDE 1

Identifying malvertising domains using lexical clustering

When threat hunting fails

Tucson, January 9th, 2018

slide-2
SLIDE 2

2

kitty

Authors

Matt Foley David Rodriguez Dhia Mahjoub

slide-3
SLIDE 3

3

Agenda

Background Ad Network Profiling and Filtering Lexical Clustering Hosting space and top talkers

slide-4
SLIDE 4

4

Background

slide-5
SLIDE 5

5

Exploit Kits

Compromised Site Ad Net. Publisher Staged Site (Ad)

Victim

Malvertising Compromised Site

EK Server

Gets lander (proxy)

Step 1.

slide-6
SLIDE 6

6

What is Malvertising

Visitors Publishers Ad Networks Ad Exchanges DSPs Ad Agencies Ad Servers

slide-7
SLIDE 7

7

slide-8
SLIDE 8

8

Compromised Ad Net.

Ad Campaign Flow

User visits publisher site Publisher site includes ad network javascript Ad network fingerprints and sends user to malvertisement

Examples: Tech support scam Rig Exploit Kit Fake flash/java update

Publisher Site Compromised Ad Net.

slide-9
SLIDE 9

9

Exploit Kits

slide-10
SLIDE 10

10

Tech Support Scams

slide-11
SLIDE 11

11

Fake Flash and Java Updates

slide-12
SLIDE 12

12

Ad Network Profiling and Filtering

slide-13
SLIDE 13

13

Filtering on non-residential IP Address

slide-14
SLIDE 14

14

403

Proxy Network

Rotating IPs Choice of region Squid Proxy

slide-15
SLIDE 15

15

Filtering on non-residential IP Address

Ad Network Browsing with DigitalOcean proxy GET

403

Ad Network Returns a 403

slide-16
SLIDE 16

16

Attempts with other VPS providers

slide-17
SLIDE 17

17

Attempts with other VPS providers

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

Lexical Clustering

slide-21
SLIDE 21

21

Attention to Details

slide-22
SLIDE 22

22

Fake Flash and Java Updates

slide-23
SLIDE 23

23

slide-24
SLIDE 24

24

More or Less Traveled Roads

slide-25
SLIDE 25

25

Consider the almighty RegeX Keywords

Known Keywords UnKnown Keywords

safe build click content free apple

Synonyms Typos

slide-26
SLIDE 26

26

Consider the almighty RegeX

grep “*.fake.*”

slide-27
SLIDE 27

27

Traffic Pattern of Fake Update Sites

slide-28
SLIDE 28

28

Traffic Pattern of Fake Update Sites

Look for burst in traffic

slide-29
SLIDE 29

29

For one word, many

slide-30
SLIDE 30

30

Shingling Fake Flash and Java Update

contentfreeandsafe4update Trigram host name {‘con’, ‘ont’, ‘nte’, ‘ten’, ‘ent’, …, ‘ate’}

slide-31
SLIDE 31

31

Shingling Fake Flash and Java Update

contentfreeandsafe4update Trigram host name {‘con’, ‘ont’, ‘nte’, ‘ten’, ‘ent’, …, ‘ate’} MinHash LSH

slide-32
SLIDE 32

32

Locality Sensitive Hashing Fake Flash

contentfreeandforupdate content4freeandsafeupdate

3 Domains with a lot of shingles in common

contentfreeandsafe4update

and con tent fre saf dat

slide-33
SLIDE 33

33

On to production

slide-34
SLIDE 34

34

Clustering Pipeline Realtime/Batch

goodnewcontentssafe.download pipeline hasher Cluster DB Count min-sketch Out pipeline Analyst Dashboard

slide-35
SLIDE 35

35

Payday

slide-36
SLIDE 36

36

Fake Flash and Java Update Lexical Clustering

cluster_1: goodnewcontentssafe.download goodnewfreecontentsload.date goodnewfreecontentall.trade ... cluster_2: call-mlcrosoftnw-err81711102.win call-mlcrosoftnw-err99817109.win call-mlcrosoftnw-err81711101.win ... cluster_3: artificialintelligencesweden.se artificialintelligencechip.com artificialintelligence.net.cm ... cluster_4: mkto-sj220048.com mkto-sj220146.com mkto-sj220162.com ...

slide-37
SLIDE 37

37

We need help

slide-38
SLIDE 38

38

Simple Flask App Dashboard

slide-39
SLIDE 39

39

Hosting space and top talkers

slide-40
SLIDE 40

40

  • Take 1 week’s worth of detections and their hosting space; Jan 1-7
  • Some hosters are consistently abused

AS12876, FR AS14618 Amazon AWS and more Some IPs are actively hosting thousands of domains for months

  • Some hosters are highly infested with shady, toxic content; dedicated?

AS202023, LLHOST, RO; phishing, tech support scams, fake updates, porn

Where are these hosted? Any patterns?

slide-41
SLIDE 41

41

  • Take 1 week’s worth of detections; Jan 1-7 and user IPs
  • 10 busiest hours

20000+ user IPs querying 2000+ malvertising domains

  • Some top talker clusters emerge

Security companies owned ranges querying hundreds of domains Some rogue networks querying hundreds of domains

Who is querying these domains?

slide-42
SLIDE 42

42

Summary

slide-43
SLIDE 43

43

grep “*.fake.*”

Look for burst in traffic

user IPs hosting IPs

slide-44
SLIDE 44

44

NLP on misspellings and common typos Models to categorize clusters Identifying malicious file hosts using belief propagation

Current and Future Work

slide-45
SLIDE 45

45

Matt Foley, matfoley@cisco.com David Rodriguez, davrodr3@cisco.com Dhia Mahjoub, dmahjoub@cisco.com

Thank you Questions? We are hiring