Identifying malvertising domains using lexical clustering
When threat hunting fails
Tucson, January 9th, 2018
When threat hunting fails Identifying malvertising domains using - - PowerPoint PPT Presentation
When threat hunting fails Identifying malvertising domains using lexical clustering Tucson, January 9th, 2018 Authors kitty Matt Foley David Rodriguez Dhia Mahjoub 2 Background Ad Network Profiling and Filtering Agenda Lexical Clustering
Identifying malvertising domains using lexical clustering
Tucson, January 9th, 2018
2
kitty
Matt Foley David Rodriguez Dhia Mahjoub
3
Background Ad Network Profiling and Filtering Lexical Clustering Hosting space and top talkers
4
5
Compromised Site Ad Net. Publisher Staged Site (Ad)
Victim
Malvertising Compromised Site
EK Server
Gets lander (proxy)
Step 1.
6
Visitors Publishers Ad Networks Ad Exchanges DSPs Ad Agencies Ad Servers
7
8
Compromised Ad Net.
User visits publisher site Publisher site includes ad network javascript Ad network fingerprints and sends user to malvertisement
Examples: Tech support scam Rig Exploit Kit Fake flash/java update
Publisher Site Compromised Ad Net.
9
10
11
12
13
14
Rotating IPs Choice of region Squid Proxy
15
Ad Network Browsing with DigitalOcean proxy GET
Ad Network Returns a 403
16
17
18
19
20
21
22
23
24
25
Known Keywords UnKnown Keywords
safe build click content free apple
Synonyms Typos
26
27
28
Look for burst in traffic
29
30
contentfreeandsafe4update Trigram host name {‘con’, ‘ont’, ‘nte’, ‘ten’, ‘ent’, …, ‘ate’}
31
contentfreeandsafe4update Trigram host name {‘con’, ‘ont’, ‘nte’, ‘ten’, ‘ent’, …, ‘ate’} MinHash LSH
32
contentfreeandforupdate content4freeandsafeupdate
3 Domains with a lot of shingles in common
contentfreeandsafe4update
and con tent fre saf dat
33
34
goodnewcontentssafe.download pipeline hasher Cluster DB Count min-sketch Out pipeline Analyst Dashboard
35
36
cluster_1: goodnewcontentssafe.download goodnewfreecontentsload.date goodnewfreecontentall.trade ... cluster_2: call-mlcrosoftnw-err81711102.win call-mlcrosoftnw-err99817109.win call-mlcrosoftnw-err81711101.win ... cluster_3: artificialintelligencesweden.se artificialintelligencechip.com artificialintelligence.net.cm ... cluster_4: mkto-sj220048.com mkto-sj220146.com mkto-sj220162.com ...
37
38
39
40
AS12876, FR AS14618 Amazon AWS and more Some IPs are actively hosting thousands of domains for months
AS202023, LLHOST, RO; phishing, tech support scams, fake updates, porn
41
20000+ user IPs querying 2000+ malvertising domains
Security companies owned ranges querying hundreds of domains Some rogue networks querying hundreds of domains
42
43
grep “*.fake.*”
Look for burst in traffic
user IPs hosting IPs
44
NLP on misspellings and common typos Models to categorize clusters Identifying malicious file hosts using belief propagation
45
Matt Foley, matfoley@cisco.com David Rodriguez, davrodr3@cisco.com Dhia Mahjoub, dmahjoub@cisco.com