Semantic based DNS Forensics Samuel Marchal, J er ome Fran cois, - - PowerPoint PPT Presentation

semantic based dns forensics
SMART_READER_LITE
LIVE PREVIEW

Semantic based DNS Forensics Samuel Marchal, J er ome Fran cois, - - PowerPoint PPT Presentation

samuel.marchal@uni.lu 3/12/12 Semantic based DNS Forensics Samuel Marchal, J er ome Fran cois, Radu State and Thomas Engel Motivations Semantic analysis Experiments and Results Conclusion Outline 1 Motivations 2 Semantic analysis 3


slide-1
SLIDE 1

samuel.marchal@uni.lu 3/12/12

Semantic based DNS Forensics

Samuel Marchal, J´ erˆ

  • me Fran¸

cois, Radu State and Thomas Engel

slide-2
SLIDE 2

Motivations Semantic analysis Experiments and Results Conclusion

Outline

1 Motivations 2 Semantic analysis 3 Experiments and Results 4 Conclusion

2 / 17

slide-3
SLIDE 3

Motivations Semantic analysis Experiments and Results Conclusion

Outline

1 Motivations 2 Semantic analysis 3 Experiments and Results 4 Conclusion

3 / 17

slide-4
SLIDE 4

Motivations Semantic analysis Experiments and Results Conclusion

DNS misuse

DNS: Domain Name System is the support of many malicious activities

DNS requests: malwareupdate.com commandandcontrol.net

compromised host DNS recursive server 56.7.89.10 123.45.67.8 76.54.32.1 C&C server malware update

Requests forwarding

phishing web servers

DNS resolution

bots

request for malware update request to C&C connection to phishing website DNS replies: 123.45.67.8 56.7.89.10

Authoritative DNS server for malicious domains

DNS resolution

· malware updates · botnet C&C · phishing · backdoor communications · etc.

4 / 17

slide-5
SLIDE 5

Motivations Semantic analysis Experiments and Results Conclusion

DNS misuse

DNS: Domain Name System is the support of many malicious activities

request for malware update DNS requests: malwareupdate.com commandandcontrol.net

compromised host DNS recursive server 56.7.89.10 123.45.67.8 76.54.32.1 C&C server malware update

Requests forwarding

phishing web server

DNS resolution

bots

request to C&C connection to phishing website DNS replies: 76.54.32.1 56.7.89.10

Authoritative DNS server for malicious domains

DNS resolution

4 / 17

slide-6
SLIDE 6

Motivations Semantic analysis Experiments and Results Conclusion

DNS for forensic

Why proceed DNS analysis for forensic purposes ?

◮ find proof of infection (malicious domains requests) ◮ reduced amount of data to analyse: DNS is a meager

subset of network traffic

◮ DNS analysis keeps users’ anonymity

= ⇒ useful as a first step before in-depth analysis

5 / 17

slide-7
SLIDE 7

Motivations Semantic analysis Experiments and Results Conclusion

DNS for forensic

Why proceed DNS analysis for forensic purposes ?

◮ find proof of infection (malicious domains requests) ◮ reduced amount of data to analyse: DNS is a meager

subset of network traffic

◮ DNS analysis keeps users’ anonymity

= ⇒ useful as a first step before in-depth analysis Issue: How do we know if a domain is malicious ?

5 / 17

slide-8
SLIDE 8

Motivations Semantic analysis Experiments and Results Conclusion

State of the art

Identification of malicious domains:

◮ User reports + manual checking ◮ DNS packet fields analysis + classification via

machine learning algorithm:

◮ domain records removed: data is no longer available

= ⇒ problematic for forensic analysis

◮ Domain name based analysis:

◮ number of domain levels ◮ relative position of labels ◮ domain length ◮ etc.

6 / 17

slide-9
SLIDE 9

Motivations Semantic analysis Experiments and Results Conclusion

Outline

1 Motivations 2 Semantic analysis 3 Experiments and Results 4 Conclusion

7 / 17

slide-10
SLIDE 10

Motivations Semantic analysis Experiments and Results Conclusion

Analyse domain semantic

◮ Domain names are meant to be meaningful ◮ Observations: malicious domains often use words

from the same semantic fields:

◮ www.visa-sweden.mastercard.forever4c.com ◮ myvodafone.vodafone-security-update78.systemknight.com ◮ paypal.com-us.webscr.cmd-homeelocale.gumuspena.com

◮ Issue: single domains are not significant enough ◮ =

⇒ Group domains according to common features (IP address, etc.)

◮ Knowing group of malicious and legitimate domains

= ⇒ deduce if an unknown group is malicious or not

8 / 17

slide-11
SLIDE 11

Motivations Semantic analysis Experiments and Results Conclusion

Features extraction

Splitting of domain name:

myvodafone.vodafone-security-update78.systemknights.com

myvodafone.vodafone-security-update78.systemknights.com vodafone vodafone my system knights security update ‘.’ splitting ‘-’ splitting word segmentation systemknights number extraction 78 update78 myvodafone vodafone-security-update78

◮ distword = {(my, 0.125), (vodafone, 0.25), (security, 0.125), ...}

9 / 17

slide-12
SLIDE 12

Motivations Semantic analysis Experiments and Results Conclusion

Semantic relatedness evaluation

How to evaluate semantic similarity between two sets of domain names ? = ⇒ between two words: Wordnet, Disco:

◮ calculate a similarity score (semantic relatedness)

between 2 words

◮ give the n most related words to w ◮ based on dictionary (Wikipedia, BNC, PubMed, etc.)

sim(w1, w2) =

  • (r,w)∈T(w1)∩T(w2) I(w1,r,w)+I(w2,r,w)
  • (r,w)∈T(w1) I(w1,r,w)+

(r,w)∈T(w2) I(w2,r,w)

= ⇒ use this metric in new ones

10 / 17

slide-13
SLIDE 13

Motivations Semantic analysis Experiments and Results Conclusion

Semantic metrics

3 metrics defined to compare two sets of domains: Assuming two domain sets A and B and the associated extracted word sets WA and WB with the occurrence frequencies distword we have:

Sim1(A, B) =

wA∈WA

  • wB∈WB sim(wA, wB)

Sim2(A, B) =

wA∈WA

  • wB∈WB sim(wA, wB) × distwordwA,WA × distwordwB,WB

Sim′

3(A, B) = w∈WA

  • w ′∈Disco(w,n) sim(w, w ′) × distwordw ′,WB

= ⇒ Sim3(A, B) = Sim′

3(A, B) + Sim′ 3(B, A)

11 / 17

slide-14
SLIDE 14

Motivations Semantic analysis Experiments and Results Conclusion

Outline

1 Motivations 2 Semantic analysis 3 Experiments and Results 4 Conclusion

12 / 17

slide-15
SLIDE 15

Motivations Semantic analysis Experiments and Results Conclusion

Similarity metrics efficiency

Comparison pair-wise of domains sets (Sim3(A, B))

◮ 10 sets of around 13,000 domains each ◮ 5 legitimate (Alexa + passive DNS) ◮ 5 malicious (PhishTank, DNS-BH, MDL)

leg-5 leg-4 leg-3 leg-2 leg-1 mal-5 mal-4 mal-3 mal-2 mal-1 0.776 0.795 0.793 0.789 0.785 0.955 0.962 0.965 0.975 mal-2 0.782 0.800 0.798 0.797 0.797 0.965 0.968 0.973 mal-3 0.772 0.796 0.793 0.788 0.784 0.951 0.962 mal-4 0.783 0.804 0.804 0.800 0.796 0.953 mal-5 0.769 0.785 0.784 0.782 0.772 leg-1 0.946 0.948 0.952 0.938 leg-2 0.915 0.924 0.922 leg-3 0.936 0.934 leg-4 0.935 0.7 0.76 0.82 0.88 0.94 1.00

13 / 17

slide-16
SLIDE 16

Motivations Semantic analysis Experiments and Results Conclusion

Size of domains sets

Similarity metrics able to distinguish legitimate from malicious sets of domains:

◮ for big set (13,000 domains): ok !! ◮ minimum number of domains in a set to evaluate it ?

0.1 0.2 0.3 0.4 0.5 0.6 0.7 50 100 150 200

Value of Sim1 between datasets # of domains in the dataset

leg mal

S i m

3

14 / 17

slide-17
SLIDE 17

Motivations Semantic analysis Experiments and Results Conclusion

Outline

1 Motivations 2 Semantic analysis 3 Experiments and Results 4 Conclusion

15 / 17

slide-18
SLIDE 18

Motivations Semantic analysis Experiments and Results Conclusion

Conclusion

Technique for domains sets comparison:

◮ semantic similarity scoring ◮ apply to identification of malicious domain set ◮ useful for first step of forensic analysis

Results:

◮ able to distinguish malicious from legitimate

domains...

◮ ... for sets of at least 10 domains

Future works:

◮ improve similarity metrics ◮ correlate with IP Flow records

16 / 17

slide-19
SLIDE 19

samuel.marchal@uni.lu 3/12/12

Semantic based DNS Forensics

Samuel Marchal, J´ erˆ

  • me Fran¸

cois, Radu State and Thomas Engel