faster c&c detection - strategies for finding algorithmically - - PowerPoint PPT Presentation

faster c c detection strategies for finding
SMART_READER_LITE
LIVE PREVIEW

faster c&c detection - strategies for finding algorithmically - - PowerPoint PPT Presentation

. Malgorzata Debska September 22, 2015 CERT Polska faster c&c detection - strategies for finding algorithmically generated domain names Introduction - what is DGA? Malicious usage in botnets Benign DGA - false alarms in detection


slide-1
SLIDE 1

faster c&c detection - strategies for finding algorithmically generated domain names

.

Malgorzata Debska September 22, 2015

CERT Polska

slide-2
SLIDE 2

list of topics

Introduction - what is DGA? Malicious usage in botnets Benign DGA - false alarms in detection systems Current detection techniques - classification Challenges and conclusion

slide-3
SLIDE 3

introduction - what is dga? .

slide-4
SLIDE 4

algorithmically generated domain names

4

slide-5
SLIDE 5

diffrencies in generated domains:

  • randomness of characters
  • characters set
  • distribution of frequency of

character usage

  • length of generated domains
  • level of domain generation
  • utilized set of top level

domains

5

slide-6
SLIDE 6

diffrencies in generated domains:

  • randomness of characters
  • characters set
  • distribution of frequency of

character usage

  • length of generated domains
  • level of domain generation
  • utilized set of top level

domains

5

slide-7
SLIDE 7

diffrencies in generated domains:

  • randomness of characters
  • characters set
  • distribution of frequency of

character usage

  • length of generated domains
  • level of domain generation
  • utilized set of top level

domains

5

slide-8
SLIDE 8

diffrencies in generated domains:

  • randomness of characters
  • characters set
  • distribution of frequency of

character usage

  • length of generated domains
  • level of domain generation
  • utilized set of top level

domains

5

slide-9
SLIDE 9

diffrencies in generated domains:

  • randomness of characters
  • characters set
  • distribution of frequency of

character usage

  • length of generated domains
  • level of domain generation
  • utilized set of top level

domains

5

slide-10
SLIDE 10

diffrencies in generated domains:

  • randomness of characters
  • characters set
  • distribution of frequency of

character usage

  • length of generated domains
  • level of domain generation
  • utilized set of top level

domains

5

slide-11
SLIDE 11

examples

tinba-dga jmqvlmmbred2e.com fg4zstnd3ftwh.net qeh2p2u9pd3i1.com pttthldqrdt.net dircrypt mhrmhuxlcvkxay.com ctskthnhq.com safkylboxhb.com simda lykef.eu qekol.eu puzej.eu galin.eu dyre a3f6e2d182a40304a8874e994a294ec314.cc b5191b0ad53da1f1fa66653610e7601856.ws cc466dc54278d8e0fe14bdd2038b927e6f.to gameover-zeus 1g22l018lpt4alpeypioqq24k.com 1yz3uuo1yg5zmf1u7goe81sy0xy9.net 1fhvdfa1hr7na1gu9vmv6r710j.biz 5bpzt0njqbkqlbwupc8vi3yt.org banjori xjsrrsensinaix.com hlrfrsensinaix.com antisemitismgavenuteq.com bnmtsemitismgavenuteq.com

6

slide-12
SLIDE 12

examples

tinba-dga jmqvlmmbred2e.com fg4zstnd3ftwh.net qeh2p2u9pd3i1.com pttthldqrdt.net dircrypt mhrmhuxlcvkxay.com ctskthnhq.com safkylboxhb.com simda lykef.eu qekol.eu puzej.eu galin.eu dyre a3f6e2d182a40304a8874e994a294ec314.cc b5191b0ad53da1f1fa66653610e7601856.ws cc466dc54278d8e0fe14bdd2038b927e6f.to gameover-zeus 1g22l018lpt4alpeypioqq24k.com 1yz3uuo1yg5zmf1u7goe81sy0xy9.net 1fhvdfa1hr7na1gu9vmv6r710j.biz 5bpzt0njqbkqlbwupc8vi3yt.org banjori xjsrrsensinaix.com hlrfrsensinaix.com antisemitismgavenuteq.com bnmtsemitismgavenuteq.com

6

slide-13
SLIDE 13

malicious usage in botnets .

slide-14
SLIDE 14

c&c server’s name example

Every second infected host try to connect with hundreds or thousands alghoritmically generated domain name

  • most of domains return NX

response

  • attacker needs to have a

couple of registered domains

8

slide-15
SLIDE 15

c&c server’s name example

Every second infected host try to connect with hundreds or thousands alghoritmically generated domain name

  • most of domains return NX

response

  • attacker needs to have a

couple of registered domains

8

slide-16
SLIDE 16

dga botnet communication

  • DNS communication
  • algorithm that generates

domain names

  • shared seed between

botmaster and clients

  • victims search C&C server by

DNS query

9

slide-17
SLIDE 17

dga botnet communication

  • DNS communication
  • algorithm that generates

domain names

  • shared seed between

botmaster and clients

  • victims search C&C server by

DNS query

9

slide-18
SLIDE 18

dga botnet communication

  • DNS communication
  • algorithm that generates

domain names

  • shared seed between

botmaster and clients

  • victims search C&C server by

DNS query

9

slide-19
SLIDE 19

dga botnet communication

  • DNS communication
  • algorithm that generates

domain names

  • shared seed between

botmaster and clients

  • victims search C&C server by

DNS query

9

slide-20
SLIDE 20

generator’s seed

Is it easy to predict and sinkhole DGA domains ahead?

10

slide-21
SLIDE 21

generator’s seed

All domains generated alghoritmically are dependent on specified seed

  • date (CryptoLocker, Conficker,

GameOverZeus)

  • currently trending Twitter hashtag (Torpig)
  • seed hardcoded in infected file (Tinba)
  • ...

Figure 1: Ramnit

Seeds are globally consistent - victims use the same one at the same time

11

slide-22
SLIDE 22

generator’s seed

All domains generated alghoritmically are dependent on specified seed

  • date (CryptoLocker, Conficker,

GameOverZeus)

  • currently trending Twitter hashtag (Torpig)
  • seed hardcoded in infected file (Tinba)
  • ...

Figure 1: Ramnit

Seeds are globally consistent - victims use the same one at the same time

11

slide-23
SLIDE 23

generator’s seed

All domains generated alghoritmically are dependent on specified seed

  • date (CryptoLocker, Conficker,

GameOverZeus)

  • currently trending Twitter hashtag (Torpig)
  • seed hardcoded in infected file (Tinba)
  • ...

Figure 1: Ramnit

Seeds are globally consistent - victims use the same one at the same time

11

slide-24
SLIDE 24

generator’s seed

All domains generated alghoritmically are dependent on specified seed

  • date (CryptoLocker, Conficker,

GameOverZeus)

  • currently trending Twitter hashtag (Torpig)
  • seed hardcoded in infected file (Tinba)
  • ...

Figure 1: Ramnit

Seeds are globally consistent - victims use the same one at the same time

11

slide-25
SLIDE 25

generator’s seed

All domains generated alghoritmically are dependent on specified seed

  • date (CryptoLocker, Conficker,

GameOverZeus)

  • currently trending Twitter hashtag (Torpig)
  • seed hardcoded in infected file (Tinba)
  • ...

Figure 1: Ramnit

Seeds are globally consistent - victims use the same one at the same time

11

slide-26
SLIDE 26

generator’s seed

All domains generated alghoritmically are dependent on specified seed

  • date (CryptoLocker, Conficker,

GameOverZeus)

  • currently trending Twitter hashtag (Torpig)
  • seed hardcoded in infected file (Tinba)
  • ...

Figure 1: Ramnit

Seeds are globally consistent - victims use the same one at the same time

11

slide-27
SLIDE 27

generator’s seed

All domains generated alghoritmically are dependent on specified seed

  • date (CryptoLocker, Conficker,

GameOverZeus)

  • currently trending Twitter hashtag (Torpig)
  • seed hardcoded in infected file (Tinba)
  • ...

Figure 1: Ramnit

Seeds are globally consistent - victims use the same one at the same time

11

slide-28
SLIDE 28

generator’s seed

All domains generated alghoritmically are dependent on specified seed

  • date (CryptoLocker, Conficker,

GameOverZeus)

  • currently trending Twitter hashtag (Torpig)
  • seed hardcoded in infected file (Tinba)
  • ...

Figure 1: Ramnit

Seeds are globally consistent - victims use the same one at the same time

11

slide-29
SLIDE 29

is it a serious problem? what malware use dga?

  • Dyre
  • GameoverZeus
  • Banjori
  • Matsu
  • Pushdo
  • Emotet
  • Pykpsa
  • Ramnit
  • Conficker
  • Bobax
  • Murofet
  • Necurs
  • Shiotob
  • Pykspa
  • Cryptolocker
  • Rovnix
  • Emotet
  • Gozi
  • BankPatch
  • Qakbot
  • DirCrypt
  • Gozi
  • Flashback
  • Necrus
  • Ramdo

AND MORE ...

12

slide-30
SLIDE 30

different techinques but still dga

  • domain name contains

random alphanumeric characters and words from dictionary

  • names are builds from

english syllables

13

slide-31
SLIDE 31

different techinques but still dga

  • domain name contains

random alphanumeric characters and words from dictionary

  • names are builds from

english syllables

13

slide-32
SLIDE 32

benign dga - false alarms in detection systems .

slide-33
SLIDE 33

requests of av tools

Example1 0.0.0.0.1.0.0.4e.135jg5e1pd7s4735ftrqweufm5.avqs.mcafee.com 0.0.0.0.1.0.0.4e.13cfus2drmdq3j8cafidezr8l6.avqs.mcafee.com 0.0.0.0.1.0.0.4e.13kqas3qjj46ttkdhastkrdsv6.avqs.mcafee.com 0.0.0.0.1.0.0.4e.13pq3hfpunqn1d51pmvbdkk5s6.avqs.mcafee.com 0.0.0.0.1.0.0.4e.13qh71bf782qb54uzz9uhdz4mq.avqs.mcafee.com This higher level domain contains basic information about the file, its hash, version of the McAfee system and information about the execution environment

1DNS Noise: Measuring the Pervasiveness of Disposable Domains in Modern DNS

Traffic, Yizheng Chen et al.

15

slide-34
SLIDE 34

requests of av tools

Example1 0.0.0.0.1.0.0.4e.135jg5e1pd7s4735ftrqweufm5.avqs.mcafee.com 0.0.0.0.1.0.0.4e.13cfus2drmdq3j8cafidezr8l6.avqs.mcafee.com 0.0.0.0.1.0.0.4e.13kqas3qjj46ttkdhastkrdsv6.avqs.mcafee.com 0.0.0.0.1.0.0.4e.13pq3hfpunqn1d51pmvbdkk5s6.avqs.mcafee.com 0.0.0.0.1.0.0.4e.13qh71bf782qb54uzz9uhdz4mq.avqs.mcafee.com This higher level domain contains basic information about the file, its hash, version of the McAfee system and information about the execution environment

1DNS Noise: Measuring the Pervasiveness of Disposable Domains in Modern DNS

Traffic, Yizheng Chen et al.

15

slide-35
SLIDE 35

internationalized domain name

  • IDNs always begin with ’xn–’ prefix
  • Now, IDNs are also used for malicious purposes

16

slide-36
SLIDE 36

internationalized domain name

  • IDNs always begin with ’xn–’ prefix
  • Now, IDNs are also used for malicious purposes

16

slide-37
SLIDE 37

internationalized domain name

  • IDNs always begin with ’xn–’ prefix
  • Now, IDNs are also used for malicious purposes

16

slide-38
SLIDE 38

internationalized domain name

  • IDNs always begin with ’xn–’ prefix
  • Now, IDNs are also used for malicious purposes

16

slide-39
SLIDE 39

current detection techniques - classification .

slide-40
SLIDE 40

detection techinques classification

18

slide-41
SLIDE 41

what kind of data we have?

Environment:

  • Probe placement (LAN, upper

levels of DNS hierarchy)

  • Network trace (NX, XD, NX+DX,
  • nly requests)
  • Raw text data (eg. Zone

dump) Input data always enforce solution

19

slide-42
SLIDE 42

what kind of data we have?

Environment:

  • Probe placement (LAN, upper

levels of DNS hierarchy)

  • Network trace (NX, XD, NX+DX,
  • nly requests)
  • Raw text data (eg. Zone

dump) Input data always enforce solution

19

slide-43
SLIDE 43

what kind of data we have?

Environment:

  • Probe placement (LAN, upper

levels of DNS hierarchy)

  • Network trace (NX, XD, NX+DX,
  • nly requests)
  • Raw text data (eg. Zone

dump) Input data always enforce solution

19

slide-44
SLIDE 44

what kind of data we have?

Environment:

  • Probe placement (LAN, upper

levels of DNS hierarchy)

  • Network trace (NX, XD, NX+DX,
  • nly requests)
  • Raw text data (eg. Zone

dump) Input data always enforce solution

19

slide-45
SLIDE 45

what kind of data we have?

Environment:

  • Probe placement (LAN, upper

levels of DNS hierarchy)

  • Network trace (NX, XD, NX+DX,
  • nly requests)
  • Raw text data (eg. Zone

dump) Input data always enforce solution

19

slide-46
SLIDE 46

architecture

GROUND TRUTH

20

slide-47
SLIDE 47

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-48
SLIDE 48

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-49
SLIDE 49

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-50
SLIDE 50

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-51
SLIDE 51

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-52
SLIDE 52

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-53
SLIDE 53

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-54
SLIDE 54

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-55
SLIDE 55

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-56
SLIDE 56

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-57
SLIDE 57

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-58
SLIDE 58

botnet detection - architecture solutions

Methods based on : correlations

  • analysis of DNS

traffic between all hosts at network DGA features

  • 1. DNS traffic features
  • TTL value-based

features

  • DNS answer-based

features

  • SOA record
  • 2. Lexical features
  • domain length
  • number of n-grams
  • character’s entropy
  • frequency of

character usgage

  • domain level

hybrid

  • mix of

corelations and DGA features

21

slide-59
SLIDE 59

correlation method

Example of method based on correlations between hosts in network is K. Sato solution in Extending Black Domain Name List by Using Co-occurrence Relation between DNS queries, K. Sato.2

  • check DNS traffic for all hosts

in network

  • choose hosts with suspicious

traffic

  • correlate queries sent by

infected host with non-infected hosts

2Kazumichi Sato, keisuke Ishibashi, Tsuyoshi Toyono, Nobuhisa Miyake, Extending

Black Domain Name List by Using Co-occurrence Relation between DNS queries

22

slide-60
SLIDE 60

correlation method

Example of method based on correlations between hosts in network is K. Sato solution in Extending Black Domain Name List by Using Co-occurrence Relation between DNS queries, K. Sato.2

  • check DNS traffic for all hosts

in network

  • choose hosts with suspicious

traffic

  • correlate queries sent by

infected host with non-infected hosts

2Kazumichi Sato, keisuke Ishibashi, Tsuyoshi Toyono, Nobuhisa Miyake, Extending

Black Domain Name List by Using Co-occurrence Relation between DNS queries

22

slide-61
SLIDE 61

correlation method

Example of method based on correlations between hosts in network is K. Sato solution in Extending Black Domain Name List by Using Co-occurrence Relation between DNS queries, K. Sato.2

  • check DNS traffic for all hosts

in network

  • choose hosts with suspicious

traffic

  • correlate queries sent by

infected host with non-infected hosts

2Kazumichi Sato, keisuke Ishibashi, Tsuyoshi Toyono, Nobuhisa Miyake, Extending

Black Domain Name List by Using Co-occurrence Relation between DNS queries

22

slide-62
SLIDE 62

correlation method

Example of method based on correlations between hosts in network is K. Sato solution in Extending Black Domain Name List by Using Co-occurrence Relation between DNS queries, K. Sato.2

  • check DNS traffic for all hosts

in network

  • choose hosts with suspicious

traffic

  • correlate queries sent by

infected host with non-infected hosts

2Kazumichi Sato, keisuke Ishibashi, Tsuyoshi Toyono, Nobuhisa Miyake, Extending

Black Domain Name List by Using Co-occurrence Relation between DNS queries

22

slide-63
SLIDE 63

lexical features

Labels’ length distribution of the 12000+ DGA domains dataset 3

2OpenDNS Security Lab

23

slide-64
SLIDE 64

lexical features - example

In paper Detecting Algorithmically Generated Malicious Domain Names, S.Yadav and A.Reddy described a system based mainly on lexical features.4 Features:

  • length
  • entropy
  • K-L divergence
  • Jaccard index between

bigrams

  • Edit distance

4Sandeep Yadav, Ashwath K.K. Reddy and A.L. Narasimha Reddy, Detecting

Algorithmically Generated Malicious Domain Names

24

slide-65
SLIDE 65

lexical features - example

In paper Detecting Algorithmically Generated Malicious Domain Names, S.Yadav and A.Reddy described a system based mainly on lexical features.4 Features:

  • length
  • entropy
  • K-L divergence
  • Jaccard index between

bigrams

  • Edit distance

4Sandeep Yadav, Ashwath K.K. Reddy and A.L. Narasimha Reddy, Detecting

Algorithmically Generated Malicious Domain Names

24

slide-66
SLIDE 66

lexical features - example

In paper Detecting Algorithmically Generated Malicious Domain Names, S.Yadav and A.Reddy described a system based mainly on lexical features.4 Features:

  • length
  • entropy
  • K-L divergence
  • Jaccard index between

bigrams

  • Edit distance

4Sandeep Yadav, Ashwath K.K. Reddy and A.L. Narasimha Reddy, Detecting

Algorithmically Generated Malicious Domain Names

24

slide-67
SLIDE 67

lexical features - example

In paper Detecting Algorithmically Generated Malicious Domain Names, S.Yadav and A.Reddy described a system based mainly on lexical features.4 Features:

  • length
  • entropy
  • K-L divergence
  • Jaccard index between

bigrams

  • Edit distance

4Sandeep Yadav, Ashwath K.K. Reddy and A.L. Narasimha Reddy, Detecting

Algorithmically Generated Malicious Domain Names

24

slide-68
SLIDE 68

lexical features - example

In paper Detecting Algorithmically Generated Malicious Domain Names, S.Yadav and A.Reddy described a system based mainly on lexical features.4 Features:

  • length
  • entropy
  • K-L divergence
  • Jaccard index between

bigrams

  • Edit distance

4Sandeep Yadav, Ashwath K.K. Reddy and A.L. Narasimha Reddy, Detecting

Algorithmically Generated Malicious Domain Names

24

slide-69
SLIDE 69

lexical features ratios comparision

Domains length vs ngrams frequency in PL zone

25

slide-70
SLIDE 70

machine learning

Regardless of which set of features we chose (lexical or based on DNS traffic), classifier needs to create detection model.

  • it is based on ground truth
  • most alghoritms based on

genetic alghoritms or decision trees: SVM, Random Forests, J48 etc.

26

slide-71
SLIDE 71

machine learning

Regardless of which set of features we chose (lexical or based on DNS traffic), classifier needs to create detection model.

  • it is based on ground truth
  • most alghoritms based on

genetic alghoritms or decision trees: SVM, Random Forests, J48 etc.

26

slide-72
SLIDE 72

example of machine learning use

Combination of lexical and DNS traffic features was used in Bilge Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains.5

  • count features ratios based
  • n ground truth
  • create training model (J48)
  • count features ratios for host
  • classify host by comparing

host ratios with training model

5Leyla Bilge, Engin Kirda, Christopher Kruegel and Marco Balduzzi, EXPOSURE: Finding

Malicious Domains Using Passive DNS Analysis

27

slide-73
SLIDE 73

example of machine learning use

Combination of lexical and DNS traffic features was used in Bilge Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains.5

  • count features ratios based
  • n ground truth
  • create training model (J48)
  • count features ratios for host
  • classify host by comparing

host ratios with training model

5Leyla Bilge, Engin Kirda, Christopher Kruegel and Marco Balduzzi, EXPOSURE: Finding

Malicious Domains Using Passive DNS Analysis

27

slide-74
SLIDE 74

example of machine learning use

Combination of lexical and DNS traffic features was used in Bilge Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains.5

  • count features ratios based
  • n ground truth
  • create training model (J48)
  • count features ratios for host
  • classify host by comparing

host ratios with training model

5Leyla Bilge, Engin Kirda, Christopher Kruegel and Marco Balduzzi, EXPOSURE: Finding

Malicious Domains Using Passive DNS Analysis

27

slide-75
SLIDE 75

example of machine learning use

Combination of lexical and DNS traffic features was used in Bilge Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains.5

  • count features ratios based
  • n ground truth
  • create training model (J48)
  • count features ratios for host
  • classify host by comparing

host ratios with training model

5Leyla Bilge, Engin Kirda, Christopher Kruegel and Marco Balduzzi, EXPOSURE: Finding

Malicious Domains Using Passive DNS Analysis

27

slide-76
SLIDE 76

example of machine learning use

Combination of lexical and DNS traffic features was used in Bilge Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains.5

  • count features ratios based
  • n ground truth
  • create training model (J48)
  • count features ratios for host
  • classify host by comparing

host ratios with training model

5Leyla Bilge, Engin Kirda, Christopher Kruegel and Marco Balduzzi, EXPOSURE: Finding

Malicious Domains Using Passive DNS Analysis

27

slide-77
SLIDE 77

example of machine learning use

Combination of lexical and DNS traffic features was used in Bilge Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains.5

  • count features ratios based
  • n ground truth
  • create training model (J48)
  • count features ratios for host
  • classify host by comparing

host ratios with training model

5Leyla Bilge, Engin Kirda, Christopher Kruegel and Marco Balduzzi, EXPOSURE: Finding

Malicious Domains Using Passive DNS Analysis

27

slide-78
SLIDE 78

dga in top level domains

5OpenDNS Security Lab

28

slide-79
SLIDE 79

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni6

  • Filtering XD domains - lexical

features analysis

  • 1. meaningful characters ratio
  • 2. n-gram normality score
  • 3. statistical lingiustic ratios
  • Clustering using bipartite

graph recursive

  • 1. IP address features
  • 2. DBSCAN alghoritm

6Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

29

slide-80
SLIDE 80

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni6

  • Filtering XD domains - lexical

features analysis

  • 1. meaningful characters ratio
  • 2. n-gram normality score
  • 3. statistical lingiustic ratios
  • Clustering using bipartite

graph recursive

  • 1. IP address features
  • 2. DBSCAN alghoritm

6Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

29

slide-81
SLIDE 81

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni6

  • Filtering XD domains - lexical

features analysis

  • 1. meaningful characters ratio
  • 2. n-gram normality score
  • 3. statistical lingiustic ratios
  • Clustering using bipartite

graph recursive

  • 1. IP address features
  • 2. DBSCAN alghoritm

6Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

29

slide-82
SLIDE 82

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni6

  • Filtering XD domains - lexical

features analysis

  • 1. meaningful characters ratio
  • 2. n-gram normality score
  • 3. statistical lingiustic ratios
  • Clustering using bipartite

graph recursive

  • 1. IP address features
  • 2. DBSCAN alghoritm

6Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

29

slide-83
SLIDE 83

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni6

  • Filtering XD domains - lexical

features analysis

  • 1. meaningful characters ratio
  • 2. n-gram normality score
  • 3. statistical lingiustic ratios
  • Clustering using bipartite

graph recursive

  • 1. IP address features
  • 2. DBSCAN alghoritm

6Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

29

slide-84
SLIDE 84

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni6

  • Filtering XD domains - lexical

features analysis

  • 1. meaningful characters ratio
  • 2. n-gram normality score
  • 3. statistical lingiustic ratios
  • Clustering using bipartite

graph recursive

  • 1. IP address features
  • 2. DBSCAN alghoritm

6Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

29

slide-85
SLIDE 85

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni6

  • Filtering XD domains - lexical

features analysis

  • 1. meaningful characters ratio
  • 2. n-gram normality score
  • 3. statistical lingiustic ratios
  • Clustering using bipartite

graph recursive

  • 1. IP address features
  • 2. DBSCAN alghoritm

6Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

29

slide-86
SLIDE 86

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni7

  • Fingerprinting
  • 1. C&C servers IP adresses
  • 2. length of the shortest and longest domain name
  • 3. utilized character set
  • 4. number of numerical characters in chosem prefix of domain from

cluster

  • 5. set of TLDs used in cluster

7Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

30

slide-87
SLIDE 87

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni7

  • Fingerprinting
  • 1. C&C servers IP adresses
  • 2. length of the shortest and longest domain name
  • 3. utilized character set
  • 4. number of numerical characters in chosem prefix of domain from

cluster

  • 5. set of TLDs used in cluster

7Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

30

slide-88
SLIDE 88

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni7

  • Fingerprinting
  • 1. C&C servers IP adresses
  • 2. length of the shortest and longest domain name
  • 3. utilized character set
  • 4. number of numerical characters in chosem prefix of domain from

cluster

  • 5. set of TLDs used in cluster

7Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

30

slide-89
SLIDE 89

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni7

  • Fingerprinting
  • 1. C&C servers IP adresses
  • 2. length of the shortest and longest domain name
  • 3. utilized character set
  • 4. number of numerical characters in chosem prefix of domain from

cluster

  • 5. set of TLDs used in cluster

7Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

30

slide-90
SLIDE 90

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni7

  • Fingerprinting
  • 1. C&C servers IP adresses
  • 2. length of the shortest and longest domain name
  • 3. utilized character set
  • 4. number of numerical characters in chosem prefix of domain from

cluster

  • 5. set of TLDs used in cluster

7Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

30

slide-91
SLIDE 91

hybrid method

Phoenix: DGA-Based Botnet Tracking and Intelligence, Schiavoni7

  • Fingerprinting
  • 1. C&C servers IP adresses
  • 2. length of the shortest and longest domain name
  • 3. utilized character set
  • 4. number of numerical characters in chosem prefix of domain from

cluster

  • 5. set of TLDs used in cluster

7Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, Stefano Zanero, Phoenix:

DGA-Based Botnet Tracking and Intelligence

30

slide-92
SLIDE 92

detection results

Phoenix - hybrid method FPR - ? TPR - 81.4-94.8 Exposure - machine learning FPR - 0.3-1.1 TPR - 98.4-99.5 Lexical method FPR - 0.3-0.8 TPR - 83.3-100.0 Correlation method FPR - 0.5 TPR - 80.0

31

slide-93
SLIDE 93

current detection and protection techniques

As we see there are two general properties that define each DGA:

  • predictability
  • time-dependence

No detection system can predict all generated domains without malware’s algorithm and seed, yet.

32

slide-94
SLIDE 94

current detection and protection techniques

As we see there are two general properties that define each DGA:

  • predictability
  • time-dependence

No detection system can predict all generated domains without malware’s algorithm and seed, yet.

32

slide-95
SLIDE 95

current detection and protection techniques

As we see there are two general properties that define each DGA:

  • predictability
  • time-dependence

No detection system can predict all generated domains without malware’s algorithm and seed, yet.

32

slide-96
SLIDE 96

current detection and protection techniques

As we see there are two general properties that define each DGA:

  • predictability
  • time-dependence

No detection system can predict all generated domains without malware’s algorithm and seed, yet.

32

slide-97
SLIDE 97

current detection and protection techniques

As we see there are two general properties that define each DGA:

  • predictability
  • time-dependence

No detection system can predict all generated domains without malware’s algorithm and seed, yet.

32

slide-98
SLIDE 98

challenges and conclusion .

slide-99
SLIDE 99

Questions?

34