Melting the Snow: Detecting Snowshoe Spam Domains Using Active DNS - - PowerPoint PPT Presentation

melting the snow detecting snowshoe spam domains using
SMART_READER_LITE
LIVE PREVIEW

Melting the Snow: Detecting Snowshoe Spam Domains Using Active DNS - - PowerPoint PPT Presentation

Olivier van der Toorn <o.i.vandertoorn@utwente.nl> November 13, 2018 University of Twente, Design and Analysis of Communication Systems NOMS 2018 Melting the Snow: Detecting Snowshoe Spam Domains Using Active DNS Measurements


slide-1
SLIDE 1

Melting the Snow: Detecting Snowshoe Spam Domains Using Active DNS Measurements

Olivier van der Toorn <o.i.vandertoorn@utwente.nl> November 13, 2018

University of Twente, Design and Analysis of Communication Systems NOMS 2018

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3

1

slide-4
SLIDE 4

1

slide-5
SLIDE 5

1

slide-6
SLIDE 6

Snowshoe Spam

2

Internet

slide-7
SLIDE 7

Snowshoe Spam

2

Internet Spam

  • few hosts
slide-8
SLIDE 8

Snowshoe Spam

2

Internet Spam

  • few hosts
  • many messages per host
slide-9
SLIDE 9

Snowshoe Spam

2

Internet Spam Snowshoe Spam

  • few hosts
  • many messages per host
  • many hosts
slide-10
SLIDE 10

Snowshoe Spam

2

Internet Spam Snowshoe Spam

  • few hosts
  • many messages per host
  • many hosts
  • few messages per host
slide-11
SLIDE 11

Assumption

3

slide-12
SLIDE 12

Assumption

4

slide-13
SLIDE 13

Assumption: Background

SPF record from ‘consultant.com’ v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 ip4:217.72.207.0/27 -all Email from ‘paypoint_sanchez@consultant.com’ Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF v=spf1 a mx ip4:167.160.22.0/24 -all

5

slide-14
SLIDE 14

Assumption: Background

SPF record from ‘consultant.com’ v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 ip4:217.72.207.0/27

  • all

Email from ‘paypoint_sanchez@consultant.com’ Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF v=spf1 a mx ip4:167.160.22.0/24 -all

5

slide-15
SLIDE 15

Assumption: Background

SPF record from ‘consultant.com’ v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 ip4:217.72.207.0/27 -all Email from ‘paypoint_sanchez@consultant.com’ Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF v=spf1 a mx ip4:167.160.22.0/24 -all

5

slide-16
SLIDE 16

Assumption: Background

SPF record from ‘consultant.com’ v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 ip4:217.72.207.0/27 -all Email from ‘paypoint_sanchez@consultant.com’ Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF v=spf1 a mx ip4:167.160.22.0/24 -all

5

slide-17
SLIDE 17

Assumption: Background

SPF record from ‘consultant.com’ v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 ip4:217.72.207.0/27 -all Email from ‘paypoint_sanchez@consultant.com’ Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF v=spf1 a mx ip4:167.160.22.0/24 -all

5

slide-18
SLIDE 18

Hypothesis

While snowshoe spammers are hard to detect, but still leave a trace in the DNS.

6

slide-19
SLIDE 19

Hypothesis

While snowshoe spammers are hard to detect, but still leave a trace in the DNS. Snowshoe spam + SPF

6

slide-20
SLIDE 20

Hypothesis

While snowshoe spammers are hard to detect, but still leave a trace in the DNS. Snowshoe spam + SPF Many hosts + a DNS record for each host or a long SPF record

6

slide-21
SLIDE 21

Hypothesis

While snowshoe spammers are hard to detect, but still leave a trace in the DNS. Snowshoe spam + SPF Many hosts + a DNS record for each host or a long SPF record Domain with many records or long SPF records

6

slide-22
SLIDE 22

Hypothesis

While snowshoe spammers are hard to detect, but still leave a trace in the DNS. Snowshoe spam + SPF Many hosts + a DNS record for each host or a long SPF record Domain with many records or long SPF records Active DNS measurements are a good way to detect snowshoe spam domains.

6

slide-23
SLIDE 23

Methodology

slide-24
SLIDE 24

Overview

7

OpenINTEL (DNS data source)

slide-25
SLIDE 25

Overview

8

OpenINTEL (DNS data source) Machine Learning (processing)

slide-26
SLIDE 26

Overview

8

OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage)

slide-27
SLIDE 27

Overview

8

OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) SURFnet (validation)

slide-28
SLIDE 28

OpenINTEL: Background

  • Active DNS measurement platform
  • Queries more than 60% of registered domain names (in total more than 206 million)
  • A
  • AAAA
  • MX
  • NS
  • Every 24 hours a measurement is started

9

slide-29
SLIDE 29

OpenINTEL: Background

  • Active DNS measurement platform
  • Queries more than 60% of registered domain names (in total more than 206 million)
  • A
  • AAAA
  • MX
  • NS
  • Every 24 hours a measurement is started

9

slide-30
SLIDE 30

OpenINTEL: Background

  • Active DNS measurement platform
  • Queries more than 60% of registered domain names (in total more than 206 million)
  • A
  • AAAA
  • MX
  • NS
  • Every 24 hours a measurement is started

9

slide-31
SLIDE 31

OpenINTEL: Background

  • Active DNS measurement platform
  • Queries more than 60% of registered domain names (in total more than 206 million)
  • A
  • AAAA
  • MX
  • NS
  • Every 24 hours a measurement is started

9

slide-32
SLIDE 32

OpenINTEL: Datasets & Features

37 features

  • Simple: number of MX addresses
  • Complex: number of IP addresses inside an SPF record

These features are not computed for every domain in OpenINTEL.

10

slide-33
SLIDE 33

OpenINTEL: Datasets & Features

37 features

  • Simple: number of MX addresses
  • Complex: number of IP addresses inside an SPF record

These features are not computed for every domain in OpenINTEL.

10

slide-34
SLIDE 34

OpenINTEL: Datasets & Features

37 features

  • Simple: number of MX addresses
  • Complex: number of IP addresses inside an SPF record

These features are not computed for every domain in OpenINTEL.

10

slide-35
SLIDE 35

OpenINTEL: Datasets & Features

37 features

  • Simple: number of MX addresses
  • Complex: number of IP addresses inside an SPF record

These features are not computed for every domain in OpenINTEL.

10

slide-36
SLIDE 36

OpenINTEL: Long Tail Analysis

11

slide-37
SLIDE 37

Machine Learning

12

OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) SURFnet (validation)

slide-38
SLIDE 38

Machine Learning: 12 algorithms

We have trained and evaluated 12 Machine Learning algorithms.

  • Training dataset from domains on the long tail which appear in known blacklists.

The performance of each classifier is compared based on the precision metric. Precision True Positives True Positives False Positives Selected the ‘AdaBoost’ classifier as our classifier of choice, since it had the highest precision (98% with a FPR of 1%).

13

slide-39
SLIDE 39

Machine Learning: 12 algorithms

We have trained and evaluated 12 Machine Learning algorithms.

  • Training dataset from domains on the long tail which appear in known blacklists.

The performance of each classifier is compared based on the precision metric. Precision = True Positives True Positives + False Positives Selected the ‘AdaBoost’ classifier as our classifier of choice, since it had the highest precision (98% with a FPR of 1%).

13

slide-40
SLIDE 40

Machine Learning: 12 algorithms

We have trained and evaluated 12 Machine Learning algorithms.

  • Training dataset from domains on the long tail which appear in known blacklists.

The performance of each classifier is compared based on the precision metric. Precision = True Positives True Positives + False Positives Selected the ‘AdaBoost’ classifier as our classifier of choice, since it had the highest precision (98% with a FPR of 1%).

13

slide-41
SLIDE 41

Realtime Blackhole List (RBL)

14

OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) SURFnet (validation)

slide-42
SLIDE 42

SURFnet

15

OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) SURFnet (validation)

slide-43
SLIDE 43

Methodology: Recap

Active DNS measurements of more than 60% of registered domain names forms the source of our data. We filter out large domains via the Long Tail Analysis. We have selected the AdaBoost classifier as our classifier of choice, since it had the highest precision metric. The results of our daily detections are stored in an RBL. We have evaluated the RBL in SURFmailfilter.

16

slide-44
SLIDE 44

Methodology: Recap

Active DNS measurements of more than 60% of registered domain names forms the source of our data. We filter out large domains via the Long Tail Analysis. We have selected the AdaBoost classifier as our classifier of choice, since it had the highest precision metric. The results of our daily detections are stored in an RBL. We have evaluated the RBL in SURFmailfilter.

16

slide-45
SLIDE 45

Methodology: Recap

Active DNS measurements of more than 60% of registered domain names forms the source of our data. We filter out large domains via the Long Tail Analysis. We have selected the AdaBoost classifier as our classifier of choice, since it had the highest precision metric. The results of our daily detections are stored in an RBL. We have evaluated the RBL in SURFmailfilter.

16

slide-46
SLIDE 46

Methodology: Recap

Active DNS measurements of more than 60% of registered domain names forms the source of our data. We filter out large domains via the Long Tail Analysis. We have selected the AdaBoost classifier as our classifier of choice, since it had the highest precision metric. The results of our daily detections are stored in an RBL. We have evaluated the RBL in SURFmailfilter.

16

slide-47
SLIDE 47

Results

slide-48
SLIDE 48

Distinction between two types

20 40 40% 60% 80% 100%

11.2 16.6

Number of A records CDF spam ham 20 40 60 80 100 90% 92% 94% 96% 98% 100%

77.0

Number of MX records CDF spam ham

17

slide-49
SLIDE 49

Example

Domain A records MX records (ham) google.com 1 5 (spam) giftiedan.com 61 1 (spam) twirlmore.com 1 253

18

slide-50
SLIDE 50

Example

Domain A records MX records (ham) google.com 1 5 (spam) giftiedan.com 61 1 (spam) twirlmore.com 1 253

18

slide-51
SLIDE 51

Example

Domain A records MX records (ham) google.com 1 5 (spam) giftiedan.com 61 1 (spam) twirlmore.com 1 253

18

slide-52
SLIDE 52

RBL comparison (2 month period)

19

10 20 30 40 50 60 70 80 Detection in advance (days) 1 10 100 1000 10000 100000 Num ber of detected dom ains

slide-53
SLIDE 53

RBL comparison (2 month period)

19

30705

Δt < 2 days

10 20 30 40 50 60 70 80 Detection in advance (days) 1 10 100 1000 10000 100000 Num ber of detected dom ains

slide-54
SLIDE 54

RBL comparison (2 month period)

19

30705

Δt < 2 days

1972

Δt ≥ 2 days

10 20 30 40 50 60 70 80 Detection in advance (days) 1 10 100 1000 10000 100000 Num ber of detected dom ains

slide-55
SLIDE 55

RBL comparison (2 month period)

19

30705

Δt < 2 days

1972

Δt ≥ 2 days

1154 10 20 30 40 50 60 70 80 Detection in advance (days) 1 10 100 1000 10000 100000 Num ber of detected dom ains

slide-56
SLIDE 56

RBL comparison (2 month period)

19

30705

Δt < 2 days

1972

Δt ≥ 2 days

1154 1105 10 20 30 40 50 60 70 80 Detection in advance (days) 1 10 100 1000 10000 100000 Num ber of detected dom ains

slide-57
SLIDE 57

RBL comparison (2 month period)

19

30705

Δt < 2 days

1972

Δt ≥ 2 days

1154 1105 971 10 20 30 40 50 60 70 80 Detection in advance (days) 1 10 100 1000 10000 100000 Num ber of detected dom ains

slide-58
SLIDE 58

RBL comparison (2 month period)

19

30705

Δt < 2 days

1972

Δt ≥ 2 days

1154 1105 971 949 10 20 30 40 50 60 70 80 Detection in advance (days) 1 10 100 1000 10000 100000 Num ber of detected dom ains

slide-59
SLIDE 59

RBL comparison (9 month period)

20

20 40 60 80 100 120 140 160 180 Detection in advance (days) 1 10 100 1000 10000 Number of detected domains

57724 6710 1305 205

Δt < 2 days Δt ≥ 2 days

slide-60
SLIDE 60

SURFnet evaluation

21

2017-05-24 2017-06-23 2017-07-23 Observation dates daadzgam.com realdrippy.com coachspoke.com stillscratch.com homerope.com quittradition.com Domain names

slide-61
SLIDE 61

SURFnet evaluation

22

2017-05-24 2017-06-23 2017-07-23 Observation dates daadzgam.com realdrippy.com coachspoke.com stillscratch.com homerope.com quittradition.com Domain names

slide-62
SLIDE 62

SURFnet evaluation

22

2017-05-24 2017-06-23 2017-07-23 Observation dates daadzgam.com realdrippy.com coachspoke.com stillscratch.com homerope.com quittradition.com Domain names Blacklisted Detected

slide-63
SLIDE 63

SURFnet evaluation

Δt < 2 days

  • 45% of received emails fall in this category
  • 18% of observed domains fall in this category

22

2017-05-24 2017-06-23 2017-07-23 Observation dates daadzgam.com realdrippy.com coachspoke.com stillscratch.com homerope.com quittradition.com Domain names Blacklisted Detected

slide-64
SLIDE 64

SURFnet evaluation

Δt ≥ 2 days

  • 17% of received emails fall in this category
  • 26% of observed domains fall in this category

22

2017-05-24 2017-06-23 2017-07-23 Observation dates daadzgam.com realdrippy.com coachspoke.com stillscratch.com homerope.com quittradition.com Domain names Blacklisted Detected

slide-65
SLIDE 65

SURFnet evaluation ?

domain not on existing blacklist yet

  • 38% of received emails fall in this category
  • 57% of observed domains fall in this category

22

2017-05-24 2017-06-23 2017-07-23 Observation dates daadzgam.com realdrippy.com coachspoke.com stillscratch.com homerope.com quittradition.com Domain names Blacklisted Detected

slide-66
SLIDE 66

SURFnet evaluation

  • 41% of emails were received in the purple areas
  • 59% of these emails have not been marked as spam

22

2017-05-24 2017-06-23 2017-07-23 Observation dates daadzgam.com realdrippy.com coachspoke.com stillscratch.com homerope.com quittradition.com Domain names Blacklisted Detected

slide-67
SLIDE 67

SURFnet evaluation

23

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Additional score of the RBL 0% 25% 50% 75% 100% Emails marked as spam

slide-68
SLIDE 68

SURFnet evaluation

23

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Additional score of the RBL 0% 25% 50% 75% 100% Emails marked as spam

0% 3% 19% 50% 53%

slide-69
SLIDE 69

SURFnet evaluation

23

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Additional score of the RBL 0% 25% 50% 75% 100% Emails marked as spam

55% 70% 79% 88% 99% 100% 0% 3% 19% 50% 53%

slide-70
SLIDE 70

Conclusion

slide-71
SLIDE 71

Conclusion

Using active DNS and by applying machine learning we are able to detect snowshoe spam domains. We are able to detect domains from 2 to 180 days in advance when compared to other blacklists. This time advantage translates into additional email being marked as spam. Emails which would otherwise bypass the email filter. After the evaluation period, SURFnet has deployed our RBL in production.

24

slide-72
SLIDE 72

Conclusion

Using active DNS and by applying machine learning we are able to detect snowshoe spam domains. We are able to detect domains from 2 to 180 days in advance when compared to other blacklists. This time advantage translates into additional email being marked as spam. Emails which would otherwise bypass the email filter. After the evaluation period, SURFnet has deployed our RBL in production.

24

slide-73
SLIDE 73

Conclusion

Using active DNS and by applying machine learning we are able to detect snowshoe spam domains. We are able to detect domains from 2 to 180 days in advance when compared to other blacklists. This time advantage translates into additional email being marked as spam. Emails which would otherwise bypass the email filter. After the evaluation period, SURFnet has deployed our RBL in production.

24

slide-74
SLIDE 74

Conclusion

Using active DNS and by applying machine learning we are able to detect snowshoe spam domains. We are able to detect domains from 2 to 180 days in advance when compared to other blacklists. This time advantage translates into additional email being marked as spam. Emails which would otherwise bypass the email filter. After the evaluation period, SURFnet has deployed our RBL in production.

24

slide-75
SLIDE 75

Thank you & questions

Thank you for listening1. Are there any questions?

1Images are from Pixabay and Wikimedia

25